A noise-correcting function F that takes an input sequence with random errors in it, and produces an output with the errors corrected, is another type of map that reduces the effective size of an input set. For example: all the possible character sequences constituting correct and meaningful english text but with misspellings. Since each sequence without misspellings has many possible misspellings, the number of sequences with misspellings is greater than the number without, and so an F which corrects the misspellings would be compressing the input set (with misspellings) to a much smaller set
One way to make a map M more efficient is to assume it is a composition of maps, and make each constituent map more efficient. For example, lets make M into a composition Mb * Ma, where Ma takes all n^1000 possible inputs and produces n^100 possible outputs, and Mb takes those n^100 possible outputs and produces M's correct output. If Ma is fast and Mb's speed class is the same as the original M's speed class, but lower, then M's new speed is almost certainly less than its original speed. In this case, your effectively weeding out (using Ma) the inputs that don't contribute to the output. All the inputs that have a low probability of producing a given output are filtered out by Ma
A perfect bayesian agent would be able to predict the correct probabilities for possible outcomes based on all of the information its seen so far, regardless how huge and complex what its predicting is, and how much information its seen. And in terms of machine learning we can actually make such an agent with a simple 2-deep ANN, but the computation costs to do so for even simple systems can be extreme. The challenge is really how to make such an agent efficient
In fact, almost all of those length-1000 input sequences are meaningless noise. This is also true for images: imagine an image with most of it covered up, the little bit you can see is comprehensible, the set of possible images you infer the whole image could be is huge, and most of those images are random noise save for the part you can see. In terms of machine learning, a more efficient map M would be one that can quickly weed out those possible input sequences that don't effectively contribute to inferring the output
For a map M from an input set X to an output set Y, I'd imagine that for a given particular output y, the set of inputs mapping to y that have probabilities greater than some reasonable threshold value, is typically tiny compared to the size of X. I want to say: ie: output-conditional input set probabilities are typically highly modular (this might not be an accurate compression of the first sentence). For instance: in text prediction given some large input sequence (say the last 1000 characters), for some predicted character y, almost all of the n^1000 possible input sequences have probabilities essentially equal to 0, but there is some relatively small set of possible input sequences with probabilities much closer to 1
Long. Me talking about the programming language (SetTL) I'm developing
Here's a snippet of an e2ee (end-to-end execution) test for SetTL -- the language I'm writing with set-algebra-isomorphic typing. Its in the earliest stage of development right now, and I haven't added any of the actually unique features I'm developing yet. But this snippet is interesting itself, I think. Its syntax is guaranteed to be unattractive to most people for funny reasons: its sort of like a lisp (so unattractive to many, many people), but with the function heads on the outside of the parentheses, and the closing parentheses of multiline blocks on their own lines (blasphemous). And notice the inconsistent use of commas. Commas, semicolons, and newlines actually do absolutely nothing right now in the core syntax (they're equivalent to a space), despite them being tokenized. They're just immediately deleted after being tokenized. I'm a fan of adding extra information to aid in comprehension when programming (this is also one of the cool parts about tag types and set typing), so the commas help make the code comprehensible now (and I removed some here for demonstrative purposes)
```settl
do(
set(x, 100)
set(steps, 0)
while(>(x 0), do(
set(x, -(x 10))
set(steps, +(steps, 1))
))
assert(==(x 0))
assert(==(steps, 10))
)
```
This is the *core syntax* of SetTL and is a compromise between easily parsable syntax, and easily comprehensible syntax. Now I'm going to spend a few days / weeks (not weeks hopefully) incorporating some of the features of the *extended syntax* to make it faster and more smooth to program in. Specifically, I want to eliminate parethesis pairs so the actual path your cursor takes when writing or modifying any particular line is more linear. As it is you have to move your cursor around a lot when blocking any particular sequence of elements because you have to traverse the entire sequence to add the block opener and closer (ie: `(` and `)` )
In the (hopefully) near future, SetTL's syntax will look more like this (when using extended syntax):
```settl
{
let x = 100
let steps = 0
while x > 0 {
x = x - 10
steps = steps + 1
}
assert x == 0
assert steps == 10
}
```
Which involves, in no particular order:
* Line call parsing: `[\n,;]? foo x1 x2 ... xn [\n,;]?` eq to `foo(x1 x2 ... xn)` in do blocks `{...}`
* Line call parsing: `[,;]? foo x1 x2 ... xn [,;]?` eq to `foo(x1 x2 ... xn)` everywhere else
* Curly brace do block replacement: `{...}` eq to `do(...)`
* `let(... = ...)` normalization to `set(... ...)` (note: `let` will have a *lot* more power in the future if all goes well)
* Infix operations `a + b` eq to `+(a b)`
* Chained infix flattening `x1 + x2 + ... + xn` eq to `+(x1 x2 ... xn)` instead of `+(x1 +(x2 +(x3 ...)))`
These additions will make commas, semicolons, and newlines act like proper separators, so for instance you'll be able to do `set x 100; set steps 0`. And the core syntax will still always work, so even without line call separators you could do `set(x 100) set(steps 0)`
If you have any questions or are interested in SetTL, feel free to talk to me :)
I tested how long it takes to entirely remove all nodes from a randomly constructed tree such that if you remove a parent node all child nodes below it are also removed, and it seems that on average the removal-count-complexity is O(sqrt(n)) where n is the number of initial nodes in the random tree. If you remove the nodes one by one, with no subtree removal on parent removal, then it takes n steps to remove all the nodes
Note: the O(sqrt(n)) complexity depends on what type of random graph is constructed; I built the random graphs for this test by adding a new child to a uniformly-selected random node in the graph
Now to try and find why this is true, theoretically (I imagine its either really simple or really complex)
long and stupid
(Written while programming today. Ignore this post!)
Programming today:
Writing a node replacement function `replace_node_with_new` for my [python shallow tree library](https://github.com/jmacc93/shtree_py)
I find I need to be able to remove each child subtree from the to-be-removed node (or else I might end up with an accidental forest)
So I have to make a subtree remover function `remove_subtree`. But its probably a better idea to change the existing `remove_node` function so it has an option to remove child nodes too recursively (thus removing the subtree). Unfortunately, I could use recursion, but some subtrees might be potentially really deep, so I'll have to make a leaf-first node index generator to iterate a subtree so I can remove it without function recursion. Which is tricky because if the node you yielded is mutated (eg: removed) after the yield, and you try to get properties from that node, then that will fail or result in undefined behavior
---
Ok, finished with the subtree node generator and test. And with `remove_node`, and `replace_node_with_new`. And I'm mostly finished with a `replace_node` that I also wrote but ran into a bug, so I'll shelve that function for now. For reference, this all took 2 hours
Ran into some test failures with `remove_node`, which I solved by collecting all the traversed child subtree nodes and then just nullifying them in the tree (setting them to `None` nullifies them itc), which is much faster (and evidently less error prone) than calling `disconnect_parent`, etc on each child. While doing this, I noticed my random tree building and destroying `remove_node` unit test much more quickly destroys the tree it builds when it picks a random node and removing that node and its subtree, rather than just removing the node itself. This makes sense, but now I'm wondering how many random selections and `remove_node` calls with subtree removal it takes to destroy a tree on average. I might write a script to test this later
---
And now that i have `replace_node_with_new`, I can move on to the completing `greater_than_ef` which is what SetTL's draft executor calls when it encounters a `>` function (or it will, when I finish it), and needed `replace_node_with_new` to be finishable
Ok, I finished `greater_than_ef` and it seems to work, but I discovered the executor now is skipping certain parent call nodes for those nodes' parents, instead of calling the call nodes and continuing to their parents
I found that this problem was because I wasn't returning `True` from `print_ef`, which is an external function that removes itself during its execution, and so it also advances the execution head during its execution. External functions like `print_ef` can advance to the next executable node on their own, and if they don't advance the executor will advance after they're done, but to signal to the executor that an external function has advanced, and the executor shouldn't advance automatically, the external function should return `True`, or something else not `None`. Since I wasn't returning `True`, the `print_ef` function was advancing, and then the executor was advancing automatically as well, so every time I called `print_ef` while testing `greater_than_ef` it would skip over one node during execution
Anyway, I fixed that, and after fixing it I discovered that I can cheat a little with simple external functions and made `generic_fn_ef` which creates a closure that acts like `greater_than_ef` and a lesser than ef and all other n-ary operations, and all other kinds of functions otherwise that just take arguments and return a result. So I can even replace `print_ef`, and probably any other function that doesn't mess around with the stack or have to return non-value nodes, or whatever.
This took around 2 hours
---
Now since I have some condition to test with (`5 < 10`): onto testing the implementation I wrote yesterday of `if_ef` -- which is the analog of `if`, `if-else`, `if-else-if`, etc statements
I'm immediately running into the issue that the 3 print statements in the `if` statement I'm testing are all printing, when only one should print depending which condition in the `if` arguments is evaluating to true.
..
After about 2 hours of running into multiple small bugs, `if_ef` apparently works correctly
A circular definition for sublimate properties: a property that cannot be explained in terms of non-sublimate properties
If we assume the Reality exists *explicably*, then its existence must be sublimate since we cannot otherwise explain its existence without a composition of non-sublimate axioms (which themselves are inexplicably fundamental), or infinite regress (which, incidentally, potentially involves sublimate processes)
Theoretically, explanations like the Omega Model implies an infinite regress of further higher-order sublimate property categories. Note: the Omega Model (OM) is the model that there is no false thing; that literally everything is true, and exists. This presents some strangenesses like physicalism and dualism both being true simultaneously, and contradictory and paradoxical things also being unequivocally true; but otherwise the OM not-so-neatly sidesteps the problem of universal specificness (the problem that the Universe in particular and the Reality in general is limited to a specific form, rather than being completely unlimited; note: it may not be, even without the OM, and this implies sublimate properties as well)
At a looser-level, conceptually, sublimate things are fundamentally (even: beyond fundamentally, sublimately) distinct from non-sublimate things in that there are no non-sublimate things that can combine to make a sublimate thing. They're sort of like stepping into a different dimension of information and ontology. No matter what you do, you cannot escape your N dimensions, but there may be some other things that exist outside your dimension, and they may make your dimension possible (eg: a ball making your sphere surface possible). Though, all analogies necessarily fail absolutely and completely to explain sublimate things, for obvious reasons
A further note: the whole sublimate / not sublimate distinction sure harkens to the divine / not divine distinction
Anyway, I like me some pizza!
While messing around in MMA, I was trying to draw shapes in `Spectrogram`s of `Audio@Table[__]`s and I discovered that you can make a curve appear in the spectrogram with `Sin[F[t]]` where `F[t] := Integrate[f[x],{x,0,t}]` and `f` is the function of the curve you want to appear in the spectrogram, from time to frequency. And if you want to color in a region in the spectrogram, you add the curves a la `Audio@Sum[Table[__],_]`. To get something like white / brown / etc noise you add up a bunch of your function with a random sample of some distribution for a parameter, eg: `fa t + f (-fa + fb) t` linearly interpolates between `fa` and `fb`, and the parameter `f` is the interpolating parameter. If you insert a random uniform on `[0, 1]` in for `f`, then you fill in the region from `[fa, fb]` in your spectrogram
Idea: in the usual web ChatGPT interface: a subdialog where you can discuss with ChatGPT about a particular response it produced and what it did well, what it did poorly, and how it could improve, to aid in its development
I have to imagine that the future of LLMs for chat / dialog like ChatGPT involves training using its own predictions about what it did well and what it could improve, then retraining to correct what it didn't do well
Idea: train a smaller LLM / classifier which takes input text and produces YES/NO/MAYBE/FAIL answers by generating training data using ChatGPT or another fluent LLM
You can potentially generate training data given a set of input questions; you append the subprompt `(Only give Yes, No, Maybe, or Fail answers. An answer that isn't Yes, No, or Maybe should be Fail)` to each question and feed them into ChatGPT. Its responses (if they match Yes, No, or Maybe; and anything else is implicitly Fail) are the unit-vector outputs to train the new classifer
You could also potentially produce training data by taking random snippets S of text from some large dataset of arbitrary text, and ask ChatGPT: `Given the text "S", please list N questions related to the above text that can be answered with Yes, No, or Maybe, and at the end of each question write their answer (one of: Yes, No, or Maybe)`. Where `N` is some small integer (maybe `5 <= N <= 100`)
This classifier could potentially be used to update a system that is keeping track of how some human-programmable state is evolving when the evolved state is not human-programmable but human-describable: you evolve the system and describe it in text, then ask a finite set of questions to synchronize the programmable state with the new system state description
For example, anyone who played the old AI Dungeon back when it used GPT-2 (and probably still now), or who has played a text adventure using ChatGPT (which is really fun: try it out!), knows that the finite length of the input for those systems means they lose track of information frequently, and there are a lot of small details that are lost in general. A human-programmable text adventure, on the other hand, has limited generality, but has a definitive state. With the above classifier you could potentially make a program with a definitive, human-programmable state, evolve the state using a LLM, then update the human-programmable state with the new state's text-description using the classifier
This same technique might be useful for LLMs themselves to generate notes to augment their memories
Another stable diffusion controlnet idea:
A module similar to the reference preprocessor but with a text prompt. The prompt controls what the model's attention goes to in the reference image. Presumably this would allow you to reference just one feature of the reference image, and essentially ignore everything else
Averaging different seeds at the same denoising strength in img2img shows the scale that denoising strength affects
As seen in this video / image: https://imgur.com/a/pz2BCgS
@trinsec Yeah, definitely! Heres an image with a diagram of my anonymized main workflow and tasking canvas. Sorry its cluttered
I really like having a representation of what I'm going to do laid out in time, but most tasks take an indeterminate amount of time to complete, and I also want to limit as much as I can the amount of choice I have when adding and doing tasks, so I evolved away from the calendar-like canvas I used originally. Now, on a single canvas, I use a week-scale timeline of tasks that have definite start and end times, a logarithmic timeline of goals I want to achieve, and I use a sort-of kanban-like card system for most tasks
Since I can't automate the movement of cards (yet), I want to limit the amount of habitual updates I have to make each day, so, for example, the week timeline has a fixed order, and I just cycle an image of an arrow around to indicate what to look at. A note: you can't independently scale the size of markdown cards, but you can scale the size of image cards as much as you want, so images are a good way to indicate where things are when you're zoomed out. Along that same line, group headers are always visible, and scale up when you zoom out, so they can also be used similarly (though the headers can overlap with stuff, whereas images can't)
The kanban-like system takes the form of an ordered list of (fifo) queues which each individually are split into a top and bottom section, the tasks at the top I do whenever the arrow gets to that queue, the tasks at the bottom I can optionally add to the top to do when I come back around, but I have to keep the order they're in. I always try to add things to the very bottom of whichever (top / bottom) part of a queue, to minimize the amount of choice I have in which order things are added. I don't really want to introduce bias into which tasks I'm doing, though the length of tasks obviously introduces bias into how long I spend doing certain tasks. I'm betting there is an optimal queueing ordering algorithm given you know the distribution of task lengths, or similar
Idea for stable diffusion: train a model to correct an image which has been randomly deformed. It may be cheap enough to use perlin noise, or similar, to generate random deformations, but otherwise something like GIMP's pick noise, which just randomly exchanges pixels with nearby pixels n times, may be faster
Theoretically, you could use a regular image as the initial noisy image, and the model would then deform it to match what it thinks is the denoised equivalent. This might allow for, eg: correction of anatomical problems for characters, composition problems, etc
I've been using [obsidian](https://obsidian.md/)'s [canvas](https://obsidian.md/canvas?trk=public_post-text) feature as a sort-of multi-dimensional [kanban board](https://en.wikipedia.org/wiki/Kanban_board) plus calendar and goal timeline and it works amazingly well. It seems like its really important to have the right representation for these sorts of things, and this works really well as a representation for me. I highly recommend checking it out