These are public posts tagged with #ngrams. You can interact with them if you have an account anywhere in the fediverse.
I was even thinking to use #ngrams data from https://marcoxbresciani.codeberg.page/keyboards/ergodash/ergodash.html#org2ca8e47 but even if I have those numbers, I have no idea on how to use them to create a better #Italian-based #ColemakDH layout.
Also, is it worth it?
Hints? Ideas? Help!
Has anyone cited Google #NGrams slash does anyone have ideas of how to do it? I'm using APA, and it's for a throwaway "the term gains popularity around XXXX" comment.
My first thought is to follow the Merriam Webster citation format, but that hinges on using the word as the web title and like how do I even approach that with NGrams. I do expect to cite NGrams more than once, so I guess I need a specific title? Ughhhhhhhhhh
@johnwehrle I'm defending the notion of effective and fact-based criticism here, not longtermism ...
... but note that the term "existential risk" LONG predates the emergence of "longtermism", and through 2000 is also far more prevalent. See screenshot, and note that "longtermism" is multiplied 3x to scale equivalently to "existential risk".
I've strong concerns with any argument which leans heavily on such readily-refuted claims. The viewpoint may well be justified, but a bit less hyperventilating hyperbole and poor scholarship would greatly help the case.
The notion of "existential risk" was originally applied in a religious context (by Paul Tillich) and to nuclear weapons.
See:
Bulletin of the Atomic Scientists (1946): https://www.google.com/books/edition/Bulletin_of_the_Atomic_Scientists/KLMhAQAAMAAJ?hl=en&gbpv=1&bsq=%22existential+risk%22&dq=%22existential+risk%22&printsec=frontcover
Tillich reference / religious context (1959): https://www.google.com/books/edition/American_Scientist/8-9UAAAAMAAJ?hl=en&gbpv=1&bsq=%22existential+risk%22&dq=%22existential+risk%22&printsec=frontcover
#longtermism #ExistentialRisk #GoogleNgramViewer #Ngrams #WeakArguments #EmilePTorres
Doing some n-gram analysis of texts, trying to see what sort of abstract structural features of language exist on a statistical kind of level.
Image is a graph of probability (log scale) against n-gram size.
The solid purple curve decreases increasingly rapidly as n increases. I think this indicates that distinct n-grams get increasingly more sparse (in the space of all character combinations) as n increases.
The dashed green curve decreases very rapidly until a minimum at n = 4, P = 10^{-26}, then increases less rapidly but at a steady rate, near 10^{-4} at n = 10. This shows that given a string of 140 characters whose (n-1)-grams are all found in the corpus, it's increasing likely (as n increases) that all its n-grams are found in the corpus too (provided n > 3).
I don't know what this implies about the nature of patterns of various scales in human language.
Corpus for this experiment was https://www.gutenberg.org/files/48320/48320-0.txt
Runtime of my Haskell code to analyse the data set was 1m40s.
Who else is interested in #ngrams and #GoogleTrends and uses custom tools on their #data?
@emacsomancer I mean, we might want to talk about the British war on ... Easter:
In case you were wondering, Christmas is in fact doing just fine
If there ever was in fact a war against it, that ran from 1950--1980.
Via Googe Ngram Viewer US English Corpus
I suspect this is a fundamental principle I've just stumbled on.