Show newer

@ampanmdagaba @davidho
ok, we emit 37 040 000 000 tonnes CO2 yearly (37.04 billion tonnes).

given your numbers i come up with a forest of about 2434km edge length:

> puts [ expr sqrt((37040000000 / 2500000.0) * (20 * 20)) ]
2434.4198487524704

restore mother europe to be a continuous forest as she has been.

let's go.

@Czerion ich hab mal in dem bereich was gemacht, mit dem leitungswasser und abwasser (da tut sich grade was wegen der medikamentenrückstände) hier wird sich echt mühe gegeben. das läuft mmn. noch gut, vielleicht weils so dezentral ist.

mikroplaste usw. sind 100% auch im regenwasser bei uns.

@zleap i really don't know where you find the people you boost at times, but it's pretty effective in keeping my blood pressure high :)

Trump survived assassination attempt. So what? Brandon survived several of them! Remember how them evil people put him on a bicycle? How they made him climb the stairs? Never forget!

> Gewalt habe keinen Platz in der Demokratie, schrieb EU-Kommissionspräsidentin von der Leyen.

SPANNEND, WARUM DURFTE ICH DANN FÜR EIN JAHR NICHT IN DEN BAUMARKT?

@gabriel next post erich will lecture you about why the 3rd reich was bad.

Wrote doc/why-did-i-not-think-of-this.txt immediately after returning from the toilet. It's mostly codebase-specific technical notes. Nothing that would make sense to anyone besides me.

I've been unhappy with the amount of overhead the ingestor introduces, right? It's the thing that gets in my head and stays there, it's kinda burrowed in there. (Maybe because it is more fun to do that sort of thing; the words "fucking" and "bullshit" appear way more often in the ActivityPub module than the bits that deal with the network or with the encoder.) Bear with me a moment: blocks aren't just a chunk of storage in this case, blocks are additional network traffic, they're an increase in false positives in the bloom filters, they're filesystem overhead, they're slots in the cache, it's good to minimize them, you will have to take my word for it if it seems not worth considering.

I had this stupid normalization process because I was thinking of things the way you think of them in Postgres, here are the values, we split them out and put an ID in there, etc., and in the mean time had this byte-level encoder, and it picks boundaries around block sizes and then does some of the trivial stuff: truncating around zeroes, headerless deflate. So it is really like how you'd structure it in a conventional application: you pull some object out, say you have `"to":[$long_ass_hellthread_list]` and you turn that into `"toID":"$some_sha256_id"`. Not pretty, and requires that you use accessors to handle things uniformly (i.e., so that the "to"/"toID" split doesn't need to be handled elsewhere in the application), and amounts to a weakref. Weakrefs are fine here, but they're slower: you don't see it until you're parsing the JSON, which means the network trying to fill in pieces when it expects you'll be asking for them or propagating pieces that you use more often, it can't do much with weakrefs, it doesn't parse the JSON.

On the other hand, being smarter about the encoding means that instead of shifting the reference up a level so that we've got to worry about that at the level where we're dealing with AP objects and activities, we push it down one more level. The way the encoding works is that it's split into blocks, then a header is generated for each of the blocks, the headers are concatenated to produce the top of the tree, and this process is recursive. (I don't remember what it actually does, but I recall tarsnap describing some cleverness around finding optimal block boundaries.) It actually doesn't matter to the decoder *where* the boundaries are: instead of storing the string "abcd" as four bytes and a metadata block (48 bytes total), you can store it as four one-byte strings plus four headers, and then another header up one level (4+(4*44)+44=224 bytes), and the decoder handles it fine. That's obviously contrived to be perverse, right, but you could engineer it such that you actually try to find some reasonable boundaries. Take a hellthread for example: the to/cc list is going to be the same, a thousand times (or if you sorted it, it would be). Take a post from Mastodon: there's going to be "content" and "contentMap" and in the overwhelmingly likely case, "contentMap" is going to have only one entry and it's going to be the same as the value for "content". (That's two copies of the Bee Movie.) I was accounting for the to/cc stuff, but there's "content", there's "contentMap" on Mastodong, there's "source" on Pleroma, and that's before investigating the weird ones.

I think it's probably *possible* to cram this into the MarshalJSON() functions, so you can tell the encoder "Hey, the boundaries are here and here", basically compiling a list of hints for it to break on. That would require some sort of side-channeling, it'd require planning and accounting for all of the weird shit all of these servers do (including different versions) and that is a pain in the ass. Just telling the encoder "This is well-formed JSON" and letting it pick things out, that's easier. It can find long strings and arrays, and this ends up transparent to the decoder, but also transparent to everything upstream from the encoder, so the litepub.Activity doesn't need to try to worry about the encoder any more (beyond sorting), I don't need to worry about which things are likely to be redundant, etc. I also don't have to implement it yet; I can delete code that is already written, code that adds unnecessary complications, and then add this stuff later, like if I get bored while watching the import start from the top again. (136,127,584 objects, 161,084,319 activities, 717,471 actors. It does take a minute.)

Obviously, the heuristics will need to be clever, but the worst-case is that they make sub-optimal breaks. It's also the case that we lose single-pass encoding, since it'll have to backtrack, but that's not a huge problem for an encoder and keeping some JSON in memory isn't so bad; if we're normalizing it, then we already parsed it, it's already in memory, so it's not going to be pathological (that is, if it were going to crash us, it would have done so before we parsed it). So that's the downside, I think it's not so bad.

Anyway, if I sound like I've missed something important (I don't think I have, but I'm uneasy), that's there, say some words. In the mean time, I think I can gut a bunch of conditionals before the next rerun of the ingestor.

That brings me to the second half of this post, which is less "someone sanity-check this design" and more "here is a fit and/or start as FSE resurrects itself".

I'll also have (probably after I restart the ingestor again and it runs long enough for me to alter it) a patch for Pleroma's lib/pleroma/object/fetcher.ex (around line 83), but the version running on FSE (and probably get a patch that applies cleanly against a reasonably current version), to use the object proxy feature if you want to test that out. It responds to URLs that it already has, obviously, but will also start responding to URLs that the ingestor has ingested while it ingests them (wherever I was running the ingestor; the test instance at screamshitter.club will have the data, although I am running the ingestor on a different machine where I can blow out the data and then bring it back), which means that Fediverse Jesus will be calling the dead to rise from their graves. (It will also start trying to fetch some objects on demand at some point, but it's going to be very conservative about fetching on demand until I can make sure that I'm not going to DoS fedi. Fetches for AP objects will be signed if I remember to turn that on, and in either case, it will be usable as a media proxy fallback, though that will probably require an extra patch, but at minimum, FSE's media/emoji/etc. will be back when you try to view them from here or from another instance that proxies to a Revolver test instance.)

Anyway, I'm pretty excited; there has been a lot of stuff that is somewhat less visible, like the /inbox fork (for example, you may have seen, in some cases, FSE's IP attempt to double-fetch objects with two different UAs, and that is why), all of which is less visible. This is going to be pretty visible, at least from FSE. There is also a surprise. :revolvertan: (There may actually be two surprises but no one will like the other one because it is going to look cooler than it is and I'll say "no, no, that doesn't work yet, it just looks like it does if you look at it from the right angle".)
Altered_Beast_ResurrectionbyBreakbeat_OC_ReMix.mp3

@torparskytt haha :D

i think i didn't even interact with them, but they posted some bullshit during the so called pandemic. then i saw someone from my feed interacting with them and figured i should take a look again. nothing changed much :)

@ThatCrazyDude i only used it a bit, but it feels pretty nice. i think it's cool that it is focusing on being a darknet, not on routing to the clearnet.

@DCR @begsby ich kaufs tatsächlich weil ich einfach sprudelwasser mag :)

was ist denn an leitungswasser schlecht?

@DCR @begsby sind glasflaschen mit regionalem mineralwasser ok? 😬

*unmutes someone

*looks at profile

> collectivist bullshit

* mutes again

:blobfoxpolice: who the hell are you?
:blobcatrainbow: i'm the scatman
yub dub dub dub

Things No One Tells You, But They Should:

All trauma you've repressed through your teens and 20's will come bite your ass in your 30's, you will not escape this reckoning.

What else do you wish someone would have told you?

"I can confidently say that code quality should be the least of your concerns when it comes to your FOSS project."

Modern day devs, ladies and gentlemen.

RT: https://social.treehouse.systems/users/TheEvilSkeleton/statuses/112241679352421223
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.