:9front: Okay, here's webvac, which is of very limited use: https://git.freespeechextremist.com/gitweb/?p=webvac;a=blob_plain;f=README;hb=31f6b1e9fca63584c466e646bbf0af7faf855fc2

:bwk: webvac serves static files using venti. It solves the "big directory" problem Pleroma has if you have a lot of uploads, it serves HEAD requests *faster* than static files on-disk, it automatically compresses/dedups, and you can use the same venti server for as many instances as you have, so you get cross-instance dedup. It has been in production on FSE for a few days, serving the hell out of files, and (given that serving uploads is an I/O-bound process) no latency increase for the most part. Almost all of this is thanks to venti, which is great software.

:ken: FSE uses venti for its backups as well, by running vac against the media directory, then using venti/copy to replicate this to a venti server offsite. After the initial backup of the full 100GB (which took most of a day), replication (three times a day) with full history takes under ten minutes, usually.

:mcilroy: You will need: venti (for storing the blocks; a Plan 9 venti server or a P9P one running on Linux/BSD/whatever works fine), redis (for storing the map of filenames to blocks), vac/unvac from P9P ( https://github.com/9fans/plan9port , also available over IPFS: QmW7ytEymcMpw1KAsqQh73gTiP5iCQFNZeD1DxRVgMpFDA), Ruby, and the ability to tweak nginx without getting an aneurysm from the rage.

:pike: There are two main pieces: webvac-server (which serves the files) and webvac-sweep (which sweeps files into venti). webvac-sweep pulls a file into venti and then records the pathname, score, and metadata to redis. If all of that is successful, it can (optionally) delete the file. The file is now ready to be served! What FSE does is sweep everything and then remove files smaller than 4MB (after 4MB it takes about a second to retrieve the file from venti, I plan to fix the source of this problem and then serve *everything* from venti). Then we have nginx check if the file is present in the FS and if not, forward the request to webvac-server. (There is an example nginx config file for testing.)

You can browse the tree at https://git.freespeechextremist.com/gitweb/?p=webvac . You can clone it using:

$ git clone git://git.freespeechextremist.com/webvac

Or in a completely decentralized manner by grabbing a checkout from IPFS:

$ ipfs get -o webvac QmXhsk7s87NSh5fQmhwK3HNizt69iWUDn2bAYASEhpjMAF

There are installation instructions in the README at the top of the source tree and linked above.

There are more announcements coming today.

@p You made claims this was faster. Do you have benchmarks between this and the oldway to see?

@freemo I eyeballed it locally and I ran awk against the server logs. The uploads dir for FSE was 9.6MB, so just a full readdir was a 9.6MB pointer-chase across non-consecutive 4kB blocks of the disk, this is a lookup in a hash table in memory. The worst-case scenario for that was obscene, it was like ten seconds to do an 'ls' when the cache was cold. Across the board, HEAD requests are now pretty uniform and stay under 1ms of backend time.

If you need something more scientific than that, then feel free to not believe me. I'm already hacking on the next thing.

@p
i do t need anything. A proper benchmark woukd be nice to back uo your claim, but if its just a guestimate then it is what it is. I just wanted to know where the xlaim came from and if it had any weight.

@freemo

> A proper benchmark woukd be nice to back uo your claim

If I managed to make a hash table lookup in RAM slower than a pointer chase across the disk, I would publish that. This is just a property of the data structures and their respective storage media. It's an in-memory hash table versus an on-disk linked list, that'd be like devising and running a benchmark to determine that a search index is faster than a full scan of the disk. The only reason I bothered to eyeball it was to make sure I hadn't completely botched it by doing something stupid.

> I just wanted to know where the xlaim came from and if it had any weight.

Okay, the FS's block size is 4kB. dirents are not contiguous because they are appended piecemeal, so between two blocks that represent a directory, you will find thousands of blocks representing the files that have been written to disk in the interim. You can't predict it because it's a linked list. (Every clever thought you're having right now about putting an index in has to account for the physical medium and the failure modes and space overhead and performance. If you have any intuition about this, it is almost certainly tuned for RAM rather than disk, and most of the clever ideas have been tried already, and they did not result in a reliable, performant filesystem.) Here is a diagram. A pointer-chase across *random-access* memory is already bad, but even on SATA or NVMe disks, a seek is costlier.

Here is a hasty diagram, this is CS 101 stuff, dude.
filesystem_pointer_chase.png
Follow

@p

Ya know what is also CS 101, writing unit tests and benchmarks to go along side code that is written, even when it is known to be an improvement... Why? Simple, it helps us track the performance improvement and also help us tweak future modifications to the code and know when we make mistakes other than what we intended.

No one is saying your intent isnt justified, this is just how you write good code, that includes good tests not sure to prove out your current code, but more importantly as a measure for future tweaks to the code.

Good job and thanks for the hard work.

The thing is, optimisation is a tricky beast.

> that'd be like devising and running a benchmark to determine that a search index is faster than a full scan of the disk

I have seen many times where search indexes can be slower than full scans, just as sometimes hash tables can be slower than linked lists under the right circumstances.

I am not saying that applies here, I am not saying you are under any obligation to do a benchmark, I am not saying this is bad work in anyway.

All I'm saying is a benchmark would have been interesting and I dont rule out the possibility that under certain conditions it might show a slow down and in others a speedup, either of those might be marginal, and it would be interesting to see where the tradeover occurs and just how much of an improvement you get as various conditions grow.

Again not saying you need to do this to determine it was a good move. Just saying it would have been interesting to see, and the benchmarks in general useful for future diagnostics.

On the projects I run I like to create along side my unit tests extensive benchmarking. As features or fixes are added we watch the benchmarks change along side it, and it provides a similar CI tool as unit tests might. So I generally find it a worthwhile effort even if it may not be critical in knowing that the current feature set makes sense performance wise.

@freemo @p FUCK YOUR ASS, BITCH! I FUCK YOU I FUCK AHUFHDUFHDUSHFUJHSIFHDISHFUIHSUIFHD
@freemo @p I think you both are over-estimating the level of content taught in CS 101.

@tmy

LOL true that. I work on some really advanced projects that most even advanced CS guys dont seem to be able to understand or touch (though they do use the libraries)...

I think ive gotten to the point where I dont realize how the vast majority of CS professionals really never even learned the basics, let alone the advanced stuff.

Frankly I doubt many CS noobs would understand what P is doing, let along the need for CI that includes both benchmarks and tests.

@p

@freemo @p I probably fall in that category of CS noob and yeah, it's way over my head. Something about having things make more efficient use of the hard-drive's block size.

That being said, CS 101 is dumbed-down trash that students should have the option to test out of as a required pre-req.

@tmy

Preaching to the choire.

I spend a lot of time tutoring CS students for free. I do this partly because I want to try to teach them good habits early on so I dont have to deal with idiots in the workplace so much.. I'm a good guy like that.

@p

@freemo @p I would say thank you, but as an IT guy working with engineers, them actually being complete idiots is my job security...

@tmy

lol good point. If it werent for their stupidity I wouldnt be rolling around on a mountain of cash either :)

@p

@tmy @freemo

> it's way over my head.

Nah, nah, trust me. Picture a linked list that you have written to disk. Then imagine that hand-wave properties of disks mean that you will want each node in the list to be 4kB or 8kB or 16kB or some arbitrary size. If it were a flat, contiguous array, then to find the Nth item, you would just seek N blocks ahead and read that block. But because it's a linked list, you have to start at the head, then read that, extract the pointer to the next node, seek there, read that, etc. That's how to make reading 9.6MB of disk took 10 seconds.

@xair

"We have to rebuild the array about every week, but man is it fast for that week!"

@tmy @p

@freemo @tmy @p hey that's still better uptime than most people can manage in a small-medium workplace lol
@xair @freemo @tmy The solution is GreenArrays chips everywhere and no more disks.
@tmy @freemo Hyperbole, maybe. Read "CS 101" like "It is the merest elementary, my dear Watson" but where "Watson" is the IBM AI rather than the doctor/roommate/reader-proxy.

@p

I wasnt headed anywhere. I asked you if you had a benchmark. My intent was you to either say yes, and share it, or no and I'd say "ok".

However you decided, as you tend to, to get insulting and childish and be like "this is just Cs101 you should know this dumbass".. so here we are.

Personally I dont care, I'm not developing on the project so it doesnt effect me. At some point someone might come in and write some benchmarks, that would be great. It would be a very useful tool.

I just laugh at the fact that you get your undies in a knot over someone asking a simple question tot he point you feel the need to start spewing rude quips rather than just saying "no sorry dont have one" and calling it a day.

At one point I just hope you grow up and join the rest of the adults in being able to have a normal conversation with someone.

@tmy

@freemo @tmy Dude, next time I want to know what the HN hivemind has to say about best practices, I'm glad you're available as a resource, but I'd rather be dead than bored.

@p

lol good coding isnt always fun, sometimes its boring. But hey if you dont care about doing a good job, more power to you. Best of luck.

@tmy

@freemo @tmy Work tedium is completely different from hearing a guy read you the development methodology four-color glossies. At least something's getting accomplished. Just assume the person you're talking to has heard everything you've heard and mention the name instead of paraphrasing the bullet points under the beaker/gear/light bulb/person icons on the Bootstrap page about the thing. That way you don't come across as tedious or patronizing and if they don't know what the thing is, they can ask instead of trying to scan a sermon to see if there is anything in it that is worth addressing.

> if you dont care about doing a good job, more power to you.

We've reached the stage where you downgrade your post quality from "Hacker News" to "Reddit", I see.

@p

Says the man who all I wanted to know what benchmarks you had if any, no judgement and went right to telling me my question was "CS 101".. It was a yes or no question and you've went on with your moaning about even daring to be asked it for over an hour now...

We call this projection

seriously grow up at this point. You've wasted far too much time trying to flex your ego. Its toxic and you dont do yourself any favors.

@tmy

@realcaseyrollins

Exactly! Thats what im fricking saying.. Asked the dude a question and every response is some condescending bullshit and he wonders why I keep calling him out on it.. Its getting old.

@p @tmy

@freemo @p @tmy Well IDK about that you just seem mad that he won't make a benchmark. A benchmark isn't the only way to test efficiency, as P is proving here, although it's probably better.

Berating P over not making a benchmark just seems childish to me imho...y'all have different opinions on how this sort of thing should be handled, just agree to disagree man

@realcaseyrollins

I've said the exact opposite several times, several of my quotes in this thread:

> if its just a guestimate then it is what it is. I just wanted to know where the xlaim came from

> I am not saying that applies here, I am not saying you are under any obligation to do a benchmark

> Again not saying you need to do this to determine it was a good move.

> I wasnt headed anywhere. I asked you if you had a benchmark. My intent was you to either say yes, and share it, or no and I'd say "ok".

I am **not** berating P over not doing the benchmark. I am berating P over being condescending and rude and therefore am giving the same in kind and defending against hisa ttacks (like telling me its CS 101 that I should know this and dare even ask).

@p @tmy

@realcaseyrollins

The rudeness started in his very first response..

"A proper benchmark woukd be nice to back uo your claim, but if its just a guestimate then it is what it is"

He started when said "this is CS 101 stuff, dude." which i took as rude and condescending

then went on to tell me i was lecturing for just asking the question:

"I bleed tree sap and I cannot think of anything more boring than being lectured"

and other such condescending or rude comments as:

"next time I want to know what the HN hivemind has to say about best practices, I'm glad you're available as a resource, but I'd rather be dead"

"hearing a guy read you the development methodology four-color glossies"

No one lectured him, I asked a freaking question. He challenged the absurdity of even asking for such a thing, I told him it was good practice, and he went on for over an hour with this crap.

@p @tmy

@freemo @p @tmy You might be right but I can see how one might view "this is just how you write good code, that includes good tests not sure to prove out your current code, but more importantly as a measure for future tweaks to the code." as a sort of lecturing

@realcaseyrollins

Yes I could see that too, but it also was after he already started in with the condescending remarks I quoted. Thus why I turned off the delicate touch.

I will say this, P seems to be getting a little better, so maybe he is on the right path and this childish phase of him being defensive over a simple question and then I get lost in back and forth with him for hours might (I hope) not be the norm for much longer.

@p @tmy

He just wants people to reply to him because he is trying to get his mention count up because he is trying to get to the #1 spot this month.

@realcaseyrollins @p @tmy

@freemo @realcaseyrollins @p @tmy
No, the reason @p responds is because he active and likes to defend himself, and also thanks for using irony and scarasm in your sentences
@freemo @realcaseyrollins @tmy

> P seems to be getting a little better, so maybe he is on the right path and this childish phase

Eat every dick, you humorless, oblivious knob-jokey. Every time I have a conversation with you, you do your damnedest to make me regret it. Plonk.

@realcaseyrollins

LOL FYI this comment triggered him so much he blocked me. I'll call QED on that one.

@p @tmy

@realcaseyrollins

Anything is better than that. I just hoped he could have grown the fuck up instead. But he has some comgination of ego issues (insecure) and a bit of a personality disorder mixed in it seems.

Which is fine, no one is perfect, Its probably just years of him being fired from programming jobs for refusing to get along with the group that has led him to be insecure I imagine.

If he acted this way in any group where he wasnt the one in control he wouldnt last a day, which is probably why he needed to start his own server. It was the only way he could be in enough control and feel secure he wouldnt get kicked out of some other server.

@p @tmy

@realcaseyrollins @freemo @tmy I ain't been triggered, I just regret every interaction I've ever had with this guy about a third of the way through the interaction.
@freemo @realcaseyrollins @tmy

> every response is some condescending bullshit

The first response was "I don't have a benchmark for this because it seemed obvious to me that RAM is faster than disk, and I have no benchmarks to prove it", to which you replied with several paragraphs of...it's like you trained a Markov bot on HN comments.

> he wonders why I keep calling him out on it.. Its getting old.

"It hurts when I do this, and he wonders why I keep doing it!"
jagoff.gif

@p

Except that wasnt your first response, earlier I already quoted directly the parts of your first response and subsequent responses that were condescending, I am not going to repeat it. But no the quote of yourself is literally not what you said.

That would have been the polite way to have said it mind you, and I wish that had been your response, but it wasnt.

I dont know what personality disorder your struggling with, but everytime i try to setup a conversation to say something nice about you or to you you come in with some bullshit that makes it hard.

The last thread was when I came in and told that person attacking you about hell threads that you werent violating any laws and always saw you as acting morally when I had complained about the issue... but even then you came in with the flippant response "another country heard from"

You literally make it impossible for people to be nice to you. You should work on that.

@realcaseyrollins @tmy

@realcaseyrollins @freemo @tmy I am mainly annoyed that the guy's asking dumb questions like "Can you back up your claim that a hash table in RAM is faster than reading from disk?" and he's gonna paraphrase at me some blog post where some thought leader expounded on how Pivotal Tracker changed the future of Moleskine notepads *forever*. It's like, why? Why am I seeing this?

@p

Actually I asked because I was hoping youd give a specific figure like "It has increased speed 25x" and then I can go "Nice job P, you do amazing work" and it would be a nice way to try to support you rather than criticize you and you might not feel so attacked. My plan was if you said "no" my response would be "well thanks for all the hard work, still good work", same result.

But instead you decided to jump in witht he condescending bullshit and it didnt play out that way.

But yea that was literally why I asked and what was going on in my mind. I just couldnt get past your insecurity apparently and you think everything is going to be an attack. Sadly, I can see why, you set yourself up to be hated so people wind up attacking you even when they come in with good intentions.

@realcaseyrollins @tmy

@tmy @freemo Every girl's crazy about a sharp-dressed man, but if you want her to stick around, giving her pork chop sandwiches helps.
@p @freemo Give her the poke chop sammich and do the tube snake boogie
@tmy @freemo You put the spurs to her, then you give her a sandwich, and she's definitely calling back.
@freemo I was worried this was where you were going with this. I've seen the thin end of the "grand vision of best practices" wedge before.

> this is just how you write good code

It is one of many ways that a team (one of many software-producing organizations) might write code. It doesn't make the code good, and it doesn't stop the code from being bad.

Another way to write code is to make it 400 lines long with fewer than a dozen conditional expressions.

You know where HN is, they eat this stuff up. I have been in the game so long I bleed tree sap and I cannot think of anything more boring than being lectured on the $current_year Silver Bullet. Take this back twenty years and no one will give a damn about unit tests and they will tell you that "how you write good code" is UML diagrams.

There is no silver bullet and software methodology follows the same kind of trend pattern as diet fads. Atkins : RAD/XP :: Paleo : Agile Development. You use what works for the thing you're making with the team you have, and what your team will put up with matters a great deal more than what you think is the best methodology. I had, in this case, myself for the team, and a very simple server for the thing I was making. Proving the concept took longer than writing the code, and the changes to the way things already work are largely architectural.

> The thing is, optimisation is a tricky beast.

I didn't optimize it. (I left some very low-hanging optimization fruit, in fact.)

> Just saying it would have been interesting to see

Well, unless you're storing your media files in an unconventional FS, it is orders of magnitude faster to do a lookup in a hash table. For a HEAD request, this code just does a regex substitution and says "HGET" to the Redis server, then gives a 404 if it got no data back and a 200 if it did. A benchmark of this would be, essentially, benchmarking Redis, not this code.

> I like to create along side my unit tests extensive benchmarking.

For cases when it matters, I just crank the profiler all the way up and write a fuzzer, and if it's interactive, I just use cgroups stuff to give it a tenth of the CPU. In this case, that wouldn't have helped, because it's a pretty flat system: there are two pieces (one comes with a wrapper) and they both do one thing each. It doesn't do so hot with bigger files, but I know why and what to do about it, I just have other things to do before I worry about that.

It doesn't even need to be faster, just reliable, and there's so little code that it is about as reliable as its pieces. It's a happy side-effect that it is faster for some cases (like fetching metadata), because it already enables me to pool the storage across instances and it saves me time and disk space and makes the backups much faster.
Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.