**Simon Willison** @simon@simonwillison.net · Feb 10, 2024, 16:24

**Simon Willison** @simon@simonwillison.net · Feb 10, 2024, 16:24

Simon Willison @simon@simonwillison.net

Feb 10, 2024, 16:24

Simon Willison @simon@simonwillison.net

S3 question: I'm considering a bucket design where every uploaded file is stored as:

2014-02-10/73WakrfVbNJBaAmhQtEeDv/original-filename.pdf

So that's /date/UUID/filename

This is to avoid clashes if a user uploads multiple files with the same name

Are there any downsides to having a unique "folder" prefix per file like this?

I'm specifically interested in non-obvious performance or cost implications

(Side question: would this be a problem on regular Linux file systems aside from S3?)

**Biggles** @Biggles@qoto.org · 2024-02-11T05:11:20Z

Biggles @Biggles@qoto.org

@simon

Most legacy-but-still-used filesystems on linux (ext3 and ext4 in particular) sort when you do a directory, and the performance falls off exponentially with the number of files in that directory, with about 1000 being a practical upper limit. The noted symptom is hanging on "ls", but if you know the exact name you're fine. The usual workaround if you need "ls" is directory hashing, ie "mount/ab/cd/ef/abcefghi" so any given directory tops out at "fast enough". (The workaround to ls when you have a giant directory is to use "find", which doesn't try to sort)

If you don't need to "ls" - and random programs wont be accessing it - no idea what the upper limit is, but I hit this at about 40 thousand files. Fun!

Feb 11, 2024, 05:11 · · Metatext · · ·

Trending now

Resources

Developers

What is Mastodon?

qoto.org

More…