S3 question: I'm considering a bucket design where every uploaded file is stored as:

2014-02-10/73WakrfVbNJBaAmhQtEeDv/original-filename.pdf

So that's /date/UUID/filename

This is to avoid clashes if a user uploads multiple files with the same name

Are there any downsides to having a unique "folder" prefix per file like this?

I'm specifically interested in non-obvious performance or cost implications

(Side question: would this be a problem on regular Linux file systems aside from S3?)

Follow

@simon

Most legacy-but-still-used filesystems on linux (ext3 and ext4 in particular) sort when you do a directory, and the performance falls off exponentially with the number of files in that directory, with about 1000 being a practical upper limit. The noted symptom is hanging on "ls", but if you know the exact name you're fine. The usual workaround if you need "ls" is directory hashing, ie "mount/ab/cd/ef/abcefghi" so any given directory tops out at "fast enough". (The workaround to ls when you have a giant directory is to use "find", which doesn't try to sort)

If you don't need to "ls" - and random programs wont be accessing it - no idea what the upper limit is, but I hit this at about 40 thousand files. Fun!

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.