It's kind of interesting how little there formal standardization there is in archive formats. The zip format has a standard-ish; ISO standardized part thereof in ISO/IEC 21320-1:2015 by taking the APPNOTE-6.3.3.TXT and forbidding certain parts from the full specification from being used. The pax format (that is used by approximately nobody) has a formal description in POSIX. And that seems to be just about it.

Follow

@rq

Is the zip format standardized, or parsing of zips? Without looking I expect the former, because there are multiple ways to parse a zipfile: there's the infamous "does first or last entry for a given path count" problem (exacerbated by the distinction between "first in the directory in footer" and "first in byte order in the file itself").

@rq

Well, it's a spec for the format. It says things like "a valid ZIP file MUST ...", but doesn't specify the behaviour of the parser (can the parser assume that this holds? must the parser fail if it doesn't hold? what if it doesn't hold in a part of the file the parser wouldn't even ordinarily read? etc.).

Quick grepping through those appnotes doesn't reveal any expectations around path other than ones in 4.4.17, which don't specify anything about uniqueness (nor about interpretation of nonuniqueness, but that is not something I'd expect in a specification of format as opposed to parsing).

@rq

BTW. github.com/google/wuffs/blob/m is an example of a spec that describes what a decoder must do, in a case where there are nontrivial constraints on it (that serve to make sure that you can't construct a compressed file that will decompress differently depending on whether you read from the beginning or not).

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.