There are many attributes that cannot be portably stored in a POSIX ustar archive. IEEE Std 1003.1-2001 (“POSIX.1”) defined a “pax interchange format” that uses two new types of entries to hold text-formatted metadata that applies to following entries. Note that a pax interchange format archive is a ustar archive in every respect. The new data is stored in ustar-compatible archive entries that use the “x” or “g” typeflag. In particular, older implementations that do not fully support these extensions will extract the metadata into regular files, where the metadata can be examined as necessary.
An entry in a pax interchange format archive consists of one or two standard ustar entries, each with its own header and data. The first optional entry stores the extended attributes for the following entry. This optional first entry has an "x" typeflag and a size field that indicates the total size of the extended attributes. The extended attributes themselves are stored as a series of text-format lines encoded in the portable UTF-8 encoding. Each line consists of a decimal number, a space, a key string, an equals sign, a value string, and a new line. The decimal number indicates the length of the entire line, including the initial length field and the trailing newline. An example of such a field is:
25 ctime=1084839148.1212\n
Keys in all lowercase are standard keys. Vendors can add their own keys by prefixing them with an all uppercase vendor name and a period. Note that, unlike the historic header, numeric values are stored using decimal, not octal. A description of some common keys follows:
atime, ctime, mtime
File access, inode change, and modification times. These fields can be negative or include a decimal point and a fractional value.
uname, uid, gname, gid
User name, group name, and numeric UID and GID values. The user name and group name stored here are encoded in UTF8 and can thus include non-ASCII characters. The UID and GID fields can be of arbitrary length.
linkpath
The full path of the linked-to file. Note that this is encoded in UTF8 and can thus include non-ASCII characters.
path
The full pathname of the entry. Note that this is encoded in UTF8 and can thus include non-ASCII characters.
realtime.*, security.*
These keys are reserved and may be used for future standardization.
size
The size of the file. Note that there is no length limit on this field, allowing conforming archives to store files much larger than the historic 8GB limit.
SCHILY.*
Vendor-specific attributes used by Joerg Schilling's star implementation.
SCHILY.acl.access, SCHILY.acl.default
Stores the access and default ACLs as textual strings in a format that is an extension of the format specified by POSIX.1e draft 17. In particular, each user or group access specification can include a fourth colon-separated field with the numeric UID or GID. This allows ACLs to be restored on systems that may not have complete user or group information available (such as when NIS/YP or LDAP services are temporarily unavailable).
SCHILY.devminor, SCHILY.devmajor
The full minor and major numbers for device nodes.
SCHILY.fflags
The file flags.
SCHILY.realsize
The full size of the file on disk. XXX explain? XXX
SCHILY.dev, SCHILY.ino, SCHILY.nlinks
The device number, inode number, and link count for the entry. In particular, note that a pax interchange format archive using Joerg Schilling's SCHILY.* extensions can store all of the data from struct stat.
LIBARCHIVE.xattr.namespace.key
Libarchive stores POSIX.1e-style extended attributes using keys of this form. The key value is URL-encoded: All non-ASCII characters and the two special characters “=” and “%” are encoded as “%” followed by two uppercase hexadecimal digits. The value of this key is the extended attribute value encoded in base 64. XXX Detail the base-64 format here XXX
VENDOR.*
XXX document other vendor-specific extensions XXX
Any values stored in an extended attribute override the corresponding values in the regular tar header. Note that compliant readers should ignore the regular fields when they are overridden. This is important, as existing archivers are known to store non-compliant values in the standard header fields in this situation. There are no limits on length for any of these fields. In particular, numeric fields can be arbitrarily large. All text fields are encoded in UTF8. Compliant writers should store only portable 7-bit ASCII characters in the standard ustar header and use extended attributes whenever a text value contains non-ASCII characters.
In addition to the
x entry described above, the pax interchange format also supports a
g entry. The
g entry is identical in format, but specifies attributes that serve as defaults for all subsequent archive entries. The
g entry is not widely used.
Besides the new
x and
g entries, the pax interchange format has a few other minor variations from the earlier ustar format. The most troubling one is that hardlinks are permitted to have data following them. This allows readers to restore any hardlink to a file without having to rewind the archive to find an earlier entry. However, it creates complications for robust readers, as it is no longer clear whether or not they should ignore the size field for hardlink entries.