One of the defining features of Unix is its hierarchical filesystem: directories
on Unix systems can contain other directories, without a limit to the depth of
the nesting. This isn’t a big deal nowadays, but Unix was one of the first
operating systems to feature a hierarchical filesystem. And believe it or not,
developers today are still writing code today that
and doesn’t handle long file paths correctly.
This constant is defined by POSIX, and is supposed to be the largest possible
size for a filesystem path. There a few compelling reasons to define such a
- Fixing the size of paths makes it easier to declare file paths inline in
structs or on the stack, simplifying manual memory management.
- In practice most filesystems have a limit on the length of filenames, so it
makes sense to expose this limit somehow.
The problem is that you can’t meaningfully define a constant like this in a
header file. The maximum path size is actually to be something like a filesystem
limitation, or at the very least a kernel parameter. This means that it’s a
dynamic value, not something preordained. The
<limits.h> header file doesn’t
know what filesystems you’re trying to use, or what kernel you’re running, it
just exports a static value. For this reason alone we know that the value of
PATH_MAX is at best a lower bound.
File Paths Can Be Arbitrarily Long
Most filesystems will have some limits on files, such as a maximum length on
components in a file path. The limit on path components (also known as the file
name limit) is defined as
NAME_MAX, generally 255 bytes. But a file path can
have many components, and thus a full path can be much longer. Unix filesystems
have directory inodes that map relative file names to file inodes, and file
do not actually contain file names at all.
Unix also allows filesystems to be mounted hierarchically. Even if a
hypothetical filesystem had a size limitation on the length of full path names
(and none of the mainstream ones do), that filesystem could be mounted at a
mount point other than
/. The mounted filenames would all be exposed with the
mount point as a prefix, and thus their full file names would become longer than
what the underlying filesystem supported!
Exercise for the reader: Consider how one could
implement hard links on a Unix
system, and why hard links preclude storing full file paths in inodes.
System Calls and
As a practical consideration, the kernel must enforce a limit on the length of
all strings supplied via system calls. There are a couple of reasons for this,
but the most important is that the kernel must actually do a memory copy of
non-value parameters like strings from userspace into kernel memory, e.g.
For system call arguments that are file names, the kernel will return
ENAMETOOLONG if the supplied file name is too long. On Linux, system calls
open(2) perform this check
The check is performed using the 4096 byte
PATH_MAX value, which, as I just
finished explaining, is not really a file path limit!
PATH_MAX is actually defined as the maximum permitted size of file paths
supplied via system calls. If you try to open a path whose length equals or
exceeds 4096 bytes, you’ll get an error. But that doesn’t mean it’s impossible
to open such a file: it just means that you need to use a shorter (relative)
file path when opening the file.
Many functions defined in libc can accept or return file names, and those file
names are not necessarily limited by the size of
Path Metadata With
To make things more sane, POSIX defines a less well-known symbol
This function lets you get at low-level information about kernel limits related
to things like path lengths:
// POSIX method for getting file path metadata.
long pathconf(const char *path, int name);
The maximum relative path name can be fetched for a path by supplying the value
_PC_PATH_MAX as the second argument. There are a few important usage caveats
The first thing you’ll notice when using this API is that
pathconf() takes a
file path as its argument. Thus you can’t use
pathconf() to get the maximum
file path for arbitrary files, because there isn’t an arbitrary limit. You can
pathconf() with a file whose name you already know.
When you do know the filename, the return value for
_PC_PATH_MAX is the maximum relative path size, since Unix files don’t
really have absolute paths. Therefore the data returned with
not as general as what you might think at first: most code will need the ability
to handle longer paths than what
What Directory Am I In?
The evolution of the Unix filesystem APIs is illustrative of how to properly
deal with long file paths. I’m going to use accessing the current directory as
an example of how things have changed. Back in ancient times, you would have
getwd() to get the current directory name:
// Deprecated old-school Unix way of getting the current working directory.
char *getwd(char *buf);
The buffer supplied to
getwd() is supposed to be at least
PATH_MAX bytes in
length. This will fail in a bunch of cases, since
PATH_MAX isn’t a reliable
way to tell the maximum length of a directory name. This was fixed by
introducing a new, more general method, called
getcwd(). The major difference
is that it accepts another parameter indicating the buffer size:
// Current POSIX way of getting the current directory; does not allocate.
char *getcwd(char *buf, size_t size);
If the buffer you supply is too small,
getcwd() will return -1 and set
ENAMETOOLONG. Since paths can be of arbitrary size, to correctly use
getcwd() you actually need a loop that resizes the underlying buffer and
retries when this happens.
The POSIX specification for
says the behavior is undefined if
buf is a null pointer.
GNU libc takes advantage of this by turning
getcwd() into an allocating
version when a null pointer is supplied as the buffer. To simplify this further,
GNU libc defines an extension called
get_current_dir_name() that takes no
parameters, and just returns a newly-allocated directory name for you:
// GNU extension, caller must call free() after.
The GNU libc implementation of
actually implemented by calling
getcwd() with a null pointer.
Portable code can check to see if
get_current_dir_name() is available, and
then fall back to a loop that uses
getcwd() if necessary.
Canonicalizing File Names
Not everything in POSIX has been updated for compatibility with long file names.
POSIX defines a function called
realpath(), which can be used to get
a “canonical” path for a file,
i.e. one that doesn’t include extra slashes or dots. It takes a path to
canonicalize, and an output buffer to store the canonical path in:
// POSIX way to get the "canonical" path for a file.
char *realpath(const char *path, char *resolved_path);
The caller is supposed to supply an output buffer to
realpath() whose size is
PATH_MAX bytes. As we know, this isn’t sufficient. Unlike the
previous example of getting the current directory, POSIX doesn’t define a
realpath() that specifies the size of the buffer the return value
should be copied into.
As in the previous example, POSIX does not specify implementation
when a null pointer is used as the output parameter.
As before, GNU libc takes advantage of this to turn
realpath() into an
allocating version when the second parameter is null. When using this interface,
you supply a null value for
resolved_path, and then later you’re expected to
free() on the returned pointer. To make things even easier, GNU libc
exports a non-standard function called
canonicalize_file_name() which is like
realpath(), but only takes one argument, the file path to be resolved. Under
canonicalize_file_name() is implemented by just calling
with the supplied path, and a null parameter for the resolved path. This is
exactly analogous to the previous example, and is a good example of how
intentional ambiguity in the POSIX specification allows vendor extensions.
The GNU libc man page
realpath() has some interesting notes here about these issues and the
pathconf(). The source code is also interesting:
the GNU libc source code for
is nearly 200 lines long,
which includes the logic for computing the right buffer size, and a lot of very
careful error handling. This code also demonstrates the correct usage of using
_PC_PATH_MAX. Take a gander if you’re interested in seeing
very portable, correct C file handling code.