Environment Variables

Environment variables are one of the aspects of Unix programming that are poorly understood by a lot of people. In particular, I've noticed a lot of people seem to misunderstand how "exported" environment variables work. I think the root of this misunderstanding is that many people primarily encounter environment variables in the context of shell programming where the syntax for working with environment variables is identical to the syntax for accessing regular variables. Therefore I'm going to explain how environment variables work the other way around---outside of the context of shell programming first, which I think makes the whole concept easier to understand.

On Unix systems every process has an environment associated with it. The environment is a mapping of string keys to string values. In a C program to look up the value of a variable you'd use getenv(). You can set a value with putenv() or setenv(), and you can delete a variable with unsetenv(). Environment variables cannot contain null bytes, but otherwise there are no restrictions on the contents. Typically all upper case letters are used for environment variables, but this is just a convention.

New processes are created on Unix by calling fork() followed by a call to one of the [exec](https://en.wikipedia.org/wiki/Exec_(computing)) family of calls. The way that exec is invoked controls what the initial environment for the new executable will have. In the most common situation the new process will inherit a copy of its parent's environment.

This is also what happens when you run a shell script. If the shell script launches new processes those processes will inherit a copy of the shell script's environment. There's nothing magic about shell scripts and environment variables---they work exactly the same way that other programs on Unix systems do.

What is confusing about shell scripts is that the syntax for using environment variables is exactly the same as the syntax for using regular variables. For instance, in a shell script you can typically access $USER as if the variable were defined somewhere in the script. What actually happens when you do this is the shell will see that there's no variable declared called $USER and then will fall back to the environment by calling getenv() with the string USER. There's effectively a two level lookup system, where first variables are looked up in the normal variable scope and then in the environment.

When you declare and set a regular variable in a shell script the shell will check if the variable you're using is in the environment. If it is, the value in the environment will be updated automatically with putenv(). These variables are "exported" because the value you set will be visible to new processes. If the variable isn't in the environment then it will not be automatically added to it by the shell, so if you have a line like FOO=1 in your script then by default FOO won't be visible in the environment of new processes created by the shell. If you want to add a new variable to the environment you must direct the shell to do so with the export statement. You only need to use export once, and you only need to export variables that actually have meaning to subprocesses. A common (but harmless) mistake that people make is to export too many variables because of a lack of understanding of what export actually does.

Debugging

If you understand the above about how environment variables work usually you won't run into any problems. However, from time to time you may need to debug complex shell scripts or programs where it's not immediately obvious what is in the environment. On Linux there's a fun trick for reading the environment of a running process. In the proc pseudo-filesystem the environment of a process is mapped to a file called /proc/PID/environ. The values in the environment are separated by null bytes, so in Python you can decode the values with some code like:

print open('/proc/self/environ').read().split('\0')

You can only read environment variables this way. If you're ever in the unfortunate position of needing to update the environment of a running process you need to attach using GDB and call setenv().