Surprising Things Found While Exploring Bash

A few days ago I was making some changes to my .bashrc file and noticed a few interesting things regarding bash aliases and functions.

In my actual .bashrc file I had only the following lines that were related to setting up aliases:

alias grep='grep --color=auto'
alias ls='ls --color=auto'

if which vim &>/dev/null; then
    alias vi=vim
fi

But here's what I got when I typed alias:

$ alias
alias egrep='egrep --color=auto'
alias fgrep='fgrep --color=auto'
alias grep='grep --color=auto'
alias l.='ls -d .* --color=auto'
alias ll='ls -l --color=auto'
alias ls='ls --color=auto'
alias vi='vim'
alias which='(alias; declare -f) | /usr/bin/which --tty-only --read-alias --read-functions --show-tilde --show-dot'
alias xzegrep='xzegrep --color=auto'
alias xzfgrep='xzfgrep --color=auto'
alias xzgrep='xzgrep --color=auto'
alias zegrep='zegrep --color=auto'
alias zfgrep='zfgrep --color=auto'
alias zgrep='zgrep --color=auto'

Weird, right? The only ones I had defined in my .bashrc were the aliases for grep, ls, and vi. Well, it turns out that my distribution has already decided to add the --color=auto stuff for me for ls, which is pretty reasonable, and I found bug 1034631 which is the origin of all of the weird grep variants automatically aliased for me. That seems a little weird, but I do understand it. However, I do find it amusing that the obscure ls variant vdir isn't colorized even though it is part of coreutils and supports colorization (perhaps I should file an RFE).

But WTF is going on with that which alias? Actually, what's going on here is pretty neat. This alias pipes the list of defined aliases and bash functions defined in the shell to which so you can can see where these come from. And if I run declare -f on my system, I see that there's actually a lot of stuff in there:

$ declare -f | wc -l
2489

A couple of those are functions I defined in my own .bashrc file, but that accounts for perhaps 50 of those nearly 2500 lines of shell script.

Nearly all of the functions that I see listed by declare -f appear to be functions that are installed for bash command completion. There are some real gems in here. Check out this one:

_known_hosts_real ()
{
    local configfile flag prefix;
    local cur curd awkcur user suffix aliases i host;
    local -a kh khd config;
    local OPTIND=1;
    while getopts "acF:p:" flag "$@"; do
        case $flag in
            a)
                aliases='yes'
            ;;
            c)
                suffix=':'
            ;;
            F)
                configfile=$OPTARG
            ;;
            p)
                prefix=$OPTARG
            ;;
        esac;
    done;
    [[ $# -lt $OPTIND ]] && echo "error: $FUNCNAME: missing mandatory argument CWORD";
    cur=${!OPTIND};
    let "OPTIND += 1";
    [[ $# -ge $OPTIND ]] && echo "error: $FUNCNAME("$@"): unprocessed arguments:" $(while [[ $# -ge $OPTIND ]]; do printf '%s\n' ${!OPTIND}; shift; done);
    [[ $cur == *@* ]] && user=${cur%@*}@ && cur=${cur#*@};
    kh=();
    if [[ -n $configfile ]]; then
        [[ -r $configfile ]] && config+=("$configfile");
    else
        for i in /etc/ssh/ssh_config ~/.ssh/config ~/.ssh2/config;
        do
            [[ -r $i ]] && config+=("$i");
        done;
    fi;
    if [[ ${#config[@]} -gt 0 ]]; then
        local OIFS=$IFS IFS='
' j;
        local -a tmpkh;
        tmpkh=($( awk 'sub("^[ \t]*([Gg][Ll][Oo][Bb][Aa][Ll]|[Uu][Ss][Ee][Rr])[Kk][Nn][Oo][Ww][Nn][Hh][Oo][Ss][Tt][Ss][Ff][Ii][Ll][Ee][ \t]+", "") { print $0 }' "${config[@]}" | sort -u ));
        IFS=$OIFS;
        for i in "${tmpkh[@]}";
        do
            while [[ $i =~ ^([^\"]*)\"([^\"]*)\"(.*)$ ]]; do
                i=${BASH_REMATCH[1]}${BASH_REMATCH[3]};
                j=${BASH_REMATCH[2]};
                __expand_tilde_by_ref j;
                [[ -r $j ]] && kh+=("$j");
            done;
            for j in $i;
            do
                __expand_tilde_by_ref j;
                [[ -r $j ]] && kh+=("$j");
            done;
        done;
    fi;
    if [[ -z $configfile ]]; then
        for i in /etc/ssh/ssh_known_hosts /etc/ssh/ssh_known_hosts2 /etc/known_hosts /etc/known_hosts2 ~/.ssh/known_hosts ~/.ssh/known_hosts2;
        do
            [[ -r $i ]] && kh+=("$i");
        done;
        for i in /etc/ssh2/knownhosts ~/.ssh2/hostkeys;
        do
            [[ -d $i ]] && khd+=("$i"/*pub);
        done;
    fi;
    if [[ ${#kh[@]} -gt 0 || ${#khd[@]} -gt 0 ]]; then
        awkcur=${cur//\//\\\/};
        awkcur=${awkcur//\./\\\.};
        curd=$awkcur;
        if [[ "$awkcur" == [0-9]*[.:]* ]]; then
            awkcur="^$awkcur[.:]*";
        else
            if [[ "$awkcur" == [0-9]* ]]; then
                awkcur="^$awkcur.*[.:]";
            else
                if [[ -z $awkcur ]]; then
                    awkcur="[a-z.:]";
                else
                    awkcur="^$awkcur";
                fi;
            fi;
        fi;
        if [[ ${#kh[@]} -gt 0 ]]; then
            COMPREPLY+=($( awk 'BEGIN {FS=","}
            /^\s*[^|\#]/ {
            sub("^@[^ ]+ +", ""); \
            sub(" .*$", ""); \
            for (i=1; i<=NF; ++i) { \
            sub("^\\[", "", $i); sub("\\](:[0-9]+)?$", "", $i); \
            if ($i !~ /[*?]/ && $i ~ /'"$awkcur"'/) {print $i} \
            }}' "${kh[@]}" 2>/dev/null ));
        fi;
        if [[ ${#khd[@]} -gt 0 ]]; then
            for i in "${khd[@]}";
            do
                if [[ "$i" == *key_22_$curd*.pub && -r "$i" ]]; then
                    host=${i/#*key_22_/};
                    host=${host/%.pub/};
                    COMPREPLY+=($host);
                fi;
            done;
        fi;
        for ((i=0; i < ${#COMPREPLY[@]}; i++ ))
        do
            COMPREPLY[i]=$prefix$user${COMPREPLY[i]}$suffix;
        done;
    fi;
    if [[ ${#config[@]} -gt 0 && -n "$aliases" ]]; then
        local hosts=$( sed -ne 's/^['"$'\t '"']*[Hh][Oo][Ss][Tt]\([Nn][Aa][Mm][Ee]\)\{0,1\}['"$'\t '"']\{1,\}\([^#*?%]*\)\(#.*\)\{0,1\}$/\2/p' "${config[@]}" );
        COMPREPLY+=($( compgen -P "$prefix$user"             -S "$suffix" -W "$hosts" -- "$cur" ));
    fi;
    if [[ -n ${COMP_KNOWN_HOSTS_WITH_AVAHI:-} ]] && type avahi-browse &> /dev/null; then
        COMPREPLY+=($( compgen -P "$prefix$user" -S "$suffix" -W             "$( avahi-browse -cpr _workstation._tcp 2>/dev/null |                  awk -F';' '/^=/ { print $7 }' | sort -u )" -- "$cur" ));
    fi;
    COMPREPLY+=($( compgen -W         "$( ruptime 2>/dev/null | awk '!/^ruptime:/ { print $1 }' )"         -- "$cur" ));
    if [[ -n ${COMP_KNOWN_HOSTS_WITH_HOSTFILE-1} ]]; then
        COMPREPLY+=($( compgen -A hostname -P "$prefix$user" -S "$suffix" -- "$cur" ));
    fi;
    __ltrim_colon_completions "$prefix$user$cur";
    return 0
}

Huh? It turns out that when the functions are loaded into bash they're stripped of comments, so the declare -f output makes it look a bit more cryptic than it really is. If you look at the bash-completion source code, you'll see that the original is actually well commented, and that the other weird functions showing up in my shell (including a bunch of non-underscore-prefixed functions like quote and dequote) come from this package. It's still fun to look at. You know you're in for a treat whenever you see IFS being redefined. You can see a lot of fun things like hacks for the lack of case-insensitive regular expressions in awk and sed, and the rather remarkable fact that avahi can be used for host discovery from bash. From this code I also learned about the existence of bash regular expressions.

It's also interesting to see just how many commands commands have completion, and how much code has gone into all of this:

$ complete -p | wc -l
314

$ find /usr/share/bash-completion -type f | wc -l
543

$ find /usr/share/bash-completion -type f | xargs wc -l
...
40545 total

Wow. That's a lot of code. Who wrote it all? Does it all work? Is everything quoted properly? Are there security bugs?

There's even amusing things in here like completion code for cd. That's right, the shell builtin cd has custom completion code. You might think that to complete cd you just have to look at the directories in the current working directory... not so. The builtin cd has two important parameters that can affect its operation, the CDPATH variable (which defines paths other than the current working directory that should be searched), as well as the shopt option called cdable_vars which allows you to define variables that can be treated as targets to cd. The bash-completion package has code that handles both of these cases.

A lot has already been written about how Unix systems, which used to be quite simple and easy to understand, have turned into these incredibly baroque and complicated systems. This is a complaint that people particularly levy at Linux and GNU utilities, which are perceived to be particularly complex compared to systems with a BSD lineage. After finding things like this, it's a hard to disagree. At the same time, this complexity is there for a reason, and I'm not sure I'd want to give any of it up.