I am an avid Emacs user and helm and
projectile are big parts of my
workflow, which I combine using
helm-projectile. This enables helm
file completion using projectile as the backend.[^1] When I'm in a project and
want to open a new file I type SPC p f
to invoke helm-projectile-find-file
.
This opens a fuzzy matching helm dialog that navigates the files in my current
project.
A few months ago I started a new job where I'm working in a very large git
repository (containing well over 100k checked in files). For a project this size
helm-projectile-find-file
with fuzzy matching is very slow. I'd hit SPC p f
to find a file, and it would take 3--5 seconds to update the helm dialog
after every key stroke. If I was typing a file name like logging.h
it would
take helm several seconds to refresh after I typed the l
, several more to
refresh after typing o
, and so on. I would end up typing most of the file name
blindly. Worse, there seems to be a race condition in helm in this situation,
where sometimes the helm dialog will get out of sync and stop updating or
responding to keystrokes if too many keypresses are typed blindly this way.
A Red Herring: Projectile Caching
In my initial investigations to fix this the only thing I was able to find
was the projectile-enable-caching
variable, which can be set to t
to enable
projectile caching:
;; XXX: Don't actually do this!
(setq projectile-enable-caching t)
The idea is that normally projectile generates the candidate list of files for
helm-projectile-find-file
by invoking git ls-files
every time it is invoked.
This is done using git instead of just searching the filesystem to make
helm-projectile-find-file
only get a list of files that are actually checked
in, so git ignored files won't appear in the helm query. Setting
projectile-enable-caching
will cause the git ls-files
list to only be
generated once and then cached, avoiding the need to invoke git every
time.
Unfortunately this didn't speed anything up. Helm felt just as slow as
before. I confirmed this by running git ls-files
a few times and found it only
took 0.40s wall time, so it only accounted for an insignificant fraction of the
time I was spending waiting. This was also confirmed by looking at top
while I would search for a file: the Emacs process was spinning at 100% CPU (in a single thread, of course), indicating that the time was spent by some CPU-bound operation in Emacs rather than waiting for the results of the git command.
One last note: if you turn on projectile-enable-caching
the projectile file
cache will get out of sync as the repository changes. This means new files won't
appear in your queries, and old or moved files will appear even though they're
not actually present. If you use this option you'll have to periodically
manually refresh the projectile cache when you notice things are out of sync. I
would recommend only using this option as a last resort, if you've confirmed
that your VCS is extremely slow.
The Solution: Exact Matching Helm Queries
Recently a fellow helm and projectile user at work found a solution to this
problem! It turns out that there's a simple way to disable helm fuzzy matching:
you simply precede the helm query with a space. For example, let's say I know
the file I want to open is named logging.h
(but I don't necessary know or want
to type out the directory it's in). Instead of literally just typing logging.h
into the helm dialog, I would enter a leading space character before typing out
the filename. I call this mode "exact matching".
The difference between exact matching and the default fuzzy matching is that
exact matching will only match file names that contain the exact string you type
as a substring, whereas fuzzy matching will match a broader selection. For
example, consider the query foo
. In fuzzy matching the string foo
is
implicitly converted to a regex like /f.*o.*o/
. This will match any filename
that has an f
character followed by an o
anywhere else in the string
followed again by another o
anywhere else. In exact matching mode the same
query is converted to the fixed regex /foo/
, i.e. a regex that literally
matches any string containing the substring foo
.
As a concrete example: foo
will match a file named src/files/blog.socket
in
fuzzy matching mode, but not in exact matching mode. Both modes will match a
file named src/files/foo.socket
. In general, fuzzy matching results are always
a superset of exact matching results.
Using exact matching might sound less convenient than fuzzing matching, but I
found often it works just as well (and sometimes better). It's very common to
know an exact substring of a filename, but not to know what directory it's in,
or some leading or trailing component of the filename. Exact matching works
great in this situation. In the example I gave earlier, in a huge project I
might know there is a header named logging.h
somewhere in the project, I'm
just not sure the exact subdirectory it's in, and exact matching works perfectly
for this. In fact, since I learned this trick I use it quite often on smaller
projects just because it gives better matching results in some cases,
particularly when typing a very short string.
[^1]: I actually use Spacemacs, which is a wonderful Emacs distribution that has helm and projectile set and integrated out of the box.