Generator Comprehensions / Using any() and all() in Python

February 27, 2016

Way back in 2006 (can you believe it's been almost ten years?) Python 2.5 added two new awesome builtin language functions: any() and all(). Both of these take an iterable with an optional conditional using the standard list/generator syntax. The any() case returns true if any item in the iterable is true-ish and all() returns true if all items in the iterable are true-ish.

Here is a mistake I see over and over and over again (code example is obviously contrived here):

# creates a list and evaluates every user
if any([user.is_cool() for user in users if is_prime(user.id)]):
    unleash_awesome_machine()

This code is correct, in the sense that it works. But:

it's overly verbose
it uses more cpu than necessary
it uses more memory than necessary
it causes an unnecessary GC

The correct way to write this code is only very slightly different:

# creates a list and does not evaluate every user
if any(user.is_cool() for user in users if is_prime(user.id)):
    unleash_awesome_machine()

See, the only thing I changed is I removed those two square brackets. That's it.

The difference between the two is that the first uses the list comprehension syntax. This creates an actual Python list object which is passed to the any() function. Since this is a list comprehension every user in the users variable will be evaluated.

The "correct" way I've shown uses a generator comprehension instead. This creates a lazy generator object that evaluates from users one user a time. The function any() in that case can exit upon the first user detected that meets our criteria.

You can also create explicit generator objects in Python using the generator comprehension syntax. The syntax looks like this:

x = (user.is_cool() for user in users if is_prime(user.id))

This creates a generator object. You can't call len() on it. Instead it has some weird methods like .next() defined. If you really want to know the details of how it works read the official Python wiki which explains things in plain English and also refers to the original PEP (which unfortunately does not seem to cover the generator comprehension syntax).

I'll give an extremely easy to follow, simple example.

Consider the following python shell session:

>>> x = [1, 2, 3]
>>> type(x)
<type 'list'>
>>> len(x)
3
>>> x[1]
2

Its just a list. Easy as pie. Here's the generator equivalent:

>>> x = (x for x in [1, 2, 3])
>>> type(x)
<type 'generator'>
>>> len(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: object of type 'generator' has no len()
>>> x.next()
1
>>> x.next()
2
>>> x.next()
3
>>> x.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration>>> x = (x for x in [1, 2, 3])

Clearly something much weirder is going on. Generators are lazily evaluated. Now this case is dumb, because I'm lazily evaluating a non-lazy object (a list) to prove a point.

Here's a more non-obvious and deleterious example (although it's still contrived):

import re

CAPITAL_AT_START = re.compile('^[A-Z]')

# reads the ENTIRE FILE, creates a HUGE list object, and then has to GC it
def any_capitalized_1(filename):
    return any(line.match(CAPITAL_AT_START) for line in open(filename).readlines())


# lazily reads the file, creates less GC pressure
def any_capitalized_2(filename):
    return any(line.match(CAPITAL_AT_START) for line in open(filename))

This example is a little bit more subtle because both are actually using generator comprehensions, but the first is using a generator comprehension over an eagerly evaluated list object, whereas the second is using a generator comprehension over a lazily evaluated generator object.

Here's my real take away. Any time you see any([...]) or all([...]) you should be very suspicious that the square brackets are necessary. You can almost always remove them and get code that runs faster, uses less memory, and saves a whole two bytes of disk space!