Way back in 2006 (can you believe it's been almost ten years?) Python 2.5 added
two new awesome builtin language functions:
all(). Both of these
take an iterable with an optional conditional using the standard list/generator
any() case returns true if any item in the iterable is true-ish
all() returns true if all items in the iterable are true-ish.
Here is a mistake I see over and over and over again (code example is obviously contrived here):
# creates a list and evaluates every user if any([user.is_cool() for user in users if is_prime(user.id)]): unleash_awesome_machine()
This code is correct, in the sense that it works. But:
- it's overly verbose
- it uses more cpu than necessary
- it uses more memory than necessary
- it causes an unnecessary GC
The correct way to write this code is only very slightly different:
# creates a list and does not evaluate every user if any(user.is_cool() for user in users if is_prime(user.id)): unleash_awesome_machine()
See, the only thing I changed is I removed those two square brackets. That's it.
The difference between the two is that the first uses the list comprehension
syntax. This creates an actual Python list object which is passed to the
function. Since this is a list comprehension every user in the
will be evaluated.
The "correct" way I've shown uses a generator comprehension instead. This
creates a lazy generator object that evaluates from
users one user a time. The
any() in that case can exit upon the first user detected that meets
You can also create explicit generator objects in Python using the generator comprehension syntax. The syntax looks like this:
x = (user.is_cool() for user in users if is_prime(user.id))
This creates a generator object. You can't call
len() on it. Instead it has
some weird methods like
.next() defined. If you really want to know the
details of how it works read the
official Python wiki which explains
things in plain English and also refers to the original PEP (which unfortunately
does not seem to cover the generator comprehension syntax).
I'll give an extremely easy to follow, simple example.
Consider the following python shell session:
>>> x = [1, 2, 3] >>> type(x) <type 'list'> >>> len(x) 3 >>> x 2
Its just a list. Easy as pie. Here's the generator equivalent:
>>> x = (x for x in [1, 2, 3]) >>> type(x) <type 'generator'> >>> len(x) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: object of type 'generator' has no len() >>> x.next() 1 >>> x.next() 2 >>> x.next() 3 >>> x.next() Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration>>> x = (x for x in [1, 2, 3])
Clearly something much weirder is going on. Generators are lazily evaluated. Now this case is dumb, because I'm lazily evaluating a non-lazy object (a list) to prove a point.
Here's a more non-obvious and deleterious example (although it's still contrived):
import re CAPITAL_AT_START = re.compile('^[A-Z]') # reads the ENTIRE FILE, creates a HUGE list object, and then has to GC it def any_capitalized_1(filename): return any(line.match(CAPITAL_AT_START) for line in open(filename).readlines()) # lazily reads the file, creates less GC pressure def any_capitalized_2(filename): return any(line.match(CAPITAL_AT_START) for line in open(filename))
This example is a little bit more subtle because both are actually using generator comprehensions, but the first is using a generator comprehension over an eagerly evaluated list object, whereas the second is using a generator comprehension over a lazily evaluated generator object.
Here's my real take away. Any time you see
should be very suspicious that the square brackets are necessary. You can almost
always remove them and get code that runs faster, uses less memory, and saves a
whole two bytes of disk space!