Sometimes you want to write some code that maintains a running sorted list of items. For example, let's say you have some code that services requests and maintains running median or percentiles of response times. You might imagine some Python code like this:
import time
class Service:
def __init__(self):
self.latencies = []
def work(self, request):
"""Do some work and time it."""
start = time.time()
# work implemented here...
# record latency and maintain sortedness of latencies
self.record_latency(start)
def record_latency(self, start):
"""Record elapsed latency and keep self.latencies sorted."""
elapsed = time.time() - start
self.latencies.append(elapsed)
self.latencies.sort()
def top_n_slowest(self, n):
"""Return the n slowest times."""
return self.latencies[-n:]
def median(self):
"""Return the median latency."""
ix = len(self.latencies) // 2
if len(self.latencies) % 2:
# odd case
return self.latencies[ix]
# even case
a = self.latencies[ix - 1]
b = self.latencies[ix]
return (a + b) / 2
The code shown here works just fine, but the way record_latency()
maintains
the sorted latency list is kind of weird. After every request, the code appends
a new timing latency and then resorts the list. This is wasteful: the old list
is already sorted, so it shouldn't really be necessary to resort the new list
from scratch. It would be more efficient the sorted list could be maintained
using the insertion step from insertion
sort, where we binary search the
array to find the right insertion point, and then insert the item directly at
that location.
Today I was reintroduced to the Python bisect module, and noticed that it has a function that does just this. The method is called bisect.insort. Here's how you use it:
import bisect
class Service:
def record_latency(self, start):
"""Record elapsed latency and keep self.latencies sorted."""
elapsed = time.time() - start
bisect.insort(self.latencies, elapsed)
# other methods as before
This isn't the only useful thing in the module: you can also use bisect.bisect to do binary searches and range queries on sorted lists. Check out the documentation examples for more examples.