Maintaining Sorted Lists In Python

Sometimes you want to write some code that maintains a running sorted list of items. For example, let's say you have some code that services requests and maintains running median or percentiles of response times. You might imagine some Python code like this:

import time

class Service:

    def __init__(self):
        self.latencies = []

    def work(self, request):
        """Do some work and time it."""
        start = time.time()
        # work implemented here...

        # record latency and maintain sortedness of latencies
        self.record_latency(start)

    def record_latency(self, start):
        """Record elapsed latency and keep self.latencies sorted."""
        elapsed = time.time() - start
        self.latencies.append(elapsed)
        self.latencies.sort()

    def top_n_slowest(self, n):
        """Return the n slowest times."""
        return self.latencies[-n:]

    def median(self):
        """Return the median latency."""
        ix = len(self.latencies) // 2
        if len(self.latencies) % 2:
            # odd case
            return self.latencies[ix]
        # even case
        a = self.latencies[ix - 1]
        b = self.latencies[ix]
        return (a + b) / 2

The code shown here works just fine, but the way record_latency() maintains the sorted latency list is kind of weird. After every request, the code appends a new timing latency and then resorts the list. This is wasteful: the old list is already sorted, so it shouldn't really be necessary to resort the new list from scratch. It would be more efficient the sorted list could be maintained using the insertion step from insertion sort, where we binary search the array to find the right insertion point, and then insert the item directly at that location.

Today I was reintroduced to the Python bisect module, and noticed that it has a function that does just this. The method is called bisect.insort. Here's how you use it:

import bisect

class Service:

    def record_latency(self, start):
        """Record elapsed latency and keep self.latencies sorted."""
        elapsed = time.time() - start
        bisect.insort(self.latencies, elapsed)

    # other methods as before

This isn't the only useful thing in the module: you can also use bisect.bisect to do binary searches and range queries on sorted lists. Check out the documentation examples for more examples.