Counting Overlapping Elements in Python

Wed 16 April 2014

Say you have a list of lists, like:

matrix = [[1, 2, 3], [4, 5, 6], [1, 2, 6]]

You want to know how many times pairs of numbers occur in the same list, or overlap. In the example above, you'd have:

  • 1 & 2: paired twice
  • 1 & 3: paired once
  • 2 & 3: paired once
  • 4 & 5: paired once
  • 4 & 6: paired once
  • 5 & 6: paired once
  • 1 & 6: paired once
  • 2 & 6: paired once

You'd like some code that tells you that there's one pairing that occur twice and seven pairings that occur once. Stack Overflow will tell you how to do it:

from collections import Counter
from itertools import chain, combinations
Counter(chain.from_iterable(combinations(row, 2) for row in matrix))

That works, but it's completely confusing. I'm going to break down what it's doing, beginning with combinations(row, 2).

Producing combinations

combinations() produces an iterator containing the pairs in an input iterable (e.g. a list). The code below uses combinations to generate all the pairs in the matrix's first row. Each tuple represents one pairing; for example, (1, 2) indicates that 1 and 2 were in the same row one time.

>>> x = combinations(matrix[0], 2)

>>> print x
<itertools.combinations object at 0x10d1719f0>

>>> print list(x)
[(1, 2), (1, 3), (2, 3)]

To find the combinations in all of the rows in the matrix, we apply combinations() to each row through a generator comprehension. A generator comprehension has the same syntax as a list comprehension but produces a generator instead of a list.

>>> combos = (combinations(row, 2) for row in matrix)

>>> print combos
<generator object <genexpr> at 0x10d168f00>

>>> print [list(x) for x in combos]
[[(1, 2), (1, 3), (2, 3)], [(4, 5), (4, 6), (5, 6)], [(1, 2), (1, 6), (2, 6)]]

Chaining the combinations

Our end goal is to count how many there are of each tuple; we should end up with one that occurs twice ((1, 2)) and seven that occur once. By inspecting the output above, we can see that this is the case, but Python can't because the tuples aren't all in the same combinations object. Enter chain.from_iterable(), a function that will combine each of the combinations objects in combos into one giant iterator:

>>> chained_combos = chain.from_iterable(combos)
>>> list(chained_combos)
[(1, 2), (1, 3), (2, 3), (4, 5), (4, 6), (5, 6), (1, 2), (1, 6), (2, 6)]

Counting the pairs in the chain

Now we can create a Counter object to tally the frequency of each pairing. A Counter is a type of dictionary used for creating tallies: the keys are the items you're counting, and the values are the tallies.

>>> counts = Counter(chained_combos)
>>> print counts
Counter({(1, 2): 2, (1, 3): 1, (4, 6): 1, (4, 5): 1, (5, 6): 1, (2, 6): 1, (2, 3): 1, (1, 6): 1})
>>> print counts.values()
[2, 1, 1, 1, 1, 1, 1, 1]

Tallying the overall frequencies

Finally, to count the overall frequencies, we use a second counter:

>>> overall_counts = Counter(counts.values())
>>> print overall_counts
Counter({1: 7, 2: 1})

Ta da! 7 pairs together once, 1 pair together twice.

This can be easily extended from pairs to trios, quartets, and larger groups by replacing combinations(row, 2) with combinations(row, <your-size>).

Finally, an important note: this will only work if the values inside each tuple are always listed in the same order. (1, 2) is not the same as (2, 1).


Category: misc.