Saturday, 7 May 2016

python - Comparing three (or more) dictionaries and finding a match if at least two are equal



I'm faced with an issue similar to this one. However, that SO question is strictly focused on three variables. I am looking for a solution that would work for more than three as well.



Here's my code for two variables:



for track_a in collection_a:

for track_b in collection_b:

t1 = track_a["tempo"]
t2 = track_b["tempo"]
k1 = track_a["key"]
k2 = track_b["key"]
m1 = track_a["mode"]
m2 = track_b["mode"]

if (t1 == t2) and (k1 == k2) and (m1 == m2):

collection_c.append((track_a, track_b))


Here's my solution for three variables:



for track_a in collection_a:
for track_b in collection_b:
for track_c in collection_c:

t1 = track_a["tempo"]

t2 = track_b["tempo"]
t3 = track_c["tempo"]
k1 = track_a["key"]
k2 = track_b["key"]
k3 = track_c["key"]
m1 = track_a["mode"]
m2 = track_b["mode"]
m3 = track_c["mode"]

a = (t1 == t2) and (k1 == k2) and (m1 == m2)

b = (t2 == t3) and (k2 == k3) and (m2 == m3)
c = (t3 == t1) and (k3 == k1) and (m3 == m1)

if a: collection_c.append((track_a, track_b))
if b: collection_c.append((track_b, track_c))
if c: collection_c.append((track_c, track_a))


Obviously, this solution is not scalable and slow. Considering the fact I'd have to check all of them, I doubt it will ever be fast since we have to iterate over all possible combinations, but could I at least make it scale? (Up to at least 5). Also, if possible, allow more comparison characteristics to be added later.


Answer




An efficient approach that solves the issue in linear time is to convert the dicts to frozen sets of key-value tuples (over keys that are used for equality tests) so that they can be hashable and used as dict keys (signatures) themselves, and so that you can simply use a dict of sets to group them:



groups = {}
for track in collections: # collections is a combination of all the collections you have
groups.setdefault(frozenset((k, track[k]) for k in ('tempo', 'key', 'mode')), set()).add(track['name'])


so that:



[group for group in groups.values() if len(group) >= 3]



will return you a list of sets of names of the 3 tracks whose signatures are identical.


No comments:

Post a Comment

c++ - Does curly brackets matter for empty constructor?

Those brackets declare an empty, inline constructor. In that case, with them, the constructor does exist, it merely does nothing more than t...