Does set membership use a different test than list membership

I am trying to produce a Simple report with children. I have used a list:

    child_list = []
    for child in sa.children(person):
        if child not in child_list:
            child_list.append(child)

# then later something like

                for child in sa.children(family):
                    if child not in child_list:
                        child_list.append(child)

This does not produce duplicate children in the child_list

However, if I use a set:

    children = set(sdb.children(person))

    if additional_parents:
        for spouse in additional_parents:
            for family in sdb.parent_in(spouse):
                children.update(sdb.children(family))

It does produce duplicates.

I can’t find anywhere that discusses this, but is it because Person defines eq

    def __eq__(self, other):
        return isinstance(other, Person) and self.handle == other.handle

So is update set testing the hash first (see test for equality), and “in list” using the defined equality operation.

I am more interested in understanding what is going on than in getting a fix, but if this is correct, should Person define __hash__ compatibly with __eq__?

Yes it looks like “item in set()” uses hash, and “item in list()” uses equals.

Personally, I would never create a set of objects, but use a set of handles instead.

1 Like

Well, no @dsblank, you can’t in SimpleAccess, because SimpleAccess doesn’t deal with handles.

I was really looking for an authoritative explanation, rather that “looks like” which was just what I guessed.

It is definitive and authoritative.

Can you please provide a source for this definitive and authoritative statement?

You can test this yourself:

# Create a class:
class Obj():
    def __hash__(self):
        print("hash!")
        return 42
    def __eq__(self, other):
        print("eq!")
        return True
# Create an instance:
o = Obj()

# Create a set:
s = set([o, o])
o in s
# prints "hash!"

# Create a list: 
l = list[o, o, o]
o in l
# prints "eq!"

You can also look up the details in the documentation, or even ask a chatbot. It looks like it is a bit more complicated for inserts of larger collections of sets:

In summary:

* object in list: Primarily uses == for comparison, iterating through elements.

* object in set: Primarily uses hash() for quick lookup, and then == for collision resolution.

You are correct that if Gramps properly implements both __hash__ and __eq__ then we should cover all bases.

I don’t think that is correct.

If a class defines mutable objects and implements an __eq__() method, it should not implement __hash__() , since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).

Person is mutable, so __eq__ should be defined (which it is), and __hash__ should not (it isn’t).

The documentation says:

For container types such as list, tuple, set, frozenset, dict, or collections.deque, the expression x in y is equivalent to any(x is e or x == e for e in y) .

It doesn’t say that object in set can use hash() for quick lookup, which it evidently is doing. I think it should say so (if that is indeed what it is doing).

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.