I am trying to produce a Simple report with children. I have used a list:
child_list = []
for child in sa.children(person):
if child not in child_list:
child_list.append(child)
# then later something like
for child in sa.children(family):
if child not in child_list:
child_list.append(child)
This does not produce duplicate children in the child_list
However, if I use a set:
children = set(sdb.children(person))
if additional_parents:
for spouse in additional_parents:
for family in sdb.parent_in(spouse):
children.update(sdb.children(family))
It does produce duplicates.
I can’t find anywhere that discusses this, but is it because Person defines eq
def __eq__(self, other):
return isinstance(other, Person) and self.handle == other.handle
So is update set testing the hash first (see test for equality), and “in list” using the defined equality operation.
I am more interested in understanding what is going on than in getting a fix, but if this is correct, should Person define __hash__ compatibly with __eq__?
# Create a class:
class Obj():
def __hash__(self):
print("hash!")
return 42
def __eq__(self, other):
print("eq!")
return True
# Create an instance:
o = Obj()
# Create a set:
s = set([o, o])
o in s
# prints "hash!"
# Create a list:
l = list[o, o, o]
o in l
# prints "eq!"
You can also look up the details in the documentation, or even ask a chatbot. It looks like it is a bit more complicated for inserts of larger collections of sets:
In summary:
* object in list: Primarily uses == for comparison, iterating through elements.
* object in set: Primarily uses hash() for quick lookup, and then == for collision resolution.
You are correct that if Gramps properly implements both __hash__ and __eq__ then we should cover all bases.
If a class defines mutable objects and implements an __eq__() method, it should not implement __hash__() , since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).
Person is mutable, so __eq__ should be defined (which it is), and __hash__ should not (it isn’t).
The documentation says:
For container types such as list, tuple, set, frozenset, dict, or collections.deque, the expression x in y is equivalent to any(x is e or x == e for e in y) .
It doesn’t say that object in set can use hash() for quick lookup, which it evidently is doing. I think it should say so (if that is indeed what it is doing).