Gramps Proxy Refactor Plan
Problem
Gramps supports “proxy” databases that wrap a real database (or another proxy) and
filter out certain items. For example:
PrivateProxyDb hides objects marked private
LivingProxyDb hides living people
FilterProxyDb hides objects not matching a filter
The current implementation has a fundamental API flaw: when you request a filtered
object by handle, the proxy returns None. This means every caller must defensively
check:
person = db.get_person_from_handle(handle)
if person is not None:
...
Worse, a caller may obtain a handle from a cross-reference list (e.g., iterating
family.child_ref_list) and then look it up only to receive None, because the
reference list was never cleaned of filtered handles. This is the root cause: the
proxy filters at lookup time but not at reference-list return time, so filtered
handles leak into the caller’s hands.
Solution Overview
Two cleanly separated layers of filtering:
-
include_*(handle) methods — binary gate: should this entire object be
visible? Used to raise HandleNotFoundError on direct filtered lookups, and to
strip filtered handles from cross-reference lists in returned objects.
-
sanitize_*(data) methods — attribute-level cleanup on a DataDict: given a
visible object’s raw data, strip sub-attributes that should not be shown (e.g.,
private notes on a public person, private citations, private associations).
Together these guarantee: a caller who receives an object from the proxy will never
hold a handle that resolves to None or raises unexpectedly. Every handle in every
reference list has already been validated as included.
Working Medium: DataDict instead of reconstructed objects
The key implementation insight is to work on DataDicts (returned by
get_raw_*_data()) rather than reconstructed full objects (like Person()).
The underlying real database’s get_raw_person_data(handle) returns a DataDict —
a dict subclass that supports direct attribute access (.family_list,
.child_ref_list, .father_handle, etc.) and is much cheaper to create than a
full Person object.
Revised call chain
Instead of the current chain where get_raw_person_data calls
get_person_from_handle:
(current) get_raw_person_data → get_person_from_handle → object_to_data
The proxy reverses this: the core filtering and sanitization happen on the DataDict,
and get_person_from_handle is just a thin wrapper:
(new) get_raw_person_data(handle):
check include_person(handle) → raise if filtered
data = self.db.get_raw_person_data(handle) # DataDict from inner db
strip filtered cross-refs directly on data
sanitize_person(data) # strip private sub-attrs
return data
get_person_from_handle(handle):
return self.get_raw_person_data(handle) # DataDict is the object
This means proxies override get_raw_*_data() as the primary method. No full object
reconstruction is ever needed. sanitize_* methods are simple list-filter operations
on the DataDict.
Example: get_raw_person_data in ProxyDbBase
def get_raw_person_data(self, handle):
if not self.include_person(handle):
raise HandleNotFoundError(handle)
data = self.db.get_raw_person_data(handle)
# Layer 1: strip cross-refs to filtered objects using include_* predicates
data.family_list = [
h for h in data.family_list if self.include_family(h)
]
data.parent_family_list = [
h for h in data.parent_family_list if self.include_family(h)
]
data.event_ref_list = [
ref for ref in data.event_ref_list if self.include_event(ref.ref)
]
data.person_ref_list = [
ref for ref in data.person_ref_list if self.include_person(ref.ref)
]
data.note_list = [h for h in data.note_list if self.include_note(h)]
data.citation_list = [h for h in data.citation_list if self.include_citation(h)]
data.media_list = [
ref for ref in data.media_list if self.include_media(ref.ref)
]
# Layer 2: strip private sub-attributes (no-op in base class)
return self.sanitize_person(data)
Example: get_raw_family_data in ProxyDbBase
def get_raw_family_data(self, handle):
if not self.include_family(handle):
raise HandleNotFoundError(handle)
data = self.db.get_raw_family_data(handle)
if not self.include_person(data.father_handle):
data.father_handle = None
if not self.include_person(data.mother_handle):
data.mother_handle = None
data.child_ref_list = [
ref for ref in data.child_ref_list if self.include_person(ref.ref)
]
data.event_ref_list = [
ref for ref in data.event_ref_list if self.include_event(ref.ref)
]
data.note_list = [h for h in data.note_list if self.include_note(h)]
data.citation_list = [h for h in data.citation_list if self.include_citation(h)]
data.media_list = [
ref for ref in data.media_list if self.include_media(ref.ref)
]
return self.sanitize_family(data)
Direct attribute access (.family_list, .child_ref_list, .father_handle) is
used throughout, compatible with DataDict objects.
include_* — base class defaults
The base ProxyDbBase defaults to including everything (passthrough):
def include_person(self, handle): return handle is not None
def include_family(self, handle): return handle is not None
def include_event(self, handle): return handle is not None
def include_note(self, handle): return handle is not None
def include_citation(self, handle): return handle is not None
def include_source(self, handle): return handle is not None
def include_media(self, handle): return handle is not None
def include_place(self, handle): return handle is not None
def include_tag(self, handle): return handle is not None
def include_repository(self, handle): return handle is not None
Subclasses override only the types they filter.
sanitize_* — base class defaults
Base class sanitize methods are no-ops on the DataDict, returning it unchanged:
def sanitize_person(self, data): return data
def sanitize_family(self, data): return data
# ... etc.
Subclasses override these for attribute-level filtering. Because the working medium
is a DataDict, no object reconstruction is needed — just filter list fields:
# Example: PrivateProxyDb.sanitize_person
def sanitize_person(self, data):
# Strip private attributes from the person's own attribute list
data.attribute_list = [
a for a in data.attribute_list if not a.private
]
# Strip private names
data.alternate_names = [
n for n in data.alternate_names if not n.private
]
# Strip private addresses
data.address_list = [
a for a in data.address_list if not a.private
]
# Strip private LDS ordinances
data.lds_ord_list = [
o for o in data.lds_ord_list if not o.private
]
return data
This replaces the existing large sanitize_person(db, person) standalone function
in private.py which manually reconstructed a new Person() object — that
reconstruction is no longer needed.
Performance: Precomputed Handle Sets
The include_*(handle) predicate is called many times during reference list cleanup.
It must be fast — O(1) set membership.
Database interface additions
Two new methods on the database interface:
-
db.is_filter_override(table, filter_name) — returns True if the database has
a native SQL implementation for this named filter on this table. Collapses into a
single call the questions of: does the DB support SQL, and does it have SQL for
this specific filter? Returns False for proxy databases, non-SQL backends, or
filters that cannot be expressed in SQL.
-
db.apply_filter(table, filter_name) — runs the filter natively and returns a
list of matching handles. Convenience wrapper around filter.apply(db) so callers
don’t need to hold a reference to the filter object. Only call when
is_filter_override returns True.
Two paths for building the included-handle set
Path 1 — Native SQL override (fast, eager)
When db.is_filter_override('person', filter_name) is True:
def _build_included_persons(self):
if self.db.is_filter_override('person', self.filter_name):
self._included_persons = set(self.db.apply_filter('person', self.filter_name))
else:
self._included_persons = {
h for h in self.db.iter_person_handles()
if self._check_person(h)
}
include_person(handle) is then always O(1):
def include_person(self, handle):
return handle in self._included_persons
Path 2 — Python-only predicates (lazy)
For expensive predicates like probably_alive(), build the set lazily on first use:
@property
def _included_persons(self):
if self.__included_persons is None:
self.__included_persons = {
h for h in self.db.iter_person_handles()
if not self._is_living(h)
}
return self.__included_persons
self.db.iter_person_handles() composes naturally with inner proxies — only handles
already passing the inner proxy’s filter are iterated.
Summary by proxy
| Proxy |
Filter |
is_filter_override? |
Set-build strategy |
PrivateProxyDb |
'private' |
Yes (if wrapping real SQL DB) |
Eager via apply_filter |
FilterProxyDb |
user-defined |
Yes (if SQL-expressible and real DB) |
Eager via apply_filter |
LivingProxyDb |
'living' |
No (Python-only) |
Lazy via iter_person_handles |
ProxyDbBase |
n/a |
n/a |
handle is not None — no set needed |
Cross-Reference Map
Every get_raw_*_data() method strips filtered handles from the DataDict using
include_* predicates:
| Object |
Cross-references to strip |
Person |
family_list, parent_family_list, event_ref_list[*].ref, person_ref_list[*].ref, media_list[*].ref, citation_list, note_list |
Family |
father_handle, mother_handle, child_ref_list[*].ref, event_ref_list[*].ref, media_list[*].ref, citation_list, note_list |
Event |
place_handle, media_list[*].ref, citation_list, note_list |
Citation |
source_handle, media_list[*].ref, note_list |
Source |
reporef_list[*].ref, media_list[*].ref, note_list |
Place |
placeref_list[*].ref, media_list[*].ref, citation_list, note_list |
Media |
citation_list, note_list |
Repository |
note_list |
Subclass Responsibilities After Refactor
PrivateProxyDb
- Override
include_*: build precomputed sets via apply_filter (SQL) or iteration
(fallback). Each set contains handles of non-private objects.
- Override
sanitize_*: strip private sub-attributes (private attribute_list items,
private alternate_names, private addresses, private LDS ordinances) directly on
the DataDict. No object reconstruction needed.
- Remove existing standalone
sanitize_person(db, person) etc. functions —
replaced by the instance methods operating on DataDicts.
LivingProxyDb
- Override
include_person: lazy set built from probably_alive().
- Override
sanitize_person: for modes 1–3 (name replacement), modify the DataDict’s
name fields directly (e.g., data.primary_name.first_name = "[Living]") instead
of constructing a new Person().
- Remove
__remove_living_from_family — now handled by base class get_raw_family_data.
- Remove
__restrict_person — replaced by sanitize_person on DataDict.
FilterProxyDb
- Override
include_*: use existing precomputed sets (self.plist, self.flist,
self.elist, self.nlist) — keep this pattern.
- Override
sanitize_* for note filtering: replace sanitize_notebase calls with
direct DataDict note_list filtering.
- Remove all the per-type
get_*_from_handle overrides — now handled by base class.
Files to Change
| File |
Change |
gramps/gen/proxy/proxybase.py |
Add default include_* and sanitize_*; rewrite all get_raw_*_data() to raise + clean refs on DataDict + call sanitize_*; make get_*_from_handle delegate to get_raw_*_data |
gramps/gen/proxy/private.py |
Override include_* (SQL-built sets); override sanitize_* (DataDict field filtering); remove old standalone sanitize_* functions and all get_*_from_handle overrides |
gramps/gen/proxy/living.py |
Override include_person (lazy set); override sanitize_person (modify DataDict name fields); remove __restrict_person, __remove_living_from_family |
gramps/gen/proxy/filter.py |
Override include_* (existing sets); override sanitize_* for note filtering on DataDicts; remove per-type get_*_from_handle overrides |
gramps/gen/db/base.py |
Add is_filter_override(table, filter_name) and apply_filter(table, filter_name) to interface (default returns False / raises) |
gramps/plugins/db/dbapi/dbapi.py |
Implement is_filter_override and apply_filter for SQL backends |
gramps/gen/proxy/test/proxies_test.py |
Verify no cross-reference list contains a filtered handle; test DataDict attribute-level sanitization; test chained proxies |
Invariant Guaranteed After This Refactor
For any DataDict returned by a proxy’s get_raw_*_data() (or get_*_from_handle()),
every handle appearing in any cross-reference list will pass the proxy’s
include_* predicate. No caller will ever receive a handle that resolves to
None or raises HandleNotFoundError as a result of following a cross-reference.
Direct lookup of a filtered handle raises HandleNotFoundError immediately and
clearly, rather than silently returning None.