Fixing Proxies properly

@Nick-Hall @SteveY @DavidMStraub @kulath : just letting you know that I am planning on fixing the root of our type issues (and fixing lots of issues along the way). Currently, we have an issue with proxies that depend on get_ITEM_from_handle() as a filter and possibly returning None. I’ll post a proposed fix here (below).

The plan (in order) is to:

  1. Fix proxies (no returning None allowed on getting objects); then:
  2. https://github.com/gramps-project/gramps/pull/1934 - Minimal code changes to support type hinting of db methods; then:
  3. https://github.com/gramps-project/gramps/pull/1919 - Add type hints to db classes
1 Like

Gramps Proxy Refactor Plan

Problem

Gramps supports “proxy” databases that wrap a real database (or another proxy) and
filter out certain items. For example:

  • PrivateProxyDb hides objects marked private
  • LivingProxyDb hides living people
  • FilterProxyDb hides objects not matching a filter

The current implementation has a fundamental API flaw: when you request a filtered
object by handle, the proxy returns None. This means every caller must defensively
check:

person = db.get_person_from_handle(handle)
if person is not None:
    ...

Worse, a caller may obtain a handle from a cross-reference list (e.g., iterating
family.child_ref_list) and then look it up only to receive None, because the
reference list was never cleaned of filtered handles. This is the root cause: the
proxy filters at lookup time but not at reference-list return time, so filtered
handles leak into the caller’s hands.


Solution Overview

Two cleanly separated layers of filtering:

  1. include_*(handle) methods — binary gate: should this entire object be
    visible? Used to raise HandleNotFoundError on direct filtered lookups, and to
    strip filtered handles from cross-reference lists in returned objects.

  2. sanitize_*(data) methods — attribute-level cleanup on a DataDict: given a
    visible object’s raw data, strip sub-attributes that should not be shown (e.g.,
    private notes on a public person, private citations, private associations).

Together these guarantee: a caller who receives an object from the proxy will never
hold a handle that resolves to None or raises unexpectedly.
Every handle in every
reference list has already been validated as included.


Working Medium: DataDict instead of reconstructed objects

The key implementation insight is to work on DataDicts (returned by
get_raw_*_data()) rather than reconstructed full objects (like Person()).

The underlying real database’s get_raw_person_data(handle) returns a DataDict
a dict subclass that supports direct attribute access (.family_list,
.child_ref_list, .father_handle, etc.) and is much cheaper to create than a
full Person object.

Revised call chain

Instead of the current chain where get_raw_person_data calls
get_person_from_handle:

(current) get_raw_person_data → get_person_from_handle → object_to_data

The proxy reverses this: the core filtering and sanitization happen on the DataDict,
and get_person_from_handle is just a thin wrapper:

(new) get_raw_person_data(handle):
        check include_person(handle) → raise if filtered
        data = self.db.get_raw_person_data(handle)   # DataDict from inner db
        strip filtered cross-refs directly on data
        sanitize_person(data)                         # strip private sub-attrs
        return data

    get_person_from_handle(handle):
        return self.get_raw_person_data(handle)       # DataDict is the object

This means proxies override get_raw_*_data() as the primary method. No full object
reconstruction is ever needed. sanitize_* methods are simple list-filter operations
on the DataDict.

Example: get_raw_person_data in ProxyDbBase

def get_raw_person_data(self, handle):
    if not self.include_person(handle):
        raise HandleNotFoundError(handle)
    data = self.db.get_raw_person_data(handle)
    # Layer 1: strip cross-refs to filtered objects using include_* predicates
    data.family_list = [
        h for h in data.family_list if self.include_family(h)
    ]
    data.parent_family_list = [
        h for h in data.parent_family_list if self.include_family(h)
    ]
    data.event_ref_list = [
        ref for ref in data.event_ref_list if self.include_event(ref.ref)
    ]
    data.person_ref_list = [
        ref for ref in data.person_ref_list if self.include_person(ref.ref)
    ]
    data.note_list = [h for h in data.note_list if self.include_note(h)]
    data.citation_list = [h for h in data.citation_list if self.include_citation(h)]
    data.media_list = [
        ref for ref in data.media_list if self.include_media(ref.ref)
    ]
    # Layer 2: strip private sub-attributes (no-op in base class)
    return self.sanitize_person(data)

Example: get_raw_family_data in ProxyDbBase

def get_raw_family_data(self, handle):
    if not self.include_family(handle):
        raise HandleNotFoundError(handle)
    data = self.db.get_raw_family_data(handle)
    if not self.include_person(data.father_handle):
        data.father_handle = None
    if not self.include_person(data.mother_handle):
        data.mother_handle = None
    data.child_ref_list = [
        ref for ref in data.child_ref_list if self.include_person(ref.ref)
    ]
    data.event_ref_list = [
        ref for ref in data.event_ref_list if self.include_event(ref.ref)
    ]
    data.note_list = [h for h in data.note_list if self.include_note(h)]
    data.citation_list = [h for h in data.citation_list if self.include_citation(h)]
    data.media_list = [
        ref for ref in data.media_list if self.include_media(ref.ref)
    ]
    return self.sanitize_family(data)

Direct attribute access (.family_list, .child_ref_list, .father_handle) is
used throughout, compatible with DataDict objects.


include_* — base class defaults

The base ProxyDbBase defaults to including everything (passthrough):

def include_person(self, handle):     return handle is not None
def include_family(self, handle):     return handle is not None
def include_event(self, handle):      return handle is not None
def include_note(self, handle):       return handle is not None
def include_citation(self, handle):   return handle is not None
def include_source(self, handle):     return handle is not None
def include_media(self, handle):      return handle is not None
def include_place(self, handle):      return handle is not None
def include_tag(self, handle):        return handle is not None
def include_repository(self, handle): return handle is not None

Subclasses override only the types they filter.


sanitize_* — base class defaults

Base class sanitize methods are no-ops on the DataDict, returning it unchanged:

def sanitize_person(self, data):  return data
def sanitize_family(self, data):  return data
# ... etc.

Subclasses override these for attribute-level filtering. Because the working medium
is a DataDict, no object reconstruction is needed — just filter list fields:

# Example: PrivateProxyDb.sanitize_person
def sanitize_person(self, data):
    # Strip private attributes from the person's own attribute list
    data.attribute_list = [
        a for a in data.attribute_list if not a.private
    ]
    # Strip private names
    data.alternate_names = [
        n for n in data.alternate_names if not n.private
    ]
    # Strip private addresses
    data.address_list = [
        a for a in data.address_list if not a.private
    ]
    # Strip private LDS ordinances
    data.lds_ord_list = [
        o for o in data.lds_ord_list if not o.private
    ]
    return data

This replaces the existing large sanitize_person(db, person) standalone function
in private.py which manually reconstructed a new Person() object — that
reconstruction is no longer needed.


Performance: Precomputed Handle Sets

The include_*(handle) predicate is called many times during reference list cleanup.
It must be fast — O(1) set membership.

Database interface additions

Two new methods on the database interface:

  • db.is_filter_override(table, filter_name) — returns True if the database has
    a native SQL implementation for this named filter on this table. Collapses into a
    single call the questions of: does the DB support SQL, and does it have SQL for
    this specific filter? Returns False for proxy databases, non-SQL backends, or
    filters that cannot be expressed in SQL.

  • db.apply_filter(table, filter_name) — runs the filter natively and returns a
    list of matching handles. Convenience wrapper around filter.apply(db) so callers
    don’t need to hold a reference to the filter object. Only call when
    is_filter_override returns True.

Two paths for building the included-handle set

Path 1 — Native SQL override (fast, eager)

When db.is_filter_override('person', filter_name) is True:

def _build_included_persons(self):
    if self.db.is_filter_override('person', self.filter_name):
        self._included_persons = set(self.db.apply_filter('person', self.filter_name))
    else:
        self._included_persons = {
            h for h in self.db.iter_person_handles()
            if self._check_person(h)
        }

include_person(handle) is then always O(1):

def include_person(self, handle):
    return handle in self._included_persons

Path 2 — Python-only predicates (lazy)

For expensive predicates like probably_alive(), build the set lazily on first use:

@property
def _included_persons(self):
    if self.__included_persons is None:
        self.__included_persons = {
            h for h in self.db.iter_person_handles()
            if not self._is_living(h)
        }
    return self.__included_persons

self.db.iter_person_handles() composes naturally with inner proxies — only handles
already passing the inner proxy’s filter are iterated.

Summary by proxy

Proxy Filter is_filter_override? Set-build strategy
PrivateProxyDb 'private' Yes (if wrapping real SQL DB) Eager via apply_filter
FilterProxyDb user-defined Yes (if SQL-expressible and real DB) Eager via apply_filter
LivingProxyDb 'living' No (Python-only) Lazy via iter_person_handles
ProxyDbBase n/a n/a handle is not None — no set needed

Cross-Reference Map

Every get_raw_*_data() method strips filtered handles from the DataDict using
include_* predicates:

Object Cross-references to strip
Person family_list, parent_family_list, event_ref_list[*].ref, person_ref_list[*].ref, media_list[*].ref, citation_list, note_list
Family father_handle, mother_handle, child_ref_list[*].ref, event_ref_list[*].ref, media_list[*].ref, citation_list, note_list
Event place_handle, media_list[*].ref, citation_list, note_list
Citation source_handle, media_list[*].ref, note_list
Source reporef_list[*].ref, media_list[*].ref, note_list
Place placeref_list[*].ref, media_list[*].ref, citation_list, note_list
Media citation_list, note_list
Repository note_list

Subclass Responsibilities After Refactor

PrivateProxyDb

  • Override include_*: build precomputed sets via apply_filter (SQL) or iteration
    (fallback). Each set contains handles of non-private objects.
  • Override sanitize_*: strip private sub-attributes (private attribute_list items,
    private alternate_names, private addresses, private LDS ordinances) directly on
    the DataDict. No object reconstruction needed.
  • Remove existing standalone sanitize_person(db, person) etc. functions —
    replaced by the instance methods operating on DataDicts.

LivingProxyDb

  • Override include_person: lazy set built from probably_alive().
  • Override sanitize_person: for modes 1–3 (name replacement), modify the DataDict’s
    name fields directly (e.g., data.primary_name.first_name = "[Living]") instead
    of constructing a new Person().
  • Remove __remove_living_from_family — now handled by base class get_raw_family_data.
  • Remove __restrict_person — replaced by sanitize_person on DataDict.

FilterProxyDb

  • Override include_*: use existing precomputed sets (self.plist, self.flist,
    self.elist, self.nlist) — keep this pattern.
  • Override sanitize_* for note filtering: replace sanitize_notebase calls with
    direct DataDict note_list filtering.
  • Remove all the per-type get_*_from_handle overrides — now handled by base class.

Files to Change

File Change
gramps/gen/proxy/proxybase.py Add default include_* and sanitize_*; rewrite all get_raw_*_data() to raise + clean refs on DataDict + call sanitize_*; make get_*_from_handle delegate to get_raw_*_data
gramps/gen/proxy/private.py Override include_* (SQL-built sets); override sanitize_* (DataDict field filtering); remove old standalone sanitize_* functions and all get_*_from_handle overrides
gramps/gen/proxy/living.py Override include_person (lazy set); override sanitize_person (modify DataDict name fields); remove __restrict_person, __remove_living_from_family
gramps/gen/proxy/filter.py Override include_* (existing sets); override sanitize_* for note filtering on DataDicts; remove per-type get_*_from_handle overrides
gramps/gen/db/base.py Add is_filter_override(table, filter_name) and apply_filter(table, filter_name) to interface (default returns False / raises)
gramps/plugins/db/dbapi/dbapi.py Implement is_filter_override and apply_filter for SQL backends
gramps/gen/proxy/test/proxies_test.py Verify no cross-reference list contains a filtered handle; test DataDict attribute-level sanitization; test chained proxies

Invariant Guaranteed After This Refactor

For any DataDict returned by a proxy’s get_raw_*_data() (or get_*_from_handle()),
every handle appearing in any cross-reference list will pass the proxy’s
include_* predicate. No caller will ever receive a handle that resolves to
None or raises HandleNotFoundError as a result of following a cross-reference.

Direct lookup of a filtered handle raises HandleNotFoundError immediately and
clearly, rather than silently returning None.

Thank you for giving me the opportunity to actually read this in an easy way :slight_smile:

1 Like