This is yet another example where a network graph approach would clearly outperform any kind of traditional search and match.
If all primary Gramps object data were stored in a graph structure within the database—using two additional tables, one for nodes (objects like persons, sources, places, etc.) and one for edges (relationships between them)—then existing graph algorithms could be applied directly. This could significantly reduce the workload, especially for deep and complex genealogical queries.
This can already be done in a SQLite database today by creating these two tables and using libraries that support graph data extraction from relational databases. In these cases, you don’t need much more than names and key vital data in the graph store—additional object details can be fetched dynamically as needed.
To extract data from these tables, you can use pandas, which makes it easy to query SQLite and load the node and edge data into memory as DataFrames. These can then be passed directly into graph libraries like:
-
NetworkX (BSD license): Pure Python, easy to integrate, ideal for prototyping and analysis. It works well with pandas and supports a wide range of graph algorithms.
-
python-igraph (GPL-2 license): High-performance and memory-efficient, with native support for graph structures and algorithms. Fully compatible with Gramps’ licensing. Its backend is a compiled C core optimized for in-memory graph operations, not a database.
Of course, it’s entirely possible to bypass pandas if you don’t need advanced filtering, grouping, or transformation. Both libraries can be fed directly from SQLite or PostgreSQL using standard Python database connectors like sqlite3 or psycopg2.
Using a graph model opens up access to algorithms like:
-
Breadth-First Search for shortest paths between individuals or objects
-
Bidirectional Search to halve the search depth
-
Dijkstra’s or A\* for optimal paths with weighted edges (e.g., generational distance or relevance)
Naturally, there are many more shortest-path and minimal-hop algorithms that can be applied depending on the structure and goals of the query. And beyond that, this graph-based approach could be extended to incorporate other types of data—such as DNA matches, segment overlaps, or source citations—already present in Gramps, allowing for hybrid queries that combine genealogical, genetic, and contextual relationships. For example, you could query all individuals linked to a specific source, or all people associated with a particular location, regardless of its hierarchical level.
In terms of performance, even on a modest machine with 8–16 GB of DDR4 RAM, both NetworkX and python-igraph can comfortably handle:
-
NetworkX: ~50,000–100,000 nodes and ~500,000 edges for in-memory analysis, depending on algorithm complexity
-
python-igraph: ~500,000–1 million nodes and several million edges, thanks to its optimized backend
These are hypothetical but realistic estimates for genealogical datasets, assuming the data is already extracted from the database and loaded into memory.
I’m fairly certain I’ve written something about this specifically at some point earlier, so there may be more details in an older thread or comment.
Note: I’ve used Copilot (Microsoft’s AI assistant) to help translate this from Norwegian, verify the technical feasibility of the proposed functionality, and ensure that the concepts discussed are aligned with what existing tools and libraries can realistically support in this context. If anything reads oddly, it may be because English isn’t my first language and something slipped past my final review.