Algorithm for Deep Connections gramplet

I notice, from the new display in the Deep Connections gramplet, that the process appears to start with one person and proceeds by a depth first search looking for the other. This has work factor b exp d, where b is the (average) branches at each level and d is the search depth.

I suggest that, because the gramplet is computationally intensive, it would be worth trying the more complex, but more efficient method of starting at both ends and looking for matches between the list for one end and the list for the other. This has work factor 2(b exp (d/2)).

1 Like

This is yet another example where a network graph approach would clearly outperform any kind of traditional search and match.

If all primary Gramps object data were stored in a graph structure within the database—using two additional tables, one for nodes (objects like persons, sources, places, etc.) and one for edges (relationships between them)—then existing graph algorithms could be applied directly. This could significantly reduce the workload, especially for deep and complex genealogical queries.

This can already be done in a SQLite database today by creating these two tables and using libraries that support graph data extraction from relational databases. In these cases, you don’t need much more than names and key vital data in the graph store—additional object details can be fetched dynamically as needed.

To extract data from these tables, you can use pandas, which makes it easy to query SQLite and load the node and edge data into memory as DataFrames. These can then be passed directly into graph libraries like:

  • NetworkX (BSD license): Pure Python, easy to integrate, ideal for prototyping and analysis. It works well with pandas and supports a wide range of graph algorithms.

  • python-igraph (GPL-2 license): High-performance and memory-efficient, with native support for graph structures and algorithms. Fully compatible with Gramps’ licensing. Its backend is a compiled C core optimized for in-memory graph operations, not a database.

Of course, it’s entirely possible to bypass pandas if you don’t need advanced filtering, grouping, or transformation. Both libraries can be fed directly from SQLite or PostgreSQL using standard Python database connectors like sqlite3 or psycopg2.

Using a graph model opens up access to algorithms like:

  • Breadth-First Search for shortest paths between individuals or objects

  • Bidirectional Search to halve the search depth

  • Dijkstra’s or A\* for optimal paths with weighted edges (e.g., generational distance or relevance)

Naturally, there are many more shortest-path and minimal-hop algorithms that can be applied depending on the structure and goals of the query. And beyond that, this graph-based approach could be extended to incorporate other types of data—such as DNA matches, segment overlaps, or source citations—already present in Gramps, allowing for hybrid queries that combine genealogical, genetic, and contextual relationships. For example, you could query all individuals linked to a specific source, or all people associated with a particular location, regardless of its hierarchical level.

In terms of performance, even on a modest machine with 8–16 GB of DDR4 RAM, both NetworkX and python-igraph can comfortably handle:

  • NetworkX: ~50,000–100,000 nodes and ~500,000 edges for in-memory analysis, depending on algorithm complexity

  • python-igraph: ~500,000–1 million nodes and several million edges, thanks to its optimized backend

These are hypothetical but realistic estimates for genealogical datasets, assuming the data is already extracted from the database and loaded into memory.

I’m fairly certain I’ve written something about this specifically at some point earlier, so there may be more details in an older thread or comment.


Note: I’ve used Copilot (Microsoft’s AI assistant) to help translate this from Norwegian, verify the technical feasibility of the proposed functionality, and ensure that the concepts discussed are aligned with what existing tools and libraries can realistically support in this context. If anything reads oddly, it may be because English isn’t my first language and something slipped past my final review.

2 Likes

Speaking of graphs, I was watching this video a few minutes ago: https://youtu.be/94k0oiifkqs?si=X8x1FT8CigEG0AJ5

presenter: David Allen Stumpf, MD, PhD; Graphs For Genealogists, plugin for Neo4j
wai website, facebook group

1 Like

Yeah, they have a few good videos on the topic of genealogy over at Neo4j.

Yes, you are correct that I used depth-first search. Because the gramplet is designed to find all of the paths, not just the shortest path, and stops at each path, I don’t think it makes much difference.

I did try a bi-directional search, but it added quite a bit of complexity to the code.

Did you try the updated “Deep Connections Gramplet”? Is it still too slow? If so, can you tell me how many people are in your tree?

I tested on the largest tree that I have (30k people) and it worked very fast.

1 Like

Yes, I have tried it and it’s significanty faster (19000 people): thank you. It’s now fast enough that it’s not inconvenient when waiting for the results. But it isn’t fast enough that I want to leave it active, even when I don’t want the results. So I still have the mild inconvenience of activating and deactivating Deep Connections.

1 Like

Just to clarify: my comment was meant as a general tip on what a network graph approach can offer in solving genealogical problems. With such a model, you naturally have access to algorithms that can show all paths between two or more nodes. Most graph libraries support a wide range of search patterns and traversal strategies—those two I mentioned were just examples among many.

I don’t use Deep Connections myself at the moment, so my comment wasn’t related to its current implementation. I simply wanted to highlight how graph-based techniques can open up new possibilities for querying relationships and structures in genealogical data—much like what Deep Connections already does today, as I understand it.

Note: I’ve used Copilot (Microsoft’s AI assistant) to help translate and format this reply based on my original Norwegian notes. The technical concepts and suggestions are my own, but Copilot assisted in ensuring clarity and consistency in English.

I wonder if the “Pause” button in the gramplet keeps its state between selected person changes? Probably not. But that could be changed.

Or I might bite the bullet and explore bi-directional search.

Thanks for the original comment and further explanation. While it is true that graph-based tools are better designed for graph operations, we can do any of those oprtations using a graph algorithm on our gramps databases.

An interesting exploration would be for someone to implement our generic database class on top of of such a graph engine.

Could you share links for this video?
i m trying to play with graph technologies with python-igraph.
i can make some elementary thinks like searching all ancestors but i m a bit stuck for exemple with the deep connexions problems

see this Perplexity response requesting a collated list of Genealogy related videos at Neo4j.
https://www.perplexity.ai/search/could-you-list-videos-on-the-t-8J9GqRcLSYm37XuNClA7Eg#0

We should also invite the Isotammi team to comment. They were using Neo4j for the data imported from Gramps. Unfortunately, we know very little about that.

There are multiple videos on neo4j’s Youtube channel, so here is a link with the search for “genealogy” on their channel.

But let us not drag this too far from the original topic…
Maybe instead create a new more general post where it can be possible to discuss different tools for using graphs in research and visualization of genealogy data?

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.