Could Gramps use a Proxy table for a persistent tree subsets?

Problem Statement (AI rewrite)

In Gramps, genealogical databases stored in SQLite can grow to contain tens of thousands of individuals, families, and associated records. As these databases scale, interactive features such as dynamic charting and data analysis become increasingly unresponsive. The primary bottleneck arises from the need to scan and process the entire dataset for each operation, leading to slow performance and analyses that are often skewed by outlier data or irrelevant branches.

Currently, the only practical workaround is to “fork” the tree: manually export a relevant subset of the database and re-import it into a new tree. This approach is cumbersome, error-prone, and disrupts workflow continuity. There is a clear need for a more efficient and persistent method to define and work with meaningful subsets of genealogical data within the same Gramps environment, without duplicating or fragmenting the database.

Human Postscript: There have been previous discussions about adapting to a limited scope Dashboard option. The current Dashboard has statistical gramplets that look at the whole tree. And once the tree has collateral line data, the statistics become less pertinent. (As an example, consider the Age Stats or Surname Cloud gramplets: their statistics results become meaningless with a whole tree. But could be very informative when applied to the just Ancestors and/or Descendants of the Proband/Active Person.)

Perplexity.ai suggestion

Possible approach: Persistent Subset via Proxy Tree Feature

To efficiently work with large Gramps SQLite databases, consider implementing a proxy tree feature that allows users to define and persistently store a working subset of the data:

  • Subset Definition: Let users select a root person, family, or branch, and specify rules (e.g., ancestors, descendants, tagged individuals) to define the subset.
  • Persistent Proxy Table: Store the subset’s handles (unique IDs) in a dedicated table within the SQLite database. This table acts as a filter for all queries and chart displays, so only relevant data is loaded and analyzed.
  • Integration: Modify Gramps’ data access layer to use this proxy table as a filter, bypassing the need to scan the entire database for every operation. This approach leverages SQLite’s efficiency and avoids repeated export/import cycles[4][6].
  • Performance: This method ensures interactive charts and analyses remain responsive, as only the proxy subset is processed, greatly reducing overhead and minimizing the impact of outlier data[2][3].

This approach is similar to pre-computing filter maps for fast access, as discussed in recent Gramps filter optimizations, and can be implemented using standard Python and SQLite techniques[3][6].

Citations:
[1] Gramps Performance - Gramps
[2] Tips for large databases - Gramps
[3] Making Gramps filters Faster, and then Superfast
[4] GEPS 010: Relational Backend - Gramps
[5] Gramps
[6] 2.5.2.4 Lab - Working with Python and SQLite Answers
[7] Collaborate on Optimizing a new Custom Rule
[8] http://app.aspell.net/create?max_size=70&spelling=US&max_variant=0&diacritic=strip&download=wordlist&encoding=utf-8&format=inline


Answer from Perplexity: https://www.perplexity.ai/search/as-an-expert-in-python-tree-da-JkbXuZw9SdupvH2A9.7GbA?utm_source=copy_output

I would also like to have more responsive versions of the Interactive Family Tree (Topola viewer) by @PeWu , FamilyTreeView (FTV) by @ztlxltl and the “All Connected” mode of Graph View by Gary Burton

If there was an option to quickly copy the filtered people to a new tree database in the cache (with a 30day purge … which Gramps would have to manage) and switchover Loaded Trees, then the forked tree approach becomes more viable.

(However, if forking was TOO transparent, a jarring “Mode” indicator would needed. Similar to the different theme for browsers when in Private browsing mode.)

I have limited experience (compared to other Gramps developers) with databases, but I think you could speed up many DB-dependent processes by defining what you want to do not in Python but in SQL. My guess is that writing filters in SQL could greatly improve their performance. (But I’m happy to be corrected by database experts.)
My guess is that getting rid of the JSON string in the DB and storing that data in dedicated tables (if I remember correctly, that is the plan) will speed up queries even more. I’m not sure if the AI you asked is aware of this peculiarity in the Gramps database. With my limited database eperience, my guess is that using ordinary DB tables instead of the JSON string is the first step to go. (Again, I’m happy to be corrected by an expert!)

That said, looking at FamilyTreeView without filters, the main performance issues are not DB related, but UI related. The improvements discussed here will most likely not speed up FTV’s tree building without improving UI performance at the same time, which I think is possible, but not easy.

Looking at FTV with filters, possible performance improvements made to filters will directly affect performance when FTV applies and uses them.

If you can pinpoint a specific performance issue or a typical workflow that is slow with FTV, feel free to describe it so I can look into it in detail.

1 Like

It is not so much degraded performance with FTV.

It is more about being able to loosen some constraints. To increase the scope of what relationships have the expansion gadgets.

The Graph View chart has a wonderful feature to show All Connections. But it the tree is more than 1,000 people, it has problems.

My fear is that FTV expanded its scope, then it would suffer similarly.

I know that it would be lovely to be able to reach 1st and 2nd cousins through Expanders in FTV. But not at the expense of bogging down the interface. That would keep too many new users from adopting the addon… and @Nick-Hall has already expressed a desire to see your view rolled into the core. That would be a big benefit to the general Gramps community that should not be risked.

1 Like