Threading (or multiprocessing), performance and refactoring

romjerome · September 8, 2025, 12:10pm

Hello !
I am starting and looking at a little bit refactoring and performance issues on an old 3rd party addon. It was generated some years ago and tested with old hardwares and few resources. So, the idea was to generate (Yet) another alternated interface to Relationships Calculator module and to provide an output (via an hack of standard way for OpenDocument Spreadsheet exporter).

Performances sounded good by using example.gramps data (less than 5 secondes for calculations and display) with default values (generations depth).

As filter rules performances have been improved on Gramps 6.x, (and before making migration from 5.2 to 6.0), I started to look at ways for improving this addon tool. As it re-uses many times the Relationships module and as this could be expensive (memory, calculation), maybe the first step might be a quick refactoring. Also, the more number of columns will increase, the more it needs time for calculations and display. So, maybe something around this too.

The first exploration (on my side) is to try to look at threading and multiprocessing modules. It makes sense for more modern configurations and more and more large dataset. It is the first time that I use/call it. The basic test was to replace some possible expensive sections and put it them to their threads.
Well, the total time process increases from 5 secondes to 12 secondes with a simple database and default values (~2000 individuals)! I did not test with large databases. So, is it normal or I generated more mistakes with threads? Here a link to these first changes (tests, experimentations):

Something looks wrong to my thread usage and implementation.

-            filter.add_rule(related)
-            self.progress.set_pass(_('Please wait, filtering...'))
-            filtered_list = filter.apply(self.dbstate.db, plist)
+            t_filter = Thread(target=self.t_filter_rules(related, plist))
+            t_filter.start()
+            t_filter.join()

+def t_filter_rules(self, related, plist):
 +       """
+        """
+        self.filter.add_rule(related)
+        self.progress.set_pass(_('Please wait, filtering...'))
+        self.filtered_list = self.filter.apply(self.dbstate.db, plist)

Should I rather try to make a pool for threading? Or should rather improve the monitoring[1]? It looks like iteration and filtering might be different on Gramps 6.0, but I need to first improve it on 5.2.x, for understanding ways and modifications.

[1] $ gramps -d “relation_tab”

Best regards,
Jérôme

romjerome · September 8, 2025, 1:20pm

Maybe, I should also add a close event to threads?

github.com/gramps-project/gramps

Commentaire de prculley - New tools for media manager

master ← CWSchulze:cws_media_manager

Threading is not generally used in Gramps because our code is not thread safe. …Not even a little bit. We don't have any way to lock our modifications to the db that would allow other threads to continue seeing a coherent view of the data. For instance a change to an object that might refer to other objects doesn't require that the other objects even be present; our standard code would eventually get around to creating the needed objects in the db, but another thread would not know when it was safe to examine the db. Our GUI uses signals from the db to try to remain consistent with the db, but is has taken literally years of effort to keep the GUI somewhat up to date. We are still finding issues there. Adding in changes from other threads would make this even more 'interesting'.

romjerome · September 8, 2025, 2:55pm

It seems that I should not use return into the thread section dedicated to rank calculation?

def t_rank(self, dist, max_level):
    self.rank = dist[0][0]
    if self.rank == -1 or self.rank > max_level: # not related and ignored people            
        return

This might be into the loop (with continue) to avoid extra calculations (not related and ignored people).

dsblank · September 8, 2025, 10:41pm

There are better ways of handling threads than just starting 3. But in general, you’re going to run into issues with incompatibilities in accessing the database from different threads. I worked on trying to have a generic interface for parallel processing a couple of weeks ago, and I don’t think this is a good solution in general. Not to say that we can’t do parallel processing, but we have to do it in an abstract manner so that each database backend has options.

In any event, using Gramps 6.0.5 is going to be faster than any parallelism because of the Optimizer. You’ll want to move to it to get ready for 6.1.

romjerome · September 8, 2025, 11:12pm

Do you mean a Pool or a high-level stuff like concurrent module?

romjerome · September 8, 2025, 11:25pm

The primary idea was to move some basic calculations to functions and their related threads. I made a mistake on one of the function and the test for limiting entries was broken into the loop, generating extra calculations and increasing time process. So, testing my primary idea around threading should be now complete. After looking at some samples, I just understood that dealing with lock() and playing with location of join() might not provide the expected result. The thread on filter could be ignored anyway as it is outside of my iteration and loop.

I suppose that I got some of these incompatibilities…

Exception in thread Thread-56:Traceback (most recent call last):
File “/usr/lib/python3.6/threading.py”, line 916, in _bootstrap_innerself.run()
File “/usr/lib/python3.6/threading.py”, line 1182, in runself.function(*self.args, **self.kwargs)
TypeError: ‘NoneType’ object is not callable

Exception in thread Thread-58:Traceback (most recent call last):
File “/usr/lib/python3.6/threading.py”, line 916, in _bootstrap_innerself.run()
File “/usr/lib/python3.6/threading.py”, line 1182, in runself.function(*self.args, **self.kwargs)
TypeError: ‘NoneType’ object is not callable

File “/usr/lib/python3/dist-packages/gramps/plugins/db/dbapi/dbapi.py”, line 1002, in _get_raw_data
self.dbapi.execute(sql, [handle])
File “/usr/lib/python3/dist-packages/gramps/plugins/db/dbapi/sqlite.py”, line 136, in execute
self.__cursor.execute(*args, **kwargs)
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 140130013738816 and this is thread id 140129322649344.

romjerome · September 8, 2025, 11:34pm

Nothing currently common (python, others applications, etc.) or a limitation to gramps database backends?

github.com/gramps-project/gramps

DBAPI improvements with SQLite implementation

dsb/sqlite-optimizations ← glamberson:fix-doug-sqlite

ouvert 03:04PM - 07 Aug 25 UTC

glamberson

+296 -3

This PR builds on your excellent SQLite optimizations by adding complementary ba…ckend-agnostic improvements to the DBAPI base class. I've kept all your valuable work while organizing things so SQLite-specific code stays in sqlite.py and generic improvements benefit all backends. ## What This Adds ### 1. Real Cursor Support (5 lines, huge memory savings) - Returns iterators for backends that support real cursors (PostgreSQL, MySQL) - Gracefully falls back to regular lists for SQLite/BSDDB - Reduces memory usage by 10x for large databases ### 2. Lazy Loading (Optional, 50% memory reduction) - Returns a proxy that loads person data only when accessed - Dramatically reduces memory for operations that don't need all object data - Completely optional - existing code continues to work ### 3. Improved Prepared Statements (Backend-agnostic) - PostgreSQL/MySQL get real prepared statements - SQLite gets your cached query strings - Same API for all backends ### 4. Batch Operations (Using executemany) - Uses executemany() when available - Falls back to individual commits for other backends - 100x speedup for bulk imports ## What I Preserved All @dsblank 's proposed SQLite optimizations remain intact: - WAL mode, cache settings, memory-mapped I/O - Connection pooling - Your bulk insert/update methods - Performance indexes - JSON query functions ## Organization Changes I moved SQLite-specific code to the SQLite class: - VACUUM now lives in SQLite.optimize_database() - Base DBAPI.optimize_database() only does ANALYZE (widely supported) ## Testing All changes include graceful fallbacks and maintain backward compatibility. The improvements are additive - existing code continues to work exactly as before. ## Impact Combined with your optimizations: - Your changes: 2-10x performance improvement - These additions: 10x memory reduction, 2x query performance - Together: Makes Gramps viable for 100,000+ person databases Let me know if you'd like any adjustments or have questions about the implementation\! Best, Greg

About Gramps 6 installation, I need a large OS upgrade, before…

romjerome · September 10, 2025, 11:46am

Well, after some refactoring, I looked at additionnal (or advanced) features like family network centrality, or shared subtree size… It sounds very good!

(sorry, I did not force english locale for the screenshot)

You can see two additionnal columns. That’s something that I cannot add alone on one day (maybe on one week!)…

I was behind instructions to the “copilot” and the spirit and ideas are still present.

Maybe I can polish the RelID (ID Rel) map design. Anyway, by re-using most core modules from gramps, we could quickly go very far on relationships analysis. I will not make a PR. This analyse (analyze) tool has some custom behaviors like asking to select a folder for the save .ods action. Sure, after refactoring, the code is more pythonic and clean, but some sections are very experimental or still pending (DNA stuff?):

name = name_displayer.display(person)
# pseudo privacy; sample for DNA stuff and mapping
import hashlib
no_name = hashlib.sha384(name.encode() + handle.encode()).hexdigest()
_LOG.info(no_name) # own internal password via handle

the model should very close to the TreeView, but I cannot maintain a new View (e.g., Relation Views Category) or make some deep interactions with filter rules (e.g., like on some Graphical Views).

romjerome · September 11, 2025, 10:41am

Note, during experimentations, I also got a ProgressMeter window via CLI…

$ gramps -O ‘example’ -a tool -p name=relationtab -d “relation_tab”

romjerome · September 11, 2025, 12:40pm

I made a draft Pull Request against gramps60.

Copyright (C) 2000-2006 Donald N. Allingham

It was the plugin environment (and global gramps application).

Copyright (C) 2008 Brian G. Matherly

It was around the GUI stuff, maybe the hack for the folder selector (gtk2 to gtk3).

Copyright (C) 2010 Jakim Friant

It was the filter rules handling into the tool.

Copyright (C) 2012 Doug Blank

Might be related to the TreeView model, the ODS file format support or the Tools options.

I kept way to generate issues around threading, via a dedicated function! So, it will return errors (console) without crashing the tool. The addon is not listed (include in listing = False) and there is an additional file (import number ; numbers already exist as built-in python set).

dsblank · September 11, 2025, 2:16pm

@romjerome I am not sure what you are doing. At one point you have:

1. thread = Thread(target=self.long_running_task, args=(default_person, person,))
2. thread.start()
3. dist = self.relationship.get_relationship_distance_new(
       self.dbstate.db, default_person, person, only_birth=True)

where line 1 calls long_running_task that does this:

        dist = self.relationship.get_relationship_distance_new(
            self.dbstate.db, default_person, person, only_birth=True)

But you call that again on line 3. I don’t think this code is doing what you think it is doing. And this doesn’t look like proper thread management. But even if it were done properly, Gramps makes no guarantees that you can handle database actions in a thread.

dsblank · September 11, 2025, 2:36pm

BTW, there is a “git blame” button and cli:

But in general it is impossible to associate a copyright with particular lines (because they may be revised or removed).

romjerome · September 11, 2025, 5:57pm

Yes, it was the plan! I just call it twice to keep a trace of the incompatibilities you warned me and also listed on:

Something like a testing without crash because the thread section does not really generate a useful data. The second one does. Sure, a final version (polish outside draft PR) should remove the thread “lines”.

I remember that one script on experimental Gram.pycould generate something very close (at least the primary idea).

By looking at some comments on code, history and documentation, there was some performance issues in the past (e.g., SQLite backend 30% slower than bsddb3 backend). As it was some years ago, I just try to improve a little bit any possible way to limit any slow down or extra time process with large database.

romjerome · September 11, 2025, 6:32pm

It seems that I just added a copyright when I looked at an other module, or section (piece or part) of code into Gramps, then re-used it into the addon.

Maybe it makes sense for the core set of plugins. Does it still make sense when we create an addon? It seems that I re-used the GUI logic, Plugins classes for Tools, part of the OpenDocument logic and just hide my name when it should be added. For me, it is a plugin for gramps. So, the licence, maybe the copyright, are related to development around gramps. As you pointed out git blame (or svn blame !) could find the history of commits and authors.

My problem is now on coding Policy, AI, copyright & co. I should be able to quickly add advanced calculations (via new columns on table) related to Relationships, DNA, Families, Surnames, Statistics, etc. Most of them will be basic calculations (few lines), but what about any addition from copyright(ed) code and provided by an AI. Do not worry, I checked before adding any lines from the AI suggestions. The problem might be to include a “real” algorithm. I remember the Lunisolar calendar issue (e.g., for Chinese dates, see bug tracker). It was not included, despite some patchs (even by myself!). Today, any AI will include it (at a glance).

romjerome · September 11, 2025, 10:05pm

Is it dangerous to keep it? Can this crash more than Gramps? Can this corrupt the database (in read only state/mode)? Python uses its pseudo-sandbox (GIL environment), isn’t it?

I saw some projets like this one:

but I thought that warnings were only for printing informations and threads were closed before.

update: I suppose that I understand now what you wrote about a proper implementation… For testing threading, I should always (or often) skip and limit use of return into the related function and prefer yield for performances issues.

romjerome · September 14, 2025, 4:38am

An ambitious feature could be: “Person Comparison: Add a feature to compare relationships and metrics between two specific individuals.”

So, it does not assign UUID for records into a Big Tree, like FamilySearch or Ancestry. It only checks our Relationships maps. I am not sure that we could fully hide surnames, maybe the soundex could be an alternative. Anonymized matrix(es) means decentralyzed stuff and less ressourses too. In the 90th (1990), ‘Tafel Matching System’ was very popular, at least in France, and ‘Geneanet’ more and less has improved it.

github.com/gramps-project/addons-source

Commentaire de romjeromealt - RelID improvements

maintenance/gramps60 ← romjeromealt:patch-1

Some suggestions for additions to improve it: * Add Graphical Visualizations:… Integrate libraries like Matplotlib or Plotly to create graphs of relationships and metrics. * Export to Other Formats: Add the ability to export data to formats like CSV, JSON, or PDF. * Advanced Filtering: Allow users to filter results based on specific criteria, such as gender, period, or number of generations. * Person Comparison: Add a feature to compare relationships and metrics between two specific individuals. * External Data Integration: Allow importing data from external sources like FamilySearch or Ancestry. * Enhanced User Interface: Add customization options for the interface, such as the ability to change the language or theme. * Performance Optimization: Improve performance by using techniques like multithreading or caching. * Enhanced Documentation: Add detailed comments and comprehensive documentation to facilitate understanding and usage of the script. * Unit Testing: Add unit tests to verify the proper functioning of different functions and methods. * Enhanced Logging: Improve logging to include more detailed information and different logging levels (DEBUG, INFO, WARNING, ERROR, CRITICAL). Here's an example of how you could add a person comparison feature: ``` def compare_people(self, person1_handle, person2_handle): """ Compare relationships and metrics between two people. Args: person1_handle: Handle of the first person. person2_handle: Handle of the second person. Returns: dict: A dictionary containing the differences between the two people. """ person1 = self.dbstate.db.get_person_from_handle(person1_handle) person2 = self.dbstate.db.get_person_from_handle(person2_handle) comparison = { 'name': { 'person1': name_displayer.display(person1), 'person2': name_displayer.display(person2) }, 'relationship': { 'person1': get_relationship_between_people(self.dbstate, self.relationship, self.default_person, person1), 'person2': get_relationship_between_people(self.dbstate, self.relationship, self.default_person, person2) }, 'surname_diversity': { 'person1': FamilyPathMetrics.calculate_surname_diversity(self.dbstate.db, person1_handle, generations=5), 'person2': FamilyPathMetrics.calculate_surname_diversity(self.dbstate.db, person2_handle, generations=5) }, # Add other metrics to compare here } return comparison ``` This function compares names, relationships, and surname diversity between two people and returns a dictionary containing the differences. You can then use this dictionary to display the comparison results in the user interface. _~codestral Agent_

romjerome · September 22, 2025, 8:45am

Doug, I suppose I get it more clear and clean now (yes, it is possible!) and I could provide a simple test or proof-of-concept of these testing.

How make such code fit for a Family Tree with more than 200 000 individuals ?

from gramps.gen.filters import GenericFilterFactory, rules
filter = FilterClass()
self.filter = FilterClass()
default_person = self.dbstate.db.get_default_person()
plist = self.dbstate.db.iter_person_handles()
if default_person: # rather designed for a run via GUI...
    root_id = default_person.get_gramps_id()
    ancestors = rules.person.IsAncestorOf([str(root_id), True])
    descendants = rules.person.IsDescendantOf([str(root_id), True])
    related = rules.person.IsRelatedWith([str(root_id)])
    self.filter.add_rule(related)
    _LOG.info("Filtering people related to the root person...")
    self.progress.set_pass(_('Please wait, filtering...'))
    self.filtered_list = filter.apply(self.dbstate.db, plist)
    for handle in self.filtered_list:
       ...

Someone reported that this (or on an other piece of code into the addon) could take more than 2 hours.

Sure, to call Relationships will use some ressources, but the poor timer issue seems (for me) rather on the filter rule.

Checking and retrieving informations for all filtered individuals via person_handle iteration (without filtering), seems rather to take around maximum 15 minutes. Does ‘related’ filter rule have been improved/optimized on Gramps 6.0.5?

romjerome · September 22, 2025, 1:29pm

Right, it seems that I did some strange experimentations!

I just see now, that the IsRelatedWith filter rule is already using a pseudo-parallel processing!

I was wondering why I get some blocs of records after the filtering pass. In my mind, the filter rule was only for limiting the dataset. Currently the progress meter seems (for me) to list a part of the filtered people while (or during) iteration on the first matching handles! Looking at the recursive code on filter rule I cannot really monitor this from the addon, which is only an interface. This cannot explain the performance issue with large database (I suppose), maybe it just only explains why I wanted to explore the threading during the iteration pass.

romjerome · September 24, 2025, 9:54am

I get more difficulties to properly implement custom filter rules support on tools, than use of yield and generators on code!

romjerome · September 27, 2025, 7:25am

Finally, the workflow, for human with or without “machine” support, did not really change! And I still find possible issues never reported before and outside AI monitoring or check passes…

Sure, AI’s answer sounds good and solution seems logical. I skiped now the threading experimentations, even indirectly tested with yield, stack, lists and gtk events. So, code should be more pythonic and modern, and there is no real features addition, except maybe more columns.

One issue is still pending. It is specific to GUI (Gtk TreeView model). No crash or error via CLI. It is on gtk window used for displaying the list of results. As a pseudo-sosa/kékulé numbering was expected, an experimental numbering for descendants or cousins has been tested. The main problem is to have a real number, not too complicated to understand at a glance. Maybe a positive one, not used by sosa/kékulé (so, not from 2 to infinite) and uniq. Maybe something around 0 to 1? To have a float value will crash the gtk model

self.model = ListModel(treeview, self.titles)
for entry, sort_key in batch: 
    self.model.add(entry, sort_key)

It is a design issue. So no more related to refactoring, polishing or cleanup, but still around performance and maybe threading with gtk window.

ps : there is also a cache issue, which is for re-using a list. This might display an incomplete list (signal, active person and database, etc.)

Topic		Replies	Views
Datamining standard libraries for Gramps? Development prerequisite-mgmt	44	1869	November 15, 2020
Performance Issues with Gramps Development hacks , performance	77	1234	January 6, 2025
Gramps Web API, list performance Gramps Web performance , development	3	90	January 11, 2026
What are the slowest Features in Gramps Help performance	36	529	August 17, 2025
Collaborate on Optimizing a new Custom Rule Development new_filter_rules , performance	23	1013	March 15, 2025

Threading (or multiprocessing), performance and refactoring

Related topics