Performance Issues with Gramps

I think Nick’s alternative is a good first step…

Going from BLOBS to JSON in tables utilizing the JSON features that is in sqlite and postgresql as a first step and include at least some indexes to those tables… and maybe even some views…

If that works, it would be a lot easier for someone to convert some of the Python code to functions and/or stored procedures and see if that speeds up things even more.

I am no Python developer, but if the next step was to change/convert the python code used for rules and filter to a function in sql, I do think it will be possible and relatively easy to make a GUI and a transcoder that let people write their filters in Gramps and push a button to transcode it to sql, json or the language used by a selected multi-model database use…

Today’s relational databases have built in support for json documents, so I really don’t think it will matter much if Gramps use a relational way, e.g. lots of tables with lots of joins, or a json document approach with less joins.

And by just changing from BLOBS to JSON, those data vill actually be open for anyone that want to query the data directly…

It’s not huge differences to query a JSON table in a sqlite or postgresql or to query a relational type datastore.

I am usually the one with the big “demands” here by using Gramps way outside what it was ment for, and even I see the benefit of starting with the BLOBS. utilizing the JSON feature in sqlite, then eventually change the sql queries that needs to be changed for backends like postgresql and arangodb or orientDB or similar systems.

It will even be possible to write sync functions from sqlite to other databases with this one change.

And to be honest, with the little I have played with sqlite, postgresql and MS SQLServer, I think this little change will do a lot for those with larger datasets, regardless of if it is people, places, events or any other object that is in focus.
As long as it is stored using the JSON functions in the database engines used, and that there are indexes on those tables.

Regardless, when that work is done, tweaking the database and how Gramps query the database will be possible to change one step of the time when people see what could be better.

I tested an XML import with approx. 300k place name in hierarchy a few years ago, and I found that I had to go down to approx. 80k place names for Gramps to populate the view within a reasonable time, I don’t remember now but I think it made the Place list browsable after approx. 20-28 sec, the problem was that every time I did a change to anything, it took another 20-28 sec or more before I could use Gramps again.
In this test I only had approx. 3000 people and 200 sources, don’t remember the citation numbers because it was a test for a place database.

The 300k place name database was near impossible to import and use, and for the CSV import it took so long that I had to kill it.

All this was done with a Windows 11 pro, with a AMD 3900x CPU, 64GB ECC RAM, 2 NVME PCI-3 1TB SSD’s, one with system and software, the other with databases and documents, I tested the import with the csv and xml files on separate SSD, on system SSD, on a SATA-3 SSD and on a 7200 rpm HDD.
I stopped using the HDD because it was near impossible to get the import done, even the one with 80k place names.

I imported both to sqlite and to a local mongodb running the databases on the data SSD, there was not much difference in speed, but the mongodp-api had a little different response pattern than sqlite… but nothing significant…
One of the problems with the mongodb solution was that the driver was extremely outdated and was known to have som performance problems, if those was triggered in the way Gramps used it, I don’t know.