This is pgAdmin now. The big peak may belong either to my attemps to switch to the person view or to the start of the new import. The list of locks displayed is still approx 100 items long. The import runs now with 1 - 2 transactions per second which is identical to the first import. The only difference I can see is that I now have much more peaks of 2 tx/sec and much more peaks of approx 1.3k tupels out. So it appears as something has changed with your new code but that the fundamental tx problem slowing everything down still exists.
Hi Ulrich,
This is really annoying, and I’m sorry you’re experiencing so many difficulties. This is a great solution, and I’m doing what I can to help you. However, I’m being called away to Cario on another project emergency, and I’m leaving now. I don’t expect to be back to work on this before Tuesday, but I’ll put it on the top of my lsit Tuesday morning and try to stop working all night which is what I’ve been doing. Going to Cairo and nipping this problemin the bud should allow me to do that.
So I’ll be back on the case Tuesday, and I’ll be able to focus more on it then. Right now I’m working on 2 things at once, so I’m not doing a great job of looking at all the details of what’s happening, and it shows in hte convoluted mess we’ve been going through.
So I’ll be in touch Tuesday.
Greg
No problem Greg! This is new and fascinating stuff with a great perspective, so I’m more than willing to spend some time on it. Would it help if I gave you my data so that you are free to test without having to wait for my feedback?
Christmas is coming up here so starting Wednesday morning, I’ll be forced to switch off my computer anyway.
This was quite a long discussion here so it may be helpful to add a short “interim report” where we currently are:
(1) Greg has been able to resolve all the login related problems. The code on Github is now fully functional to create a new Gramps tree or to login to an already existing one using PostgreSQL (in my case v 18) as database backend. The setup and authentication data necessary for the login are defined in the Gramps preferences, but it is still necessary to provide user id and password in a login dialog. Apparently there is no way to skip this login dialog.
(2) The import of a Gramps XML file (some 3.5M lines) is possible but is still much slower compared with the import of the same file on Gramps 5.1.6 on BSDDB and Gramps 6 on SQLite. So this performance problem has not been resolved yet.
(3) Effective working with the imported data in Gramps is virtually impossible due to performance issues. In particular I’ve not yet been able to display the person view of a tree with some 80k individuals and my assumption is that this is the consequence of the performance problems.
(4) If I access the PostgreSQL tables with dBeaver as database manager/frontend and run some JSON SQL queries on the tables, everything appears to be ok. Compared with “conventional” de-normalized PostgreSQL tables without JSON data, the queries are of course significantly slower but this is to be expected since JSON data are “expensive” in terms of performance. Those queries are orders of magnitude faster compared with queries run in Gramps. So it appears that the massive performance problems when using Gramps as frontend for a PostgreSQL database backend, are connected to Gramps and its code but not to PostgreSQL and its tables/indexes.
This has been known for a long time, that’s one of the main reasons why it was decided to move from pickled BLOBS to JSON.
That way they can start move frontend logic to the backend add indexes, functions and stored queries where needed, but this is work that takes time…
I have been waiting since 5.0 Beta for real function changes, like Main-/Sub- Events, Events for Places, a better Repository, Sources and Citation system that support CSL…
Nobody “needs” 300-500K or a million entities in a single database, if the speed is a problem, split your database to smaller logical databases until the developers have had the chance to change the logic both in the frontend and backend…
Well, I guess there are a lot of fields of research where you have such a database. In my example, I’m doing regional research in quite a small part of Bavaria you probably have never heard of, and my 80k individuals are only a small share of its inhabitants over four centuries. So I’d really be very cautious making a statement like you made. But of course, you’re entitled to your perspective of the world and what is necessary to have.
You still don’t need 500k or a million entities in a single database.
That’s the answer I got of most of the active users here and in the mail list when I started to suggest changes, you don’t need this or that, fork it and do it yourself etc. etc.
And that is what I did!, I found a workaround, until what’s limits my use of Gramps is changed…
I have census datasets with 800k and 2.7 million individuals, and I don’t use Gramps for that — Gramps is simply not optimized as a genealogy software solution for genealogy and social‑research work at those scales. I extract what I need with other tools, or I search the data using tools that are actually built for large‑scale analysis.
Alternatively, I would split the dataset into smaller, logical subsets.
You never research 200k, 400k, or even 80k people at the same time, and it’s easy to merge the data again if you need to export it. And if you were doing professional research, you would be using software solutions far better suited for this than a genealogy software solution built on SQLite or other transactional relational databases. A graph database — or even a multi‑model database — is much better suited for datasets of that size.
There’s also a reason why I’ve been advocating for the move from pickled blobs to JSON, and for more historically accurate research workflows in Gramps. Very early on, it became clear to me that even though SQLite and PostgreSQL can theoretically handle near‑unlimited amounts of data, Gramps itself is not optimized for datasets of that magnitude — just like most other genealogy software solutions. That’s also why I’ve suggested supporting either a graph‑based or multi‑model database backend in Gramps, and shifting the heavy frontend logic into the backend, whether that backend is SQLite, Neo4j, ArangoDB, or some other multi‑model database. A document/graph model is simply far better suited for serious genealogical research at scale.
This is also why I’ve proposed main/sub‑events, events attached to places, and a more robust repository/source/citation system with CSL support, along with export options to one or more open‑source, open‑data file formats. These changes would allow Gramps to become what it truly has the potential to be: a serious tool for large‑scale social and genealogical research.
But so far, very few people actually support this direction — even though many of those asking for various extra features are, in practice, asking for exactly what I’ve been proposing for years. Aside from Nick, I think maybe one or two others understand the advantages of what I’m suggesting. The rest tend to request changes that only solve narrow, single‑scenario use cases, and that still remain locked into a very limited lineage‑linked data model.
Even plain text formats — Markdown files, CSV, and other simple text‑based structures — handle datasets of that size better than Gramps does today. But that doesn’t mean Gramps can’t become significantly faster. The developers just need the opportunity to make changes to the DB‑API and the program logic step by step. Until they get there, split your datasets into logical subsets.
There are reasons why I use Obsidian and other tools as workarounds — precisely because Gramps has limitations for the kind of research I’m doing.
@glamberson @UlrichDemlehner Does this mean that you believe that PR 781 is sufficiently tested to merge/publish?
Since the major performance problems are not yet resolved, I would be hesitant to designate it as “sufficiently tested”. At the end of the day, what is the benefit of resolving login problems if you cannot work with the data after the succesful login? But that’s my personal opinion strictly from the user’s perspective.
