Sorry, I only read the initial post in this forum, which is about Windows support.
Apparently, this post has moved to a different topic.
Is it now about Gramps Web performance? Why? Most of the thread seems to refer to Gramps desktop. I’m confused.
Sorry, I only read the initial post in this forum, which is about Windows support.
Apparently, this post has moved to a different topic.
Is it now about Gramps Web performance? Why? Most of the thread seems to refer to Gramps desktop. I’m confused.
If you have followed the discussions about switching from BLOBs to JSON, you will know that this is only the first step toward achieving faster database transactions in Gramps. These improvements need to be introduced gradually. The next step might involve adding indexes, views, and functions directly within the database. However, it is important to remember that anything beyond basic SQL differs significantly between MySQL/MariaDB, SQLite, Oracle, PostgreSQL, and MS SQL Server. To make this work across different backends, developers need expertise not only in Python but also in the respective database query languages.
That said, I agree with you: in many cases, it would be beneficial to have a faster, “real” database engine as the backend. Nick has expressed openness to the idea of supporting multiple database formats in the future—or at least an interest in the concept. If enough people speak positively about such changes, they may eventually happen.
For those who want to run Gramps against a PostgreSQL database on Windows, one option is to use WSL2, install Gramps there, and connect to the database on the Windows host. I did this myself with both MongoDB and PostgreSQL while the MongoDB DB-API was still functional, but since I am not a developer, I was unable to get the updated MongoDB drivers working.
At that time, the practical speed differences compared to SQLite were negligible—whether running in the same environment or installed directly on the Windows host. I later switched from SATA SSDs to NVMe SSDs, which shaved off a few seconds, but the improvement was barely noticeable. And again, back then the backend was still based on BLOBs…
Nick and the developers listen and changed from BLOBS to JSON…
but you use Linux all the way, do you not?
We usually install different database backends to get better performance in general, even on Windows…
I can tune a MS SQL Server way better than I can with MariaDB or SQLite, actually even a JET database can get some speed and hold relatively big dataset if done right..
It only has to do with general performance specially with bigger datasets…
I don’t know if you remember but a few years ago I tried to do som performance tests on a 300K place dataset importing it to Gramps, it was not possible to do that except to the MongoDB backend directly.
86000 was absolutely max I got in running it on my Windows Workstation AMD platform, with more than enough RAM, NVME SSDs etc.
Gramps just stopped mid ways… both with CSV and XML as a input file…
My personal experience after nearly 50 years of software development tells me that you always come to the point where it makes more sense to start all over again instead of trying to implement gradual improvements. Everybody (me included) hates this point and the consequences if and when arriving at it, but the existence of this point is simply a fact of life.
But of course, this is not a decision I have to make and I trust in Nick and all the other guys that they will make the necessary and most appropriate decisions. They are doing a wonderful job and even if I keep pointing at the performance problem and how it is connected to the basic software architecture, I nevertheless keep telling everybody that Gramps is the software to use.
I see your point but I do not agree. I’m working a lot with database backends and the most significant quantum leap in performance happens when you use basic SQL querys with basic index support. As long as you are not trying to do some fancy things with SQL, nearly everything boils down to a classic “SELECT … FROM … WHERE … ORDER BY …” combined with a few JOINs. Ok, if you use JOINs, things may get a bit more complicated but at the end of the day, we’re not talking about databases with billions of records.
Right, this could be done, but why should it be done? Even if you use PostgreSQL, you’re still stuck in the basic software architecture of Gramps where the frontend code insists on digging through the haystack in search of the needle instead of having the database backend deliver the needle instead of a haystack. A change to another database backend alone will not change anything in this situation. My tests just proved this statements: the oldest database backend BSDDB has still the best performance since the frontend code and the backend are a perfect fit. It simply makes no sense to change the database backend as long as the frontend code does not change its approach as well. That’s the point I’m making now for some time and I hope Nick is listening ![]()
Yes. I only use Linux . It is the best OS unlike others.
You forgot to add, “In my opinion”, or perhaps, “for my needs”.
For me, Linux is not the best OS. It cannot run a certain piece of software that I use almost daily, not even using one of the many Windows emulators / virtual environments.
It’s a simple reason for this: earlier versions of Gramps were created for use with BSDDB, but that database engine was prone to severe stability issues under certain conditions. Before that, Gramps even used XML as its backend, and when BSDDB was adopted, the program was optimized specifically for that database type. That explains why BSDDB often appeared to perform best in practice, even though it had serious stability problems.
And when it comes to SQL, if you really want to achieve speed in a relational transactional database, you need much more than just “SELECT.” You need views to optimize queries, automatic functions and stored procedures, proper indexing strategies, and sometimes even partitioning. Without these, performance gains are marginal, and the complexity of maintaining cross‑backend compatibility (MySQL, PostgreSQL, SQLite, etc.) increases dramatically.
As for running Gramps on WSL2 as a workaround: Gramps was originally built for Linux, and Linux is far more flexible when it comes to Python and other types of development. Of course, version dependencies can be a nightmare—similar to the old “DLL hell” and ActiveX hell we had in earlier versions of Windows—but the Linux environment still provides a more natural fit for Gramps’ architecture.
Now that Gramps 6 uses JSON instead of BLOBs for serialized data in the database, it becomes much easier to move the logic that currently resides in the frontend (Python) over to the backend. Because Gramps’ queries have historically retrieved broad datasets for frontend filtering, backend features like indexes and views have limited effect; shifting logic server‑side enables precise, indexed result sets. This requires careful planning and gradual changes to get it right—and this is precisely what the transition from BLOBs to JSON unlocks: moving logic to the backend for significant performance gains while preserving stability in more robust database engines.
But to be completely honest, the real advantages at larger data scales will come from using graph databases and network‑graph logic as the backend. Genealogy data is fundamentally about relationships and traversal—object and document relationships—not transactional rows. With numbers like 200 million individuals, relational databases become impractical for genealogy queries (e.g., multi‑generation relationship paths and subgraph extractions). A graph or multi‑format backend is the most logical path forward at this scale.
I agree
“for my needs”.
We all have quirks and foibles and some operating systems and software match them better than others.
I have only one program I use regularly that has not been compiled for the Raspberry Pi processor and likely never to be otherwise I would move away from the intel range of processors. So that one is not system specific but hardware.
phil
We should not forget where we come from. As I described here in some posts, I “mirror” my Gramps data into a PostgreSQL database and I do not use any fancy things there besides basic indexes and one or two views (which of course are functionally nothing else than stored SELECTs and I wouldn’t really need those either). The performance gain compared with native Gramps is simply breathtaking, so that I have enough time to use a UI that is really not fast in any sense of the word. So Gramps currently defines quite a low bar in terms of performance and we should not make things more complicated as they really need to be. At the end of the day we are not in the multi-billion rows business of real Big Data, right? As so often we should ask the guy named Pareto and he will tell us that we will be able to achieve 80 % of performance gains with 20 % of the tools you mentioned. If you bother to hear my personal opinion (at this time of course without any proof but based on my experiences with PostgreSQL): we will reduce the run time of queries (including the UI overhead) to less than 5 % (probably to 2 or 1 %) of what we have today by using simple SELECTs with simple indexes on a modern database backend.
And again: I fully understand where Gramps comes from and why things are as they are. All what I say here is not something like “this is all BS”. I’m trying to point out some software strategic issues and where the strategic focus for development should be in my opinion. Gramps is by far the most advanced software to describe an individual’s life and his/her social environment, and I want to help staying on the top.
This is why I asked about using the SharedPostgreSQL addon with @DavidMStraub 's Gramps Web. (I had paid attention to the aggravation expressed by @StoltHD about there being problems with the newer incarnations of the other PostgreSQL database add-on.) The hope was that a (local) Gramps Web installation might allow to skip the “mirroring” portion of your workflow.
And the work that @jj0192 is doing with a higher performance external VB import of large scale .gramps and .ged files was likewise interesting for your stated objective. Although it adds a step, the time saving (4 days to 7 hours for a 2 million person import) more than pays for the extra task.
But if you’re sticking with BSDDB for any Gramps tasks, the Gramps Web route might not be viable.
I see and thanks pointing this out, but my problem with Gramps Web is that it appears to not support RegEx searches which are crucial for me. Besides that I need full access to the database backend with classical SQL statements and I’m not sure if this is possible, but I’ve never tried. The missing RegEx support was the real party crasher.
I guess I’m simply not understanding the point your’re trying to make here. I already have a highly efficient workflow from the Gramps XML file into PostgreSQL tables, so I do not see how this project would help me in any way.
The point here is: the “classic” Gramps desktop UI is a wonderful UI even if running on BSDDB. I resolved its main problem, the bad performance with large databases, by “mirroring” the data into PostgreSQL tables which I additionally use as a starting point for all downstream activities (and I have a lot of those). Switching to Gramps Web would introduce a completely new UI. This new UI is good, but in my humble opinion not better than the “classic” one. It’s missing some crucial capabilities I need, so this way makes no sense for me even if this means that I’m currently stuck with BSDDB.
But all you describe is exactly why the Gramps developers changed from BLOBs to JSON, in addition to responding to requests for direct database access. With the data in SQLite stored as JSON‑serialized strings, you can query the database with almost standard SQL—just as you do in PostgreSQL with a table structure.
The developers have already started this transition: first from BSDDB to SQL, then from BLOBs to JSON. Next they can begin moving logic to the backend and reducing or changing some of the frontend Python logic. And when I talk about backend logic in this context, I mean the logic being executed on the database side, which includes SQL queries but is not limited to them. The point is to move processing away from the frontend so that the receiving logic still gets the same variables and results, but with the heavy lifting done in the backend. This transition started a long time ago; I think the first time I asked about changing the BLOBs was just after I began using the 5.0.x versions, because I saw the benefit of using the experimental MongoDB backend. With that I could combine data in ways I couldn’t in Gramps alone.
And regarding the Pareto principle: it does not really apply here. Building new tables, removing frontend logic, and rewriting all code that references them would not be “20 % of the work for 80 % of the gain.” It would be closer to 100 % of the effort for maybe 80 % or even less of the performance improvement. By contrast, moving logic from the frontend to the backend only changes where the logic is executed. The receiving logic still gets the same variables and results as before. If you were to base everything on simple SQL, large parts of the Gramps frontend would have to be rewritten, which is far more disruptive than shifting the execution of queries to the backend.
And regarding the size: it was you who first mentioned 200M people, not me. Personally, I have “a few” research logs in Obsidian—around 300,000 place notes (some just names, some with extra information), 2,000 ship notes, a few thousand people notes, events, sources, etc. I’m building a journey event base for each ship between 1900 and 1939, roughly two journeys per ship per month. I link whatever relations I find with wikilinks and aliases, I work with both structured and unstructured text from a multitude of sources, all structured in a folder hierarchy based on the Gramps object model (people, families, events, places, sources, and more). I also integrate Zotero data into this pipeline. And I find what I’m looking for far faster there than in Gramps or in any SQL database today.
I would not gain any speed benefit from a transactional relational database for this workload, not even a well‑indexed one with lots of logic. But I do gain huge benefits from adding the data to a graph database—or even just applying graph logic in Obsidian.
I also want to emphasize why I use the phrase multitude of sources. Genealogical and social research always involves data from many different and often unstructured sources—church records, ship lists, census data, newspapers, archives, oral traditions, and more.
Trying to force all of this into transactional SQL tables is not only inefficient, it is conceptually misaligned.
A relational database in the SQL sense models relations between tables and columns, but genealogical relations are about connections between people, events, places, and sources.
These are semantic and historical links, not transactional ones. This is exactly why graph logic, or graph databases, are a better fit for this type of research data.
And we also need to acknowledge that the way many of us store data and relations in this type of research is not historically correct.
Genealogical and social data are often forced into structures that fit the database engine rather than the historical reality.
This mismatch is exactly why features like Main/Sub Events, Events for Places, and an extended repository with sources and citations utilizing CSL are far more important than shaving off a few seconds in a data import or query.
That is about how to correctly register historical data.
The speed discussion, by comparison, is just another first‑world problem.
If there ever were a reason for a full rewrite of Gramps, it would be precisely to move away from serial objects in BSDDB and toward nodes with document stores and edges in a graph or multi-model database. In the latter case, you could even continue to use most of the basic SQL in your queries.
So just be patient—more changes will come.
And if you really know PostgreSQL, then the best way to help is by contributing to the backend logic. In fact,
I think we are largely in agreement: performance improvements are important, but I also see that the changes are already underway.
Given that Gramps’ structure and logic are so robust as it is, there is no need for a total rewrite—the project can evolve step by step without losing stability. And not even utilizing a graph backend would actually require a whole rewrite.
A couple of points related to this thread:
i am not sure who this is, but maybe this is an experimental alternative for those who needs more speed?
He is Greg Lamberson who was behind the Better Gedcom initiative. His database backend is certainly worth looking at.
Could it be that he realized the moribundness of that initiative and that there exists already a much better Gedcom named Gramps XML? Just thinking …
I’ll give it a try even if I’m a bit sceptical based on the discussions we’ve had here about the basic software architecture issues (the blobs, the UI looking for the needle in the haystack instead of ordering a needle from the database backend etc).