Error using PostgreSQL as database backend with Gramps v5.2.2

Have you any idea how long it will take to get rid of the BLOBs? Just give me some hope please :wink:

2 Likes

A few years ago, I found a Karel de Grote GEDCOM on the web with more than 600,000 persons, which I imported into Gramps 3.4.9, I think, which ran on BSDDB. And when I moved to 4.x, I found no big differences in speed between the original and the SQLite version. And in some situations, I found that the SQLite version was even faster. I tested that, by running tasks like check & repair, and looking for duplicates, on my own tree, not that big one, and counting the seconds needed for such operations with a standard watch.

These results felt weird, because with the introduction of the Gramps Database API and SQL, there is an extra layer between the Python objects and the database, including an SQL engine, which we donā€™t really need, because the Gramps database is not relational.

For me itā€™s quite logical that things get slower when you switch to PostgreSQL running on localhost, because that means that you need to pass all data between Gramps and the PostgreSQL server process, running on the same machine. Passing that through a socket, with a process on the other end, is likely to cost more time than just sending a pointer to your data to another layer in the same process. Things would only be faster if the PostgreSQL server would be way more efficient than the SQLite code that is directly called by Gramps, and that is not the case. The SQLite name might be a bit deceptive here, because of the word light, which might suggest something like low power. And in fact, you can find SQLite in lots of products, like Thunderbird, and on your phone, if it runs Android.

1 Like

This has been a long-term goal for some time now. All the preparation work has been done.

I wrote a prototype in PR #800. Unfortunately Paul Culley found some performance issues, but it seems that he mostly found solutions for these. The other outstanding issue relating to the BSDDB backend is no longer relevant.

My plan is for a fairly quick release of v5.3, then perhaps we can consider this again for v5.4.

2 Likes

I did some informal testing with Gramps v5.x and found the SQLite backend a bit slower, so I switched back to BSDDB. But this was far from really signifikant, more a gut feeling. What I see now is that PostgreSQL and SQLite as backend have about the same performance but BSDDB is far ahead. So I do not really agree with your socket argument.

I think both PostgreSQL and SQLite would play on the same level, maybe PostgreSQL a bit ahead if I trust all what I read and hear about both DB systems, and both would be far ahead of BSDDB, but only if the Python frontend code would be perfectly fitted into the DB backend leveraging all the benefits of the backend. Currently we have the situation that the Python code leverages the benefits of BSDDB and ignores most of the benefits of other backends, and that of course must lead to the performance problems we see.

1 Like

I can only offer my personal opinion here. From an IT strategy perspective, getting rid of the BLOBs is the most important task I can identify. This would open the way to leverage the power of modern database backends which is not possible while the BLOBs are still being used.

Could we maybe ask for an explanation of
ā€œA quick release for 5.3ā€ Why? Is there a list of the planned
updates/improvements that are going to be made.
Are there so many known issues with 5.2 that require a quick release.
Would it be better to ignore 5.3 and focus the efforts on eliminating
BLOBS which rather than 5.4 might make the next release 6.
phil

GRAMPS: 5.1.6
Python: 3.10.12 (main, Nov 20 2023, 15:14:05)
BSDDB: 6.2.9 (5, 3, 28)
sqlite: 3.37.2 (2.6.0)
LANG: en_GB.UTF-8
OS: Linux
Distribution: 6.5.0-28-generic

There is a roadmap for v5.3, which is the current plan. The schedule will depend on the availability of volunteers, but as there are not many goals I donā€™t see why we canā€™t aim for a fairly quick release.

I will be happy to consider implementing a change in raw data format for v5.4 or v6.0.

Why a change in raw data format? Do you mean getting rid of the BLOBs?

Yes. At the moment we serialise the objects into a tuple containing simple types which are then pickled.

Thanks Nick must have missed that link, at least not as a result of issues with 5.2.

My worry was about the sheer number of requests outstanding as discussed recently being included.

Thank you for all your efforts and the same to the other Developers

phil

Another great backend for Gramps other than PostgreSQL would be a multi-model database like ArangoDB eller OrientDB.
PostgreSQL also have som addons for thisā€¦

I used the mongodb backend for a long time, but the driver was very outdated and to build a new installer for Windows with an updated driver was way over my headā€¦ and the few that did it never shared itā€¦

And I bet that 5.2 have a few to many changes in the schema for me to figure out how to get it to work with 5.2.x ā†’

Maybe I can figure it out with the help of CoPilot
I really liked the mongodb backend, because I could directly sync it with both neo4j and MS SQL Server and of course PostgreSQL. and in MongoDB all data was readableā€¦ And from neo4j I could query/ the data directly in Cytoscape and a few other network graph softwareā€¦ in addition to all the addons for both mongodb and neo4j.

1 Like

Are you sure about that? If BLOBs are replaces by JSON, you still need tools to extract data from those pieces of JSON, before you can do some nice analysis in SQL, like you can already do with the SQLite export, which gives you a full relational database, which has 99.9 % of the contents of your Gramps database. The missing 0.1 % are some things that are not exported, which might be favorites, or some settings, something that I forgot, but was mentioned in this board.

There is an importer too, but Gramps will probably only accept objects that have valid handles, meaning that any app that adds new objects needs to make sure that they are included.

1 Like

I do all my analysis work directly by extracting from the XML file. I used to work with the SQLite export quite a long time, but since the SQLite export is not part of the core Gramps project and therefore not guaranteed to be in the development focus (at least this is what I understood from some forum discussions), I decided to switch to the XML file as starting point. And I simply wanted to see if this is all working the way I expected.

A R script parses the XML file (basically as a simple text parser which is much more efficient compared with interpreting as XML document) and feeds all the information into a PostgreSQL database running on my Win10/64 machine. The PostgreSQL database is then the starting point for every downstream activity and since Iā€™m able to use the full power of PostgreSQL, queries that run in Gramps for nearly a full minute, are running only fractions of seconds there. The backside of this approach is that the PostgreSQL database is kind of ā€œnightly buildā€ since it needs the XML file that is automatically written when shutting down Gramps. So it is not fully up-to-date with the Gramps database but I can live with that. Of course it would be much better to have full access to the Gramps database but as long as this is not possible, my solution is probably the next best alternative.

I will go back to the last Gramps version with BSDDB support since running a Gramps database of my size (100k individuals, 250k events) is simply impossible. Besides the speed problem, I found a major and nearly incredible bug in v5.2.2 (which I still have to post to the bug tracker) that is a major nuisance for me.

If you want, itā€™s quite easy to test the quality of the SQLite export, by creating a local loopback, meaning importing the SQL file into another empty tree in Gramps. All you need to do then is a text diff on the XML extracted from the backup file, which is a process that you know already.

I occasionaly use a 600k tree here, saved as SQLite, and it loads reasonably fast for browsing and simple searches. Thatā€™s the Charlemagne tree that I once found on the web, originally created in PAF, which can still load it on a 20 year old Toshiba laptop with Windows 2000 and 64 MB RAM, and do some things faster than Gramps does on this HP games machine disguised as an office one. And PAF does that with its own database stored in a single file, just like its successor Ancestral Quest, and ā€˜newā€™ kid on the block RootsMagic. The latter uses SQLite too, but its full relational database makes many things way faster than Gramps, like looking for duplicates, and importing GEDCOMs for that matter.

I must add that Iā€™m still using 5.1.6, because I also found a few nasty bugs in 5.2.2, which are quite difficult to analyze without some digging into the code, or the XML. And I have better things to do, right now.

Iā€™m looking forward to your post on Mantis though.

Right, but look at it as a risk management decision. And as I said: I just wanted to know if my approach would work and it did :wink:

And thatā€™s the problem. Every software on the market with the sole exception of Gramps uses GEDCOM and its limitations, and if you need event types and roles that are not supported by GEDCOM (and probably more important the respective import/export routines) then youā€™re stuck. But I guess we already covered that topic.

Itā€™s already there (# 0013274) and itā€™s probably something youā€™d call ā€œa nasty bugā€. I donā€™t have any idea how this could happen. So I am also forced to go back to v5.1.6 because of the speed issue with PostgreSQL and SQLite as database backends and because of this bug. Thatā€™s really no fun, isnā€™t it?

This behaviour is the expected result of bug fix #13086. See pull request #1618.

Nick, please help me understand what the bux fix is saying. It appears to say that the bug when displaying those hyphenated surnames is fixed in 5.2. The bug fix appears to be that a secord version of the name is created with some spaces around hyphens simply eliminated. Iā€™d understand this as not a bug fix but a classical work-around that wil cause a lot of troubles elsewhere (as in my case where the spaces that were eliminated were completely ā€œlegalā€ and necessary). And why does the DWR use that second version of the surname and not the original one?

This is just a change to the name displayer, the underlying data remains unchanged.

The essential problem was the padding of the surname connector.

Consider the following two examples:

Surname 1: ā€œSmithā€
Connector: ā€œ-ā€
Surname 2: ā€œJonesā€

This was displayed as ā€œSmith - Jonesā€. Extra spaces were inserted that were not in the original data. The expected result was ā€œSmith-Jonesā€.

Surname 1: ā€œFernĆ”ndezā€
Connector: ā€œdeā€
Surname 2: ā€œCalderĆ³nā€

The extra spaces added by the name displayer here gives ā€œFernĆ”ndez de CalderĆ³nā€, which is the expected result.

I expect that most people donā€™t want to enter " de " with the extra spaces. Perhaps there is a better fix though.

But you understand that this fix is creating major problems and ā€“ sorry if Iā€™m a bit rude now ā€“ isnā€™t a fix at all? A name that has been entered with spaces around hyphens should not appear without those spaces. This is not a fix but a new bug.

If I enter a name as a single string and not two or more separate strings, it should never show up in a ā€œsurname connectorā€ whatever that is, since it should not be connected to anything. I would say that this is the real bug, i e not identifying correctly which names should be connected and which ones not.

Nick, I really appreciate all the fantastic work the people in the Gramps project have done, so please understand my comments not as defeatism on my part, but as an attempt to help.

The reporter of bug #13086 identified a problem and submitted a fix for it. The pull request was open for review for about three months before it was merged, and even after that we had a beta and release candidate. Nobody raised a concern, so I regard this as a minor issue.

Why donā€™t you discuss this with the author of the fix? Maybe you will come up with a solution that you are both happy with.