Import from SQLite export

Yes, I think that is the case indeed. These pickles make a lot of sense if you want to store and retrieve native Python objects fast, and in the old days, I don’t think that there were free SQL servers available, and certainly no such things that are fully embedded in DLL’s, like for SQLite, where you don’t have a real server process. Our old hardware was not powerful enough for that, so people put all the business logic in code, and used whatever tool there was, like BSDDB, to store their person and other objects fast. You can see that in a program like Brother’s Keeper, which fully relies on the old BTrieve engine.

It sounds like it’s somewhat larger than Ulrich’s database. Have you tried exporting it from Gramps to SQLite and then importing again? I’m curious if you’ll see the same slow performance. Of course you have different machines as well, but it would be interesting to know what happens.

George,

the indexes really make the difference! I created an index on each and every handle column I found in the SQLite export database. The re-import of the SQLite database back into Gramps will now run for some 15 - 20 min. That is still significantly slower compared with the import of the XML file but in an acceptable range.

Thank you for you comments!

Ulrich

1 Like

I tried, and gave up. Export took just a few minutes, but after two and a half hours of import, I’d seen enough, and killed it, because I wanted to get some sleep, and the progress bar was filled for about 1 millimeter. That was without the extra indexes that Ulrich made.

That’s good news, thank for trying it!

I think the performance could be improved further but it would involve a lot of rework (simplification, actually). Currently the import selects all rows from a given primary object table (such as person), then for each handle, selects rows from other tables (for example, to get lists of handles for related objects), and stitches all of the data together. I imagine SQLite could do all of this joining work faster on its own. This would require the creation of some SQL views. Then the import code would just select * from a person_view, for example, and receive all of the data needed to create all of the person objects in the new database. The performance of each view could be analyzed and improved within SQLite; the import program would not have as much to do with it.

Please take all of my comments with the understanding that I know a little about SQL and less about Python!

Thanks for trying. For comparison, have you ever tried exporting and re-importing it as Gramps XML? I haven’t studied the XML import, but I imagine the main reason it’s faster is because it can load the entire XML file and process it without making repeated external calls like the SQLite import.

Basically that is pretty similar to what I’m doing with the SQLite export. The database schema of the export is quite “fragmented” if I may use this term. This means of course that “doing something useful” with the data in the SQLite database means using a lot of JOINs, temporary tables and similar SQL stuff. This is not a real problem, it’s only a bit boring. So I developed a SQL script that I will run after the export to “stitch together” all the data I’m going to need for my downstream processing. At the end of the day this means that I’m roughly recreating the “fragmentation level” of the XML file in the SQLite database. As Enno said before: results count, not elegance :wink:

I agree but would like to point out that the real leap forward would be a good database backend for Gramps without the Python pickels. This would finally be the gravestone for all import and export performance discussions.

I tried that, and it is way faster indeed. I had to use that whenever I wanted to copy a tree from Linux to Windows, at the times when I still had those trees in BSDDB format, because the BSDDB versions for Linux and Windows are not compatible.

With SQLite, you can simply copy the contents of whatever folder that you can find inside the grampsdb folder, because the files that are left in that are either text or SQLite, and the latter is portable.

Note that there is no real need to load the entire XML file into memory in that case, and if you have enough RAM, your OS will cache that anyway. The main advantage is, that in the XML format, you don’t need to reassemble everything from the normalized tables that exist in the SQL export format. The XML file has the same contents as the main database, meaning that for eample a person object includes all the person’s names, and attributes, and lists of handles that reference associated sources, events, and other objects, so the only thing that the importer needs to do is read it into a Python object, pickle that, and write it to the database, accompanied by the handle and ID, so that the table can be indexed on those. This is a linear process, so there is no random disk access, like when you read data from a normalized database, and that is way faster.

The fragmentation mentioned by Ulrich is a natural consequence of normalisation, and is not a problem at all. It’s just how normalized databases work, and programs like My Heritage Family Tree Builder can perfectly deal with that, even when you have almost a million persons, like I have, when I import two different medieval GEDCOM files into a single database.

As far as I understand from the information on the Internet about the MongoDB backend, it’s available only for Linux but not for Win10. As I said, I’m a Windows guy, so this backend appears not to be a real solution for me. Did I miss something?

Sqlite would not be a problem if the multi thread functionality was enabled both for Gramps and the Sqlite database.

The database itself can serve multiple millions of entries (I don’t remember exactly now, but a few years ago there was a massive discussion about this in the forums). But no database is "better than its indexes.
And when the Blobs are gone, it would be possible to use the Simple Graph approach to create graph data of the gramps data, directly inside sqllite, and utilizing the Generated Columns feature of sqlite for json object to use sqlite as a (semi) document database for the json data and/or utilizing sqlite as a (semi) multi model database…

Postgresql would be a great backend when the Blobs are deprecated, especially if there was a way to use the geospatial addon for Postgresql.
With postgresql as the data store it will also most likely be possible to use the AgensGraph multi model database solution as a graph and document database solution…

Another extremely useful backend for advanced genealogical and other historical linked or relational data research would be a network graph database (or multi model databases) like neo4j, Nebula DB, ArrangoDB, ArcadeDB, OrientDB etc.
Multiple of them also has geospatial data “add-ons”.
And all of them can easily handle millions of nodes and edges and an unlimited amount of attributes for each node/edge, each with an unlimited amount of citations etc. etc.

I use it on Windows 11 with Gramps 5.1.5, you will find a zip file with an old and outdated driver in josib’s answer on this feature request: 0010491: [Gramps AIO & MacOS installers] Add prerequistes MongoDB [Wontfix][Windows use : MSYS2/ MacOS use: MacPorts] - Gramps - Bugtracker – Free Genealogy Software

You need to change the Gramps supported version and enable it in the Gramplet files etc.
Just don’t try to use this on a public/online database, because there are multiple security risks in this old driver.
I have asked for an update for the AIO, but it’s denied because “it’s experimental” and “no one uses it”.
There was one that had an updated driver for Windows a while back, I asked if he could share the zip, but have never got any answer.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.