JSON inside table

In v6.0, Gramps converted from using BLOBs to JSON for object storage.

I fully agree. I’ve always stated my opinion here and anywhere else that Gramps is by far the most advanced and sophisticated software to describe a person’s life.Therefore, it’s really frustrating to see that Gramps cannot leverage its full potential until the blobs are gone and the UI is re-written to fully leverage the performance of modern database backends. Once this is completed, a 200M dataset will still be a challenge but it will be possible.

Having said this, I still do not see how you may be able to work with a 200M dataset in Gramps, e g by searching for individals, by adding events to an individual’s life and so on. How do you envision this work?

Besides the fact that I cannot code in Python either, I see some parallels. I do a lot of data wrangling with the Gramps data. Instead of writing Gramps extensions, I’m using PostgreSQL with the Gramps data having been transfered into SQL tables. This allows me to use any technology that is able to access PostgreSQL. Gramps is basically only the GUI for PostgreSQL to input or modify data.

Right, I’m aware of this, but this did nothing to address the performance issue. I hope it is a stepping stone towards a better integration of database backends.

To be honest, I haven’t gotten far enough to work with all 200 million people. The first major hurdle is the data import, which would be a nightmare at that scale. In my testing, importing about 2 million individuals through GEDCOM took roughly four days. After converting the data to XML, that same import dropped to around seven hours.

Even so, importing my full database into Gramps right now isn’t realistic. At the XML rate, it would still take an astonishing 700 hours. That said, I do see the potential. Gramps’ interface is far more user-friendly than working directly in SQL all the time, and that makes it an appealing long-term solution despite the current limitations.

Ok, I see. Just to give you a figure: for some reasons, I’m regularly importing a Gramps XML file with 3.5M lines that covers some 80k individuals with their events, places, citations, sources, notes etc. This will run some 5 - 10 minutes. So just enough time for a coffee break :wink:

For that amount of data points, there is really only one type of database backend that makes sense: a graph database, combined with network graph algorithms to query the data. Genealogy data is not a transactional format, so even with PostgreSQL, trying to handle 200 million individuals in a traditional relational setup is impractical.

I can look into this. Most of what I have is collected trees and other data. I appreciate the feedback.

I have committed all the code that was created to GitHub and reduced the amount of documentation. Although there are still tasks to be completed, I hope someone finds this project both interesting and useful. Again, it’s not yet complete, but it’s a good starting point.

1 Like

Maybe you find this useful…

thanks, I am sure there are things in here I was not able to cover in my application. I appreciate it.

High-performance batch importer

for genealogy data from Gramps XML (schema 1.0) to Gramps SQLite database (database format 21)
Version 1.0.0 by Jeff Jones in Visual Basic .NET (VB.NET)

Commit 605ca90 12 Dec 2025: Removed a lot of extraneous .md files, updated form1, current time for almost 2 million people is about 9 minutes the total imported data i almost 16 million items.

1 Like

In the README.md, Jeff notes that the main difference is that (for his test with 815k records) the batch application does 1 commit to disk. Gramps, on the other hand, commits each record separately. So it does 815k disk actions.

Is this is similar to when the progress bar graph in the statusbar was refreshing after every record. Or when Deletes or Tagging of a mass selection where calling for a view refresh after every deleted record or tag addition.

PR 997 (“Faster Multiple Person Delete”) by @prculley increased the delay in the statusbar refresh interrupt to curtail the excessive refreshes. His test selecting all Persons in the People flat view and deleting demonstrated the improvement: original code 21.5 minutes; with PR, 4 seconds. (Also see https://gramps-project.org/bugs/view.php?id=12186 )

@Nick-Hall, @dsblank and @DavidMStraub ; is there an opportunity here that could reduce the Import Time for Gramps for Desktops and Gramps Web. Such as collating 1000 (or 500 to allow for padding in the Python 1,000 record limit) records at a time in memory using a proxy and writing them to disk in blocks?

While having a Gramps for desktops unavailable during an import only inconveniences a single user and only slightly because they are AWARE of the other activity. But having the server unavailable for Gramps Web for extended periods will lose collaborators because they will be unable to plan for unavailability. Instead, they will mistakenly assume that Gramps Web is flaky.

I’m not sure all of those words (such as blocks, padding and proxy) line up with the way we use those terms in Gramps. But if you are asking what the fastest way to get data into a SQLite or Postgreqsl database, there are probably tweaks to be made. Increasing the number of items in a “commit” probably has tradeoffs between: memory, safety, and speed.

I haven’t looked at the speed of import in a long time. I am currently finishing up a round of exploration of all of the filters (250+) in gramps and seeing how we can best take advantage of the new JSON.

2 Likes

Yep. I’m stepping out of my comfort zone.

I will gladly edit (with delete insert markup) if you advise of more correct words/phrasing.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.