Can someone point me to the correct JSON schema for Gramps?
I haven’t been able to find any documentation about it on the Gramps website.
I’m working on testing a new import, but I’m running into an error that seems to indicate Gramps is expecting certain values in the JSON data. Unfortunately, I don’t know what those expected fields or structures are supposed to be.
Here’s the error message I’m getting:
6210959: ERROR: dbloader.py: line 107: not enough values to unpack (expected 8, got 0)
Traceback (most recent call last):
File "C:\Program Files\GrampsAIO64-6.0.5\gramps\gui\dbloader.py", line 198, in read_file
db.load()
File "C:\Program Files\GrampsAIO64-6.0.5\gramps\gen\db\generic.py", line 834, in load
self._gramps_upgrade(dbversion, directory, callback)
File "C:\Program Files\GrampsAIO64-6.0.5\gramps\gen\db\generic.py", line 2784, in _gramps_upgrade
gramps_upgrade_14(self)
File "C:\Program Files\GrampsAIO64-6.0.5\gramps\gen\db\upgrade.py", line 1762, in gramps_upgrade_14
ValueError: not enough values to unpack (expected 8, got 0)
I can already tell we are beyond my skills for troubleshooting this. However, there’s so much here, maybe I can at least locate the (vb?) module writing the JSON. Then a developer will have a pre-targeted starting point.
The goal I am trying to get to is to make an importer that will use batching so that importing will be a lot faster. Since a lot of research I do requires database usage, I currently use SQL but I really want to get to a point of importing all the information I have into gramps and using it just one of the databases I maintain has about 200,000,000 people in it. However, the biggest limiter for me is the speed and time it would take to transition this data. I done a test just on a gedcom import of roughly 2,000,0000 people and it took about 4 days. However, when I transitioned the data to an XML file, converting it straight over from gedcom and importing the XML file it took 7 hours. I feel that we can def get this to a better spot.
I did that too. The gedcom was the “Catalog of life” from 2010.
My problem with your import is the language. The first use for gramps is LINUX.
Your project use the Microsoft’s Visual Basic which is a proprietary language. I’ll never accept this project be included in gramps. Please, use python.
I don’t have the knowledge base to code in python, and this project is just proof of concept. its something that could be looked into once I am done to help push gramps further along. Main goal for me is just to get this to work and then go from there.
Examples of reading Gramps XML and writing database files in other programming languages is almost a necessity to make Gramps part of a genealogical toolbox.
We proudly state that one of the main benefits of open source (in general) and Gramps (specifically) is that we do not trap a person’s data in our system.
So tools to handle our files in other programming languages (even those that are proprietary property of corporate overlords) extends this philosophy.
Before looking for clues to answer to your question, the screen captures reminds me of something.
Since you are working on validating import, you probably ought to use the View:Configure in each view mode to enable ALL the available columns of data in the Category.
(Noticed that your Place views didn’t have Title, Latitude, Longitude columns enabled. The latter 2 are essential for the Geography view to plot Event Places. Those views skip any events where the Place has no coordinates.)
@Nick-Hall Those docs repeatedly have blob_data BLOB lines. Do they nedd to be revised for JSON? (And does it now support reading both BLOB and JSON for backwards compatibility, while writing only JSON ?)
@jj0192 … You are aware that, while default database backend has been SQLite since Gramps 5.2, before that it was BSDDB? And that other database backends are possible (and likely for Gramps Web instances)? So, hopefully, you are verifying that before merge attempts?
To be honest, I have solely focused on SQLite at the moment. The main goal is to see if we can get some speed ups. while a conversion straight from gedcom to XML has made vast improvements on speed I am needing something that can handle bulk imports. From this I am hoping that someone will see this and create something that will integrate directly into gramps once its done. I hope that sometime today or tomorrow I will be at a point to attempt a much larger import just to test it.
That wasn’t a suggestion that you support other backends. Just that you might want to check the type so there’s no accidental corruption when writing nor attempt to read incompatible files.
I realize that you’re probably coding primarily for your own situation with those large databases. Still, your documentation is so extensive that you might have more in mind.
And there are already interpreter console gramplet plug-ins that allow running SQL and Python code on the current tree. So a vb (or r) console isn’t beyond imagination.
I appreciate the encouragement I wanted to give an update. there are still some things that need to change but here was a snippet of the outcome of my first major test:
[22:52:26] Starting import…
XML Path:XXXXXXXXX\data.gramps
Database Path: XXXXXX\SQLite.db
[22:52:26] Import engine created, beginning import…
[22:53:45] ✓ XML Parsing: 15493831 records in 78.80s
[22:53:46] ✓ Database Setup: 0 records in 0.26s
[22:53:52] ✓ Places: 215180 records in 6.46s
[22:53:52] ✓ Sources: 8704 records in 0.26s
[23:02:43] ✓ Events: 12692533 records in 530.70s
[23:02:43] ✓ Notes: 9145 records in 0.24s
[23:02:43] ✓ Media: 0 records in 0.00s
[23:05:25] ✓ People: 1909594 records in 162.19s
[23:05:53] ✓ Families: 658675 records in 27.92s
[23:09:05] ✓ Associations: 10280414 records in 192.01s
Very interesting even if I haven’t yet understood what you are trying to do with those data in Gramps. I’ve had a lot of discussions here regarding the performance of Gramps, and the final outcome was that databases with more than 100k individuals (or may be 200k if you have more patience than I usually tend to have) are currently off limits for Gramps as long as its technology is based on using blobs. My tests even convinced me to downgrade from v6.x (with SQLite as database backend) to v5.1.2 (with BDDSB) since the SQLite backend made life considerably more miserable by roughly halving the performance of (some) searches. My solution to this problem was that I built some R based software (I cannot write Python software) that basically transfers my Gramps data to a PostgreSQL database where I do the data wrangling stuff that’s not possible or too slow in Gramps.
So maybe you can spend a few minutes to explain what the scenario for using Gramps in your 200M individuals case would be? I’d be really interested in understanding it better.
The main reason I prefer Gramps is its interface and its ability to handle far larger datasets than RootsMagic, Ancestral Quest, or Legacy—I’ve used all of them. Gramps also has the flexibility to represent many different aspects of a person’s life, making it much more versatile than most genealogy programs.
Beyond its storage capabilities, Gramps offers one major advantage that the others do not: the ability to create custom tools and extensions. Even though I don’t personally code in Python, it wouldn’t be difficult to get help from people who do. With the large amount of data I’ve collected over the years, having the option to build my own tools makes it much easier to view, filter, and analyze specific pieces of information.
I originally planned to make my final update this week, but since I’m back on here, I can now go ahead and finish it. I’ve managed to reduce the processing time to about nine minutes. Without some form of multi-core or parallel processing, that’s probably the best achievable speed. Although Gramps doesn’t use Visual Studio, the goal was never to integrate my tool directly into Gramps, but rather to demonstrate a more efficient approach. Once I upload the new Visual Studio VB.NET code, I’m confident a developer could convert it back into Python if needed. There are still a few things that need to be completed, but even if the time increases slightly after conversion, it would still be significantly faster than the current process.