Rewrite Gramps XML importer?

Hi.
The title is a kind of “asking without expected answer”, because I do not have any plan to make a rewrite of the gramps importer.

I was just a little bit surprised how fast we could get a pseudo-serialization for primary objects, from a Gramps XML file. Sure, this is different to a commit on DB or checking during object creation. But it sounds like a simple DB dump.

I can try to copy something like the testcase used for testing (it looks like):

from xml.etree import ElementTree
      ...
     NAMESPACE = '{http://gramps-project.org/xml/1.7.2/}'
     ...
     tree = ElementTree.parse(filename)
     root = tree.getroot()
     ...
     for one in root.getchildren():
     ...
           for i in range(0, len(one.findall(NAMESPACE + 'event'))):
                event = one.findall(NAMESPACE + 'event')[i]
                lines.append(event.items())
                print('Event:', ElementTree.tostring(event))
            lazy = list(lines)
            print(lazy)

I suppose it should be “easy” to make a bridge to json or any way for a quick import of primary objects and related records.

For fun, I also looked at a pseudo timestamp revision control with a simple sum of timestamp values stored into Gramps XML. I know, this cannot be unique values when one removes some records but it is close to an UID for our Gramps XML file and backups.

            for two in one.iter():
                if two.get('change') != None:
                    timestamp_int.append(int(two.get('change')))
            timestamp_control_like = sum(timestamp_int)
            print('Timestamp sum and control %d:' % timestamp_control_like)

Are you suggesting exploration of optimizing the XML export to improve parsing performance and reducing overhead in subsequent imports?

Or is there another primary objective?

Sorry, this way (method) is now deprecated on python 3.9 and+.
Thank you, giotodibondone, for pointing this out.

Also, I should really think on a python version upgrade…

  1. XPath 1.0 support (Python 3.8)
  2. Namespaces improvements (Python 3.8+)
  3. Parser enhancements (Python 3.9)
  4. New indent() function (Python 3.9)
  5. Efficient Parsing with iterparse (Python 3.10)

Well, the primary idea, was to get something fast and less “complex” than Database Differences Report

(see also feature request #7228 for the history/genesis)

and Import Merge Tool


I do not know about exportation. Does to export table per table will improve time process for backup (no filter)?

An improvement could be explorer (explored?) on Database Differences Report, because there “is” limitations on metadata handling:

"Some of these ‘metadata’ are not ‘regular’ in the way they are stored, comparing them would be a fair amount of additional code work.

Also note that some of the ‘metadata’ would be difficult to compare because they are not stored in the backup XML files, but rather in the Gramps initialization files.

According to the author, not all changes are pertinent. For example an object that just has an updated timestamp is ignored, as are dates that may have irrelevant differences. Note that it will report changed order of list items as two different changes. It doesn’t say “tag list items 1 and 2 were switched” (which would be nice) but rather “tag list, #1 [ToDo]”, “tag list, #2 [Imported]” for item1 and “tag list, #1 [Imported]”, “tag list, #2 [To Do]” for item2. "

Maybe same questions, or issues for Import Merge Tool?

A second exploration could be on Primary objects filtering (isolation, standalone, raw tables). Looking at one “raw” table only (like gen.lib API). To get a sum of current change attributes in primary objects (either per table or on the complete Family Tree) might be a first ‘clue’ for finding the right backup file during a long session on one day (eg., many changes and backup files).

I do not really “care” of DB second indices! If can generate an alternate DB table for, per example, the events DB table, I suppose, need to have to calcultate the sum of the change attribute on events? I tested with a simple ‘sum’ but could be clever and smarter way or a more complex method.

As basic evidence, the logical way and use of change attribute, because the limited time log set on Gramps XML header or on name (date and time), may help, but not really on a long edition session with multiple backup or on a DB table corruption.

As example, I used lists into a global list. “We, do not know, maybe one record could ‘escape’!:face_with_raised_eyebrow:
The format could be different. eg., via json or xmltodict.

doc = xmltodict.parse(file, dict_constructor=dict)
xmltodict.unparse(doc, pretty=True)

or to look at advanced use, like:

mmmh, it seems that I am trying to track a request down since a while!
I guess that the central “Feature Request” was the #3885.

Currently, I am just exploring paths under Windows (blind check) and shutil stuff (posix OS)… That’s maybe my “hight level” coding stage! So, as a “cost on demand” feature for importing data from Gramps XML, could be done via Import gramplet (I suppose), I am rather navigating on the “old time” security side. Before AI, and its formating model. You can call this exploration as something like a “Emergency parachute” or “extra .gramps analysis”.

Finally, I suppose that while python 3.10 will not be available for any user, ElementTree module for python built-in will still have some limitations against lxml 3rd-party lib. “External” lxml (or shutil?) seems to have some problems for properly handling spacing characters with some “old” versions. So, some transition issues.
On the other side, no more excuse to not support .gramps on import for any other project dedicated to genealogy. :wink:

I’m sorry, I did not grasp the objective. So I asked Perplexity to re-write for a non-programmer (with extra attention to the differences between French sentence construction and English sentence construction.)

How much did it mangle your intended points?

At the moment, I’m focusing on testing file-path compatibility and data-handling reliability. This might sound like advanced coding, but it’s really about ensuring stability! For handling Gramps XML imports, we could add this as an optional feature in the Import Gramplet (a plugin-like tool). My priority is strengthening security measures — think of it as building an ‘emergency parachute’ or creating a backup analysis tool (like an ‘extra .gramps analysis’) to safeguard data integrity. This approach predates modern AI formatting models, so it’s more about foundational safety.*

On a technical note: Even though Python 3.10 isn’t universally available, we’re limited by the built-in ElementTree module’s capabilities compared to the third-party lxml library. However, older versions of lxml sometimes struggle with spacing characters in data, which could cause hiccups during transitions. That said, there’s no reason other genealogy projects shouldn’t fully support .gramps imports now! :wink:

I am Proud that Perplexity cannot match Jerôme’s model… :face_in_clouds:

Seriously, the primary objective was an exploration (more basic than deep). So, done for myself. The second one was to share or to spare time for investigations. +/- done for myself and shared as experimental addons. Now, I am looking where it could be useful and why there is only few or incomplete Gramps XML file format support, on “modern” genealogical applications? It is “easy” to import and write a set of valid Gramps XML file. Either on import or export. We all know that Gramps XML is for Gramps apps, but as the file format is documented and open, it should be easy to at least provide an import support. I thought there was a lack or poor documentation. But, as I was able to read it and parse some set of data, maybe there is no difficulties to handle this file format, either on import or export. Does Perplexity can find an answer for this?

1 Like

Perplexity’s rewrite (after 6 SIX! refining prompts where Perplexity was stubbornly misaligning perspective… it needed to remain from your perspective, not mine. ):

Objectives and Observations
I initially reworked the Gramps XML importer for personal exploration, focusing on a more basic than deep dive. This was done primarily for myself. My second objective was to share my investigations and thus save others some time. Which I achieved by creating experimental add-ons. Now, I am looking into where this work could be useful and why there is limited or incomplete support for the Gramps XML file format in modern genealogical applications.

Thoughts on Gramps XML Support
It seems straightforward to import and write valid Gramps XML files, whether importing or exporting. Given that Gramps XML is documented and open, I expected it to be easier for applications to at least provide import support. Initially, I thought the issue might be poor documentation, but since I was able to read and parse some data, I wonder if there are no significant difficulties in handling this format.

Seeking Insights
I am curious about why Gramps XML isn’t more widely supported and if there are specific challenges that I haven’t considered. Does anyone have insights into this?

1 Like

This will not match… I tried something like that with Mistral AI too, and the words or sentence construction can be close between any language. Even in french, such construction is not conventional or academic. There is a mix-up of simple speak and comments with direct and indirect directions. So, maybe ask Perplexity to analyse sentence per sentence or split all ideas marked between commas, then join something to get a resume. If there is a “strictness” level into your AI, maybe need to move between “Mediterranean flexibility” and “rigor of Western Europe”…

I know, that construction into a sentence in english, could be also very flexible. For the wording, we have dictionaries. There is maybe still a problem with linear formatting (or form). eg., when we switch from french or german to english, my custom local dictionary will break up! Not “always” in my mind…

Sometimes, we need to often look back to see if there is any typo or switch on text because words do not exist in english or are just missing on our “local AI dictionary”. A real problem for mobile device with web access. Sure, need format and model for a common exchange. Maybe AI needs some alternate “doors” or “ways”, like any fallback, before too many intrusive changes? Automatic tools vs Formatting Tools? Most annoying issues are not always cultural issues. Mobile devices (and OS), with their automatisation tools are generating many mistakes during writing action.

1 Like

Yeah, it is time to use good test and benchmark for AI…
Got the same “rewrite” with Mistral.

1 Like

Ultimately, it was simpler to let the data flow freely rather than providing support for all operating systems and platforms… Therefore, additional tests and workarounds for enhanced security focused on desktop environments rather than XML parsing. There might still be an issue with the icon stock, but it should function as intended within a Gramplet, which is a component of Gramps that uses Gtk icons. Initially, I thought the problem was related to a specific Gnome environment. However, since it works under the Pantheon desktop manager, it could also function outside common Gnome desktop environments.