Error when importing a Gramps XML file

I’m trying to import a large Gramps XML file (>3 M lines) and get the following error report:

Error Details: 
===================

8536485: ERROR: dbloader.py: line 569: Failed to import database.
Traceback (most recent call last):
  File "C:\Program Files\GrampsAIO64-5.1.6\gramps\gui\dbloader.py", line 558, in do_import
    dbstate=self.dbstate))
  File "C:\Program Files\GrampsAIO64-5.1.6\gramps\plugins\importer\importxml.py", line 149, in importData
    info = parser.parse(xml_file, line_cnt, person_cnt)
  File "C:\Program Files\GrampsAIO64-5.1.6\gramps\plugins\importer\importxml.py", line 936, in parse
    self.p.ParseFile(ifile)
  File "C:/repo/mingw-w64-python3/src/Python-3.6.4/Modules/pyexpat.c", line 468, in EndElement
  File "C:\Program Files\GrampsAIO64-5.1.6\gramps\plugins\importer\importxml.py", line 3146, in endElement
    self.func(''.join(self.tlist))
  File "C:\Program Files\GrampsAIO64-5.1.6\gramps\plugins\importer\importxml.py", line 2678, in stop_event
    if self.event.get_description() == "" and \
AttributeError: 'NoneType' object has no attribute 'get_description'
9327775: ERROR: dbloader.py: line 569: Failed to import database.
Traceback (most recent call last):
  File "C:\Program Files\GrampsAIO64-5.1.6\gramps\gui\dbloader.py", line 558, in do_import
    dbstate=self.dbstate))
  File "C:\Program Files\GrampsAIO64-5.1.6\gramps\plugins\importer\importxml.py", line 149, in importData
    info = parser.parse(xml_file, line_cnt, person_cnt)
  File "C:\Program Files\GrampsAIO64-5.1.6\gramps\plugins\importer\importxml.py", line 936, in parse
    self.p.ParseFile(ifile)
  File "C:/repo/mingw-w64-python3/src/Python-3.6.4/Modules/pyexpat.c", line 468, in EndElement
  File "C:\Program Files\GrampsAIO64-5.1.6\gramps\plugins\importer\importxml.py", line 3146, in endElement
    self.func(''.join(self.tlist))
  File "C:\Program Files\GrampsAIO64-5.1.6\gramps\plugins\importer\importxml.py", line 2678, in stop_event
    if self.event.get_description() == "" and \
AttributeError: 'NoneType' object has no attribute 'get_description'
9698210: ERROR: dbloader.py: line 569: Failed to import database.
Traceback (most recent call last):
  File "C:\Program Files\GrampsAIO64-5.1.6\gramps\gui\dbloader.py", line 558, in do_import
    dbstate=self.dbstate))
  File "C:\Program Files\GrampsAIO64-5.1.6\gramps\plugins\importer\importxml.py", line 149, in importData
    info = parser.parse(xml_file, line_cnt, person_cnt)
  File "C:\Program Files\GrampsAIO64-5.1.6\gramps\plugins\importer\importxml.py", line 936, in parse
    self.p.ParseFile(ifile)
  File "C:/repo/mingw-w64-python3/src/Python-3.6.4/Modules/pyexpat.c", line 468, in EndElement
  File "C:\Program Files\GrampsAIO64-5.1.6\gramps\plugins\importer\importxml.py", line 3146, in endElement
    self.func(''.join(self.tlist))
  File "C:\Program Files\GrampsAIO64-5.1.6\gramps\plugins\importer\importxml.py", line 2678, in stop_event
    if self.event.get_description() == "" and \
AttributeError: 'NoneType' object has no attribute 'get_description'

System Information: 
===================

Gramps version: AIO64-5.1.6-1 
Python version: 3.6.4 (default, Jan 23 2018, 13:17:37)  [GCC 7.2.0 64 bit (AMD64)] 
BSDDB version: 6.1.0 (6, 0, 30) 
sqlite version: 3.21.0 (2.6.0) 
LANG: de_DE.UTF-8
OS: Windows

GTK version    : 3.18.9
gobject version: 3.26.1
cairo version  : (1, 16, 1)

I understand that Gramps appears to encounter the same problem three times and then finally calls it a day. Does anybody know what the problem could be? I’d assume that it is not a misformed XML entry since Gramps will usually detect any XML syntax problems and bring up a warning with the line number. I can open the file in Notepad++ and check the XML syntax there without any error message.

Thanks for your help!

Ulrich

This is possibly due to malformed XML. Does it validate against the schema?

How can I check that? As I said, the XML plugin for Notepad++ does not raise an error and the XML Notepad from Microsoft does not either.

https://www.gramps-project.org/wiki/index.php/Gramps_XML#Validating_Gramps_XML_file

Instructions are for Linux. But @UlrichDemlehner is using Gramps 5.1.6 on Windows.

1 Like

do you remember what version of Gramps produced the XML file?

I produced the XML file, not Gramps. I’m playing around a lot with the XML file which of course means that I’m creating a lot of malformed XML. My experience is that Gramps is quite good in identifying those problems and the line where they are located. So my assumption was that I did not encounter a malformed XML problem but something else. But of course this assumption may be wrong …

It might be well formed XML but still not valid. That is why we publish both DTD and RelaxNG schemas.

1 Like

Ok, I see. The XML plugin for Notepad++ supports the validation on Win10/64. This plugin does not find any malformed XML elements. If I try to run the validation, I’m greeted with this message box:

image

I have to confess I’m a XML illiterate so I have no idee what I should input here.

This looks like a missing event, and that may be one that couldn’t be found by its handle. And that can mean that it isn’t there, or the handle itself is malformed.

I’m assuming that Gramps expects that all references in an XML file are valid, because normally, they’re made by Gramps, and are always perfect, sort of.

Are you sure? When using a XML file that was made by Gramps, I get the same message box.

Would it be possible that I lost that event some time ago when experimenting with the XML file and Gramps did never say anything about this missing event when importing a XML file where this event was missing?

We don’t provide an XML Schema (XSD).

The choice is:

  • Document type definition (DTD)
  • RelaxNG (RNG)
1 Like

It is possible, but in that case I would expect that a Check & Repaiir would result in a better export. And I have never seen such a thing with a regular export from Gramps.

I practical situations, I only can say that every piece of software is based on assumptions, meaning that we don’t check every object when there are no external influences that can mess them up. Our code would be much larger and more difficult to read and maintain if we make it more paranoid. And that probably means that GEDCOM imports are more robust than XML ones.

What I write here is based on heuristics, and actual files may proove me wrong.

This appears to be a problem for the XML validator. The first lines of the XML file are:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE database PUBLIC "-//Gramps//DTD Gramps XML 1.7.1//EN"
"http://gramps-project.org/xml/1.7.1/grampsxml.dtd">
<database xmlns="http://gramps-project.org/xml/1.7.1/">

When running the XML syntax test it says that DTD is not allowed here and I have to correct this first before running the validation. So I simply delete lines 2 and 3 which obviously then leads to the problem that the validator does not know about the DTD and presents the message box with the two input fields.

I tried to input the URL http://gramps-project.org/xml/1.7.1/grampsxml.dtd into the first input field “Please select XML schema (XSD)”. That’s obviously ok since I get no error but after a short time a message box “No error detected”. So if I believe this, my XML file has been successfully validated against the schema.

Please note: what I described above happened both with “my” XML file and a XML file made by Gramps. So it appears that both files are ok.

To be honest, I’ve never tried any GEDCOM import that did not lead to major problems. But I have done a lot of XML imports without any problems at all. And when I had problems, Gramps was quite good in identifying them. It was always a malformed XML element quite easy to spot if you know where to look at in 3 mio lines. So I allow myself to disagree with you that GEDCOM imports are more robust than XML ones :wink:

But coming back to your arguments: I see your point and agree with the

part of it. It should be quite easy to check if an event is really missing. The XML file has all the events as objects with their respective handles. And it has all the links to events also with their handles. If I compare those two lists from the objects and the event links, a missing event should immediately pop up. Something to do tomorrow …

1 Like

The current schema version is 1.7.2, but unfortunately exports created specify 1.7.1. If I try to validate our example file against the 1.7.1 schema I get the following errors:

ERROR:RELAXNGV:RELAXNG_ERR_ELEMNAME: Expecting element attribute, got citationref
ERROR:RELAXNGV:RELAXNG_ERR_EXTRACONTENT: Element eventref has extra content: attribute
ERROR:RELAXNGV:RELAXNG_ERR_ELEMNAME: Expecting element eventref, got name
ERROR:RELAXNGV:RELAXNG_ERR_ELEMNAME: Expecting element lds_ord, got name
ERROR:RELAXNGV:RELAXNG_ERR_EXTRACONTENT: Element person has extra content: name

It does validate against 1.7.2 as expected though.

Does the backup (as opposed to export) create 1.7.2 ?

And as a followup to:

I’ve always has problems with .gpkg archive imports. Even with than a dozen media objects, they are always problematic importing between system. (Windows to another Windows is enough of a pain with different user names. Windows to Fedora is a nightmare.)

Backups are just exports. It shouldn’t cause a problem for most users though.

I’ll fix the bug and add another unit test to prevent this happening again in the future.

2 Likes

I made the following tests with the XML plugin for Notepad++:

Schema and namespace were provided by means of the message box with the two input fields
image

Each and every combination was ok. I guess this could either mean that the XML plugin is not reliable at least with my XML file, or that everything is really fine and I simply do not have elements in my data that will trigger Nick’s error message when validating against the 1.7.1 schema.

I tested this hypothesis by comparing the handles that appear in eventref links with the handles of event objects. No eventref link points to a non existing event object, so this hypothesis can be ruled out.

To test what Gramps would do in a situation where an event with an eventref link pointing to it is deleted in the XML file by purpose, I simply deleted such an event in the XML file (which had been produced by Gramps) and then imported the corrupted XML file into Gramps. Gramps has no problems at all to import this XML file but informs that the XML file was not complete. A note is created pointing to the missing event.

OK, good to know, and thanks for checking this. This does not rule out that there is a missing event though, because the error message says that in some object, there is supposed to be an event, for which the code tries to get the description text. And where it is quite normal that that text is empty, it is not normal that the event referenced by that object (self) is missing, And in that situation, where you see the text ‘self.event’ and the error says that that event is None, where a Python object would be expected, you need to know what that ‘self’ is.

This is the equivalent of a case where code expects that a person has an embedded name object, which is not there. If that happens, and ‘self’ is a person object, you may see a similar error where ‘self.name’ appears to be none, where the program tried to retrieve the given name, or something like that.

In both situations, the XML probably does not look malformed, in the sense that the brackets are there, and it will most likely have no misspelled tags either. And if the schema doesn’t have a rule that says that a person must have at least one name, the schema validation will not give an error either, so you’re stuck, in the dark.

You’re in the dark here, because the error message doesn’t show anything about the object that was being processed, meaning what that ‘self’ is. You can’t see what type of object it is, and you can’t see its handle either.

When you look at the code, at the line(s) mentioned in the error message, you may see what kind of object it’s about, but since you have thousands of them, that doesn’t help much either, except when you know that you changed one such object.

I think it’s safe to assume that you did not produce a malformed XML in the sense of missing brackets, or misspelled tags, but the error does suggest that you produced a sort of nonsense, in the sense that the semantics aren’t right.

And if you have no recollection of the exact ‘nonsense’ produced, your remedies are either to go back to an older file that did not trigger this error, or to figure out a way to debug this in a way that works for you. That may be by adding print statements to the code that show the object type and handle, or by using a real Python debugger. There are several free ones available on the web, and I use Visual Studio for that.