Note, you will have to comment or disable all next and previous lines with self.text.set_text()
because it will be so fast that ‘pretty printing’ the XML content, should be at the end of the run…
>>>from xml.etree import ElementTree
>>>tree = ElementTree.parse(Path('filename')) #sample with example.gramps
>>>root = tree.getroot()
>>>NAMESPACE = '{http://gramps-project.org/xml/1.7.2/}'
>>>print(root.find(NAMESPACE + 'header'))
>>>print(ElementTree.tostring(root.find(NAMESPACE + 'header')))
>>>for nf in root.find(NAMESPACE + 'name-formats'):
>>> print(ElementTree.tostring(nf))
will return something like this:
<Element '{http://gramps-project.org/xml/1.7.2/}header' at 0x7f3d9a73a278>
b'<ns0:header xmlns:ns0="http://gramps-project.org/xml/1.7.2/">\n <ns0:created date="2025-03-18" version="6.0.0" />\n <ns0:researcher>\n <ns0:resname>Alex Roitman,,,</ns0:resname>\n </ns0:researcher>\n <ns0:mediapath>{GRAMPS_RESOURCES}/example/gramps</ns0:mediapath>\n </ns0:header>\n '
b'<ns0:format xmlns:ns0="http://gramps-project.org/xml/1.7.2/" active="1" fmt_str="SURNAME, given (common)" name="SURNAME, Given (Common)" number="-1" />\n '
...
We have the children hierarchy and content by calling the matching tag Element.
etc.
There is just some minor changes on behavior and xpath handling between ElementTree and the API for lxml. Same for the counting according to their versions.
Also, XML attribute matching does not always make sense as the parent tag/object keeps them as items (Element into the tree). That’s also why I tried to be more direct as possible. find() will match the first tag into the hierarchy. This works fine for Primary objects like Events, People, etc. findall() was used for counting, and maybe if one wants to make a specific iteration or pseudo-mapping. I did not look at others functions as I am not using the most recent python versions (3.10 or 3.12).
>>>for one in root:
>>> print(one)
<Element '{http://gramps-project.org/xml/1.7.2/}header' at 0x7f6e66cad278>
<Element '{http://gramps-project.org/xml/1.7.2/}name-formats' at 0x7f6e6cb5f188>
<Element '{http://gramps-project.org/xml/1.7.2/}tags' at 0x7f6e6cb5f2c8>
<Element '{http://gramps-project.org/xml/1.7.2/}events' at 0x7f6e6cb5f458>
<Element '{http://gramps-project.org/xml/1.7.2/}people' at 0x7f6e651e9cc8>
<Element '{http://gramps-project.org/xml/1.7.2/}families' at 0x7f6e5fbe3f48>
<Element '{http://gramps-project.org/xml/1.7.2/}citations' at 0x7f6e5f8787c8>
<Element '{http://gramps-project.org/xml/1.7.2/}sources' at 0x7f6e5eb90868>
<Element '{http://gramps-project.org/xml/1.7.2/}places' at 0x7f6e5eb963b8>
<Element '{http://gramps-project.org/xml/1.7.2/}objects' at 0x7f6e5e84f368>
<Element '{http://gramps-project.org/xml/1.7.2/}repositories' at 0x7f6e5e84fb38>
<Element '{http://gramps-project.org/xml/1.7.2/}notes' at 0x7f6e5e8556d8>
<Element '{http://gramps-project.org/xml/1.7.2/}bookmarks' at 0x7f6e5e8705e8>
<Element '{http://gramps-project.org/xml/1.7.2/}namemaps' at 0x7f6e5e870818>
for one in root:
print(one)
for two in one:
print(two)
for three in two:
print(three)
<Element '{http://gramps-project.org/xml/1.7.2/}header' at 0x7feaa1100278>
<Element '{http://gramps-project.org/xml/1.7.2/}created' at 0x7feaa1100a98>
<Element '{http://gramps-project.org/xml/1.7.2/}researcher' at 0x7feaa1100408>
<Element '{http://gramps-project.org/xml/1.7.2/}resname' at 0x7feaa2f25e08>
<Element '{http://gramps-project.org/xml/1.7.2/}mediapath' at 0x7feaa2f25f48>
<Element '{http://gramps-project.org/xml/1.7.2/}name-formats' at 0x7feaa2f25ef8>
<Element '{http://gramps-project.org/xml/1.7.2/}format' at 0x7feaa2f25e58>
<Element '{http://gramps-project.org/xml/1.7.2/}tags' at 0x7feaa2f25d18>
<Element '{http://gramps-project.org/xml/1.7.2/}tag' at 0x7feaa2f25b38>
<Element '{http://gramps-project.org/xml/1.7.2/}tag' at 0x7feaa2f25cc8>
<Element '{http://gramps-project.org/xml/1.7.2/}events' at 0x7feaa2f25c28>
<Element '{http://gramps-project.org/xml/1.7.2/}event' at 0x7feaa2f25b88>
<Element '{http://gramps-project.org/xml/1.7.2/}type' at 0x7feaa2f259f8>
<Element '{http://gramps-project.org/xml/1.7.2/}dateval' at 0x7feaa2f258b8>
<Element '{http://gramps-project.org/xml/1.7.2/}place' at 0x7feaa2f256d8>
<Element '{http://gramps-project.org/xml/1.7.2/}description' at 0x7feaa2f25868>
<Element '{http://gramps-project.org/xml/1.7.2/}event' at 0x7feaa2f257c8>
<Element '{http://gramps-project.org/xml/1.7.2/}type' at 0x7feaa2f25778>
<Element '{http://gramps-project.org/xml/1.7.2/}description' at 0x7feaa2f25728>
<Element '{http://gramps-project.org/xml/1.7.2/}event' at 0x7feaa2f255e8>
<Element '{http://gramps-project.org/xml/1.7.2/}type' at 0x7feaa2f25638>
<Element '{http://gramps-project.org/xml/1.7.2/}dateval' at 0x7feaa2f252c8>
<Element '{http://gramps-project.org/xml/1.7.2/}place' at 0x7feaa2f25318>
<Element '{http://gramps-project.org/xml/1.7.2/}description' at 0x7feaa2f25368>
<Element '{http://gramps-project.org/xml/1.7.2/}event' at 0x7feaa2f254a8>
<Element '{http://gramps-project.org/xml/1.7.2/}type' at 0x7feaa2f25458>
etc.
Sorry, there was an old remaining issue with whitespace under some common default paths, like Program Files
(Windows OS), Application Support
(Mac OS), Google Drive
, etc. and gzip(p)ed Gramps XML file.
Feel free to improve, modify the etree gramplet
. The “playground” used for the above testing is under the def parse_xml()
section. Less than ten lines under a python console should provide the expected information too.
for three in two:
print(three, three.tag, three.tail, three.attrib, three.items())
will return:
…
{'hlink': '_a5af0ebb9cb14a540b8', 'role': 'Primary'} [('hlink', '_a5af0ebb9cb14a540b8'), ('role', 'Primary')]
<Element '{http://gramps-project.org/xml/1.7.2/}childof' at 0x7f6483f7fd68> {http://gramps-project.org/xml/1.7.2/}childof
{'hlink': '_PQLKQCZXJL39KAJ927'} [('hlink', '_PQLKQCZXJL39KAJ927')]
<Element '{http://gramps-project.org/xml/1.7.2/}citationref' at 0x7f6483f7fdb8> {http://gramps-project.org/xml/1.7.2/}citationref
{'hlink': '_c140d26615d00343a2a'} [('hlink', '_c140d26615d00343a2a')]
<Element '{http://gramps-project.org/xml/1.7.2/}first' at 0x7f6483f7fc78> {http://gramps-project.org/xml/1.7.2/}first
{} []
<Element '{http://gramps-project.org/xml/1.7.2/}surname' at 0x7f6483f7fcc8> {http://gramps-project.org/xml/1.7.2/}surname
{} []
<Element '{http://gramps-project.org/xml/1.7.2/}gender' at 0x7f6483f7fe58> {http://gramps-project.org/xml/1.7.2/}gender
{} []
<Element '{http://gramps-project.org/xml/1.7.2/}name' at 0x7f6483f7fea8> {http://gramps-project.org/xml/1.7.2/}name
{'type': 'Birth Name'} [('type', 'Birth Name')]
<Element '{http://gramps-project.org/xml/1.7.2/}eventref' at 0x7f6483f7ff98> {http://gramps-project.org/xml/1.7.2/}eventref
...
{'hlink': '_a5af0ed68655861efbe', 'role': 'Family'} [('hlink', '_a5af0ed68655861efbe'), ('role', 'Family')]
<Element '{http://gramps-project.org/xml/1.7.2/}childref' at 0x7f6483899db8> {http://gramps-project.org/xml/1.7.2/}childref
{'hlink': '_ID6KQC0QKF8901H8ZG'} [('hlink', '_ID6KQC0QKF8901H8ZG')]
<Element '{http://gramps-project.org/xml/1.7.2/}childref' at 0x7f6483899e08> {http://gramps-project.org/xml/1.7.2/}childref
{'hlink': '_3LWKQCO1STR7E2WKB5'} [('hlink', '_3LWKQCO1STR7E2WKB5')]
<Element '{http://gramps-project.org/xml/1.7.2/}childref' at 0x7f6483899e58> {http://gramps-project.org/xml/1.7.2/}childref
{'hlink': '_PLWKQCF4RWXWG1G60A'} [('hlink', '_PLWKQCF4RWXWG1G60A')]
<Element '{http://gramps-project.org/xml/1.7.2/}childref' at 0x7f6483899ea8> {http://gramps-project.org/xml/1.7.2/}childref
{'hlink': '_EMWKQC03WYSNOW7OS2'} [('hlink', '_EMWKQC03WYSNOW7OS2')]
<Element '{http://gramps-project.org/xml/1.7.2/}childref' at 0x7f6483899ef8> {http://gramps-project.org/xml/1.7.2/}childref
{'hlink': '_PNWKQC1MHXVPWXURT3'} [('hlink', '_PNWKQC1MHXVPWXURT3')]
<Element '{http://gramps-project.org/xml/1.7.2/}childref' at 0x7f6483899f48> {http://gramps-project.org/xml/1.7.2/}childref
{'hlink': '_NUWKQCO7TVAOH0CHLV'} [('hlink', '_NUWKQCO7TVAOH0CHLV')]
<Element '{http://gramps-project.org/xml/1.7.2/}childref' at 0x7f6483899f98> {http://gramps-project.org/xml/1.7.2/}childref
{'hlink': '_AVWKQCFEVZ1VAPVY8O'} [('hlink', '_AVWKQCFEVZ1VAPVY8O')]
<Element '{http://gramps-project.org/xml/1.7.2/}childref' at 0x7f64838a0048> {http://gramps-project.org/xml/1.7.2/}childref
{'hlink': '_3XWKQCDDBNSGVE84ET'} [('hlink', '_3XWKQCDDBNSGVE84ET')]
<Element '{http://gramps-project.org/xml/1.7.2/}childref' at 0x7f64838a0098> {http://gramps-project.org/xml/1.7.2/}childref
{'hlink': '_SXWKQCHK1ZFY3K3U27'} [('hlink', '_SXWKQCHK1ZFY3K3U27')]
<Element '{http://gramps-project.org/xml/1.7.2/}citationref' at 0x7f64838a00e8> {http://gramps-project.org/xml/1.7.2/}citationref
{'hlink': '_c140d28db3d175718fb'} [('hlink', '_c140d28db3d175718fb')]
<Element '{http://gramps-project.org/xml/1.7.2/}rel' at 0x7f64838a0188> {http://gramps-project.org/xml/1.7.2/}rel
{'type': 'Married'} [('type', 'Married')]
<Element '{http://gramps-project.org/xml/1.7.2/}father' at 0x7f64838a01d8> {http://gramps-project.org/xml/1.7.2/}father
{'hlink': '_ZDPKQCR0W4EC0JYQ0H'} [('hlink', '_ZDPKQCR0W4EC0JYQ0H')]
<Element '{http://gramps-project.org/xml/1.7.2/}mother' at 0x7f64838a0228> {http://gramps-project.org/xml/1.7.2/}mother
{'hlink': '_HDPKQCVUZ1TN61K6DS'} [('hlink', '_HDPKQCVUZ1TN61K6DS')]
<Element '{http://gramps-project.org/xml/1.7.2/}eventref' at 0x7f64838a0278> {http://gramps-project.org/xml/1.7.2/}eventref
...
{'hlink': '_c9658726f7b5d7a6086246c1242'} [('hlink', '_c9658726f7b5d7a6086246c1242')]
<Element '{http://gramps-project.org/xml/1.7.2/}ptitle' at 0x7f6482158408> {http://gramps-project.org/xml/1.7.2/}ptitle
{} []
<Element '{http://gramps-project.org/xml/1.7.2/}pname' at 0x7f6482158458> {http://gramps-project.org/xml/1.7.2/}pname
{'value': 'Bennington'} [('value', 'Bennington')]
<Element '{http://gramps-project.org/xml/1.7.2/}placeref' at 0x7f64821584f8> {http://gramps-project.org/xml/1.7.2/}placeref
{'hlink': '_c96587264e44365e02812c02bbe'} [('hlink', '_c96587264e44365e02812c02bbe')]
<Element '{http://gramps-project.org/xml/1.7.2/}ptitle' at 0x7f6482158598> {http://gramps-project.org/xml/1.7.2/}ptitle
{} []
<Element '{http://gramps-project.org/xml/1.7.2/}pname' at 0x7f64821585e8> {http://gramps-project.org/xml/1.7.2/}pname
{'value': 'Shawnee'} [('value', 'Shawnee')]
<Element '{http://gramps-project.org/xml/1.7.2/}placeref' at 0x7f6482158688> {http://gramps-project.org/xml/1.7.2/}placeref
...
<Element '{http://gramps-project.org/xml/1.7.2/}bookmark' at 0x7f6481fee688> {http://gramps-project.org/xml/1.7.2/}bookmark
{'target': 'person', 'hlink': '_AWFKQCJELLUWDY2PD3'} [('target', 'person'), ('hlink', '_AWFKQCJELLUWDY2PD3')]
<Element '{http://gramps-project.org/xml/1.7.2/}bookmark' at 0x7f6481fee6d8> {http://gramps-project.org/xml/1.7.2/}bookmark
{'target': 'person', 'hlink': '_35WJQC1B7T7NPV8OLV'} [('target', 'person'), ('hlink', '_35WJQC1B7T7NPV8OLV')]
<Element '{http://gramps-project.org/xml/1.7.2/}bookmark' at 0x7f6481fee728> {http://gramps-project.org/xml/1.7.2/}bookmark
{'target': 'person', 'hlink': '_Q8HKQC3VMRM1M6M7ES'} [('target', 'person'), ('hlink', '_Q8HKQC3VMRM1M6M7ES')]
<Element '{http://gramps-project.org/xml/1.7.2/}bookmark' at 0x7f6481fee778> {http://gramps-project.org/xml/1.7.2/}bookmark
{'target': 'family', 'hlink': '_9OUJQCBOHW9UEK9CNV'} [('target', 'family'), ('hlink', '_9OUJQCBOHW9UEK9CNV')]
[]
<Element '{http://gramps-project.org/xml/1.7.2/}map' at 0x7f6481fee8b8> {http://gramps-project.org/xml/1.7.2/}map
{'type': 'group_as', 'key': 'Fernández', 'value': 'Fernandez'} [('type', 'group_as'), ('key', 'Fernández'), ('value', 'Fernandez')]
etc.
UPDATE
I enabled some options (Boolean and String). This could be also more flexible for debug or any print statement. One option will dump the file and will print something like that:
{http://gramps-project.org/xml/1.7.2/}database = None [ObjectifiedElement]
{http://gramps-project.org/xml/1.7.2/}header = None [ObjectifiedElement]
{http://gramps-project.org/xml/1.7.2/}created = '' [StringElement]
* date = '2025-03-18'
* version = '6.0.0'
{http://gramps-project.org/xml/1.7.2/}researcher = None [ObjectifiedElement]
{http://gramps-project.org/xml/1.7.2/}resname = 'Alex Roitman,,,' [StringElement]
{http://gramps-project.org/xml/1.7.2/}mediapath = '{GRAMPS_RESOURCES}/example/gramps' [StringElement]
{http://gramps-project.org/xml/1.7.2/}name-formats = None [ObjectifiedElement]
{http://gramps-project.org/xml/1.7.2/}format = '' [StringElement]
* number = '-1'
* name = 'SURNAME, Given (Common)'
* fmt_str = 'SURNAME, given (common)'
* active = '1'
{http://gramps-project.org/xml/1.7.2/}tags = None [ObjectifiedElement]
{http://gramps-project.org/xml/1.7.2/}tag = '' [StringElement]
* handle = '_bb80c229eef1ee1a3ec'
* change = '1288512479'
* name = 'complete'
* color = '#076780873bf0'
* priority = '1'
{http://gramps-project.org/xml/1.7.2/}tag = '' [StringElement]
* handle = '_bb80c2b235b0a1b3f49'
* change = '1288512442'
* name = 'ToDo'
* color = '#efb60c280c28'
* priority = '0'
{http://gramps-project.org/xml/1.7.2/}events = None [ObjectifiedElement]
{http://gramps-project.org/xml/1.7.2/}event = None [ObjectifiedElement]
* handle = '_a5af0eb667015e355db'
* change = '1284030602'
* id = 'E0000'
{http://gramps-project.org/xml/1.7.2/}type = 'Birth' [StringElement]
{http://gramps-project.org/xml/1.7.2/}dateval = '' [StringElement]
* val = '1987-08-29'
{http://gramps-project.org/xml/1.7.2/}place = '' [StringElement]
* hlink = '_08TJQCCFIX31BXPNXN'
{http://gramps-project.org/xml/1.7.2/}description = 'Birth of Warner, Sarah Suzanne' [StringElement]
{http://gramps-project.org/xml/1.7.2/}event = None [ObjectifiedElement]
* handle = '_a5af0eb696917232725'
* change = '1284030602'
* id = 'E0001'
{http://gramps-project.org/xml/1.7.2/}type = 'LVG' [StringElement]
{http://gramps-project.org/xml/1.7.2/}description = 'Custom FTW5 tag to specify LIVING not specified in GEDCOM 5.5' [StringElement]
{http://gramps-project.org/xml/1.7.2/}event = None [ObjectifiedElement]
* handle = '_a5af0eb698f29568502'
* change = '1284030602'
* id = 'E0002'
{http://gramps-project.org/xml/1.7.2/}type = 'Birth' [StringElement]
{http://gramps-project.org/xml/1.7.2/}dateval = '' [StringElement]
* val = '1928-07-09'
{http://gramps-project.org/xml/1.7.2/}place = '' [StringElement]
* hlink = '_1GTJQCCXZ3YO5QOFS'
{http://gramps-project.org/xml/1.7.2/}description = 'Birth of Garner, Howard Lane' [StringElement]
{http://gramps-project.org/xml/1.7.2/}event = None [ObjectifiedElement]
* handle = '_a5af0eb69b82a6cdc5a'
* change = '1284030612'
* id = 'E0003'
{http://gramps-project.org/xml/1.7.2/}type = 'Birth' [StringElement]
{http://gramps-project.org/xml/1.7.2/}place = '' [StringElement]
* hlink = '_Q8VJQCBTTFJ6B54QBI'
{http://gramps-project.org/xml/1.7.2/}description = 'Birth of Schultz, John' [StringElement]
There is also a recover
option which should (in theory) pointing to a possible parsing issue by stopping the stream. I made a quick test with a malformed file using .gramps extension and correct DTD version:
test.xml:291: parser error : Opening and ending tag mismatch: event line 286 and evant
</evant>
but it is xmllint
stuff, so I am not certain that this will work fine under Windows (AOI) or Mac OS bundle, within a lxml object/element. The standalone cmd for xmllint
is ok. 3rd-party lib like lxml
might have a different support according to OS.
The debug_xml
will also look at xmltodict
3rd party lib (if exists) and some lxml
options/functions. I just grouped some command lines or tools for generating some print or logging statements. These options could make to debug less confusing.