… this proposal (AI script) needs some ‘human local’ adjustements (fix)…
import xml.etree.ElementTree as ET
import json
# Charger le fichier XML
with open('input.xml', 'r', encoding='utf-8') as file:
xml_data = file.read()
# Analyser les données XML
root = ET.fromstring(xml_data)
# Extraire les événements et convertir en JSONL
jsonl_lines = []
for event in root.findall('.//{http://gramps-project.org/xml/1.7.2/}event'):
type_elem = event.find('{http://gramps-project.org/xml/1.7.2/}type')
dateval_elem = event.find('{http://gramps-project.org/xml/1.7.2/}dateval')
place_elem = event.find('{http://gramps-project.org/xml/1.7.2/}place')
description_elem = event.find('{http://gramps-project.org/xml/1.7.2/}description')
event_data = {
"handle": event.get('handle'),
"change": event.get('change'),
"id": event.get('id'),
"type": type_elem.text if type_elem is not None else None,
"dateval": dateval_elem.get('val') if dateval_elem is not None else None,
"place": place_elem.get('hlink') if place_elem is not None else None,
"description": description_elem.text if description_elem is not None else None
}
jsonl_lines.append(json.dumps(event_data))
# Joindre tous les objets JSON en une seule chaîne avec des séparateurs de nouvelle ligne
jsonl_output = "\n".join(jsonl_lines)
# Sauvegarder le résultat dans un fichier JSONL
with open('output.jsonl', 'w', encoding='utf-8') as file:
file.write(jsonl_output)
print("Conversion terminée. Le fichier JSONL a été enregistré sous 'output.jsonl'.")
to be continued…
1 validation error for FinetuningMessages messages Input should be a valid list [type=list_type, input_value={'handle': '_a5af0eb66701... Warner, Sarah Suzanne'}, input_type=dict] For further information visit https://errors.pydantic.dev/2.9/v/list_type (line 1)
this can take a while since… I find the expected file format! Anyway, it is still unclear for me (see also dataset).
Is there any “limit” for the number of characters before the “end of line” (CPL) on JSONL file format? 112 characters per line?
I am not certain of the “right” JSONL file format! So, I made some experimentations with a basic custom XML file format
, after parsing the content of a Gramps XML
file (example.gramps
).
The custom XML file
is for the POC. Few lines for a very basic transformer/exporter (new JSONL file generation and print into GtkTextBuffer on gramplet). Anyway, I suppose that I should be able to generate more templates with this “old” method, but still active with few ressources and time process.