I’m making improvements to how Betty loads files/media from Gramps XML files. The DTD documents the mediapath
element as optional. If the element is omitted, what is the correct way to resolve relative file/media paths when parsing the XML yourself?
@emyoulation Polite soft ping because I’ve got a Gramps user who ran into a problem with this in Betty
The answer probably is GRAMPSHOME, meaning that all files are loaded as if the base path is something like C:\Users\bartf or /home/bart
Thanks for the reply, Enno!
I should clarify that the solution is limited to using whatever is included in a Gramps export. So if Gramps does not export a mediapath
element and that means parsers cannot correctly resolve filepaths, I’d say that’s perhaps a Gramps bug, but not something I can handle on Betty’s end.
It is a bit worse than the XML missing a basepath references for the media path.
Another discussion collates a series issues related to the relative media paths in Gramps.
Part of the problem seems to be related to trying to side-step cross-platform paths.
But a big issue is that importing a .gpkg
simply FAILS if a folder exists in the temporary directory designated relative paths. There is no option to either skip blindly or intelligently using a compare of the checksum. (I have not looked to see if the MD5 checksums are included for the media object XML.)
I know that Betty uses the .gramps
, not the .gpkg
. But addresses the bigger problem should probably lead to a solution here too.
Note: another discussion has a SuperTool script to list the Constants (Envrionmental Variable) leveraged by Gramps. They can vary by session so may not be stored in the database or exported XML. But they can be useful for troubleshooting behaviors that differ between instances of Gramps.
OK, I get it, and I just checked to see whether this GRAMPSHOME variable actually exists. And on my Linux PCs, the answer is no.
But even without this variable, your software should be able to do a simple test, by checking whether a media file with a relative path can be read by adding the user’s home directory to the path. And that’s something that you should know, or can guess, because you know the OS that Betty runs on, while parsing the file.
Have a look at the expand_media_path
utility function.
If the media path is missing we use USER_PICTURES
which is the XDG pictures directory or the user home directory if that doesn’t exist.
Doesn’t this leverage data in the database but not coveyed in the XML?
Betty does not access the database file or the Gramps engine, it is an external application that reads XML and GEDCOM.
An external application that reads XML should add a base directory path consistent with where it expects to find the media. It could choose to install the media anywhere.
But that’s exactly the problem that Bart is having. He cannot determine the base directory path from the XML file.
This is only a potential resolvable problem for users of Betty running it on the same machine that has Gramps and the media data. (Although I suppose it is possible the users would be on different machines on the same network using a NAS for the media path.)
Can he determine the database name that corresponds to the XML? Then do a Gramps CLI call to extract the Media Path from that database?
The original base media path on the machine that ran the export is not really relevant. All the application needs to know is where the media has been installed on the system reading the XML file.
Even if using our backup format which contains the media, the machine doing the import may use a different base media directory.
In the case of Users generating static sites with Betty instead of the Narrated Website report, Gramps AND Betty are installed on the same machine. They are going to do regular updates with Betty.
So it is relevant.
But the mechanism for conveying data is XML, not the database. Since the media data is accessible from that machine, Bart needs a method for Betty to extract the Media Path.
@bartfeenstra
The solution might not be forthcoming… either by changing the XML to convey the media path or with an CLI/API call to extract that data from the database.
I think you might want to suggest the user utilize the Gramps tool to convert all their Media Objects from relative paths to absolute paths. Then the relative path becomes irrelevant to that exported XML.
If you are running a script on the same machine that ran the XML export, then you can just use the XDG Pictures directory or home directory as appropriate.
That is a workaround that does not retrofit easily. It is not a universal solution.
The universally supportable solution is to tell Betty users to convert to Absolute Paths with the existing Gramps tool.
Absolute paths would work, but the question asked by @bartfeenstra was:
“If the element is omitted, what is the correct way to resolve relative file/media paths when parsing the XML yourself?”
My replies were intended to answer this question.
@bartfeenstra
Do you think it is an opportune time to improve the handshaking between Gramps and Betty?
You could make a Python call to the Gramps Engine API to extract a .gramps
XML (with a pre-defined filter) to overwrite a file at a pre-defined location. And to also write a .txt file with the media path. The Betty could grab the XML and text file, then do its update.
This would significantly simplify the workflow for Betty Users.
Another option might be to adapt the built-in .gramps Exporter plugin (exportxml.py
) to make an optimized “Betty dialect” XML addon Exporter plug-in that includes the Relative Media Path in the header. Or its Media object writer could convert Media paths to absolute on the fly.
See
You could also make it exclude any XML data chunks that Betty does not support. Which would be a faster export and less resource-intensive for Betty to process.
There is nothing to resolve.
Gramps imports any object files as found in the XML file.
<objects>
<object handle="" change="" id="">
<file src="" mime="" checksum="" description=""/>
</object>
</objects>
and IF it exists, Gramps imports
<mediapath></mediapath>
Any problems that arise after an import is for the user (often with help of other users here) to resolve.
If Betty NEEDS a media path setting, then if it encounters an XML file without the <mediapath></mediapath>
line, it halts the process and alerts your user to the problem.
Thanks! That’s helpful if I want to go with a best-effort solution.
Betty currently parses any of Gramps’ own export formats, including *.gramps and *.gpkg. However, this may or may not happen on a machine where Gramps is available. Either way, Gramps was only recently made available as a pip-installable package on PyPI, and Betty does not integrate with any Gramps APIs directly, and will not do so for the purpose of fixing this media path import problem right now due to where it is in its release cycle. Gramps API integration is planned for future releases, but that will require some work.
TLDR; All we’ve got to work with is whatever is contained in the *.gramps or *.gpkg file Betty tries to load.
It appears that this is not doable in a reliable fashion, with the current state of Gramps and Betty. What @DaveSch proposes is what I was thinking as a fallback solution, and it seems that at this moment, simply providing a helpful error upon a missing media path + relative file path situation is the best I can deliver right now.
@emyoulation I do wish to eventually integrate with Gramps’ Python APIs directly. It’s been lower on the priority list (although moving upwards rapidly), but practical challenges are learning Gramps’ APIs and ensuring that what Betty needs is in fact possible in such a way that it’s worth the effort. One thing that would be incredibly helpful is the introduction of type hints to the Gramps code base.