Today, I met a fellow genealogist on-line, who wanted to download pictures from his tree on Geneanet. And with some testing, I found out that, when you download a GEDCOM file from the site, as exported by GeneWeb, you wil have nice paths like this one:
And as you can see, Discourse immediately uses that path to retrieve the picture from my tree.
Gramps however just warns me that I need to be connected to the internet, which I obviously am, because I wouldn’t be able to type this otherwise, and then tells me that there were errors. And when I run Gramps in terminal, there are no clues in logs either, so I’m stuck.
This suggests that although I am connected to the internet, by telepathy, I assume, and my PC is too, Gramps is not convinced that we are, with we meaning me and my PC. And I have no idea where I can tell Gramps about the username and password needed for Geneanet, which are obviously not needed anyway, because Discourse CAN retrieve the picture without problems.
That’s interesting, but the existence of the Media Add-on suggests that it should work with the current code too. And that’s because the GEDCOM importer simply stores whatever it finds after the FILE tag, so there is no need to add such support on the import side. The picture shown above is the result of such an import, meaning that I pasted its path from the imported media, not from the GEDCOM file.
Disclaimer: I haven’t looked at the code, because I hope that this fellow genealogist can use a release version of Gramps to get things done.
There is an addon Download Media tool to harvest URL linked media objects from the net.
Since the thumbnail generation & caching is a Gramps function (as opposed to a native OS function), it is likely that Gramps only generates thumbnails for local files. The extra Internet handling (such recognizing files are dangerous filetypes or too large for the connection type) may have been deemed as being out-of-scope by the thumbnail generator developer.
That is the tool that I was referring to, and I’m looking at the code right now. It has provisions for username and password entry, which have been commented out, and which are not necessary, since most pictures are public anyway, like the one above.
But looking at the code, it seems that it’s making some assumptions about the URL which are not met on Geneanet, like for the files to have unique names. They’re all called medium.jpg, so that part will need tweaking because Gramps wants to save all in a single folder, and the user has 4000 pictures in his tree. Geneanet has unique paths for all the million, or billion, media files that they host, so I will probably need to replace some slashes with dots or underscores to make the names unique.
Further on, it looks like the code to fetch the files is there, and it will download anything with a proper URL. And now it’s up to me to figure out what’s proper …
I recall seeing a note that addressed rewriting paths to make the delimiter conformant to the OS. But that may have been part of the Gramps Web Sync… which is very likely to have delimiter differences between the server and desktop OSes. (Or it might have been this GitHub report.)
If harmonizing the to Relative and Absolute media paths to the path delimiter OS expectations isn’t a feature of the Media Verify Tool, then that seems like a good enhancement to request.
and the pattern matching doesn’t seem to like the part behind .jpg. When I reverse the check, to find files that don’t match, I get loads of files with names like these in my Downloads folder. And although those are legitimate file names in Linux, I bet that they’re quite problematic in Windows, so I will need to do something with (part of) the full file path, which looks like this:
It would be surprising if anything but Geneanet DID like that… since they’re passing a parameter. Those tend to be specifically configured for parsing by that server’s content management system.
I wonder if is a (cached on the server) thumbnail (‘t’?) ID that overrides and is served for any scaled preview? Maybe @grocanar can tell us the true purpose of that parameter
Good point about that thumbnail, thanks. When I only parse the URL before that parameter, the resulting URL is a proper file, so it will pass the test. And then the only thing to do to make the file names unique is to replace the slashes with dashes.
1 OBJE
2 FILE http://gw.geneanet.org/public/img/media/deposits/8c/61/50211503/medium.jpg?t=1688289897
This part comes from the GEDCOM that I can download from my own tree. It seems that the person that asked this has larger pictures, so you may ask about those too, if you want.
Thanks! I saw a very interesting answer already, about replacing medium with normal. In the other tree that I mentioned, I found that he has a couple of large pictures, which can be seen on-line if you follow the link that I gave, and he already sent me a GEDCOM to test with.
This was a GEDCOM from My Heritage, which gave me other challenges, but I hope to get his Geneanet tree too.
OK, great. I saw a new answers, which basically confirms that I can ignore the parameter, and pass the resulting URL to a function that takes the full path after //, replaces every / with -, and stores the file under the resulting name. These names are long, but they’re almost guaranteed to be unique, because with this code, the domain becomes part of the file name. And then it’s the user’s responsibility to deal with all the ugly names, should he want so.
I got a new GEDCOM from him, this time from Geneanet, and my Gramps downloaded 362 pictures in a reasonable time. And when I want, I can create a new tree with the same GEDCOM, use the media manager to replace medium with normal, and then try the download again.
In the end, it is of course the user’s responsibility to set up a strategy to manage trees on different platforms, like in this case Geneanet and My Heritage, but in this case it was a nice exercise for me to learn what I can do with this add-on, just in case.
It seems like a correction to their GEDCOM export is worth requesting.
Rather than degrade the media in an export,
Geneanet should generate a GEDCOM that points at the original (normal) image, rather than a reduced size copy generated by the server to reduce load.
And it should not be including that trailing timestamp parameter of “&t=” plus the Unix epoch time in seconds.
If they want to also include a custom tag with this thumbnailing path & timestamp, that seems reasonable.
I understand what you’re saying, and I have noticed that replacing medium with normal gives much bigger downloads, so I bet that the user will be happy with my hack, but …
Geneanet has no obligation to make such a change, and they are a commercial company, owned by Ancestry. And as a software engineer, I have always known, that most users don’t act responsibly, and don’t even try to make sure that they saveguard their data. And Geneanet never promised that you could download your pictures in their original format, like I can, as a developer who knows how things work.
P.S. When you use Ancestry with RootsMagic, it IS possible to download your media in their original format. But that is a feature, and users really need to educate themselves.
I’m being optimistic and hoping that Geneanet doesn’t realize that the GEDCOM export is pointing to scaled images instead of the originals. So reporting might lead to an easy tweak.
But naturally, such requests would have more impact from a registered customer.
I know what you’re saying, but you don’t know Geneanet like I know them. I’ve been a paying customer, and asked for changes that I thought were simple. And I got blamed for not accepting the system as is. And I bet that their software team is quite small, so I do understand them.
The thing is, that they have a tool to upload images, because that’s probably what users asked for. I used that myself, and they even have a version for Linux, which is nice. But OTOH, the fact that there is no official tool to download them, should be a sign that this is a one way street, for normal people at least. And it’s a one street for Ancestry users too, unless they’re smart enough to rely on RootsMagic.
Years ago, I did help desk work for the Dutch police, and when I was on duty, and received a call from a colleague who was working in the field, my first question was: What did you do? And I asked that, because in the majority of cases, it was the user who triggered something.
The main issue here is, that there is no easy way to download pictures from Geneanet, because there is no official tool for that, at least not that I know of. So in a way, the inclusion of media refs in the GEDCOM could be seen as a stupid mistake.