URL in media file path

dixonge · January 22, 2021, 10:44am

In my recent import from WikiTree there were almost 300 images. Unfortunately these were imported as URL’s. So now all of my Media category has a URL pointing to the image in the Path column.

I thought that the DownloadMedia plugin would bring a copy of all of those images back down, but it just crashes. I think I may know why now.

If I try to ‘Open Containing Folder’ there is an error because Gramps is trying to access the file at Basepath/http... so I can see why that would crash the downloader. If I remove the basepath, it reverts to /user/myusername/http... so now I’m unsure in what scenario this downloader would actually ever work!

Rather unsure how to proceed at this point…

Gramps 5.1.3
Mac OS Big Sur

DaveSch · January 22, 2021, 1:54pm

Again there is a third party addon similar to the Media Verify.

menu Edit >> Utilities >> Download Media files from the Internet.

dixonge · January 22, 2021, 2:01pm

Yes, that is the one I have installed and am trying to use…it gets an error every time.

DaveSch · January 22, 2021, 2:29pm

I have only tested the tool on a non-login type of site. Not from one of the web based genealogy sites. Gramps cannot access a media object with a URL path.

What type of error? What does it say is the problem?

The problem may be that the website is preventing the download. If you put just the URL from the record’s path field into a browser do you get the image in the browser?

dixonge · January 22, 2021, 3:29pm

Ya know, I thought I had checked that. It looks like a complete URL to the image, but it isn’t. It resolves to a page that displays the image.

URL:
https://www.WikiTree.com/wiki/Image:Taylor-46013.jpg
Resolves to:
https://www.wikitree.com/photo/jpg/Taylor-46013
Actual image url:
https://www.wikitree.com/photo.php/6/6c/Taylor-46013.jpg

I guess I could export them to CSV, search-and-replace the /wiki/Image: part into photo.php/6/6c/ but only if all image files have the 6/6c part in them.

This is not looking good…

UPDATE: they do not all have the same letter/combo in the middle sigh

dixonge · January 22, 2021, 3:35pm

Not directly related to the issue, but the download I did from WikiTree was for all of my ancestors. As it crawled up the branches, it went out into a LOT of individuals I’ve never seen or dreamed of before. I am unlikely to care if those records keep their images. As I go back in and crawl around and clean things up, I can manually take care of media for my surnames that I am interested in. I already have most of that media on hand, and can bulk drop it into the media section. So in the meantime I’ll probably just delete all the media entries from WikiTree. It would have been nice to automate that part, but not a deal-breaker…

DaveSch · January 22, 2021, 6:15pm

FYI: there is a Media Manager (this one is built into the Gramps install).

One of the functions is a search and replace of the Media record’s Path field.

StoltHD · January 23, 2021, 12:49am

you can use regex i.e. notepad++ or vscode to search for the last “/” in any string starting with https, and replace it with the whole string with your correct path.

I am sure there are someone here that can make that query for you, I’m not good enough in regex to help, I just know it’s possible…

dixonge · January 23, 2021, 12:53am

as I mentioned earlier, the images use random letter/number combos in the middle - no way to do any sort of search/replace…

StoltHD · January 23, 2021, 1:04am

with regex you can search on a string that starts with a pattern and ends with a pattern or character, and replace that string with something else… it doesn’t matter if there are different chars or different lengths…

It’s like : - In any string starting with https, select everything from the start of the line to the last “/” and replace it with “X”.

dixonge · January 23, 2021, 2:21am

No, I’m familiar with regex. The problem isn’t finding what is there, it’s what to replace it with. That is the part that appears to be very random. Some would need to be changed to /6/6c/ in the middle, others to /5/5d/ and I’m not sure how many combos there are. Otherwise yeah, I’d throw it into VS Code. It’s just not possible with this particular set up.

StoltHD · January 23, 2021, 5:39am

if you say so.-

Regex to get all data before second last special character (xspdf.com)

dixonge · January 23, 2021, 9:47am

@StoltHD - the search part isn’t the problem, and doesn’t even require regex - just search for /wiki/Image: (as I mentioned yesterday) - it’s the replace side that is the issue. Replace with what? No idea, it’s different on every image. Each image seems to have been assigned a random path, with no discernible pattern related to the filename. There is no regex that will solve that part.

Just looking at my own profile images on WikiTree, one has /a/a5/ - one has /b/bc/ in the middle of the file path of the actual image url. Random.

dixonge · January 23, 2021, 2:16pm

@StoltHD I am afraid you misunderstand what I need to do.

1 - here is the media path as listed when I first do an import - it is in the form of a URL
https://www.WikiTree.com/wiki/Image:Smead-34-1.jpg

Unfortunately, if you go to that URL, it’s not the actual image. It takes you to a page with the image inside of it.
https://www.wikitree.com/photo/jpg/Smead-34-1
2 - So to get to the actual image URL, you have to click the image on that page, which then takes you to the actual, real image URL:
https://www.wikitree.com/photo.php/d/d2/Smead-34-1.jpg
If all of the images used /d/d2/ in their URL’s then the search and replace would be easy and would not require regex at all. Just search for /wiki/Image: and replace it with /photo.php/d/d2/ and they would all be fixed.
3 - But if you tried that, you would break most of the image links, because almost every one of them has a unique path. As far as I know, THIS image is the ONLY ONE that uses /d/d2/ in it. All of the other images have random numbers and letters in their path.
So how would you fix this with search and replace or regex?

Simple: YOU CAN’T

I’m tired of explaining this over and over. I am unsubscribing from this thread.

StoltHD · January 23, 2021, 2:40pm

You talked about local images and csv.

But your problem can easily be done by a web scraper that visit the page and grab the image.
There are multiple web scrapers that do this automated, you just give it a list of web pages that it shall scrape.

An even easier way is to use Zotero and Zotero web clipper.

I do this nearly every day for a multitude of sites…
But since you say its impossible, I shall let you stay in that believe.

Good luck

Topic		Replies	Views
Allow URL in media object Ideas media-records , hacks , websolutions , roadmap-item	16	166	April 20, 2025
Re-develop Download Media addon for Gramps 6.0? Beta Testing	3	60	May 22, 2025
Image download and import from getmyancestors urls to Gramps Help familysearch	13	323	October 30, 2023
Embed Gallery Image data? Help media-records	4	741	June 27, 2022
What do I need to make the Download Media add-on work? Help third-party-addon	18	614	August 2, 2023

URL in media file path

Related topics