In my recent import from WikiTree there were almost 300 images. Unfortunately these were imported as URL’s. So now all of my Media category has a URL pointing to the image in the Path column.
I thought that the DownloadMedia plugin would bring a copy of all of those images back down, but it just crashes. I think I may know why now.
If I try to ‘Open Containing Folder’ there is an error because Gramps is trying to access the file at Basepath/http... so I can see why that would crash the downloader. If I remove the basepath, it reverts to /user/myusername/http... so now I’m unsure in what scenario this downloader would actually ever work!
I have only tested the tool on a non-login type of site. Not from one of the web based genealogy sites. Gramps cannot access a media object with a URL path.
What type of error? What does it say is the problem?
The problem may be that the website is preventing the download. If you put just the URL from the record’s path field into a browser do you get the image in the browser?
Ya know, I thought I had checked that. It looks like a complete URL to the image, but it isn’t. It resolves to a page that displays the image.
URL: https://www.WikiTree.com/wiki/Image:Taylor-46013.jpg
Resolves to: https://www.wikitree.com/photo/jpg/Taylor-46013
Actual image url: https://www.wikitree.com/photo.php/6/6c/Taylor-46013.jpg
I guess I could export them to CSV, search-and-replace the /wiki/Image: part into photo.php/6/6c/ but only if all image files have the 6/6c part in them.
This is not looking good…
UPDATE: they do not all have the same letter/combo in the middle sigh
Not directly related to the issue, but the download I did from WikiTree was for all of my ancestors. As it crawled up the branches, it went out into a LOT of individuals I’ve never seen or dreamed of before. I am unlikely to care if those records keep their images. As I go back in and crawl around and clean things up, I can manually take care of media for my surnames that I am interested in. I already have most of that media on hand, and can bulk drop it into the media section. So in the meantime I’ll probably just delete all the media entries from WikiTree. It would have been nice to automate that part, but not a deal-breaker…
you can use regex i.e. notepad++ or vscode to search for the last “/” in any string starting with https, and replace it with the whole string with your correct path.
I am sure there are someone here that can make that query for you, I’m not good enough in regex to help, I just know it’s possible…
with regex you can search on a string that starts with a pattern and ends with a pattern or character, and replace that string with something else… it doesn’t matter if there are different chars or different lengths…
It’s like : - In any string starting with https, select everything from the start of the line to the last “/” and replace it with “X”.
No, I’m familiar with regex. The problem isn’t finding what is there, it’s what to replace it with. That is the part that appears to be very random. Some would need to be changed to /6/6c/ in the middle, others to /5/5d/ and I’m not sure how many combos there are. Otherwise yeah, I’d throw it into VS Code. It’s just not possible with this particular set up.
@StoltHD - the search part isn’t the problem, and doesn’t even require regex - just search for /wiki/Image: (as I mentioned yesterday) - it’s the replace side that is the issue. Replace with what? No idea, it’s different on every image. Each image seems to have been assigned a random path, with no discernible pattern related to the filename. There is no regex that will solve that part.
Just looking at my own profile images on WikiTree, one has /a/a5/ - one has /b/bc/ in the middle of the file path of the actual image url. Random.
@StoltHD I am afraid you misunderstand what I need to do.
1 - here is the media path as listed when I first do an import - it is in the form of a URL https://www.WikiTree.com/wiki/Image:Smead-34-1.jpg
Unfortunately, if you go to that URL, it’s not the actual image. It takes you to a page with the image inside of it. https://www.wikitree.com/photo/jpg/Smead-34-1
2 - So to get to the actual image URL, you have to click the image on that page, which then takes you to the actual, real image URL: https://www.wikitree.com/photo.php/d/d2/Smead-34-1.jpg
If all of the images used /d/d2/ in their URL’s then the search and replace would be easy and would not require regex at all. Just search for /wiki/Image: and replace it with /photo.php/d/d2/ and they would all be fixed.
3 - But if you tried that, you would break most of the image links, because almost every one of them has a unique path. As far as I know, THIS image is the ONLY ONE that uses /d/d2/ in it. All of the other images have random numbers and letters in their path.
So how would you fix this with search and replace or regex?
Simple: YOU CAN’T
I’m tired of explaining this over and over. I am unsubscribing from this thread.
But your problem can easily be done by a web scraper that visit the page and grab the image.
There are multiple web scrapers that do this automated, you just give it a list of web pages that it shall scrape.
An even easier way is to use Zotero and Zotero web clipper.
I do this nearly every day for a multitude of sites…
But since you say its impossible, I shall let you stay in that believe.