Photo scanner recommendations?

I searched but somehow didn’t find anything.

I think a lot of us by default would have to digitize photo collections (to get them in gramps, to safeguard against eventual degradation, etc.)

Anyone have one they highly recommend? No strict budget in mind but willing to perhaps spend a bit more to get something that works well. Mainly looking for something that can do bulk scans without a lot of intervention (ie. put a stack of photos on and it just auto separates, etc.) Most likely not flatbed. Probably wouldn’t need slides/negatives scanning- just normal photos and newspaper clippings, etc.

Bonus points if it works by default on Linux but I highly doubt one does that, especially with auto separation.

1 Like

I am finishing up a major photo scanning project. My mother passed a few months ago and I got the task of digitizing her stash of family memorabilia. It did not all wind up in Gramps, but certainly some of it got there.

I use Fedora Linux 37. I wound up using four different devices to do the scans.

  1. A VERY old Umax Astra 1220s. This is a SCSI interface flatbed scanner at least 25 years old. Slow, but it works. I used this to scan a bunch of 2 inch by 2 inch photos which I think originated as 120-format film. I also used it for newspaper clippings that could not go through the Fujitsu scanner and some odd-size stuff.

  2. A Fujitsu ScanSnap 5500M. This lives on a USB interface. It is a document feeder and will scan front and back at the same time. I used it for photos printed in modern sizes such as 3.5x5, 4x6 etc. I could not put a stack on the sheet feeder, but it happily pulled them through one at a time. It only takes 3 or 4 seconds to scan a photo. I was feeding and clicking steadily. This scanner also happily pulled through postcards.

  3. A Xonos film scanner. This is a stand-alone device that scans to an SD memory card. It can handle all 35mm formats. Supposedly it handles 8mm and 110 as well. I found 110 film very hard to feed through.
    Amazon has dozens of different brands of these devices.

  4. A digital camera mounted on a tripod. I used this for some flip-style calendars that my nieces made for several years.

And I had five items on 11x17 paper. FedEx Store has a self-service scanner that will make either JPG or PDF of them for 25 cents per scan.

Software tools: Gwenview, exiftool, xsane, pdfunite, convert, pdfsandwich, tesseract and some others. I used convert to combine JPG files so that front and back are in one image. Pdfunite does the same thing for PDF files. Pdfsandwich and tesseract are optical character recognition tools. Exiftool embeds comments and captions directly in JPG (and other format) image files.

The hardest part of the whole task was identifying what the photos were, then typing in the captions.

I did that a few years ago and recently bought an Epson 800 for a new campaign overs thousands new photos (as my father passed away, leaving his whole collection of professional and private photos).

Don’t go for the 850, the main difference being software, unfortunately targeting only Windows while I am under Linux (Fedora 37 presently).

My choice of this model (Epson 800) was motivated by its wide handling of various negative films since my originals cover 110 or lower up to 6×6cm (120), also stereo pairs on glass (largest dimension higher than 15 cm). Consider that flatbed is quite reasonable because you also have positives. For instance, I inherited several albums from a great-great-great-uncle with copies dating back to the late 1870’s. Since the copies are glued in the album, I need a large surface in order not to damage the book. The 800 is A4, which is hardly sufficient and I must arrange for other holding devices around the scanner. A3 would have been good but is too expensive for personal use.

One thing to consider is the sensitivity of the scanner. When working with old photos, they may have faded. Therefore, a fair number of bits in the sensor in necessary to extract the information. The 800 is capable of 14 bits per channel. This allows to retrieve nearly wiped out photos. Of course, you must scan 16 bits. GIMP can’t do it. You must scan with xsane and hand over the result to GIMP for post-processing (curiously, GIMP can handle 16-bit original but doesn’t know how to request 16-bit from xsane).

From experience, you can’t rely on automatic scanning because exposure varies on every frame. Even photos taken two in a row have a different histogram. Every photo needs post-processing for luminance range, white point balance or even to compensate for negative or dark room processing. With skill, you can even eliminate, at least partially, vignetting caused by old optics. All in all, you spend more time in post processing then in scanning itself. And it is sometimes fun to extract unexpected data from photos in areas which visually are all black or bright, thanks to the 16-bit scanning.

For this kind of memorabilia saving job, avoid entry-level scanners limited to 8-bit.

My next step will be digitizing Super-8 films, but I have no idea about the scanner to choose.

1 Like

Could you provide a link? I can’t seem to locate this model on either the Epson site or Amazon??

Craig

PDFunite is part of the Poppler suite of pdf tools, in case anyone (else) is wondering.

Craig

1 Like

It looks like it has been discontinued, shame. The exact reference was Epson Perfection V800 ref B11B223401. I couldn’t guess your home country from your profile, but assuming you’re in the US, this is was the Epson site says: search for V800. A refurbished one is offered but I’d comment I didn’t pay this much for a brand new one. And the price tag for the V850 is excessive for an extra CD and 2 film trays.

When choosing a scanner for transparent media (films) pay special attention to the optical resolution (very often commercial papers boast interpolated resolution). My oldest scans for 135 films (24×36 mm) were done at 1600dpi, giving ~3.5 Mpixels, and stored at 8 bits/channel after post processing due to the limited size of my then archive media (DVDs). Since I spent a lot of time on the scan, I saved as JPEG2000 lossless. For smaller formats, such as 110 or Super8, you need a higher resolution unless the lenses are really awful (I think of the Instamatic disposable cameras).

Think twice about the save format. Avoid classical JPEG because it is lossy. It won’t allow useful editing as you’d lose information after each save. I’m considering replacing JPEG 2000 by TIFF as JPEG 2000 doesn’t seem to meet the then-expected success despite its qualities (notably lossless wavelet compression) and I am not sure it handles 16-bit/channel.

Since scanning is highly time-consuming, save several copies in different locations. My “live” collection is on my desktop PC and for safety I configured the discs in RAID1. And I was very happy for that because one of the discs failed. All I had to do was to replace it and 2 hours later the discs were automatically re-sync’ed. One year later the second disc failed too (same manufacturing batch), but no harm. So my advice would be: install discs of different manufacturers to maximise the MTBF.

Considering the size of the discs have now grown dramatically, I plan to rescan at a higher resolution and storing the original bit depth.

1 Like

You all are my kind of people- I am also running Fedora 37 so this is absolutely perfect for me to know what works and what doesn’t.

Good call on the flatbed @pgerlier. I hadn’t thought of needing to lay books out to get scans, etc. I will have to also see about not only dpi (something on my list) but what bits per channel it can do. I’m not quite at the point where I’ve looked at all the old pictures so unsure just how far gone some might be.

Thank you @bgee for the long, long list of software. That is extremely helpful to know what I will need to get where I need to be.

I assume TIFF then is a proper format for photos? Are you guys for some reason then using PDFs? I do when it’s the only format I have (or something simple like a marriage cert where I don’t need tagging) but I would like to have tags I’m thinking inside my photos themselves (via exif presumably) so I can use whatever photo software that recognizes tags and makes it easily searchable (thus negating the need to use one software FOREVER where I’ve created labels/tags in it.) Ie. person=“John Michael Smith” and for group photos doing something akin to person1=“Nicholas William Smith” so that picture could be found using a search term of “John Smith”, “John Michael Smith”, “Smith”, etc.

I do need to read up on tags within an image. Digikam apparently recommends adding tags like /person/John_Smith whereas exif tool might only do certain predefined tags? ie. exiftool -person1=John 1.png fails with “Tag ‘person1’ is not defined.”

PDFs are good when they originate from some text document initially, i.e. have been exported from LibreOffice Writer or MSO Word, because text will be inserted verbatim (with some sauce around to compose the page). If you think of storing images (coming out of cameras or scanners), I’d rather recommend against because the complete picture is encoded inside in some alphanumerical encoding. I have no idea of the eventual compression applied before encoding but since this is not a binary format, the whole file is larger than the photographic data itself.

For certificates which are only an illustration to the facts recorded in Gramps (birth, marriage, death, …), I’d use JPEG with high compression ratio (up to the readability limit) to spare disc space. This is sufficient if you have a means to retrieve the original good-quality picture, like recording the source URL in a note. Of course, if the certificate is part of the memorabilia, save it with care as lossless TIFF.

Presently, all my photos are under Digikam. One valuable feature in Digikam is the hierarchical tag system. It is very important to organise and structure the tags to avoid a huge mess when the number of tags grows.

At level 0, there are category names: Places, Events, Lineage, Homes, Monuments, Quality
Level 1 gives more details:

  • Places have one entry per country
  • Events: birth, baptism, marriage, other family events, special events like Paris Universal Exhibition 1900, Colonial Exhibition 1931, …
  • Quality: blurred, bad white balance, to be processed, to dump?, …

Level 2 goes into even more details:

  • Places>Country: town or intermediate division depending on the number of photos, so that selecting a tag won’t retrieve too many pictures
  • Lineage>family (in a broad sense): name of person

The Digikam tag system is quite smart. You need not tag with intermediate items. For instance, if you have USA>NY>Poughkeepsie, you only to tag Poughkeepsie. Digikam knows it is a member of NY state. So, when you query NY, all pictures about Poughkeepsie are also retrieved without being tagged NY. Similarly, querying USA retrieves Poughkeepsie and Seattle, OR. This is in contrast with other photo manager I used under MacOS where you had to repeat all levels of the hierarchy.

The tag system is very versatile as there are no constraints on tag names, nesting or else. An image may have any number of tags.

But tagging and commenting photos is also a time-consuming task. However, it may be as important as annotating Gramps data with “proofs” (detailed citations).

2 Likes

Yes- I had half a mind to use exiftool thinking it might speed it up but between needing a different config file and digikam just seemingly working easy as can be and with nesting, I think I’ll go with digikam.

Do you have a recommended way to tag people? I don’t know if I should just do /person/First Middle Last and then tag as appropriate (even if it’s many people) or maybe break it down by /person/GivenLastName/First Middle or some variation thereof. I’ll keep playing around with it.

And yes- I know this is a HUGE task. I’ve been diligently filling out gramps with just normal data and adding pictures/supporting docs as I can but this will only add to the project. But I’m young enough so I should have plenty of time to keep chipping away.

If you are on linux, you should look at the sane site compatibility:
http://www.sane-project.org/sane-mfgs.html

1 Like

Just the same as a recommended way to use Gramps: you have to think about what you want to emphasise. What is right for me may not be for you.

My recommendation is: your scheme must be useful and manageable. I am not sure if the slash / in your /person/GivenLastName/First stands for tag level separation. You MUST use hierarchy to avoid having thousands tags at level 0.

On my photos, people are seldom shot individually. They usually appear with spouse and eventually children. What I did then was to have a “family” tag made of husband+wife names and one level lower first names. Unfortunately, this is not perfect because when children grow, they themselves create new families. I’ll perhaps assign several tags for the same person: as child, then as member of “families”.

Remember also that you can combine tags in query “equations” and save these “equations” for future use. This frequently eases retrieval without needing more tags.

I agree with Patrick - Use what works for you. There are many ways to approach this, and all have merits.

In my case, I did NOT use DigiKam or any other photo catalog system. The collection of photos that I scanned will be sent out on USB sticks to many family members, none of whom use DigiKam. Therefore it was imperative for me to label photos in a way that would stick with the file no matter what computer system or image viewer was used. I wanted a method which can be seen on almost all image viewers and operating systems. DigiKam is a fantastic application - and about as portable as a nuclear power plant.

EXIF data fulfills that need because it is embedded in the file. It is not an auxiliary file or index. It is completely independent of whatever application is used to view and organize images.

One major disadvantage to using EXIF data fields is they are not easily searchable. I cannot, for example, search all my photos for Susan Gladys Cundiff. It is possible to write a script which would open each file, extract the Comment field and run a text scan on it. That would be VERY slow. So far I have had little need to do this.

Exiftool has two features that I used a lot. First, it can tag multiple files in one go. Second, it can copy fields within a file. I used the Comment, UserComment and Caption standard fields. I first put my notes in the Comment field, then used Exiftool to copy that to the caption and UserComment fields. Another feature which I used only once is to create a text file of tags, then apply them to a group of photos.

It is worth noting that another Linux tool - exiv2 - can do all of the same things that exiftool can do. I found some GUI tools for editing EXIF data, but they were very hard to use. The EXIF field editing feature was buried many layers down in menus. Command-line tools were much easier to use.

When I assigned names to the people in the photos, I used right-to-left, back-to-front sequence as much as possible. When that was not clear in the photo, I expanded the note to try and make it clear. I always used maiden names in parenthesis, which makes it easier to find people in databases.

My photos are organized in directories and sub-directories, either by base family name or by event. I have, for example, a bunch of Christmas directories which are named similar to “1995-Christmas-AnnArbor” or “1997-Christmas-GrandIsland”. There are directories for massive family gatherings. I mentioned the flip calendars that my nieces made. Those go in their own directory. I wound up using only three levels of directories.

For file format, I used JPG as much as possible. Dealing with TIFF would have taken more time than I wanted to spend, and the scanners I have are not that good. Correspondence which can be run through OCR is saved as PDF/A format with the text embedded. I transcribed a few handwritten letters. The scanned image is in a PDF with a TXT file embedded. Most of the handwritten letters remain as images.

When I used photos or letters in Gramps, I made a hard link from the main photo directories over to the Gramps media directory. Gramps is perfectly happy with this. Symbolic links work too.

1 Like

There was a time when I had imagined tagging my photos and documents from Windows or a DigiKam type software, I have (for the moment. You must never say: fountain, I will not drink your water) renounced the take advantage of attributes associated with media in Gramps. It allows me to add notes and citations as needed, then search directly in Gramps via filters. The ideal would be to be able to export these attributes and automatically associate them with the image files. This would require a script that translates the Gramps XML to an XMP for example. In the meantime I had written this article which illustrate my media attributes use.

The official Digikam site provides version for Linux, Windows and MacOS. So “portability” solutions are possible.

Digikam, doesn’t embed the photos in its DB, only a link to them. So you can send your directory copy on a USB stick plus the SQLite DB for the comments. I have not experimented, thus I don’t know how the paths will be interpreted on “foreign” computes (other than the original one). I looked at the DB and saw that ‘albumroots’ may need to be modified on the destination computer. It records paths below /home (i.e. starting at /user/directory1/directory2/…). So the photo directory needs to be reloaded unchanged (for the relative paths in table ‘albums’ to reman valid) and the paths in ‘albumroots’ to be adjusted to reflect the destination location. This can easily be dow=ne with SQLite utilities like Sqliteman (at your own risk, work first on a copy).

1 Like

I was also looking for a scanner recently, but I wanted a flat bed to be able to take it with me (in a car at least). Also I needed Linux compatiblity, plus decent quality as my old scanner cannot do anything beyond 1200 pt and is also very slow.

I ended up with a Epson Perfection 3200 Photo from Ebay Kleinanzeigen (2nd hand). In Germany. Also 8 years old or so. But sufficient, and it was only 30 Euros or so. I guess there are better ones, but I only use to a few times per month.

My approach was to look at https://sane-project.gitlab.io/website/sane-mfgs.html#Z-EPSON to see what works on Linux.

1 Like

All good points about DigiKam. And also swamped by my first criteria, which is that the resulting photos WITH CAPTIONS must be usable by people who have no idea how a computer works. Many of the people I will send the photos to barely know how to plug in a USB stick. They know that they plug it in and something appears. They click on a thumbnail and it opens.

How all that happens is a complete mystery. Telling them to open Windows Explorer is like speaking Swahili. Expecting them to install DigiKam and import a database is beyond impossible.

It is true that the EXIF fields in the default Windows image viewer are not obvious. You have to right-click on the file and go to Properties to see them.

I have thought about standing up a web site to host the photos. Since I have 100/100 Internet service, it would be possible. It would be a huge amount of work and there are privacy concerns. Some of the photos I scanned are babies in a bath. Those just don’t fly in today’s culture.
For the same reason I am not willing to use something like Pinterest.

This is what I do on my own computer. It is quite easy to output the web pages. You can do this manually (quite long) or, if you have Digikam, there is a function to create the pages with their captions. you have thumbnails in the pages and you can download the original with a right-click.

The hard part is to configure the server engine if you never did that. I have experience with Apache, lighttpd, Cherokee, Nginx and thttpd. thttpd is very fast and has a tiny footprint. Unfortunately you can’t “extend” it security-wise, so I’d ruled it out. Out of the 4 “majors” remaining, I chose Nginx because of its versatility. It allowed me to add a “security layer” over my Gramps site leg: I intercept the request to display a page and check credentials (I initially query for user-id/password and accept it for 15 minutes, after which user must re-identify him/herself). If credentials are not correct, page display is prohibited, i.e. not sent to user, so there is no workaround, instead a warning page is displayed. This is done without modification to the stored web pages. I have an alternate schema, requiring some customisation in the page header, to provide selective access, i.e. a member of a family branch can display pages related to the branch but none of the pages for another branch.

In addition, though traffic is HTTP for the “common” pages of my site, genealogical data is under HTTPS so that potentially sensitive personal data can’t be “sniffed”. You can create your own keys for HTTPS but not all OS’es are equal. MacOS users have no problems because private keys are linked to Apple certificates. I am under Linux and there is no such link with a CA (certificate authority). This means users get a security warning when accessing my site because browsers don’t trust self-signed pages. This may frighten unwarned persons and this is good.

If you’re interested, we can continue on private mail.

To this end, I have been testing it with pigallery2. It’s lightweight and can be hosted quite simply with docker (well, simply for me- I work in IT so it’s all old hat.) But even then, that and a reverse proxy wouldn’t be too hard to set up. It seems to take the tags that digikam adds just fine and is easily searchable.

It also has authentication so setting up a generic “read only” (ie. view) user with a simple password you can pass on to family seems just fine. I’m also available if you want to DM and go down that road.

I have been using a Epson V39 scanner for photographs, operating on Windows 7 platform and I recall it was under $100. Seems to work great, I have roughly 29,000 family photos on my computer. I have been scanning in my great grandparents, grandparents and parent photo collections from both my side and my wife’s side. We have just started scanning in the slides and this scanner seems slow and awkward for them so we went back to an eight year old slide scanner, moving the chip into the laptop the transfer onto a flash drive over to our main computers. Sounds tedous but isn’t, since it is a really fast slide scanner, haven’t found anything newer that is faster and it works fine yet. There probably is a doggle that we could get to plug into our main computer to make that faster to work around the laptop, but the five or six times we had to do it, doesn’t seem to justify changing.

1 Like

For slides, if you have a decent camera, you can cobble together a homemade system to photograph the images. Basically as quick as you can push the slides through. This used to require a DSLR back when I looked at it but I think a modern smartphone camera would now be quite capable.

Your favourite internet search engine should find dozens of designs.

Craig