I have a database with about 14,000 individuals in Gramps.
I also have an archive of around 1,700 scanned documents (obituaries, PDFs, images), of which only part are already included in Gramps as sources.
I do not intend to import this entire collection wholesale into Gramps.
My idea is to develop a Gramps plugin that:
scans an external folder of documents,
applies OCR and name detection,
stores the results in a JSON index (document → names found),
and, when a person is selected in Gramps, allows checking whether that name (or variants) appears in that index.
The goal is not to replace Gramps’ source system, but to make it easier to locate documents relevant to a specific person within a large collection of scans.
I am posting here to ask whether this is viable within Gramps’ architecture or whether this is an unrealistic idea.
This sounds interesting! Having a way to cross-reference external documents … without cluttering the (non-hierarchical) Media Object category view… sound valuable.
There is a Media Manager feature (Add images not included in database) that scans a folders containing Media Objects and adds any that do not already exist as Media Objects. But nothing that helps correlates those media objects to possible matches in the Tree
Although the Merge Media addon will find references to the same media objects and merge them. (So this might be used to correlate files that have already been manually added from that particular folder.)
Gramps currently includes a built-in SoundEx gramplet and a Soundex match of People with the <name> filter rule. (I don’t see a corresponding SoundEx name field in the Gramps Data Model diagram. And there are no hits when searching for SoundEx in the Sphinx-based developer docs. And there is a gramps/gen/soundex.py)
But name matching might be more internationally compatible with Double Metaphone (There’s probably no analog for the commercial Metaphone3 in the open source realm.) Perplexity suggested the jellyfish.double_metaphone(name) (pip install jellyfish metaphone) for “for Gramps or Python/SQL fuzzy name matching” as a “Genealogy/Software Dev Fit”