Find duplicate persons in large database

My “usual” database consists of around 6000 individuals. Now I have a large gedcom file with over 300000 individuals (which I have successfully imported to a new database). I would like to find those individuals that are present in both databases. Is there a good way to do that? I can merge the two databases into a temporary one, but I have a feeling that the “find duplicates tool” will not perform well with that amount of people. (And ideally I only want to compare persons in database A with persons in database B and not get a lot of false positives where both persons are in database B.)
I’m not limited to Gramps, other tools might be fine as well.
(I’m not concerned about merging, I just want to identify the persons.)

You could try to export your two db with sqlite export.

Then open both db with OpenRefine (and its sqlite addon - I don’t remember its name but I can find it if you want) to aggregate ie name and birth date of both db in new fields and compare thes fields with OR reconciliation tools

For each records you’ll find similar you could add a tag (or append some text or an emoji to the name - maybe more simple than tag) to your own db export to mark that record as possible match.

Re-import your modified db export with tags (or text/emoji) in a new base and search for tagged/marked people.

If you’re on Windows, you could install Portable Gramps., then open your modified base in the portable Gramps app and the big base in the regular gramps app to have both bases open at the same time and compare tagged records manually.

Also merged a large amount of data into my database and have been cleaning mistakes, errors, duplicates, fields with no data, etc. Found a tool, Family Tree Analyzer to be a great compliment to Gramps. A free version of FTAnalyzer can be attained from the Official Microsoft Store. It reads in a Gedcom file and reports a large variety of possible issues, you can look for duplicates, dates that don’t make sense, (parents to young, too old, marriages before or after death dates), duplicate facts and many more. It allows you to toggle any or all of the types of data to display to look for issues.
Hope this helps.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.