Find duplicates feature is too slow

emyoulation · June 8, 2024, 3:47pm

Yes, it would be nice to have this feature even as a follow-on tool.

Since the Person merging process adds a Merge ID to the surviving Person (although the Handle seems to be lost), then a process could find Merge IDs (perhaps filtered to those IDs created today) and offer to merge identical duplicates.

First, it could merge Families with the same Spouses and Marriage date. (Don’t want to merge Families where they divorced and remarried in case there was a marriage to someone else between.)

Then merge identical secondary objects of the Merged Families. (Merging Children might create more duplicate Families to merge.)

Then merge identical secondary objects of the Merged Persons.

ennoborg · June 8, 2024, 4:26pm

You can do that indeed, but at the moment, I prefer to keep things simple, by starting with the events, and the notes. And if we think that this is too complex as an addition to the existing code that does the merging, a follow-on tool is a great thing indeed.

Starting simple, this means detecting all events with an identical date and place. That can be a challenge by itself already, because after import, from a GEDCOM or Gramps XML, the places are not in the same object, so you may need to decide on the title. And that is not easy for simple titles, like Harlingen or New York. The first exists in several places, like our Friesland province and your state, and the second is a typical example of a place that is a state, county, and city, like we have that for Groningen, and Utrecht.

Fun fact: Gramps does some smart merging for families already, meaning that when both spouses have been merged, there is only one family object in the database, so you don’t have to work on that. You will still see duplicate family events though, which can be merged in the exact same way as for persons.

emyoulation · June 8, 2024, 4:45pm

Kari @kku has an Automerge tab in the Multimerge gramplet for identically named/titled objects in the Places, Sources and Repositories categories.

Not for duplicate Citations, Notes or Events though.

Using the MergedID to narrow the scope (from the Entire Tree to just the secondary objects of 1 person at a time) seems like a good way to exclude a lot of false positives.

ennoborg · June 9, 2024, 11:03am

I use automerge all the time, for places, sources, repos, and there is a tool for citations too, not by Kari. The problem with that is however, that I’m sort of forced to merge all places before I start merging persons, and that’s a bit tedious. It’s also not a good idea to do that, when there are lots of places without a hierarchy, because you don’t know whether they are real duplicates at all.

The MergedID attribute is useless however, because it doesn’t exist here with that name. It’s localized. It’s value is not much use either, because the corresponding person was removed by the merge.

I have a filter that shows all persons with the localized attribute, so that I can work through all persons that I merged in a session, and remove all duplicate events, and the attribute.

When I merge persons, and find a birth in Paris, France, and another birth in Paris, for that same person, I can bet on it, that the other Paris is not Paris, Texas, so that it’s safe to merge the events. It is not safe in a raw place table however, because I might have a relative in the latter. Merging places can also be a waste of time, because when I remove duplicate events that reference unprocessed places, those places often become unused objects, so that I won’t have to waste time on merging them anyway.

The essence is, that if you want the merging to be user friendly, you allow me to focus on the merging of those persons. Having to merge the places first is so tedious, that it causes lots of irritations here, especially since I know how fast things are in RootsMagic.

Urchello · June 9, 2024, 11:43am

Correct me if I’m wrong. As I understand that the first step and main condition for automatic merging of places is the use of some standardized international databases of places, possibly even with some codes for these places. But even in this case, I hardly understand how to compare places if in Gramps these are not just places, but a nested structure that can vary depending on the year.
I wonder if there are such standardized databases? Do they have a coding system? If they exist, can such databases be connected to Gramps to standardize places? And what to do with places if some are written in Latin and others in Cyrillic? In this case, I see only one solution - merging places not by their names, but by some generally accepted codes (again, if such codes exist).

ennoborg · June 9, 2024, 12:46pm

There are standards, in different forms. Many commercial programs have place databases that exist outside your tree, and the most recent one that I tried comes with Legacy 10. It’s the latest development of an American program that can connect to FamilySearch, one that I sometimes use, because it’s the only one in its class that also speaks Dutch. Legacy was recently bought by MyHeritage, and was a program with a limited free version, and with version 10, you get the full program for free, possibly because MyHeritage has enough ways to make money already.

Legacy is a Windows program, but RootsMagic does the same and has a Mac version, which is the Windows program with a built-in compatibility layer.

In Gramps, you can extract data from external sources like GeoNames and GOV:

http://gov.genealogy.net/search/index

We have Gramplets for both, and they both have codes, in different forms. You can find more information about these on the wiki.

Davesellers · June 9, 2024, 1:03pm

Thankfully I don’t have to merge places very often, only when I have entered a duplicate. I base most of the places on FamilySearch. I think this is near impossible task because there are so many duplicate names that are a village, city, parish, district, borough, county. And they all could be a little different in scope when you are adding the lat/long to the records.
I don’t know how coding a place would work unless your database was purely places such as a village or city. I have many that are a parish or a district, etc.

ennoborg · June 9, 2024, 1:19pm

For me, new places often come from FamilySearch too, when I import new branches with Ancestral Quest or RootsMagic. The former has an option to import all places as standardized by FamilySearch, which makes merging easier, so that I don’t have to deal with names like Inglaterra, or Paises Bajos, where I live. There will still be some manual merging left, because most country names are anglicized, and some provinces too, and I prefer to use local names.

FamilySearch also tries to promote the proper name for the time of the event.

There is a drawback, and that’s that when FamilySearch can’t find the name of a church or a cemetery, the standardized place defaults to the enclosing populated place, so I can lose information on the way.

Davesellers · June 9, 2024, 2:37pm

Just thinking more of the complexity, thankfully GRAMPS allows you to attach dates when adding the enclosed by place. I have places that change counties at different times. So merging such a place when they refer to different times adds more issues.

system · July 9, 2024, 2:38pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
What are the slowest Features in Gramps Help performance	36	423	August 17, 2025
TreeMerge - porting an experimental GEDCOM matching tool Development	28	1227	March 14, 2022
Performance Issues with Gramps Development hacks , performance	77	1025	January 6, 2025
Importing GEDCOM from RootsMagic 9 yields lots of duplicates Help	13	950	March 10, 2024
Collaborate on Optimizing a new Custom Rule Development new_filter_rules , performance	23	905	March 15, 2025

Find duplicates feature is too slow

Related topics