When duplicate Persons are found using this tool, and the merge of the two profiles is accepted via this tool, the People window should then highlight the merged Person, and/or make that Person the Active Person. This would facilitate “cleanup” of the Events for the newly merged Person.
[I’m unsure how to create a Feature Request for this suggestion.]
Plugin name: Find Possible Duplicate People [Tool]
Id: dupfind Version: 1.0
Description: Searches the entire database, looking for individual entries that may represent the same person.
Filename: finddupes.py
Location: /home/districtsupport/.local/lib/python3.11/site-packages/gramps/plugins/tool
Authors: Donald N. Allingham
Email: don@gramps-project.org
Audience: Everyone Status: Stable
I think that a secondary object FindDupes dialog might be preferable.
Or maybe an option to clipboard a list of IDs for the Events (Citations, Notes) of the merged person (and Family events) for that person? The list of IDs can be pasted into a (freshly reset/cleared) Events Filter Gramplet’s ID (and using RegEx). This would give a sortable Events view that would simplify finding mergable Events.
Also a list of Families of the Potentially Duplicate persons has potential. Because a Family Merge can process 2 sets of 2 duplicate people at a time.
I imported the Sample.ged GEDCOM twice… creating exact duplicates of every person. (Found in the example parent folder as Example.gramps file, but the GEDCOM is much smaller … just 42 people.)
My expectation was that this doubled-import would provide the tool with a large set of persons with VERY high corelation duplicates.
But the corelation values were all over the map. These were 100% identical people (except for IDs and handles) so why were there so many lower ranks in the Rating column of the matches?
The rankings seem to be related to the quantity of secondary objects. And the quality too. So 2 Person of the same name matching births dated “about 1492” will low in both quantity and quality. (A name, gender, and a Vital Statistic event with iffy dates.) While the same Persons where the birth dates are “1492” will still be low quantity but a bit higher quality. And if those persons also have deaths dating 1538, that is a bit more quantity with reasonable quality.
The tool is not of much (any) use. When it merges a person it creates duplicate events like birth and death dates, and duplicate (identical) notes etc for the merged person. It does not understand that duplicate families are created, so you have to merge those manually anyhow. And some other problems also arise, but can’t recall all, gave up on the tool.
My solution to merge is to tag the people in the smaller file when importing it, colour them, and then manually go through them one by one.
It collates the Events (Notes, Sources, etc) from both persons. It also collates parent families and children.
All that redundant data already existed, new items are not created with a merge. But what it lacks is what FamilySearch has : a Primary Object merge dialog that has allows secondary object merging options in the same dialog. (Their secondary merge has had problems that needed years to resolve. As an example, they JUST added an option to manually change the data in the Surviving secondary object without exiting the dialog. The previous version just offered to Replace or Reject. If one side had the date but the other had the date, you had choose one or the other. And take notes about the missing part to manually entered post-merge.)
Maybe we are misuderstanding each other? I just tested: exported a small part of my tree as Gedcom, duplicated the file, imported both to a fresh Gramps DB, and used the merge tool to “merge” them. Each and every person has two identical notes when they had one. And two births, two deaths, two burials, two occupations, and two marriages (when there was only one). BTW, 16 married persons resulted in 43 families, propably as some have spouses not included in the sample, didn’t really check the mess.
Well, yeah that duplication is going to happen with GEDCOM version 5.5.1 and earlier.
That specification explicitly has no discriminating features using IDs. GEDCOM 7 has the potential for storing user IDs for that purpose.
And the Gramps XML can use a combination of Tree ID and Person ID to ensure uniqueness. So there is a Family Tree Processing → Import and Merge addon tool that requires both branches being descended of the same parent tree. But importing GEDCOMs doesn’t ceate the same Tree ID.
And the newest Text Import allows overwrite updates of Vital Statistic data.
Unfortunately, Gedcom 5.x is still the de facto standard for genealogy, Gedcom 7 is not often to be seen. What I don’t understand is how the limitations in Gedcom stops the merge tool in Gramps to merge a persons (identical) birth date when he/she is selected for merging… Or -even better -two identical marriages, spouses etc, for that matter. If not identical - well, tag the person, so the needed manual check is easy to find.
As far as I’ve understood the Gramps XML way only works if both files (to be merged) originate from the same Gramps database. Or are you saying it works if you load a Gedcom 5.x, export it to a Gramps XML, and import that to another Gramps database?
Since 1999, GEDCOM has operated under the assumption that distinct person (individual) objects represent separate people, even if their sublevel contents are identical. This means:
Two separate Individual records could have 100% identical content (names, dates, events, etc.) but would still be considered different people in the GEDCOM structure.
The only way to indicate that two Individual records refer to the same person was through an explicit alias relationship that cross-references the 2.
This approach has been a point of criticism in the genealogical community, as it can lead to duplication and complicate the process of identifying and merging records that actually refer to the same person.
GEDCOM 7, released more recently, has attempted to address this issue by providing alternatives to this strict separation. However, it’s worth noting that comprehensive guidelines for merging individuals across different genealogical software and databases have not yet been fully developed nor widely adopted within the GEDCOM standard itself.
Isotammi-team is testing new version of Family windows which supports automatic child sorting and deleting duplet children and in Person Windows automatic sorting of events. Together they speed much cleaning merged families. Deleting of similar events would demand complicated rules covering citations, notes, places, dates etc.
This sounds interesting. But the (superficially duplicate) children could so easily have unique (and hidden) secondary objects. Maybe a merge would be possible over a delete? (Not certain how that interface would work… since selecting 2 rows in the Family Editors is not normally possible. Delete is easy … it only requires selecting 1 row.)
Merging would maintains the historical ID (but not the handle) for future comparisons too.
This is not a Gedcom problem, but I suppose most imports are done with Gedcom.
The issue is exactly the same, if you create the two identical persons entirely within Gramps, no Gedcom involved at all, and then merge them. You get one person, with two (identical) birth events.
You may have two identical birth events, but one has a different note or attribut than the other, or different media.
This is why automatic merging is very difficult. So for me it must be a manual merge.
Please test it. Create two persons with identical name and identical birth date - nothing else. Merge them, and you have a person with two births.
If they have different notes or attributes then they are of course no longer identical. The merge tool is still a “manual” merge, it does not merge automatically, each pair must be merged individually