Importing GEDCOM from RootsMagic 9 yields lots of duplicates

Gramps 5.1.6

There are a couple of us collaborating with a much larger group of people in order to construct a massive family tree (eight generations). The one guy fields the requests from the family via RM 9 and then sends me an updated GEDCOM and any new images, which I then update into a Git repository along with the media itself and a Gramps backup. It’s been looking somewhat wonky in Gramps. Typically, there are a large number of duplicates. I didn’t realize until today that it seems like most/all of the records in the GEDCOM are perceived as duplicates.

image

There are also some weird relationships, such as I, Dustin, appearing to be my own stepmother.

image

The other guy is not so very technical, so I launched a Windows instance in AWS, installed RM 9, and loaded the GEDCOM that he’s been sending me there, and it appears to have loaded perfectly. So, clearly it deviates from standard GEDCOM features.

Does anyone have any experience or wisdom regarding this? I know that the compatibility chart seems to rate the level of compatibilty with the last version of RM (8) as being disastrous. Can I include some excerpt of the GEDCOM data to help debug this? I’m not sure what I could paste that could be helpful.

I’m a fluent Python developer. I could write a routine to traverse the data using a library in order to investigate or extract certain chains of nodes if need be. Even though it seems like RM 9 is my only path forward at the moment, I’d like to be able to contribute some evidence/support on issues like this to help myself/others downstream.

Hello Dustin,

When you import data into an existing tree, persons are never merged automatically, not even when their ID’s are the same. It makes no difference whether you import a GEDCOM file, or Gramps XML.

Also, when you export data from RM, 7, 8, or 9, you can select whether you want to include RM specific extensions, like mark-up in notes, and other things, like information about source templates, and their contents, and fact/event definitions. Switching those off, and maybe tasks too, makes the exported GEDCOM much cleaner.

I use RM 8 or 9 a lot to synchronize data with FamilySearch, or my tree on Ancestry, and whenever I know that the data from either has new information, like a new branch of my tree, or new data for existing family members, I import the RM GEDCOM into a new tree in Gramps, and make a selective export, in Gramps XML, from there, which I then integrate into my working tree, by importing the selected persons, and merging known duplicates, or let Gramps find them, in case I’m not sure.

Note that, unlike RM, when you merge persons, Gramps does not recognize duplicate events, even when for example birth dates and places are exactly the same. You have to eliminate duplicate events after merging, and make sure that you select the right one when you see conflicting data.

If Jessica is your wife, which is something I guess because she shows up as your stepfather, there might be a mix-up in the RootsMagic file.

I run RootsMagic in Wine, and may be able to help if you send the RM file to my email, which is my user name on this forum at Gmail dot com. That is probably better than sending an exported GEDCOM, because it gives me the chance to look at the original RM data, and check how they are exported when I use different options in RM.

I will remove your files after testing.

One of the things being added (to support GEDCOM7) is allowing internal IDs from sources outside of Gramps to be persistent after import. This should make it more viable to identify updates of records from the same sources to reduce duplicates.

The Import and Merge tool has been using the Gramps internal identifiers (object handles) in XML to identify the same records during Import merges. Gramps Web Sync currently synchronizing using the same method.

1 Like

Okay. Confirmed. This is the first time that I’m sitting here, staring at the data, and identifying issues. I should’ve identified the duplicate thing in Gramps, earlier.

Regarding any RM9 export options, I had completely missed it:

image

It improved the number of errors (1349 → 762), though it appears to still use unsupported tags (disregard the image errors):

image

The other guy mentioned that things somehow got screwed-up and/or disappeared in the RM file after making some recent changes. It might have happened, and have maybe been further exacerbated, by underlying data issues (unfortunately):

image

My initial attempt was via Wine, but the RM9 window appeared to stop rendering from the point that you select the GECOM.

Thanks for offering. I think we got a handle on the genesis of the issues.

I’m tempted to do grafting like you do, but I get worried about just making things more complicated when I have limited opportunity for additional overhead at this stage as I’m preparing for the reunion.

Do you know the status of that persisted-ID support? The mere fact that you know about it implies that it’s complete at some level (unless you’re the one working on it).

Thanks for mentioning the IM Tool. At least it works for me for my current tree for the simplest case (importing itself and not seeing any changes).

Paul ( @prculley ) has been the lead on GEDCOM and adding _UID (unique identifier custom GEDCOM tag) and would be the obvious person with whom to coordinate developing enhancements.

Take a look at :

  • PR#1005: Better support for GEDCOM _UID
  • PR#1000: Find possible duplicate people enhancements (_UID related)

References

Gramps wiki articles

MantisBT reports

  • 0009925: Support import and export of Custom [_APID] records to and from Ancestry.com generated GEDCOM files
  • 0008332: Provide an option to omit imported “3 _APID” Tags from Ancestry.com generated GEDCOMs
  • 0009249: GEDCOM import improvements to support Ancestry.com, FTM 2012/2014 and FTM for MAC 3
    : Notes include a patch for libgedcom.py
  • 0011116: Support import and export of Custom [_APID] records to and from Ancestry.com generated GEDCOM files

GitHub

  • PR#1000: Find possible duplicate people enhancements (_UID related)
  • PR#1002: GEDCOM export, add _UID when not present (supseded by PR#1005)
  • PR#1005: Better support for GEDCOM _UID
  • PR#1233: Add round trip Ancestry.com _APID tag support
    : Notes include an Ancestry gramplet
    : Notes include a revision of the GEDCOM export that supports the APID
1 Like

Seems like an interesting read, not the least of which to get some perspective on GEDCOM 7 and what issues we’ve been running into.

Seems like we’ve already tried to approach this from multiple angles. Without yet sitting down to read through the chain of PRs/issues, I’d guess that coming up with an optimal and maintainable flow might’ve been the desired outcome (though, given maybe the innate urgency of the G7 compatibility as well as the ID persistence, this must’ve really become a critical concern after some way down the path in the previous PRs)? At the risk of wasting your time, I’ll hopefully be able to find time to read through. I love storage structures.

Thanks for sharing.

1 Like

From here: I suggest that you treat our PR’s, and GEDCOM 7, as a red herring. And I say that, because the truth is out there, meaning that there are more than a dozen programs that can work with UID’s, and some are older than 20 years, like PAF. The article that Tamura Jones wrote about the subject is more than 10 years old, and still valid:

This does not mean that there’s something fundamentally wrong with these PR’s, our discussions, or GEDCOM 7, but I don’t expect any of these to solve your problem in a reasonable time, and you have a reunion coming.

And for that I recommend that you forget Gramps for a while, and concentrate on the merging options in RootsMagic, which are listed here:

http://wiki.rootsmagic.com/wiki/RootsMagic_8:Merging_Duplicate_People

Automatic Merges are only available for registered users, but they look quite powerful, and the ShareMerge feature seems to be _UID based. And if the persons that you found as duplicates are all copies of one person created in one program some time ago, they all have the _UID representing that person’s creation, so that ShareMerge is possible, for all persons that have matching _UID’s and no conflicting data. Good old PAF can merge on _UID’s too, but you probably don’t want to go back to that, for a couple of reasons. I used that automatic merge by _UID years ago, and I know that it’s fast.

The duplicate search merge is available in the free version too, and it’s better and faster than what we have in Gramps. I call it better, because it detects identical data, shown in green, so it doesn’t create useless duplicate events like Gramps currently does.

Marking persons as non-duplicates is a paid feature too, but it can be worth your money.

As a developer, I know that you might loose some data exporting from Gramps to RM, like shared events, but for this task, I think that they are minor issues.

It’s great that you have such interests! I’m a kludgy programmer but have been trying to learn Python for Gramps. (Like when I started as a New User, the docs are both overwhelming (disorganized & spread across multiple systems) and underwhelming (spotty with no ‘onboarding’ framework and not written in introductory terms). Since I learn best by documenting, I expect to my progress will be glacial… because I’ll have to write the docs I would need to get me started.

You probably want to look at the notes in this MantisBT report too:

0012226: [GEDCOM 7]Support Import & Export of New (June 2021) version

RM 9 can run in Wine, but it needs a few tweaks, so if you don’t have time for those, yet, I suggest that you run it in Windows.

Untangling the family relations in your screenshot is easy, but also quite different from how we do that in Gramps. In this screen, you can use the conext menu (right click) to unlink yourself from your parents, your spouse, and (other) family members, and then reassemble your family as it should. And maybe you can better remove the duplicates first, i.e. before reassembling.

I just looked at the GEDCOM 7 implementation status page on the FamilySearch site, and when I read that, it’s quite clear that the big players are not really interested. Half of the companies that are listed as being committed have status TBD, and popular Windows programs like Ancestral Quest, Family Tree Maker, and Legacy (now owed by My Heritage) are missing completely. And although there seem to be more than a dozen Germans that support GEDCOM 7, one of those Germans is actually Dutch, and most of their programs most probably don’t have a big following outside their respective countries. And in fact, the fellow student that built Pro-Gen has not even started.

This means that, in practice, the perspective is nil, and I suggest that you don’t hold your breath. The current _UID as described by Tamura has been the de-facto standard since years, and it’s easy to implement that, and remove the underscore and some other pecularities later.

@dsoprea Dustin,
Did you try the Isotammi Multimerge addon gramplet? It has some automatic duplicate merging of identical records.

Isotammi tools can’t find identical persons.

That’s quite right. Sorry.