Leveraging external Tools

emyoulation · July 8, 2024, 11:23am

A thought occurred while reading the following support request, matching Handles generated for Gramps objects are critical to synchronization tools like Import Merge and possibly GrampsWeb Sync.

If the GEDCOM import seeded the Gramps Handle ID generator based on the unique identifier in the GEDCOM file instead of the record creation date, could Gramps take better advantage of other Tools features (like hints or sync with FamilySearch) by building consistent Handles on subsequent imports to fresh trees?

ennoborg · July 8, 2024, 12:19pm

The only ID that makes sense is the _UID that was introduced with PAF, and adopted by most other programs. PAF has an automatic merge on that, meaning that it can do bulk merge for persons that have the same _UID, names, and vitals.

The UID introduced in GEDCOM 7 is useless, because the big players don’t support that, and have no reason to introduce it either, which means that the article that Tamura Jones wrote about that is still valid:

Davesellers · July 8, 2024, 1:26pm

How much work would this be to implement? Gramps could then be added to the list. It would be a good start to enhance our merging capabilities.

RokeJulianLockhart · July 8, 2024, 1:47pm

ennoborg:

The only ID that makes sense is the _UID that was introduced with PAF.
ennoborg:

The UID introduced in GEDCOM 7 is useless, because the big players don’t support that.

@ennoborg, since UID is vendor-agnostic (albeit unsupported by most) would it be problematic to just export one UID and _UID for each entry?

I ask because it doesn’t seem like _UID is supported correctly across vendors either, according to yourself:

…I suppose with a multitude of entries, a redundant key would substantially increase the size of the dataset, maybe?

ennoborg · July 8, 2024, 3:14pm

We have an old PR, created by Paul Culley, 4 years ago:

github.com/gramps-project/gramps

Better support for GEDCOM _UID

gramps-project:master ← prculley:uid2

opened 05:49PM - 22 Jan 20 UTC

prculley

+4876 -3272

As a result of Nick's comments in #1002, I closed that PR and #1000 to create th…is more comprehensive PR. - This adds a uid_list to both Person and Family objects. A single uid is attached to the object at creation (which required me to replace it on import to avoid an unnecessary extra uid). - As a result, it requires a database upgrade, which code is included. If no uids were found in attributes, a single uid is added to Persons and Families. - XML import/export and schema have been updated to deal with the new uid lists. - The XML import move any _UID attributes found in Persons or Families to the new uid list. - GEDCOM import/export also now stores _UID tags in the uid list and exports from there. - The FindDups tool now uses the uid list to match up people with very high confidence. - The FindDups tool also has a new "Very High" threshold level (the "Very High" was already in the gramps.po). This higher level will skip the more time consuming second pass of the tool if persons matching via uid are found. - carefully updated the import/export tests. Comments: When attempting to merge in a GEDCOM that contained _UID for the people, with a tree that already had some of the same people, I noted that our 'Find Possible Duplicate People' was ignoring the _UID entries. The tree also happened to have a set of actual duplicates which I could not get to show up. On investigation, I saw that the tool did not display all possible duplicates for a single person, only the one it thought had the highest score. And it did not look at _UID attributes at all. This PR contains several commits; - a change to GEDCOM import to remove the find_family_from_handle method; this was required because it was messing up the GEDCOm import by adding an additional uid. - the basic UID support - a pylint on finddups - the finddups patch to allow all duplicate pairs to show up - the finddups patch to allow matching uid on persons to give the highest score (10). **Note:** this PR is based on top of the #1010 database upgrade PR, as that is necessary to implement the upgrade. When looking at code, look at the individual commits.

It has conflicts with master, and is still a work in progress. You can read a lot of comments in that PR, which were written in an era where people thought that GEDCOM 7 would be the new standard, for which we now know that it didn’t really materialize.

We also have support for importing _UIDs as attributes, based on a small change that I made when I migrated from PAF to Gramps. I did the same for FamilySearch IDs (_FSFTID) later, when I found those in GEDCOMs created by Ancestral Quest and RootsMagic.

These attributes are exported too, so they are sort of persistent, This makes it easy for me to work with these programs, and Legacy 10, which is completely free, most likely because Legacy was bought by My Heritage, and they make enough money with their site.

I also made a small tool that scans all persons in my tree, and adds _UID attributes in PAF format to all persons that don’t have them yet. I have that in the gramps51 branch of my fork, but part of the code was made with the help of ChatGPT, so it can’t be integrated with the main repo.

When persons are merged in Gramps, all attributes are copied from the original persons, so unlike in the other programs, persons can have more than one _UID and _FSFTID.

I did not invest time in checking checksums, nor in a test for duplicates, but both are easy to add.

Nick-Hall · July 8, 2024, 3:26pm

The PAF _UID tag consists of a UUID plus a checksum formatted as a 36 digit hexadecimal string. The advice given by Tamura Jones is: “Applications should reject all _UID values with invalid checksums, and all _UID values containing non-hexadecimal characters.”

Prior to the publication of Gedcom 7.0 this was the approach that I favoured, except that the checksum would be generated as required.

The Gedcom 7.0 UID tag is similar except that existing tags are used without modification and new tags are generated according to RFC 4122 without checksums.

Where a UID can be interpreted as a UUID, I don’t see a reason why we can’t let the user choose the export format.

Davesellers · July 8, 2024, 4:22pm

How would a merge be done if Gramps had the _UID implemented?
I have 17,000 persons in Ancestry. 7,000 of these are in Gramps.
I could install RootMagic with the 17,000 persons installed.
Would I have to use Enno’s program to add the _UID to Gramps before doing an import/merge with the records from RootMagic? Would Gramps be able to recognize the duplicates and take the appropriate action to merge?

ennoborg · July 8, 2024, 5:30pm

It would, if it were redundant, in the sense of real UIDs vs PAF _UIDs, but I see no reason to store duplicates anyway, if the UID part is the same. And with that UID part, I mean the 1st 32 characters of a PAF _UID.

And when persons are merged, and end up with two or more _UIDs, they’re not the same, and hence not redundant either, because they show a piece of the history of that person. And when persons are merged, Gramps adds an attribute showing the Gramps/GEDCOM ID of the merged person anyway.

I just made a small export from a free My Heritage account, and see that the _UIDs in that GEDCOM have a length of 32 characters, meaning that they probably are UIDs without checksum. And my advice would be to store them anyway, because they are unique.

ennoborg · July 8, 2024, 5:51pm

No, it wouldn’t help, because the _UID is not a fingerprint, but a random number which is supposed to be unique, worldwide. And because of that, adding _UIDs to persons that don’t have them will not result in _UIDs that match the ones generated by Ancestry or My Heritage, or RootsMagic.

UIDs are normally created with the object that they are attached to, just like Gramps handles, but unlike the latter, they are supposed to be persistant. That means that when another program reads a UID, it will store that, and not overwrite it with its own, so that it is attached to the person forever, as long as you don’t create a person with the same data as a new person instead of importing one. That other person will have another UID.

Gramps handles are based on uuids, but have to be unique in any database. And that means that, when you import a Gramps XML into a tree that has the same objects already, the handles are changed to make sure that they’re unique. Their UIDs will then be the same.

A few months ago, I found that I had a duplicate UID in Gramps, for persons with different names and vitals. And that happened, because I changed the name of a spouse of a person, after finding new sources, and later merged some data from a tree with the old name. This happened, because the person had two spouses, and had children with both. Gramps never warned me about this, because I never wrote the code for that, but it caused a synchronization problem between the My Heritage desktop program and their site. And when I contacted their help desk about that, which I needed to do because their error message was too vague, I was told that the cause might be a duplicate UID, which it was indeed, for that other spouse.

Davesellers · July 8, 2024, 9:17pm

Thinking on this subject more, it seems there is no solution to the problem.
If everyone has a unique ID like a Social Insurance Number (SIN) then things could be brought together under that ID ( central authority assigning the ID), however our old relatives never had these and even today, for privacy reasons, these are included on few records.
The _UID is only unique to your own family tree in the one application since each application creates there own. As Enno said, the application would have to store each UID to keep track of them by application.
It would help you move your tree between applications or make additions from another source, but would still require intervention unit a UID has been established.
No magic solution.

Nick-Hall · July 8, 2024, 9:31pm

Yes. The Gedcom specification allows many UID tags per object. We could include a list of UIDs in all our primary objects. That would be quite easy.

Complications arise when people are merged and split apart again.

ennoborg · July 9, 2024, 10:00pm

UIDs are only created when you create a new person, of import one that doesn’t have one, which means that most of the UIDs in my database were created when I migrated the tree made by my parents from Brother’s Keeper to PAF, somewhere around 2004.

When I moved to Gramps, in 2010, I made made sure that it imported all UIDs made by PAF, even though Gramps didn’t actually use them. And that means that most of the UIDs that I now have in my tree are about 20 years old, and stable.

When I import new branches from FamilySearch, with Ancestral Quest or RootsMagic, Gramps imports the UIDs assigned by those programs, and the same goes for persons added on-line on Ancestry, mostly from hints. They get a UID created by Ancestry, which also follows the same industry standard, adopted from PAF. This means that in my tree on Ancestry, most UIDs are the ones created by PAF, in 2004.

This means, that, if you don’t merge persons, each one has only one UID, and that’s the one that’s transferred through GEDCOM from one program to another. This also means that you don’t have a UID per application, but that you do have one per person, and tree author.

When you merge persons, most programs delete one UID, because their data model doesn’t accept more than one. And the fact that Gramps doesn’t do that is just a consequence of the way that Gramps deals with attributes, where all UIDs are stored. And in most cases, I remove one of these afterwards, because I know that most programs can’t deal with more than one, or remove one on import without telling me, which means that in my tree, it’s still one man, one UID. And by sticking to one, I know that I can follow the new composite person everywhere, because all programs that I use import that one UID. And that includes Gramps.

In a way this means that I’m the authority assigning UIDs to the persons in my tree. And when I’m careful, and don’t create the same person in two different programs, like on Ancestry and in Gramps, each person has only one UID, and no more.

And that means that, when I export this person from Gramps

0 @I0000@ INDI
1 NAME William /Trye/
2 GIVN William
2 SURN Trye
1 SEX M
1 BIRT
2 DATE 1578
2 PLAC Hardwicke, Gloucestershire, England, United Kingdom
1 DEAT
2 DATE 13 MAR 1609
2 PLAC Gloucester, Gloucestershire, England, United Kingdom
1 _UID B9F4535B0AE65341B28116E843B8F3B0AED3
1 CHAN
2 DATE 14 JUL 2023
3 TIME 21:34:58

and later export him from Ancestry

0 @I122582933599@ INDI
1 NAME William /Trye/
2 GIVN William
2 SURN Trye
1 SEX M
1 BIRT
2 DATE 1578
2 PLAC Hardwicke, Gloucestershire, England, United Kingdom
1 DEAT
2 DATE 13 Mar 1609
2 PLAC Gloucester, Gloucestershire, England, United Kingdom
1 UID B9F4535B0AE65341B28116E843B8F3B0
1 NOTE Ontleend aan “Mormonen”

he can be automatically merged by a program like PAF. I tested that today, and it still works, when I replace the UID tag in Ancestry’s GEDCOM by _UID. And this works even without the checksum, which was apparently removed by Ancestry.

PAF can merge these persons automatically, because they have the same UID, name and vitals, when compared by date and place title. And that means that in a scenario where you upload your tree to Ancestry, add a few dozen persons from hints, and change some for whom you find new sources, and then download it again, you can focus on the 1 % that can’t be merged by UID, and that’s a huge time saver, all made possible by that one UID.

ennoborg · July 9, 2024, 10:03pm

What does this mean, historically speaking? Is this a tree that you started in Gramps, and expanded on-line? Did you add persons on-line, and in Gramps? Are there persons in Gramps that don’t exist on Ancestry?

Davesellers · July 9, 2024, 11:04pm

We started the research in 1973 on a Commador64. It has migrated over time from one program to another and when we adopted Gramps, we imported about 1500 persons. Those people were uploaded to Ancestry and Find My Past to find hints. New people were added to Gramps with all the source/Citations. About 85% of those in Gramps now have been verified which I show as complete. I did start a tree on FMP but have given up on it because it was too much work maintaining three trees. So the biggest tree is on Ancestry because it is very easy just to check a mark and add them to your tree. However less than 7,000 of the 17,000 have been checked so the errors on Ancestry are likely very high.
Yes, they are likely part of the family but I find little interest in making copies in Gramps for a 6th cousin. 6 times removed.
I’m more focused on using the End of Line Report and trying to fill in those connections. I have traced a line back to the early 1600s.
I do download a copy of the Ancestry tree monthly, and import it to Gramps. This is just to make a backup.
We have a cousin in New Zealand which has a tree of 25,000 persons. We communicate on a regular basis and check each others work. Good to have more eyes on what you do. We all make mistakes.
We have a common link with the Hitchcox family line and we have focused a lot of time on it. Using the map in Gramps you can see they migrated from England to New Zealand, Canada and the USA.
I can’t start using the Ancestry tree on Gramps because there is so much wrong with it. I have put a lot of time standardizing my workflow to work with Ancestry, FMP and FamilySearch.
The Ancestry persons aren’t much good unless you have verified the data by going through all the hints, adding FMP and Family Search data that Ancestry doesn’t have.
So if I merged the Ancestry persons into Gramps, 10,000 would not be verified or have the hints attached, so it would be a useless task.
I can add and verify about a 100 persons per month using my current process.
I have never used RootMagic, so was wondering if transferring the data from Ancestry via RootMagic to Gramps was a possibility to speedup the workflow.
Dave

ennoborg · July 10, 2024, 9:30pm

It is, for a couple of reasons. One is, that RootsMagic can download your tree from Ancestry with media, so that you don’t need the download media tool in Gramps. And another is, that it writes better GEDCOM files than Ancestry itself, with proper _UID tags, and better month names too.

The main reason why I use it, as an external tool, is that it can download data from the shared tree on FamilySearch, and upload too, one person at a time. And once persons are synced with that tree, you will also see hints about new sources that can be attached to those persons, and by doing that, you can contribute to that shared tree. And it can also show hints for FMP and My Heritage.

This means that, even if you don’t export data from RM to Gramps, via GEDCOM, you can use the program as an external research hub, that gives you access to all 4 sites in one place. And if 4 sites is too much, you just switch off hints for one, like My Heritage.

Other advantages are that RM has problem indicators in most screens, a fast duplicate person finder, with a merge feature that’s smarter than the one in Gramps, and relationship indicators. And these things are all available in the free version.

If you buy the program, it can also synchronize data with your Ancestry tree, so that you don’t have to download that copy once a month, although it still might be safe to do so. And synchronizing means that it can also upload changes to Ancestry, if you want.

For me, the FamilySearch features are the most important, even though I have a tree on Ancestry too. And for you, I think it’s the idea of a hub that connects you to the sites that you mentioned.

I have a copy of my tree in RM 9.1.1, running in Wine in Linux Mint, so that I can always look for hints on FamilySearch, and check whether they have sources or new data for my end-of-line persons, which are all synced with FamilySearch. Seeing hints for those, even if it’s just for one site, makes life much easier than visiting the site via Firefox, and that advantage will of course increase if you switch on hints for Ancestry and FMP.

And last but not least, because it’s external, it can also act as a second ‘screen’.

ennoborg · July 10, 2024, 9:49pm

No. Handles and GEDCOM/Gramps IDs are not persistent, in the sense that they need to be changed if you import a person with a handle and/or GEDCOM ID that already exists in the database. And that’s because both handles and GEDCOM IDs need to be unique for database integrity, internal and external.

Using external tools means that we need to store the IDs that they use as attributes, or in a special structure, like the IDENTIFIER_STRUCTURE define in GEDCOM 7, which can store a REFN, UID, and EXID, where the latter can store existing IDs like the old AFN, RIN, and the IDs used by FamilySearch, including the _FSFTID that you can see in GEDCOMs created by FS compliant programs.

emyoulation · July 10, 2024, 11:47pm

I agree that Gramps ID are largely irrelevant. Except for the Text Import where the square brackets in the CSV import can force matching for a fractured merging functionality. And where the explicit Gramps ID (without square brackets) and handle in the source data is given preference when there’s no conflict.

However, the Gramps Handle is a different animal. It has a lot more significance for merge control than the user-modifiable Gramps ID. Looking at an old Dev Maillist thead, coincidental handle collisions were expected to be almost unheard of. They could be expected to strong indicators that a merge was intended. (Discounting the likelihood of malicious handle duplication.)

So what I’m saying is: if 4 distant (say 10th cousins) downloaded their direct pedigrees from FamilySearch and imported into Gramps, those people are probably only going to only share a common 2 of their 1,024 9th great-grandparents.

But if those 2 relatives in each tree generated a consistent Gramps handle based on the xxxx-xxx profile ID in FamilySearch instead of being based on the creation timestamp, then those 4 cousins could export a Gramps XML of their pedigrees and have a reasonable expectation that an Import Merge into a blank Tree could join the 6 pedigrees automagically… even if the pedigrees from FamilySearch were downloaded months (or years) apart.

ennoborg · July 11, 2024, 3:43pm

OK, I get that, but the problem lies somewhere else, because when you use external tools, you can’t rely on handles for joins, or use import merge, because you’re dealing with GEDCOM files, that have no such handles. Import merge is nice when you’re dealing with several generations of the same tree, when it’s available in Gramps (XML) format, but it’s completely useless outside that scope. Also, there is no stable way to derive handles from external IDs, no matter whether they’re FSFTIDs or UIDs, because handles must be unique, and any tree can have persons that have the same UID and/or FSFTID, or whatever ID of which you think that it’s unique. It’s not, because noone forbids you to import the same person twice, in any program.

I just ran a small test, by importing the same tiny data.gramps twice in a new tree, and found that each person had a unique handle, and a unique GEDCOM ID. And that’s the exact reason why you can never rely on a handle, and need that extra ID, because that’s the only thing that persists.

Duplicate persons are a natural part of the workflow, when you deal with external tools, and they can’t be detected with handles, or GEDCOM IDs, because both are made unique during import.

emyoulation · July 11, 2024, 3:50pm

Agreed. Merge is the most messy problem in Genealogy records. And even when Gramps manages to Merge a person (or a family) it tends to not properly zip up the secondary objects.

That’s why I wish the Import Merge tool saved a worklist of potential Matches. Completing is too much for one tool or process.

ennoborg · July 14, 2024, 8:45pm

I must say, that I don’t use import merge for import or merge, but only to compare my tree with the latest backup. And I do that, because the merge process is not transparent to me, and I often see so many differences that it seems impossible to decide on all in one session.

Another reason is, that when I want to merge new branches coming from other sources, they don’t have handles that import merge can work with, so in that case, it’s useless anyway.

Things would be different if import merge could work with UIDs, because they are saved when I make an export from Gramps, import that into another program, and let that download new data from FamilySearch, which either changes existing persons, or adds new relatives to those.

When I follow that route, I often import the external tree into a new Gramps database, clean it a bit, to get rid of traces of weird GEDCOM tags, and then export a filtered Gramps XML from that for a regular import into my main tree, where I merge existing persons in the usual manner so that I get the new branches attached automagically. Import and merge has no extra value in that process, because of its lack of transparency, and when there’s a lot of new data, it’s easier to merge a few persons at a time, check those for duplicate and possibly conflicting events, and continue with another batch the next day(s).

This process might become a lot easier if there were an import merge for GEDCOM files, especially when it could work with UIDs, or any other ID used by the other program.

Topic		Replies	Views
Introducing UID's for new persons and families Development	32	1521	June 20, 2025
Are GEDCOM imports reversible dialects? Help	12	506	March 30, 2024
How Compare Myheritage GEDCOM with gramps db? Help	16	213	July 10, 2024
Importing GEDCOM from RootsMagic 9 yields lots of duplicates Help	13	950	March 10, 2024
Importing and merging with MyHeritage Development	10	2143	August 15, 2022

Leveraging external Tools

Related topics