Gramps and recording and comparing DNA-matches

I have created a new experimental DNA gramplet. You can install it manually for v5.1 by copying the DNA directory here into your plugins folder.

You will need to add the gramplet to a person view and then add some segment data to a person. The segments are stored as csv in a note attached to an association. Set the relationship to “DNA”. Each line in the note represents one segment. It has 5 fields: chromosome (1-22 or X), start position, end position, side (‘P’ or ‘M’), and cMs. The positions can either be integers or floating point. Floating point values will be multiplied by 1 million.

All feedback is welcome. The chart is at a very early stage of development.

1 Like

Thanks Nick! I was able to install it, add a segment, and view it in the map. I will play with it and give you more feedback. The first thing I will mention is that it needs to be sensitive to the gender of each person. I set up an association between myself and my grandfather, and as we are both males, each of has only one X chromosome (and one Y chromosome), but I am seeing two X chromosomes on the map. (If you copied it from the Genome Mate Pro video, that person is female.) If I create an association between a male and a female, I would expect either an XY combination or an XX combination to appear, depending on which person I’m viewing with the gramplet. It would be invalid for a female to have a Y segment. The best way to handle this, in my opinion, is not to have X and Y but rather just use chromosome number 23. For males, the 23-paternal is Y and the 23-maternal is X, and for females both are X. Users will probably want to see them displayed a X and Y, but internally at least they should just be 23, with the limitations as noted. Actually, you could just eliminate the Y, since it is never part of the autosomal DNA comparisons (its segments are not analyzed in the same way as the other chromosomes), and set it up so that males have a maternal X and females have both a maternal and paternal X.

1 Like

Yes, I just copied the chart from the video. I’ll fix the XY chromosome problems tomorrow.

We also need to think about colour coding individuals. At the moment it only works for one comparison.

First tests look already good @Nick-Hall , but could the side parameter (‘P’ or ‘M’) be optional? People often don’t know the relationship to their match.

Also as already mentioned by @GeorgeWilmes, the Y chromosome is missing.

Another problem is that the Gramplet does not update yet. Right now you have to remove the gramplet add it again to update. The gramplet is also always shown to all persons with the same parameter/data, even if they don’t any DNA associations yet.

1 Like

With that ‘snapshot’ update behavior, maybe this should be Quick View instead of a Gramplet?

Hi @Mattkmmr, I think a different type of display would be more appropriate for viewing all matches on a given chromosome to see how they overlap. Using Genome Mate Pro again as an example, its “Chromosomes” tab lets you view all of your matches on a given chromosome, optionally filtering them on maternal vs. paternal. The Segment Map is designed rather to show where the segments cam from.

By the way, I am not suggesting that we need to replicate all of the Genome Mate Pro functionality in Gramps, nor am I suggesting that its displays are the best, I am just using it as an example to discuss the concepts. The various DNA companies also have different displays.

2 Likes

Oh I see now, I wasn’t aware that we are implementing different views for displaying. How does Genome Mate Pro check for the paternal vs. maternal segments? Do you have do use your parents DNA results too e.g. three to one comparison?

Genome Mate Pro depends on you to figure out how you are related to your matches. It’s really just a tool for organizing and displaying the data that you get from GEDmatch, FamilyTree, Ancestry, etc. I don’t use it very often but I like the Segment Map feature. or rather I like the idea of somehow displaying which segments of my DNA came from which ancestors.

1 Like

Maybe this software could provide some ideas for graphs and presentations?
https://progenygenealogy.com/products/family-tree-charts/

What data is supplied by the testing company or DNA matching website when a match is found?

Is it anything like the output file format from the lineage package that a couple of you have already mentioned?

Perhaps we could derive the side from the family tree?

Hi Nick,

Deriving the side from the family (if one is provided) is exactly what people attempt to do with each DNA match, before contacting them and hoping for a reply. I understand that some sites now attempt to automate this (Ancestry’s “ThruLines” and MyHeritage’s “Theory of Family Relativity”), but I haven’t used them.

Otherwise, the only information a person receives about their matches limited to things like the following (the columns from a csv file provided by FamilyTreeDNA):

“Full Name”,“First Name”,“Middle Name”,“Last Name”,“Match Date”,“Relationship Range”,“Suggested Relationship”,“Shared cM”,“Longest Block”,“Linked Relationship”,“Email”,“Ancestral Surnames”,“Y-DNA Haplogroup”,“mtDNA Haplogroup”,“Notes”,“Matching Bucket”

In that case, the “Relationship Range” are “Suggested Relationship” are about the distance of the relationship, not which side it’s on. The “Matching Bucket”, however, predicts whether a given match is on your maternal or paternal side, but only to the extent that you have already mapped sufficient other matches onto your tree. Here’s an explanation of that Family Matching feature.

On GEDmatch, people use the Triangulation feature to relate matches. (That link might not be available unless you sign in.) Again, it’s only helpful if you’ve already figured out one of them.

Even if you had another user’s raw data (the file containing hundreds of thousands of SNPs) and compared it directly with your own, you wouldn’t know whether the matching segments were maternal or paternal unless you had first “phased” your data by comparing it with that of one or both of your parents. In the raw data file, the two letters that appear for each SNP do represent the maternal and paternal values, but they are not in that order; rather, they are ordered alphabetically (for example, you will find “AG” but not “GA”). In fact, you wouldn’t be sure whether the matches were really matches (see examples here).

Just as our chromosomes don’t have numbers stamped on them (the traditional numbering is simply according to decreasing length), so also they don’t have “maternal” and “paternal” stamped on their component parts.

1 Like

That is useful to know. Is the segment data in a separate file?

What I meant was that when we add an individual to a family tree we either add it to the paternal or maternal side. Of course the person can be left floating if we don’t know the relationship.

The ancestral surnames list is another intriguing opportunity.

It’d be helpful if the program could take such a list of surnames and generate a list of fuzzy (SoundEx/Metaphone) matches from an individual’s direct ancestors. Perhaps even color tag a fan chart or Pedigree Gramplets list.

Even when a DNA service doesn’t give access to the Centimorgan or Pedigree data at a particular subscription level, they’ll still often have a surname cloud for each potential match. My brother’s FamilyFinder level test on FamilyTreeDNA has a list of hundreds of distant cousins. Most have neither the DNA nor pedigree posted but DO have a (comma separated) surname cloud as a field in the site’s CSV download.

Most of the distant cousins don’t register as Maternal or Paternal. Yet since most of my brick walls are at 5 or 6 generations out in my paternal grandfather’s branch, I’d like to prioritize contacting cousins that corelate to his surname cloud. If Gramps could help with that prioritization, it’d be wonderful.

It could be useful elsewhere: pre-DNA fad postings to Rootsweb and other newsgroups often had ancestor surname lists; surname indices from published genealogies…

Having a list of potentially matching ancestors to offer a blind match would make a ‘first contact’ more promising & also look less like a phishing expedition.

Finally, some ‘uncertainty’ feature might be worth considering. Even for the DNA sample data. The Lazarus feature (where a particular ancestor’s DNA is a reconstruction from descendant tests) is becoming popular and will surely grow more sophisticated.

1 Like

Usually the fields ‘chromosome #’, ‘start pos’, ‘end pos’, ‘cMs’ and ‘Number of SNPs’ are provided. Depending on testing company there can be more additional fields.

You can derive the side for users who knows their relationship to a match, but for most matches the relationship will stay unknown, because they are just results form their testing company databases. You often just get hundreds of matches with some distant cousins from whom you have never heard before.
You need to be able to compare to at least one parent or a close relative to sort the matches to a side.

1 Like

Yes, and with your suggested approach of using Associations, they are not floating freely but are somewhat “tethered” to another person.

By the way, what would be the relative merits and drawbacks of using a shared Event instead of an Association for purposes of this experimental gramplet?

1 Like

The segment data is in the same file; that was just the header row. Here’s a sample data row:

“(hidden)”,“(hidden)”,“”,“hidden”,“3/19/2017”,“2nd Cousin - 3rd Cousin”,“2nd Cousin”,“166.8782501221”,“37.4334907532”,“”,“(hidden)”," (hidden) / (hidden) / (hidden) / (hidden) / (hidden) / (hidden) / (hidden) / (hidden) / (hidden)“,”“,”“,”",“N/A”

1 Like

Using an association was easier to code for the prototype. A DNA match isn’t really an event, but it would be an option.

So the file just consists of a single header row followed by multiple segment rows?

Yes. That’s just for the list of matches. I forgot to add, there’s another csv file with the details about each matching segment. Its header row looks like this:

NAME,MATCHNAME,CHROMOSOME,START LOCATION,END LOCATION,CENTIMORGANS,MATCHING SNPS

where NAME is the name of the person whose DNA kit this is, and MATCHNAME is the name of the matcing person, for example:

“George Wilmes”,"(hidden)",1,67855642,70452479,2.55,600

The names are whatever the users chose, not unique userids. Even in the other file, multiple people can have the same email address, because people often manage the accounts of other family members.

Those are just the file formats created by FamilyTreeDNA. I imagine the ones from the other vendors have similar data but not necessarily the same format.

1 Like