DNA Pedigree chart

(As requested, starting a new thread from this one.)

@GaryGriffin, it would also be nice if the data used by your DNA Segment gramplet could also be used for other visualizations. I’m thinking particularly of a “DNA pedigree chart” such as the examples that I’ve mocked up and pasted below. I’ve used Ahnentafel numbers for these illustrations, but the actual chart could show names, dates, whatever.

The bars would be scaled according to the percent of DNA that each ancestor (number 4 and beyond) is known to have contributed to the active person’s genome, based on match data entered and the relationships derived. The gaps (in the first example) and different bar heights (in the second example) would change over time as more match relationships are determined. In more distant generation, some ancestors would not appear unless/until there is match data to support them.

Maybe there could also be the option of filtering on a particular chromosome.

Of course, it could also just be a fan chart in which each ancestor’s wedge is shaded accordingly and possibly annotated with a percentage in the tooltip. That way all ancestors would appear, but some would lack any shading.

I don’t know if these would easier to implement as gramplets or as new views in the Charts category. The fan chart view already has some shading options, if that helps.

Now some notes about calculations:

By “percent of DNA” I mean adding up the lengths (end position minus start position, plus 1) of an ancestor’s segments across all chromosomes and dividing by the total length of all chromosomes. (Or if filtering on a particular chromosome, then the lengths of those segments and the total length of that chromosome.)

Obviously, the entirety of all my paternal segments count for my father (2) even if I don’t know which portions of them are attributable to which of his parents (4 and 5). Same goes for my mother (3) and her parents (6 and 7).

As I am male, I can attribute the entire length of my paternal 23rd chromosome (Y) to ancestors 2, 4, 8, 16, etc. That does not need to be in the match data, but can just be assumed and factored into the calculations. The Y has a shorter length than the X, so my bars for 2 and 3 would not be exactly equal. My X is entirely attributable to my mother (3). More about the X later.

If, for a particular matching segment, the common ancestral couple is my great-great-grandparents (Ahnentafel numbers 22 and 23), I may not know which of them is the source, but I do know that I inherited it from their daughter, my great-grandmother (11), so the segment counts as hers, and also counts for her daughter, my grandmother (5).

Meanwhile, I may have another segment where the common ancestral couple is (10 and 11), and so that one is also attributable to (5). To the extent that it overlaps with the one mentioned in the previous paragraph, it must not be double-counted. (Users may have entered many overlapping segments from different generations.)

If there are no segments attributable to a particular ancestor, they would simply not appear in the chart (except in the fan chart version, where their wedge would have no shading). And if there is a chromosome-specific version (not saying there should be; not sure if it’s worth the trouble) then more ancestors may drop off if they are not represented on a particular chromosome. This is especially to be expected on the X chromosome, based on its inheritance pattern.

In the version with vertical bars, the gaps do not represent specific positions on chromosomes, but rather the percent of the total chromosome lengths that is so far unaccounted for in the match data (or present bu “unknown”). I spaced them so that each gap is between the pair of ancestors who may eventually be attributed to some portion of that gap.

I will stop now and wait for questions in case I have been unclear or illogical.

2 Likes

Two more examples to illustrate how a chromosome-specific view would look for male vs. female active person, showing unique inheritance patterns of chromosome 23:

(Fun fact: there actually is some recombination between the X and Y chromosomes, in their pseudoautosomal regions. See: The Human Pseudoautosomal Region (PAR): Origin, Function and Future - PMC)

1 Like

I had a few thoughts. As I understand, any associate that is 1st cousin or closer will be irrelevant for this calculation, right. So if we are only collecting data from 2nd cousins or more distant, the shared segments are getting small.

In my case, I have about 25 associates with a known genetic connection that are further than 1st cousin. If I pick a well-populated chromosome, I have 5 people with 6 segments total that overlap with me. All are on my maternal side.

2nd cousin with a match 47M - 49M
3rd cousin : 173M - 205M
3rd cousin : 173M - 208M (sibling of above
3rd cousin : 173M - 180M (child of above)
3rd cousin: 0 - 5M
3rd cousin: 11M - 21M and 208M - 218M

All of these are thru my maternal grandmother (ahnentafel 7, I think). The 3rd cousins are thru my maternal line greatgrandmother (ahnentafel 15).

I think of my data as being reasonably large. But there is so little that this is still almost completely empty.

Given this, I question the value of this analysis. Or do I have little data compared to other users? Or do I misunderstand something?

Results will certainly vary among users, depending on:

  • how many other descendants their ancestors had (the more children in each generation, the more likely that a particular segment of an ancestor’s DNA can be found in another person today)
  • how many of those other descendants have tested and made their DNA available for comparison (only a tiny fraction of the overall population has done so)
  • whether the exact relationship paths between matches can be determined

I doubt that anyone would ever achieve a complete chart. For me, the gaps help me focus on areas where I need to work harder to figure out my relationship with matches. I’m still tallying mine and will share whatever I’m able to come up with (the examples above are just illustrations, not based on my data.)

One thing you can do to help assess the potential value of this for your own data is to analyze your match data to see if there are large areas of your genome that are not covered. You could do this, for example, by loading your matches and your raw data into a database and joining them to see what percent of SNP positions are not included in any of the match ranges. Or just take your match data, calculate the midpoint of each range, and make a histogram for each chromosome.

Analyzing my own data in a spreadsheet, I’ve so far managed to map almost 20% of my genome at the grandparent level. So I have a long way to go.

I agree it need not be a priority for implementation in Gramps. If it were there, it would be helpful.

Anyway, do you think the data from the DNA Segment Map gramplet could be leveraged for this? Or would it need to be stored differently? I’m still on the fence as to how much of my match data I want load into Gramps at this point.

I cannot think of another data format that would be more useful than what the DNA Segment Map gramplet uses. By combining that data and the Common Ancestor calculation, I think you can derive all that is derive-able.

Is a spreadsheet a better tool for this? That is a good question.

I am surprised that you have 20% coverage at the grandparent level. How many people DNA do you have and how many are useful (greater than 1st cousin and a known location in your tree)? This may give a guidance on how large the data set should be needed to be useful.

Let me analyze my data (in a spreadsheet) and give a reference point for viability. For instance, if I have 20 people who share DNA that are useful, what coverage at the grandparent level do I get. I will review both on a per-chromosome level and overall.

That’s based on 70+ segments from 25 known matches, most of whom are second or third cousins, the rest more distant. I ignored a number of their segments which were overlapping, so as not to double-count them. (And the current total is not quite 20%, it’s more like 18.5%, including my Y chromosome.)

I have a number of more distant matches that I haven’t factored in yet, but they tend to be single, smaller segments and are often duplicates of what I’ve already included.

Some of the more distant ones are also difficult to resolve because one pair of my maternal great-great-grandparents were third cousins to each other; in that case I know which great-grandparent to attribute a segment to, but can’t be sure beyond that.

I reviewed all of my DNA data. I have 75 segments from 20 known matches at 2nd cousin or greater. For chromosomes 1-22, my grandparent coverage (omitting overlaps) is:

  • maternal-maternal: 15%
  • maternal-paternal: 0%
  • paternal-maternal: 6%
  • paternal-paternal: 10%

A spreadsheet seems like a better tool at this point for this analysis, given the low coverage.

I have to admit that this analysis was interesting and may lead me to encouraging certain parts of my tree to test or export Ancestry data to calculate shared matches.