Finding statistical anomalies

(Gramps 5.1.3-2 and Win10)

As I was doing tests for a response in another thread, I ran the Statistics Report and was wondering how I can isolate the oddities it finds.

In particular, it reported a high count of direct ancestors who were older than 249 years of age. I was able to find & fix most of these by sorting the Person list (filtered to my direct ancestors) on Birth Date, then sorting again by Death Date. Added an estimated death for those with a Birth date but no Death.

But when I ran the report again, it found 9 instances that had not been fixed.

Using the Statistics Gramplet, the tool will generate a drill-down list of the data being summarized. But that is not a feature of the Report.

I tried several combinations of birth & death ranges without success. And there doesn’t seem to be an Age filter rule that I could combine with a Direct Ancestor filter rule.

Any suggestions on how I find these persons?

Try the Records Report, deselecting all options except “Oldest living person” and “Person died at oldest age”. Increase the “Number of ranks to display” to see more people who meet the criteria.

1 Like

That did not get me there. It DID find 2 additional persons that Gramps thought were even older. The report was affected by children with birthdate typos. The Records Report decided the Parents were older than their Death minus their Birth because those children were born outside the parent lifespans.

I tried filtering for Ancestors still alive 10 years ago (thinking that would find Ancestors with a Birth but no Death) with no joy.

How about this … sort of hit-or-miss, but might catch some more … in the People view filter, enter for example “before 1700” in the Birth date and “after 1800” in the Death date and see what you get. Then play with the dates (and also with your Preference settings for “before” and “after”) to catch more people.

No joy there either. I ran all centuries individually from 1500 forward and looked at the less than a dozen before 1500. And set before, after & about to 10 years.

The report says that there are 10 alive in the line, including myself. There should be none but me. I suppose the next step will be to check the Report filtering parameters. Perhaps it is a christening or other birth related event.

Actually, since you’re looking for people who supposedly lived for more than 300 years, you’d have to run it for a span of more than one century. And it might be helpful to make the preference settings larger rather than smaller. If I understand those correctly, if you use settings of 10 years and look for people born “before 1700” and died “after 1800”, you’re really only looking at birth dates in the 1690s and death dates in the 1810s?

Better yet, try born “between 1500 and 1700” and died “between 1800 and 2000”.

is this a gedcom import?

How do Gramps interpret the “Living” flag if so.

greater than 100 years should find 300 year spans.

The reports are not individual files. The specifications are in %appdata%\roaming\gramps\report_options.xml and refer to

I have not yet found the filtering or summarizing code to determine what criteria is being used.

The problem is sure to be non-birth events that are redefining the lifespan in combination with other rules. I’ll probably be able to simulate it with custom filter rules once I discover the criteria.

No. These were all hand-keyed in, no imports.

Which is how the typos in the offspring births caught me. I need to look for a report on other events that might make Gramps think it has zombie parents. Perhaps Family event that occur outside the parent lifespans?

This is where a open interchangeable network graph format would have been great…

You could just have done a full exported the database to a graph, set color and other parameters on any related field and you would have found the “error” relatively easy because it would have been shown as an abnormal value…

How good are you with Excel?
You could try to export the selection with error with a few without error to xml and then use Power Query to import the xml to excel and see if you can find any of the abnormal date field there…

By using power query to import you can import each object in the xml file to a separate worksheet and look at each of the columns to see if you can spot any faulty values…

One sheet for Person, one for Family, one for Event etc.
It’s important that you split it that way if you are gonna try it, else you will get a lot of columns and it will be really difficult to see the patterns in each column/cell…

Use Excel’s feature for coloring cells, and set it to one color for all dates, one for NULL, One for and so on…

You could import to Openrefine to, but that would be a lot more job I think…

Should have asked this first, but I assumed you already run a fix and rebuild of index
This is my best advice at the moment…

1 Like

I could easily export and comb through the data with Excel.

It just seemed like an opportune reason to look at how Gramps coded old reports. Maybe update them a bit if they haven’t been reviewed in the last decade.

Nick’s update of the Age Stats Gramplet from an ASCII chart to a bar chart will be such an example for 5.2 version.

Pull 941 updated

dramatically improved to

1 Like

Totally agree with you…

Just thought it was a urgent problem…

I have seen that in my own data some time ago, but it was a test database, so I didn’t do anything with it, and Before imported the “live” data from Legacy to Gramps, I had multiple cleaning processes in both Openrefine and Excel as you may remember…

1 Like

OK. So I haven’t been able to locate the underlying code yet for this report. And tracking down the 9 offenders was painful.

Matt was nice enough to post a Age at Death filter rule. But it became pretty likely that none of these people had a death Event. So no joy there.

I found 7 by filtering on direct ancestors & people without a known deathdate. From those, I found a7 that had a Birth (or Baptism) but no Death. Quickly dealt with using an “Estimated Before” of the birth year +100.

This left 3 that were a REAL pain to find using features INSIDE Gramps. I ended up traversing up the tree and re-running the report. I figured they were being evaluated as alive… so born (this Year) minus their age. 1 was in my Maternal Grandfather’s line, the other 2 in his wife’s. I slowly walked up a couple generations at a time while looking at a Charts:Fan Chart (Full Circle type with an Age gradient) for the highest density of people in the 100+ color.

The hardest 2 turned out to be people with a Burial date but no Death date. But one had a granddaughter in the wrong family. (She was born outside the grandfathers’ lifespan. That made the him into a zombie in the report.)