Date quality in GEDCOM exports

The Gramps data model supports date quality on all dates including compound dates. This allows dates to be marked as estimated or calculated. A common example is when a source includes a person’s age on a given date. In this case, a date range for the date of birth can be calculated from the age and event date.

About 10 years ago we included this functionality in the GEDCOM export. However, we included it as a non-standard extension. GEDCOM does not allow recording of date ranges or spans to be calculated or estimated.

We have received a pull request #987 that removes date quality from compound dates in the GEDCOM export. Will this cause anyone any problems? Do you use this feature?

Perhaps one of our GEDCOM experts would like to comment.

I have not used GEDCOM since migrating to Gramps and importing from a GEDCOM.

If I have estimated a date, I would use between… and … or use the calculated date but not label it as such. I treat all dates as unconfirmed until I find documentation for it.

So in short, this change would have no affect on my work.

1 Like

Does this PR recommend Gramps use the DATE_APPROXIMATED (or DATE_PHRASE or DATE_RANGE) field instead of a DATE field for compound dates? Or just adopt a lossy approach by discarding the date compounding & exclusively use the simple DATE ?

From page 80 of Tamura Jones’ proposed 5.5.5 GEDCOM:

DATE_APPROXIMATED:= [
ABT + space + | CAL + space + | EST + space + ]
where:
ABT CAL EST
= About, meaning the date is not exact.
= Calculated mathematically, for example, from an event date and age. = Estimated based on an algorithm using some other event date.
space = U+0020, the Space character

About Abt
The FamilySearch GEDCOM specification provides three different date modifiers for approximated dates, with three slightly different definitions. The implied demand to use different date modifiers for slightly different situations is not placed on genealogy software, but on users. In practice, users almost always use ABT. Few users even know that usage ofCAL or EST is possible.

The PR removes the quality modifier from compound dates in Gedcom exports. So, for example, a date calculated to be between the 1st and 31st of January 1900 would be exported as “BET 1 JAN 1900 AND 31 JAN 1900” instead of “CAL BET 1 JAN 1900 AND 31 JAN 1900”, resulting in data loss.

1 Like

In that case, it seems like a bad idea.

Since the DATE_PHRASE is a standard GEDCOM field, why wouldn’t we opt for a non-lossy approach by using that instead of DATE?

Especially since a symmetric export/import has the potential to re-parse the Phrase back into our fully recognized Date format with zero loss?

1 Like

This is a case where you have to read all of the GEDCOM spec, at least around DATE. If you look where it is acceptable to use DATE_APPROXIMATED, you will see

DATE_VALUE:=
[
<DATE> |
<DATE_PERIOD> |
<DATE_RANGE>|
<DATE_APPROXIMATED> |
INT + space + <DATE> + space + (<DATE_PHRASE>) |
(<DATE_PHRASE>)
]

which indicates that the DATE_APPROXIMATED cannot be used with DATE_PERIOD or DATE_RANGE, it has to be one OR the other.

I started the PR to fix this because someone posted a question about Gramps failing a GEDCOM validator; they wanted to make sure the validator was correct. The consensus was that Gramps is in error.

2 Likes

Yes, valid dates include:

DATE BET 1 JAN 1900 AND 31 JAN 1900
DATE (CAL BET 1 JAN 1900 AND 31 JAN 1900)

There is a note in the Gedcom 5.5.5 specification that PAF actually allows date phrases without parentheses. So the following would be valid:

DATE CAL BET 1 JAN 1900 AND 31 JAN 1900

I think that anything that give data loss, should be avoided, and changes should only be done if there was tags that could replace whats done today…
Even though Gramps has multiple superior export formats, not many of them are supported by other software, and if you have the data, it would be easier to ask for support of that tag from others…

But… Can’t this be stored with a custom tag “_GDCA” or something simular? Just so that the data is not lost for those that use gedcom, untill a new gedcom standard is actually supported be most vendors?

I decided to merge this pull request.

In the future our Gedcom export should conform to the Gedcom 5.5.1 specification using the unofficial 5.5.5 specification for clarification where necessary.

It is interesting to note that the FHISO ELF date microformat doesn’t allow compound dates to be specified as approximate or calculated/estimated either. The GedcomX date format doesn’t allow calculated/estimated dates at all.

We can always add a custom tag if necessary, but would anyone use it?

I don’t know what others do use, I can only speak for my self and what I believe would be of some help for others…
I myself don’t use calculated dates at all, I do use periods and “before” after" and “appr” “between” and would have like to have “to” and “from”, but those are only used where I have some event with dates and I know that the event where I use it was somehow in related in time to what I already have…

But, I was going to use Calculated as a “marker” for those census records that have calculated birth year from an age-field, I start to get a few of those, and many of them are from 10-15 years ago, imported with gedcom from Legacy, so it’s just a big mess anyway, and therefore I was going to restructure all the information…
And it would be great if it was saved to gedcom, because I still save gedcom copies as “backup” since no other software read the Gramps xml, or any of the other export formats…

As long as its still in Gramps, its not a “problem” for me personally…

The GedcomX format are as limited as the Gedcom 5.x.x format regarding dates, its still created on the mindset of LDS and what they “need”, not whats used in the “real world”…
and FHISO seems to create standards that are compatible with Gedcom 5.5.1, so its as limited as that…
I know Gramps devs can’t do anything with this AND, the most important thing is to be able to create gedcoms that can be read by other software…


As a side note: Both Legacy Familytree and RootsMagic use Calculated dates, but I have never thought about how they export them…

this is how it looks in a Gedcom 5.5.1 UTF-8 file from Legacy

0 HEAD
1 SOUR Legacy
2 VERS 9.0
2 NAME Legacy ®
2 CORP Millennia Corp.
3 ADDR PO Box 9410
4 CONT Surprise, AZ 85374
1 DEST Gedcom5.5.1
1 DATE 16 Feb 2020
1 SUBM @S0@
1 FILE R:\Applikasjon - Databaser\Legacy Familytree\Gedcom\Test of calc dates.ged
1 GEDC
2 VERS 5.5.1
2 FORM LINEAGE-LINKED
1 CHAR UTF-8
0 @S0@ SUBM
1 NAME Ikke gitt
0 @I1@ INDI
1 NAME Jaran //
2 GIVN Jaran
1 SEX M
1 BIRT
2 DATE Cal 1967
1 CHR
2 DATE Cal 1967
1 EVEN Myndighetsår
2 TYPE Myndig
2 DATE Cal 1985
1 _UID A151947C970F417CA93FA9BF3735878B3398
1 CHAN
2 DATE 16 Feb 2020
3 TIME 20:24
0 TRLR

1 Like

Do they allow date ranges and spans to be marked as calculated?

Legacy somehow support Cal between “date 1” and “date 2” , but it gives a warning about unsupported date format… Legacy still calculate for it in the chronology view, so I think it just cut out the prefix when its unsupported
Legacy do export it to the gedcom as “DATE Cal A ‘date 1’ - ‘date 2’”

In its Chronology View it look like this

Name: Jaran
Life Range: Cal 1967 - Present
Age:

0	Cal 1967	Birth:	_______________.  
2	Cal A 1969-1971	Christening:	_______________.  
18	Cal Bef 1985	Myndig:	_______________.  Myndighetsår.  

‘’’


RootsMagic do not support “Calc CA 1880”, it do support the input, but do not use the date field in any calculation…

As I wrote, never used it before, and when looking at it now, I don’t think this is the feature i’m going to use, the CAL alone is enough, I do not need to have a calculated period of time (the period in it self are some kind of a “calculation”)…

So with this, I will only say; I’m sorry, you got it right :slight_smile: