Date quality vs. modifier

I am re-reading carefully the user manual and simultaneously study the source code.

I am puzzled by an apparent ambiguity in dates. A modifier is “about” which implies some uncertainty about the date. Quality could (should?) then be set to “estimated”.

Isn’t there redundancy between modifier and quality? Is a date with no modifier but with quality “estimated” equivalent to an “about” date? How is this related to the “precision” of about, before and after dates when comparing?

Would it make sense to consider that fuzzy comparison is enabled as soon as date quality is “estimated”?

Also what is the intended use of “calculated”? Does it also imply some fuzziness? Is it simply a flag on how the date was obtained?

I only ever use ‘about’, ‘before’, and ‘after’. Use of these implies an estimation’.

The ‘calculated’ identifier would be something like knowing the death date and the age at death which would give you a birth date, which may or not be the actual date. I just assume that any date without a source/citation is an ‘estimated’ date awaiting confirmation.

There are users that will add a s/c for these calculated or estimated dates; a census record can give an estimated birth year. Users add that census s/c to the birth record because they want to know where they got that estimated information. Personally I do not add the census source to the birth because it does not ‘prove’ the event.

FYI: Gramps allows you to define what ‘about’, 'before, and ‘after’ mean. The default is set in Preferences on the Date tab to 50 years. This comes into play in age calculations and more notably in filtering.

You set a filter to find all those born in 1900. Based upon the 50 year default, a person born ‘about 1950’ or ‘before 1950’ will also return as being true for the filter. The same for ‘after 1850’ and ‘about 1850’. The same is true if you enter as the filter option looking those born ‘about 1900’. Any birth date between 1850 and 1950 will return as valid. And compounded to include a birth date ‘before 2000’. So be mindful of how you create a date filter. A better filter would use ‘between <date1> and <date2>

You can also change the default ‘before’, ‘after’, ‘about’ definitions.

It is not because something happens “about” such a date that it is estimated. A battle took place around/about such a date, without further details, but it is not estimated, it did take place at this time, it is widely documented.

On the contrary, I can “estimate” that someone who marries was probably born 20 to 25 years earlier. I don’t really know, just a guess from experience.

One can also “estimate about”; I estimate around/about 1980 the date of my cousin’s wedding because I remember it was a little before my military service in 1981. Estimate because I have no really proof of it, about because I still have some kind of reference.

I do. I add the s/c to the birth record, not to it directly because, as you said, it does not ‘prove’ the event, but to an attribute Age associated to that record to document it.

A part of the answer lies in the code.

As far as computations on date are concerned, MOD_ABOUT is treated the same as QUAL_ESTIMATED. The difference appears mainly when a human-readable representation is generated. Prefixes will be different.

It looks like date quality has no practical utility. QUAL_NONE need not be specified (default) and attaches no prefix to displayed dates. QUAL_ESTIMATED can be advantageously replaced by MOD_ABOUT. There remains QUAL_CALCULATED which seems to originate in GEDCOM but it is used nowhere in Gramps.

GEDCOM also has INT which stands for “interpreted” or “interpolated” from other information. Gramps maps this quality to QUAL_ESTIMATED.

So, in the end, it seems that date quality in Gramps is imported from GEDCOM but is not actively used. Fuzzy dates about, before, after and between provide the same information for comparisons, sorting and computation. The only added value of quality is to offer a reminder about researcher opinion on date reliability and the way it was obtained, a bit like the trust level for sources. Under this condition, wouldn’t it be interesting to “open” more wildly the date qualification, like event or note types can be customized? This departs from GEDCOM compatibility but this is already done (see my remark about INT).

What I would like is less precise than “between” and less fuzzy than “about”, “before”, and “after”.

Currently I don’t use “about”, “before”, and “after” because (1) I can’t come up with preference settings (+/- number of years) that I want to use in all cases, (2) if I were able decide on some settings, I would have to revisit all of my dates if I ever made a change to those settings, and (3) when I export the data, those values are lost anyway. Instead, I use “between” when I have evidence for the possible range of dates.

What I would like, I think, is to somehow specify the +/- tolerance for “about”, “before”, and “after” for each estimated date that I enter, rather than as global preferences (although those preferences could serve as defaults).

There is another thread that suggested the code should be updated to use the about, before and after differently. If it was about and the preference was set to 3, then if YYYY was entered, then it would be YYYY +/- 3 years; if YYYY-MM , then it would be +/- 3 months and if YYYY-MM-DD then it would be +/- 3 days. The same logic would be for before and after.
I would like to see this change to the code. It makes sense.

1 Like

It really doesn’t make any more sense than using the “Between” that already exists and is supported in GEDCOM. It would just complicate the interface. In all probability, maybe 1 in 500 users would realize a variably scaled fuzzy feature was there or understand how to use it. And fewer would be able to convert it for export.

It might be something that could be shoe-horned in as an extension of the Date Calculator … provided the scalar functionality was visible enough and the Date Calculator could poll (to Expression 1) & push (from Results) date field values for the active dialog.

Say that you had a “about Aug 1900” Event. The date calculator might poll that and express it explicitly (after checking the Preferences) in 3 equivalents:

  • About Aug 1900
  • Aug 1900 ±50 years
  • Between Aug 1850 and Aug 1950

Then you could tweak any of those and the Calculator could recalculate the Equivalents. (Dimming any that can’t be pushed currently. For instance if you changed “±50 years” to “±4 months”, then only the recalculated “Between Apr 1900 and Dec 1900” is compatible with the date validator. The “About” preference is limited to years so it cannot be changed to “4 months”)

I think the About qualifier is very useful and necessary. Take, for example, my G-G-Grandmother. I have no conclusive evidence of her birth date. The most persuasive information available is that her age was recorded as 101 years on her gravestone following her death in 1885. (Thus her date of birth is “Calculated”.) However, she appeared in Census records at least 3 times and her DOB calculated from her listed age is different each time. Cagey!

Until I find something more trustworthy, I think the About qualifier is an excellent way of reminding myself that there is conflicting or incomplete evidence.

Incidentally, if I calculate a date (based on a person’s age at some point, say) and I feel that the source is trustworthy but there is still an element of doubt, I record it as a “Calc” date. If I have conflicting evidence or I feel the source is not necessarily trustworthy, I will record it as “About”. IOW, to me, About has a greater degree of uncertainty than Calc. I can’t express that as a range of years since it is very different talking about the 1700’s or the 2010’s!

Craig

1 Like

@trlvn: what you write confirms my opinion that date handling is not well specified in Gramps and GEDCOM in its reduced purpose (information exchange, not genealogical data management) doesn’t help.

I feel that the “value” (in its mathematical sense) of a date is some YYYY-MM-DD which can be recorded more or less accurately thanks to the before, after, between and about modifier. This “value” can then be annotated with a trust level or a reminder how we obtained the date. This role could be supported by “quality”, provided it is a bit extended.

We thus have on one hand a value made ready for calculations, without the need to reference another field of object Date. On the other hand, there is “quality” which is a flag but does not participate in calculations. This flag could start with the GEDCOM options “estimated”, “calculated”, “interpreted” but, just like note types, user should be free to add any variants like “low trust”, “high trust”, “grand-father memoir”, “drafting record” as an opinion about the source or even a sub-qualification for the source.

Decoupling the date itself from its “quality” is a good thing IMHO. This means that present usage of an “exact” date + “estimated” quality should be replaced by “about” date + “estimated” quality so that quality need not be sensed before computing. Anyway present behaviour where only “estimated” is interrogated is faulty.

@emyoulation:

How would you handle the case where you know for sure the MM-DD of an event but not the YYYY? For example, you know from records that a person was born on St. John’s day (June 24th) but the year is imprecise.

I see no way in the present date structure to express this. A date can be YYYY (some time within the year), YYYY-MM (in the month) or YYYY-MM-DD (exact day), i.e. Date assumes a “prefix” notation of uncertainty.

Oh, I quite agree. (Although my preference is for a “circa” rather than the “about”.) I’ve reduced the approximation by 90% though: from ±50yrs to ±5 years. It works fairly well for those purported ages for ladies on Censuses too… provided they shaved their ages rather than slashed!

My point was that an “about” that varies concurrently within a tree is a scary concept. So varying the approximation would have greater clarity using an explicit “Between now and then” for that particular purpose.

@pgerlier If I understand you correctly, you seem to be saying that modifiers like “About”, and “Calc” should be eliminated and replaced with a ‘trust level’ or ‘quality’ indicator.

In Gramps now, when one adds a Source Citation to an Event (like Birth), one of the available field is “Confidence”. The choices for Confidence are Very low, Low, Normal, High or Very high. I believe the intention of this field is directed at the same thing as the ‘trust level’ or ‘quality’ that you mentioned. The Confidence field does not allow user-entered values, however, like “drafting record”, etc.

I don’t know if or how the Confidence field is expressed in GEDCOM. Personally I think I’ve only ever changed the default a handful of times. [Is this value displayed on any of the standard reports?]

When you say that “date handling is not well specified in Gramps”, I would have to disagree. I feel Gramps has quite an appropriate set of tools for recording dates. There may be edge cases like knowing the month and day of an event but not the specific year that are not directly accommodated. I think that such anomalies are more than adequately covered by Gramp’s ability to add Notes and Attributes to Events. Supporting more edge cases almost always makes software more complex for general use in order to make it simpler for an uncommon situation. Not a good trade-off in my view.

Are you comparing Gramps with another program that you’ve used. Perhaps if you could direct us to an example or screenshot that illustrates a better approach?

Craig

3 Likes

I should also add that I believe the Circa, About, Est., and Calc modifiers all have a long history in genealogy–far before it was common to use software to record such data. I’m sure I have a family history prepared in the 1970’s that has these indicators sprinkled throughout. Unfortunately, the document I’m thinking of is in a storage box that I’m not prepared to dig out right now.

We don’t have to keep doing things the way they were done in the past but I think it remains to be proven that this was a “bad” practise.

Craig

Yes, the limits of the GEDCOM design are why the Confidence functionality is as limited as it exists today.

There have been discussions of making more useful… to the point of including negative confidences for disproven postulations.

One of the few places I’ve noticed Confidence being woven seamlessly into the GUI is with Christopher Horn’s experimental Browse view [formerly called the Person view Profile mode or the Relationship view Linked mode].

@trlvn I probably was not clear enough. I highly value the uncertainty encoded in the Date concept of Gramps. Modifying a date with exact, about (circa), before, after and between is an absolute need in genealogy. My purpose is to separate this uncertainty from qualifying flags.

Presently, according to the code, there is an ambiguous way to confer “about” uncertainty to a date. There is the obvious “about” qualifier in the date “value” itself and there is the “estimated” quality which implies automatically “about”. This means you can write an estimated date as YYYY-MM-DD or about YYYY-MM-DD with the exact same effect.

I think this is not good because when data is exchanged it can be interpreted differently by a human reader. In my view, the modifier+date should bear the uncertainty by itself and the “quality” should be at researcher’s disposal to annotate how (s)he arrived at such a date. When quality is not blank, this should attract reader’s attention and warn him against immediate conclusions.

Since there are many ways to derive a date, the quality (as it is presently named) should not be limited to what GEDCOM defines (all the more since INTERPRETED is commented out without explaining why). I think it comes as a complement to source confidence level (which I erroneously called “trust level” in my previous post). A record has a global confidence level but individual data in it may have a different one. This is why I suggested that the date “quality” might “patch” the source evaluation. The “quality” should then be any reminder useful to the researcher/reader.

A Gramps date object holds two YYYY-MM-DD t-uples. This is done to easily encode a fuzzy date given as “between”. So far, so good. But, the same payload is used for specifying a duration, i.e. a start date and an end date (for an event).
The vocabulary to distinguish between both is not obvious (range vs. span) and I must always refer to the user’s manual to make sure I use the right category. And the situation is even worse with translations (e.g. in French “étendue”=extended vs. “incrémentée”=incremented which is even less understandable).
Also, the fact that the payload is fully occupied in the span case implies that both start and dates are exact and we can’t add any modifier lest the date interpretation reverts to single date.
When we have a fuzzy duration, should we drop range date for an event and replace the duration event by an “about” start event plus an “about” end event?

1 Like

I always think…

between <date1> and <date2>” as a single definite event occurred on a date at some point in time between the two dates. A range of time. A person was born “between <date1> and <date2>”.

from <date1> to <date2>” as an activity that started and ended on the dates. A span of time. A person was in the military “from <date1> to <date2>”.

1 Like

In other words, instead of saying “estimated between date1 and date2”, we would say “between about date1 and about date2”?

And if one of the dates is not fuzzy, then we could say “between date1 and about date2” or between “about date1 and date2”? (The latter sounds bit awkward to me.) I think I might prefer those to using “after date1” or “before date2”, respectively, if I have some idea of approximately how many years after or before.

It might be nice if users could choose the wording they prefer, regardless of the internal representation of the dates, as long as it still makes logical sense.

This makes use of a “span” date and a “quality”. But in my reading (with my own background), it could mean the event (e.g. a residency) was “active” between the mentioned dates and the dates resulted from an estimation or interpolation from sources.

This is what we can do with the present date implementation. But I’d like to separate uncertainty from “quality”/annotation.

This is not possible presently and as you point out this can become cumbersome and ugly. The solution, which is probably the best direction for ease of reading/understanding, is to split the residency event (which has a duration) into two “instantaneous” events: a start-of-residency and an end-of-residency. Thus we can tell:

  • start-of-residency “before date1” plus estimated, calculated, interpolated, written evidence, whatever, …
  • end-of residency “between date1 and date2” plus lapse-in-records, whatever, …

More generally, this calls for clarifying the notion of a Gramps event. I took the example of residency which obviously spans some period of time. But Gramps has some difficulty to record the uncertainty we can have about it. Gramps has good tools for “spot” events (instant events) and can translate all our doubts about them.

Consequently should we restrict ourselves to instantaneous events?

Instead of recording “sick from date1 to date2”, we should enter “diagnosed sick on date1±n days” and “recovered on date2”. This may seem more work (2 events instead of 1) but brings more versatility.

Sorry, I should have said “from … to” instead of “between” but yes now I understand what you mean.

Another advantage of using two events, in that example at least, is that the Place might very well be different for the beginning and ending of the illness. On other hand, it might not necessary to specify those places if you had separate residence event(s) that covered those dates.

How do you folks handle this condition?
Baptism record shows March 15 , 1850; however the birth was registered in the 2 quarter of 1850.
There is a birth and baptism event, what dates are entered in each?