Making date input more flexible

Building on Dates are bold on importing for a few dates:

When I set Gramps to use Dutch dates in the dd month year format, it will recognize Dutch month names like Mei, Juni, and Oktober, or their 3 character abbreviations, which is nice when I use Dutch sites that write mont names in full. It will not recognize a format like 30-09-2020 however, which is how today is spelled on most Dutch archive sites.

I also often use English speaking genealogy sites, which use May, Jun(e), or Oct(ober), and these dates give errors here, because when configured as above, Gramps can only read Dutch names.

This means that in both situations, I have to retype the month name, from May to Mei, or 09 to Sep, which I find quite annoying. And I think that I have some code in my gramps34 branch that cures this, by trying English as a fallback, and by running the ISO date processor, and let it reverse the date interpretation when the day number is larger than 2 digits.

Reading (long) English dates in dd month year format will also cure the problem listed in the cited message, which refers to Ancestry always exporting long names, even though that is against the GEDCOM rules. And I know that these long names also appear in the GEDCOM files created by the GetMyAncestors tool, also known as FSTOGEDCOM.

Would it be an idea to integrate my hack in 5.1?

Maybe a Tool would add less overhead?

There is a “HasInValidDate” Events rule too. (It is a Manually Installable Isotammi add-on.) There might be a way to leverage that? Perhaps an export, exit Gramps, switch to the other language, restart Gramps and import? Unfortunately, that might overwrite or it might duplicate. (If it duplicated, I doubt the merge tools will recognize an Event as ‘mergable’ if it has the same ID but different date types: ‘string’ vs. ‘date’.) Do tests on a disposable Tree first before trying on your (recently backed up) main Tree.


Creating a SuperTool script to revalidate the date might be another option.

A script could leverage @kku’s HasInValidDate (part of HasValidDate add-on) query rule’s code and just try to recommit all the invalid date events while Gramps is active in the appropriate language.

To make it easy, I looked at my old hack on GitHub, and changed the ISO parser a bit, so that it swaps day and year when the day is larger than 99.

And that works.

I recently imported a 1996 Brother’s Keeper GEDCOM to a blank tree to make an example Gramps tree that actually invites exploration.

It had hundreds of approximate dates in the format “yyyy/yy” which described a date range in the same century. Gramps saw most of those as Invalid dates. (Unfortunately, some were interpreted as valid Gregorian/Julian calendar entries.)

So, with Kari gently coaching me, here’s an example SuperTool (for v 1.2.5, released 17Oct2022) Events script that tweaks the “Selected” invalid date.

It is in “test flight” mode to start. The “Commit” is turned off and the actual “Set” line is commented. So Executing the script merely generates a list so you can visually confirm that the data is reformatting correctly. It has a “Safety” conditional which only allow the script to work on non-blank but invalid dates. And only for Events that are SELECTED in the Events view.

Make a backup before running ANY SuperTool script

Try it by creating a dummy Event with the Date “1767/68”. Select that event, run the SuperTool add-on and try the Script.

When the test flight is good, remove the “#” comment, select the “Commit changes” and Execute.

Reload the original script before changing the "my_reformatted_date = " line to slice & dice replicas of malformed dates in your Tree. (Simulate the bad formats in the Example.gramps tree for testing. DON’T develop formulas or test them on actual good Trees.)

I had to disable the “invalid” safety when I tweaked formulas for some bad dates that Gramps THOUGHT were good.

[Gramps SuperTool script file]
version=1

[title]
Invalid Date re-Formatter

[category]
Events

[initial_statements]
# Invalid Date re-Formatter.script 16Oct2022 v0.0.4  for SuperTool v1.2.5 
# Reformats invalid Event date ranges imported from Brother's Keeper v4.2 GEDCOM in the form YYYY/YY
# where the portion before the slash is a 4 digit beginning year. After the slash is a 2 digit ending year in the same century 
# found how to get the date in the 'Events with an invalid date' (v 1.0.8) add-on rule
#    https://github.com/Taapeli/isotammi-addons/tree/master/source/_hasvaliddate
# visually confirm that the reformatting makes a valid date string
from gramps.gen.datehandler import parser

[statements]
if not (event.get_date_object().get_text() == "" or event.get_date_object().is_valid()):
    my_date = event.get_date_object().get_text()
    my_reformatted_date = "bet " + my_date[ 0 : 4 ] + " and " + my_date[0:2] + my_date[-2:]
    dateval = parser.parse(my_reformatted_date)
    #event.set_date_object(dateval)

[filter]

[expressions]
type, participants, my_date, my_reformatted_date, dateval

[scope]
selected

[unwind_lists]
False

[commit_changes]
False

[summary_only]
False

Dates present even more complexity.

The Events Filter Gramplet’s filtering:

  • is on the date value unless it is an Invalid date
  • does not search the formatted date string as displayed
  • has a doubled factoring of date range adjustments
  • does not apply RegEx in the compares

That means to effectively use the SuperTool to find and fix mangled import dates, we’re going to know when we’re searching the formatted date value (dateval) or the string (text).
image

Another complication in SuperTool searching Events is a need to consider the Fallback Event for Death and Birth dates… and the ability to exclude those from being considered. The same strategy could be applied to name searches.

Hi,
I’m new to gramps and just met a similar issue.
I exported a GEDCOM from familysearch. In Familysearch I have some dates that are in english and some other dates that are in Italian (they are both valid dates for familysearch).
My Gramps is set to use english language and YYY-MM-DD ISO format for dates.
When I import the GEDCOM, all the Italian dates are parsed correctly, but the Italian date (i.e. “16 GENNAIO 1950” for 1950-01-16) is copied into the text comment field of the event and this apparently invalidate the date and let it appear in bold.
I came up using the following with supertool:

  1. Events → created a filter on “invalid dates”

  2. Selected all the events

  3. With supertool:

Initialization:


import locale

from datetime import datetime

from gramps.gen.lib import Event,Date

locale.setlocale(locale.LC_ALL, 'it_IT.utf8')

def set_date(italian_date):

italian_date_to_dateObject = datetime.strptime(italian_date.get_text(),'%d %B %Y')

italian_date.set_yr_mon_day(italian_date_to_dateObject.year, italian_date_to_dateObject.month, italian_date_to_dateObject.day)

italian_date.set_text_value(italian_date_to_dateObject.strftime('%Y-%m-%d'))

return italian_date_to_dateObject



Statements executed for each object:

italian_date = event.date

result = set_date(italian_date)



Filter:

result

It worked, but even if I ticked “commit changes” anyway I had to open every single event and just press ok, otherwise even If everything was looking good, dates still appeared in bold as invalid (this is still much quicker than edit every single date). Maybe an option to just delete the text comments ? Because the only problem seems to be related to the text comment, since the date are parsed correctly.

As an example, here one of the problematic dates:

https://www.familysearch.org/tree/person/details/G6RJ-T3P

Hi Rocco,

Nice to see you here. Like I wrote on Facebook, the real problem is in RootsMagic, and maybe it’s better if we address the problem there. All dates are OK if you download Adele and relatives with Ancestral Quest or Legacy Family Tree.

The problem that we encountered is that RootsMagic downloads the date text entered by the user, and not the verified date object behind that, which is downloaded by AQ and Legacy.

Note that I marked this topic as solved yesterday, because I was tired, and assumed that the solution provided by @emyoulation would work, because it calls the parser to create a valid dateval. And that parser call is not in the code sample that you gave, see below:

I don’t see this parser call in your code, so I guess that that is the reason that it is not working. And this parser is called when you open the event and press OK.

IMO, using a tool is not the solution, because it causes extra work for the user, and the code to parse dates in English and the user’s language is available in every installation, I think.

And as a lazy developer, I also see no reason to compensate for download errors made by RootsMagic.

I asked Rocco to cross post. The thought was that maybe some discussion can help evolve his SuperTool script and make it more easily adapted by the next user.

The particular problem he describes is usually short-lived. So the first person makes a broad hack at a resolution, manually tweaks the results and never thinks about the issue again. The motivation to improve the script is gone.

The next person puzzles out what was done, adds some comments where it was confusing, makes the code a bit more flexible to address their problem and, again, moves on.

So, by sharing the evolution, we might have an increasingly capable script where there are only surges of motivation.

OK, thanks! Cross posting is a good idea, because Discourse gives us way more flexibility with messages, and also way more screen estate, to keep things clear.

I’m still in doubt about the right approach though. We have GEDCOM files with weird dates, caused by anomalities in other software, like Ancestry and RootsMagic, and we have things like the example that you gave for Brother’s Keeper, where people often use a slash to express some form of doubt. My late father did that too, to express that he wasn’t sure about the exact year, and I think that he did not use the slash to express a range, but rather two alternatives. I mean, that’s how we often use a slash on this side of the ocean, when it’s not Gregorian/Julian. That means that I might have a date like 18-10-2012/22.

Another thing that I often found is that he put a text behind the date, like RK for a baptism done in a Roman Catholic church. We write Catholic as Katholiek.

Date formats like these can better not be handled by the GEDCOM parser, because they are quite unpredictable, because they’re based on personal habits. And that suggests that they can best be handled by a tool, also for situations where one can assume that the author wasn’t aware of using the slash for Gregorian/Julian dates.

OTOH, I know that Ancestry writes long month names, when you export a GEDCOM from the site, so that anomaly is predictable, and can be handled by the GEDCOM importer. These months names are always English, so that any Gramps setup can handle it.

And statistically, one might argue that when non English month names appear in a GEDCOM file, they are often in the user’s own language, so that they can be parsed too, if that language is installed, which is the default in Linux and Windows.

Things can get a bit more difficult when there is a 3rd language, which can give dates like 10. März 1618. In this case, there’s not only a German month name, but also a dot behind the day, which is the equivalent of ‘th’, so 10. means 10th. That dot would be fine if it were used as a separator, like in 10.03.1618, but in this case it goes wrong.

I often find these by sorting all events on date, because with that sorting, all invalid dates appear between the empty and the valid ones.

All in all, I think that the proper approach may depend on the frequency of the errors, and their consistency. The long month names in Ancestry are a well known phenomenon, and dates in the user’s language are quite common too, so IMO, they should be handled by the GEDCOM importer.

The other ones are a perfect job for a tool, if there are too many for manual correction.

1 Like

The problem is that the dates were not saved (in FamilySearch or RootsMagic, etc) as a valid date and are imported as purely Text.

When we enter the date in an Event record, as part of saving the event Gramps takes the date, which we entered as text, and converts it to a valid date. You probably noticed that if you edit an event with these invalid dates the saved results converted the date to your desired format. Edit event, make no changes, Save.

I do not know of any bulk process that will take care of these dates. Others may have had found a process to make the conversion from Text to Date.

With SuperTool, it is simple to find all the invalid dates events, poll the text string, pass it back through the parser & commit the parsed value.

@DaveSch I know what you mean, but it isn’t true in the case mentioned by Rocco. And I know that, because when I look at the profile, the summary on top shows dates in English, or Dutch, when I switch FamilySearch to Dutch, and in the vitals the dates are in Italian, but there is no warning telling me that they are invalid, and need to be normalized. That means that in the FamilySearch database, they are stored as a valid date, and that date is retrieved by Ancestral Quest and Legacy. And I know that, because I tested this.

When I have the FS language set to English, the profile title says

Adele Zoppichini
13 May 1917 – 8 December 1958 • G6RJ-T3P

and when I switch that to Dutch it says

Adele Zoppichini
13 mei 1917 – 8 december 1958 • G6RJ-T3P

and the site can only do this, if these dates are valid.

And incidently, in German they look like this

Adele Zoppichini
13. Mai 1917 – 8. Dezember 1958 • G6RJ-T3P

which means that the dot behind the day number is valid in German, at least in the world of FS.

1 Like

That’s right, but I expect that the parser only speaks English and the language installed by the user, meaning that for me, it won’t correct a German date.

Interestingly for Windoze users, the PortableApps fork of Gramps comes with ALL languages installed & is easy to change the locale. So the same script can be run after switching to the identified language.

Awkward, but viable.

… and if you want, you can also select more than one language in the standard Windows installer.

Very true. But that re-installs and loses any patches you’ve applied to the core. I suppose to could designate a different destination folder and then un-install after.

I have found the SuperTool anything but Easy!

True… If Gramps has a steep learning curve, SuperTool has a vertical wall.

But in this case, it is just 3 lines & 2 options

[initial_statements]
from gramps.gen.datehandler import parser

[statements]
my_date = event.get_date_object().get_text
event.set_date_object = parser.parse(my_date)

With the “Selected objects” and “Commit changes” selected.

Then I would find an Event to test (note the Event ID & date) and execute the script.

If it works, process in batches. There a table display limit of 1,000 records somewhere, Python or GTK. So that’s the biggest batch you ought to process.

This revalidate might made a good addition to the “Calculate Estimated Dates” tool.

But there’s a LOT of interface and that might be difficult to use when looking at it in your invalid date’s target foreign language.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.