Interest in enhancing verify.py

Hi,

I might be willing to work a bit on verify.py, which is the foundation of Tools → Utilities → Verify the Data. I have the following ideas:

  • use a TreeStore to group the results by their rule instead of a flat list
  • add a check which verifies that children of a family are sorted in the correct order (by birth)
  • add a check which verifies that families are sorted in the correct order for a person (only if children with birth dates exist, or marriage/divorce dates are given)
  • add a check which verifies that baptisms happend at around the same distance from the birth within a family with some deviation (probably configurable). This is to ensure a warning when usually a baptism happens within 2 month for a certain family, but there is one child where the baptism happend 10 years after the birth.
  • add a check which verifies that burrials are either not too far away from the death and/or are happening around the same distance after the death within a family (like the one above with baptism).

Please let me know if there is a general interest within the project of those changes/enhancements before I start investing more time here.
I am an author of a big GEDCOM file (One place study of a village) and need those checks. I was thinking to write my own Java tool (bc I’m a professional Java dev), but it might be better to enhance Gramps so all could participate from it?

Attached you’ll find a screenshot which shows my current progress (bare with me, its far from completness!!) - only featuring the grouping so far…

Its a start, but before I continue I’d like to check what you guys think about it.

The tree list looks good. I also don’t see a problem with adding extra rules.

This is interesting. Thank you for working on it!

Note that one of the difficulties in previous tools doing verification tests and auto-sorting has been related to supporting imprecise dates. Date ranges, spans, and approximations all create their own special cases to be handled.

For example, the ‘before’, ‘after’ and ‘about’ date preferences cause a lot of problems when auto-sorting birth orders of offspring. (As do undefined dates)


When I do certain filtering operations, I’ll temporarily tweak the Dates preferences values to reduce false positives and overlooked items in the results. For example, an ‘about’ year for a guestimated birth of a child is more likely to be ±1 year rather than ±50. An overly broad ‘about’ wreaks havoc in the birthorder sorting too.

Meanwhile a ‘before’ date in a filter with only 50 years doesn’t even cover a “probably alive” full lifespan.

Also note that some other ‘sanity check’ and ‘data harmonization’ tools exist. The Isotammi project has built a significant data validation toolbox. You might save both yourself (and them) some duplication of work by beginning a conversation with @kku (Kari Kujansuu)

A built-in utility: the Verify Data tool:

1 Like

Thank you VERY much for this community-minded choice.

Might I suggest that you also validate that the importer and exporter round trip doesn’t lose any data in your dialect of GEDCOM?

For example, if the other tool used to produce the original GEDCOM uses any custom tags, you might want to create a set of modified import/export addons that puts that data into an Attribute instead of a Note. Then outputs that Attribute as the original custom GEDCOM tag. (The GEDCOM Extensions import plugin might already support some dialects.)

Thanks for your opinions on that topic @Nick-Hall and @emyoulation.
It might be a good idea to split my ideas into lets say two seperate things - a tree-alike apperance as one part and additional rules as another one.
You are right about those issues which come up when processing dates. Not rly sure how other tools out there handle it, but I’m in favour of only signal an error/warning when the data in question is precise. So birth dates which are only “about” might be ignored when it comes to ordering childs by their birth dates and complaining when they are not ordered correctly. One could suggest, that the validation of the correct order of children by birth should follow the same rules the “order children tool” in gramps.
On the other hand, a burrial date like “before 1900” might be considered as an error when the death was at 1901-02-04… well - its a can of worms waiting to be opened :wink:

And regarding the GEDCOM import/export topic - Sorry for the confusion I caused. I maintain my tree in gramps and use GEDCOM only when I export it so the data can be used by the sites webmaster which hosts my work.

If there is also work done in that area in other places that would be nice too I like to avoide redundancies. I tried to check isotammi but unfortunally all the links into their Wiki appear in finish to me :frowning:

1 Like

Yes. Small changes are easier to test and merge.

1 Like

With their permission, we’ve re-worked some of the tool README.md data into english on our wiki. Being a monoglot, I am overwhelmed by their (and your) willingness to communicate in a non-native language.

I have had some success in using the browser translation tool plugins. Although the occasional “hovercraft full of eels” translation problem occurs. And the translation plugin refuses to perform quite frequently. So I have to resort to cut&paste of paragraphs using Google Translate.

Funny, I also had the idea of using a Treeview - my experimental version looks basically identical to yours:

I wouldn’t say that Isotammi has a “significant data validation toolbox”. We built some tools to fix problematic GEDCOM files and some tools to help in mass editing data in Gramps - but those don’t help much in validating data!

la 26. elok. 2023 klo 17.44 Brian McCullough via The Gramps Project (Discourse Forum & Mailing List) (notifications@gramps.discoursemail.com) kirjoitti:

3 Likes

Hello Oliver,

May I suggest a third thing: Creating a modeless window for this?

I have my tree on Geneanet, and use the consistency check there, because it is independant of Gramps, so that I can put the list of warnings in one side of my screen, and work with Gramps in the other. I can then check which warnings I correct, and which ones I ignore, and these checks are saved on the site, so that, when I’m tired, I can pause my work, and come back later.

And sometimes I also use the Genealogica Grafica program, in Virtual Box, for the same purpose. That’s a bit faster, because it runs local, and it’s also persistent.

With a modeless window, one might be able do the same in Gramps, meaning running a check, moving the results window to the side of the screen, and changing ones focus back to a view on the main screen, where one can edit persons and families, and their relations.

Here too, it might be nice to be able to save check marks for all sorts of situations where you know that the person was re-buried years and years after, or married very late in life, etc. etc, Those are things that could be stored in attributes, or saved somewhere else, depending on how far we might want to take this.

Having a modeless window would be a good start. :slight_smile:

Thanks for presenting your ideas!

Enno

I agree with you, that the UX could be enhanced in that tool. I’ll check what I can do to enhance it. But one step after another - I’m still in the process of getting settled with Python and Gtk :slight_smile:

Persisting the “ignore” flag is also on my wishlisht for super long - but that seems no easy task to accomplish right now. That was one of the big reasons why I didn’t really used that tool - because I get swamped with thousand of “errors” which are no. But persisting that “ignore” flag needs a design concept on how to persist it where. Like you would need to keep a list of ignored rule IDs at each Person or Family? I’m currently not digging any further into that.

I used Genealogica Grafica too in the past, thats where the idea of the additional filters basically came from.

You knew that you can double-click each row to open the person or family editor to fix the issue? Would also be cool that the warning gets re-evaluated after the editing dialog is closed again - and eventually gets finally removed when the issue got fixed though… :wink:

there we go :smile:

2 Likes

So - while I continue implementing additional verifications (BaptTooLate and BuryTooLate already done) I came accross the fact, that for verify.py only events of type Baptism are considered “Baptism-like” events. Other software also considers Christening a “Baptism-like” event. I noticed that because I always used Christening so far and well… no baptism related rule violation was ever triggered :slight_smile:

What do you think about trying to fall back to a Christening Event if no Baptism Event can be found?

get_bapt_date

def get_bapt_date(db, person, estimate=False):
    """get a person's baptism date"""
    bapt_date = get_date_from_event_type(db, person, EventType.BAPTISM, estimate)
>>  if bapt_date == 0:
>>      return get_date_from_event_type(db, person, EventType.CHRISTEN, estimate)
>>  return bapt_date

Another Issue I noticed - I have people with a recorded Christening/Baptism event where they acted as a godparent - so the role was not primary but something else. Unfortunally the event was nevertheless picked up as “their event” because the PRIMARY role was only checked for burials. I am really wondering whats the reason for this limitation to Burial?

get_date_from_event_type

def get_date_from_event_type(db, person, event_type, estimate=False):
    """get a date from a person's specific event type"""
    if not person:
        return 0
    for event_ref in person.get_event_ref_list():
        event = find_event(db, event_ref.ref)
        if event:
            if (
                event_ref.get_role() != EventRoleType.PRIMARY
>>              and event.get_type() == EventType.BURIAL
            ):
                continue
            if event.get_type() == event_type:
                date_obj = event.get_date_object()
                if not estimate and (
                    date_obj.get_day() == 0 or date_obj.get_month() == 0
                ):
                    return 0
                return date_obj.get_sort_value()
    return 0

I’d like to remove that consideration for Burial events only and consider found events only if they are of role primary regardless of their type. What do you think?

1 Like

You might also want to insert checks for ‘Unknown’ roles. An ‘Unknown’ Role is always an error.

An early error made by many users is to use the clipboard or Sharing for Events… but fail to set the Role. New users become so conditioned to the role being automatically set to ‘Primary’ (or ‘Family’) during an ‘add’ that they don’t even discover the Roles feature until later.

(The add-on Event custom filter rule to find Unknown roles was very useful. It is used in the workflow fixing key Events that don’t show in Relationships view, charts or reports.)

1 Like

Of course, as you write that, you also pull an example out of your personal toolbox of the same thing he’s doing! :grin:

And you probably don’t even recall the number of made-to-purpose SuperTool scripts you could pull out.

I think that if @OlliL had some of those, he might decide it was worth having a way for verify.py to be expandable with those scripts … maybe to convert the existing checks into scripts in a batch script execution framework.

There was an earlier thread about spacing of children

1 Like

I’ve added 8 more rules in my fork - not sure how to submit it as a pull request because it also contains my other changes (but the patch is applicable to the original master version too because the affected code parts are not interfering).

So - those are the rules I added:

  1. Baptism too late according to family tradition
    This rule determines the median of days between birth and baptism over all childs of a family. It then compares the days between the birth and the baptism of the person in question with also allowing some grace period of deviation. Currently that grace period is hardcoded with 120 days. Should this be a parameter or might this confuse the user?
  2. Burial too late
    A Burial is considered “too late” when its more than 14 days after the date of death. Should this be a parameter or might this confuse the user?
  3. Children are not ordered chronological
    Birth dates (if not existing and estimation is on, Baptism dates are used) are checked for each child of a family that its ascending through the list of children. Children without any of those dates are ignored.
  4. Families are not ordered chronological
    This Rule uses the marriage date and evaluates that the families are ordered in a chronological order for a person. If no marriage date is available a divorce date or even the birth date of the oldes child of each family is used. The birth date as last possible fallback is used to account for non-married families with illegitimate children.
  5. Family has events of type Unknown
  6. Person has events of type Unknown
  7. Family events not ordered chronological
  8. Person events not ordered chronological

Discussion about those rules in general, phrasing and spelling as well as the code in particular is welcome.

This is interesting but… my tree often has burial events where the day is approximated to the month. Typically, interment was within 2 days. Often there will be a burial notice in the paper within the week of death but without a day of interment.

Since FamilySearch changes ‘Between’ spans into ‘From-To’ spans, I generally just use dates without days to approximate these. (e.g., died 18 March 1954, buried March 1954) That gets awkward if they died end of month… or end of year. But having FamilySearch implying the burial occurred incrementally would be worse.

I have those too. Mostly with birth and baptism where only the baptism date was recorded in the churchbook though. I tend to still record a birth event on the same month because I like to write down the place of birth which can be different than the place of baptism.

Thats why the “Baptism too late” or “Burial too late” rules only check when birth and baptism / death and burial dates are specified exact on the day. Approx., Before, After, only Month, only year and so on make those two rules skip (if you find they are not skipped it is a bug in my implementation :smile:) - because well… you could have have a recorded death in 1880 and the corresponding burrial in 1881 - only 2 days appart in reality :smile:

1 Like

Yes, I knew that I can double click, but the editor doesn’t give me the context that I need. I often use the relations view for that, or whatever that is called in your language.

A possiible work-around would be the ability to export the results in a report. You can then open that in a word processor, and edit it to remove the persons that you’ve dealt with.

Flags are nice but may need more thinking. And maybe we don’t need flags to ignore specific warnings, but the opposite, like flags that you are very sure about a specific fact, like a marriage at old age, or that reburial.

Ok, but you can easily move the verification result window to another monitor and use gramps and its views already while the report is open? So I’m not sure whats missing, sorry.