Tree vivisection experiments with the Isotammi SuperTool

This is a start of a “How do I…?” article for the wiki on Approaching Automated Data Harmonization.

Manually cleaning data to make it consistent and fill in omissions is purely a misery. There are excellent tools to clean messy data outside of Gramps. You could use OpenRefine or manipulate CSV data in a spreadsheet like Excel. That means taking the data out of the Tree structure of Gramps, losing the benefit of Gramps understanding how the data element interrelate, and risking data loss during the export/import cycle.

Some issues have arisen that affect broad swathes of the Gramps community. And the grumbles have lead some talented people creating a number of special purpose tools and gifting them to the community.

These sorts of Harmonizing operations tend to be relatively simple as add-on Tools. But continually asking for more bespoke Single Purpose Tools seems like a poor use of their energy. They could be doing more creative things.

The traditional alternative (when trying to do it yourself) is to get the model of data structures in a Gramps tree, determine which elements have the data needed to confirm the issue exists, then write a Python script (with the appropriate field names) to run through the data, check the conditions and records as needed.

It makes me tired to just write that out!

On the other hand, the SuperTool is already a dictionary of the data, fields and expressions. It enables a mildly technical person to hack on the the data.

I’ve wanted to try out some of the capabilities of the Isotammi Supertool for a while. But it was too intimidating to a basic user to just jump in and start experimenting. And the potential for mangling a Tree is … frightening. It’s more likely to turn a Tree into mulch than a bonsai masterpiece.

So I’ve been looking for a good (but small enough to be digestible) Real World target in data harmonization that would exercise the add-on, Something basic enough to be used as an introduction.

Suddenly, a task loomed that I was dreading: harmonizing the origin Types for Surnames and multiple surnames.

I discussed a couple particular scenario variations with Kari Kujansuu (the creator of SuperTool) and he cobbled together some example scripts. One was very basic and the second is a bit more complex.

Now we have a tiny chunk of code which is a solution that can be reverse-engineered to a known objective. It is not pseudocode and it is an issue that impacts everyone. So we may be able to learn how to re-create the progression from “problem statement” to working script using the tool.

Kari notes that if Gramps is launched from the Console (Command Line), then the SuperTool also prints all changes to the console.

The basic objective, simply stated

I want the blank Origin types for surnames that match the Father’s surname to be marked as “Patrilineal”. I don’t want to overwrite ANY existing Origin types. If I don’t have a father, I don’t want to ‘assume’ the origin type is ‘probably’ Patrilineal.

I know that I’ve overlooked some outlying conditions. (Naturally, blended family situations could affect the determination.) But we can come back to add those refinements.

The problem arises from depending too blindly on the built-in Guessing

Gramps has 3 Surname Guessing options to offer. You select the option in the Display tab of Preferences.

The default is a Patrilineal variant, labeled “Father’s surname”, which will fill in a Surname as a Father (or his offspring) are added. But this feature overlooks setting the Origin to ‘Patrilineal’ to match the guess type.

Preparations

Experiments in data harmonization should always be tried on expendable sample data. Do NOT test on your real data.
  • copy the script below to a text file on your system
  • Install the SuperTool from Isotammi
  • Quit Gramps
  • Restart Gramps from the Command Line (Console)
  • Create a new Tree and import the Example.gramps file
  • Select a subset of rows in the Families view
  • Choose Tools > Isotammi ToolsSuperTool…
  • Click the Load button and open the saved Script. (This will only load, not execute.)
  • Warning: the following step modifies the Tree : To process the selected Families, click the Execute button

The first script

This Families view oriented SuperTool script is named set-surname-origin-to-patrilineal.script. This script processes ONLY the Families currently selected in the Families category view.
2 Likes

Two tests but difficulties too: une petite mine d'or - Forums Geneanet [fr]

1 Like

SuperTool and SuperFilter too !

1 Like

Creation and sharing of a new attributes filter for sources or citations, filtering is based on attributes name or value.

Very easy (to create and to share) with ST.

Note/Idea: It would be cool to be able to use a path (defined in preferences?) for includes, something like media path (relative to the preferences path).

1 Like

Note/Idea: It would be cool to be able to use a path (defined in preferences?) for includes, something like media path (relative to the preferences path).

There is now a new version at isotammi-addons/source/SuperTool at master · Taapeli/isotammi-addons · GitHub which has this feature.

Thanks for the suggestion.

Kari

1 Like

Hi Kari.

I had misplaced the Deep Connections Graph Gramplet before ever trying it. (I rarely add anything to the Dashboard) It requires looking at the .gpr.py file to discover that it was a “Dashboard only” gramplet. So I’ve finally had a chance to try it.

It was interesting. It did NOT like my big Tree – it kept saying that trying again might help. But, after a few attempts, the Gramplet worked with the PseudonymTree.gramps file. But that is a simple tree for testing Graphs. It is without any pedigree collapse… so it does not exercise the tools very well.

https://gramps-project.org/wiki/index.php/PseudonymTree.gramps

There were 4 items that didn’t have English translations

the “Uudelleenyritys voi auttaa” (“Trying again can help”?) error message and the “child/sibling/parent” labels. Could you make those translatable?

Those “child/sibling/parent” labels overwrite the vertical line connecting the boxes. Perhaps you could move them to the right about 15 pixels? (Or you could just add a non-breaking space to the front of each string.)

And would you consider feeding the current “Home Person” to the Person1 selection and the “Active Person” to Person2 when the browser is started? I would happily close and re-launch the browser each time to avoid the Person selection dialogs on my 40,000 person tree!

I posted a feature request to make the Add-On list server location more friendly to 3rd parties. It looks like the idea will meet a lot of resistance.

https://gramps-project.org/bugs/view.php?id=12363

Thanks!

1 Like

Thanks. Again good suggestions - I hope I am able to make the changes. It might be difficult to move the labels since IIRC the complete image is generated by Graphviz. The gramplet needs other improvements also… Maybe I will also move it to the Tools menu

2 Likes

I thought that moving the labels might be a pain. Perhaps padding those strings with a non-breaking space at the left (& on the right, to support Right-to-Left localization) would do the same thing with little effort?