This is a start of a “How do I…?” article for the wiki on Approaching Automated Data Harmonization.
Manually cleaning data to make it consistent and fill in omissions is purely a misery. There are excellent tools to clean messy data outside of Gramps. You could use OpenRefine or manipulate CSV data in a spreadsheet like Excel. That means taking the data out of the Tree structure of Gramps, losing the benefit of Gramps understanding how the data element interrelate, and risking data loss during the export/import cycle.
Some issues have arisen that affect broad swathes of the Gramps community. And the grumbles have lead some talented people creating a number of special purpose tools and gifting them to the community.
These sorts of Harmonizing operations tend to be relatively simple as add-on Tools. But continually asking for more bespoke Single Purpose Tools seems like a poor use of their energy. They could be doing more creative things.
The traditional alternative (when trying to do it yourself) is to get the model of data structures in a Gramps tree, determine which elements have the data needed to confirm the issue exists, then write a Python script (with the appropriate field names) to run through the data, check the conditions and records as needed.
It makes me tired to just write that out!
On the other hand, the SuperTool is already a dictionary of the data, fields and expressions. It enables a mildly technical person to hack on the the data.
I’ve wanted to try out some of the capabilities of the Isotammi Supertool for a while. But it was too intimidating to a basic user to just jump in and start experimenting. And the potential for mangling a Tree is … frightening. It’s more likely to turn a Tree into mulch than a bonsai masterpiece.
So I’ve been looking for a good (but small enough to be digestible) Real World target in data harmonization that would exercise the add-on, Something basic enough to be used as an introduction.
Suddenly, a task loomed that I was dreading: harmonizing the origin Types for Surnames and multiple surnames.
I discussed a couple particular scenario variations with Kari Kujansuu (the creator of SuperTool) and he cobbled together some example scripts. One was very basic and the second is a bit more complex.
Now we have a tiny chunk of code which is a solution that can be reverse-engineered to a known objective. It is not pseudocode and it is an issue that impacts everyone. So we may be able to learn how to re-create the progression from “problem statement” to working script using the tool.
Kari notes that if Gramps is launched from the Console (Command Line), then the SuperTool also prints all changes to the console.
The basic objective, simply stated
I want the blank Origin types for surnames that match the Father’s surname to be marked as “Patrilineal”. I don’t want to overwrite ANY existing Origin types. If I don’t have a father, I don’t want to ‘assume’ the origin type is ‘probably’ Patrilineal.
I know that I’ve overlooked some outlying conditions. (Naturally, blended family situations could affect the determination.) But we can come back to add those refinements.
The problem arises from depending too blindly on the built-in Guessing
Gramps has 3 Surname Guessing options to offer. You select the option in the Display tab of Preferences.
The default is a Patrilineal variant, labeled “Father’s surname”, which will fill in a Surname as a Father (or his offspring) are added. But this feature overlooks setting the Origin to ‘Patrilineal’ to match the guess type.
PreparationsExperiments in data harmonization should always be tried on expendable sample data. Do NOT test on your real data.
- copy the script below to a text file on your system
- Install the SuperTool from Isotammi
- Quit Gramps
- Restart Gramps from the Command Line (Console) to enable access to a change log
- Create a new Tree and import the Example.gramps file
- Select a subset of rows in the Families view
- Choose Tools > Isotammi Tools ▼ SuperTool…
- Click the Load button and open the saved Script. (This will only load, not execute.)
Warning: the following step modifies the Tree: To process the selected Families, click the Execute button
- Review the change log that was written to the Console.
The first scriptThis Families view oriented SuperTool script is named
set-surname-origin-to-patrilineal.script. This script processes ONLY the Families currently selected in the Families category view.