Merging Filter framework

Merging a duplicate Family Grouping is a massive & fussy undertaking. But it could be made more manageable.

@prculley recently gifted us with an “Active Person” filter add-on Rule. Along with a similar rule that returned all the People of the current Relationship view, I thought Filters could be built based on these 2 rules for each of the other Categories.

Those rules could make a more natural system of walking through to double-check a family. It could also make merging duplicates in all Categories a cleaner & more confident process.

Unfortunately, it can’t work yet.

The idea was:
Given that you found a duplicate John Smith… when you merge the 2 profiles, you’ll also want to merge duplicate Events, Families, Citations, Sources, Repositories, Notes, Media, Places, Addresses, Attributes. (Then work through those of the duplicate parents, sibs, spouses & children.) But… it is painful to find anything but duplicate people for merging. The other views don’t have the navigation features similar to what has been created for People records.

I found I could simulate the navigation for Events & Places.

I already had an ActivePerson filter. It has been useful for making a dynamic version of any rule that allows a Person <filter>.

I built (slow) simulation of a Relationship people lookup with the following custom Person filter rules named ActiveRelationship:

  1. Active Person
  2. Parents of <ActivePerson>
  3. Siblings of <ActivePerson>
  4. Children of <ActivePerson>
  5. Spouses of <ActivePerson>

This restricts the post-merge People view to immediate family & duplicates. So far, so good, so slow. I can Merge People with this too because the view doesn’t re-filter as I change the Active Person or merge.

(I would’ve used Matt’s Degrees of Separation rule… except that it doesn’t yet have an Active Person option.)

I copy the duplicate results in the Person view to the Clipboard as a W.I.P. list. (It is easy to lose track of work as everything changes dynamically.) As a person was completed AND their immediate family was merged, I removed them from the clipboard.

Family
[Can’t search for Families where people returned by a Filter are one of the spouses]

Events custom filter rules named ActiveRelationshipEvents:
Events of Persons matching <ActiveRelationship> (include Family events)

Places custom filter rules named ActiveRelationshipEventPlaces:
Places of events of matching <ActiveRelationshipEvents>

(It’s nice to have an alternative that finds the same but only those with no Lat/Long. So you can fix Places that would fail to plot & thus be able to map every Event)

You cannot filter for Sources, Citations, Repositories, Media, or Notes associated with People, Events… or much of anything else.

The rules based searches that are possible are also incredibly slow.

Let’s try to break it into actionable steps and see if there’s any low-hanging fruit.

It sounds like you want a family filter rule called “Families of people <filter> match” (Matches people that are members of the same family as anybody matched by a filter), and in your case, you would use your custom ActiveRelationship rule as the <filter>?

Can you suggest specific filter names and descriptions, as I did for the family filter above, that would be the most helpful (if you had to prioritize them)? I, too, have often thought it would be nice to have more filters in these areas, but haven’t spent much time thinking about the details. It sounds like you’ve got some specific use cases in mind.

There’s already a places filter rule called “Places with no latitude or longitude given”. Can you include that in your custom rule?

Yes. I used that rule when the example.gramps Tree wasn’t plotting enough Events for a useful illustration to add to the Ancestors Map. [ I could have also used the Events Coordinates Gramplet ]

It was just mentioned as an ‘aside’ because checking for Places associated with a Person (or Family group) that are not plot table is a secondary use for a similar rule set.

The naming convention seems to be more like
Families of Persons matching

Although I’m not sure we’ve seen any notice of an official convention. The overriding concern might be the limited default width for displaying the Rule names. Not many characters are being shown.

Frankly, I haven’t found a lot of reason to use the Families view.

There was a recent maillist posting about it being a way to simultaneously merge 3 objects: the Families, the Fathers & the Mothers. This was intriguing as it the only simultaneous merge I’ve seen in Gramps.

If there was a streamlined way to filter to just families of a duplicate family grouping, merging using the Families view would be more efficient.

Not necessarily:

  1. Events or other objects might be from an other unknown person with the same name connected incorrectly to one of the two persons you want to merge.
    For auto-merging all of them you’d need to make sure before that all objects are the right ones.

  2. You also have the problem of conflicting objects during merging, e.g. having different dates, places, etc. So you will probably always need a way for users making decisions in the merging process.

That can be added if needed.

1 Like

And that’s why this is about winnowing data for examination and simplifying selection for manual merging comparison. It isn’t another pie-in-the-sky request for auto-merge.

Right now, it is easy to see when you have duplicate Events in a post-merge Person. (Using the Edit Person or Events Gramplet) But, once you’ve identified Event merge candidates, you still have to find them in the Events view to select & merge.

(For Events, this is easily done, I can use an ‘ActivePersonEvents’ custom filter. It also shows that person’s Family Events. Suddenly the Event view only has less than a screenful of Event records… and I don’t have keep swapping back to the Person view to see if I’ve harmonized all the data that ought to be harmonized. I used to Tag them from the Person Editor view and filter on a ‘MergeCandidate’ tag.)

But, you can’t do a similar winnowing filter for Citations. It is further complicated because there could be Sources tabs at the Person level, Individual Event levels, Place level, Note level, Note-for-a-Place level, et cetera ad nauseam.

The citations from an import might be nearly the same. Perhaps the contributor massaged the format to make sorting more meaningful – one might have recorded leading zeroes in the page number or used different Source title format (“The Evening Sun” vs. “Evening Sun, The” vs. “periodical; Evening Sun, The”) Perhaps one citation has a Transcription note but the other doesn’t. Perhaps one has fully qualified publisher info (Company; place; pub. date) while the other merely has publication date. Perhaps one found the same microfilm roll at a different Repository. The possibilities are endless because the identifier data is in free-form text fields rather than being structured. But the Citation should still be merged.

Here again, it would ease the harmonization process if the Citations view could be winnowed down to just those related to a <filter> (whether Active Person or ‘persons currently shown in the Relationships view’.)

Even if a Citations HAD the option to show those applied in particular <person filter>'s Sources, I don’t know how you’d write filters that simultaneously found citations from all the recursive sub-levels where a Source could be applied. Although this seems to be done with the Citations Gramplet

But, if you don’t harmonize the sources & citations, you end up with a nasty mess in a report that would compile Endnotes & Bibliography.

An Active Person option would be lovely! At your convenience, of course.

Thank you for the custom filters. I may need to pick your brain to design a template for documenting Rule add-ons. (Like the “How do I” wiki template.) You’re creating useful Rules so quickly that the docs are lagging!

One idea from the past was to allow merging of events, media, cititaions etc. from the EditPerson etc. dialogs. In the displaytabs. This requires changes in Gramps main code, but may be easier for the user than trying to create filters and use them to find the appropriate objects.

Food for thought.

3 Likes

I always find it easier to just delete the extra events. I make sure that the record I am keeping has any of extra items, the citations and attributes on the event, as an example.

But usually, I am merging before I am starting the full documentation process. I would find it strange to get two people fully documented and then discover they need to be merged.

Would adding binary selection and a Merge button be easier to add to the Gramplets than the Main Code?

The duplicate records for the Active Person can be seen in these Gramplets: Ancestors, Descendants, Children, Attributes, Extended Attributes, References, Citations, Events, Events Coordinates, Gallery, Details.

If I could, the I could detach all those Gramplets, arrange them on a 2nd monitor and just step through merging in a filtered People view & the Persons duplicated secondary objects merge via the Gramplets.

No simultaneous visibility of duplicate Sources, Repositories, Notes or Addresses via Gramplets

And while this works for an Active Person, it wouldn’t allow a family or filtered set of People to be worked simultaneously.

There a 2 paths where I see this happen:

  1. Importing
  2. entering data from different line genealogies without recognizing that there’s a crossover or collapsed pedigree within the same Source

In the 2nd case, they tend to be fragmentary bios. Something like: “in 1877, he was joined to Mary, daughter of William Jones of Bethel. She survived him by 4 years.” But Mary already exists (with no death or spouse details) under William Jones’ wife’s line. Chance of recognizing this early is slim.