[Living] Placeholder contamination of trees

A user just posted a very worrisome problem having to do with an export with [Living] record placeholders backwashing sewage into his tree.

He mentions using 5.2.2 and Gramps Web Sync by @DavidMStraub on macOS. Somehow placeholder data (found in the Text tab of preferences) found its way back into his main Gramps tree.


It is possible (even likely) that he had imported an export with placeholders into a new tree. But I am wondering if Gramps needs some prophylactic measures to prevent placeholder data from being propagated. And maybe some recovery tools.

Should the list of placeholders from the Preferences be part of the XML exported into .gramps and .gpkg files so that the import tools can recognize placeholder data and NEVER use it to overwrite real data?

Should Gramps Web Sync (on the private Desktop side), the Command Line merge tool, and import and Merge tool be aware of placeholder data?

Should a Backup browsing tool be created that imports a specific Primary Object record (by GrampsID or handle) from a series of backups (with the backup Archive cited as a citation and the Last Changed date restore after the “citation being added during import” mangles the timestamp). Such a tool would allow users to peruse a object that was mangled at some time in the past and determine the last “known good” archive.

References

I’d love to see that, if it’s possible, but it isn’t easy, because placeholder strings are language dependent, and can also be changed by the user. And I must add that I really hate them, because in many cases there are alternatives, like keeping a field empty, or using a symbol, which is sort of universal.

In this particular case, I hope that we can find a way to prevent this backwashing, and for other situations, including one that you mentioned, the extraction of event descriptions, it would really help if had a change log, and a way to roll back changes, also for situations like merging citations, where there is a optional protection against merging citations with notes, but there is none against merging citations with forms data.

1 Like

The user confirms having a backup before using the GrampsWeb Sync where the Placeholder problem doesn’t appear.

If he uses the current version to filter for the placeholder list, he can export a CSV list of GrampsIDs that are affected. The create a new Tree from an import of that backup and Export just those CSV IDs.

Can the Import and Merge tool be used to repair the placeholder names without affecting other portions of the data?

(I’m assuming some of the Placeholder people will have Alternative Names… and possibly some layers of Nicknames/Callnames… which the Living/Privacy export filter will have redacted too. Otherwise, I could just use Doug Blank’s 5.2 CSV Import fixes to overwrite the Given Name only.)

@emyoulation I don’t see what Gramps Web Sync has to do with this issue; it only replicates the data as it is in the database, so it worked as expected.

And I also don’t see why Gramps should prevent anyone from importing data with placeholders. If a distant relative shares an export with me where some names are redacted for privacy reasons, I would still want to import it because it might still contain useful information for me, like the number of children of a family.

So, for me, no need to change anything.

When syncing from the Gramps Web TO the Gramps for Desktops, the redaction placeholder of an updated person should never overwrite. Gramps for desktops completely ignores privacy… except optionally (typically opting to respect privacy) when exporting or writing reports.

1 Like

Can we find out what it actually is? You mentioned backwashing in your introduction, which implies that placeholders were sent back from Gramps Web into the desktop program. It is a weird thing though, because on a site, you can put the placeholders in the presentation layer, and there is no need to put them in the database itself. And when you put them there, there is no chance that they’re sent back to the desktop program, in a two-way sync.

If it’s an export, I’m inclined to say that it’s the users own fault, and I see no way to prevent it. And like @DavidMStraub I don’t want that either, because someone may indeed send me a file with placeholders, which I find perfectly acceptable.

1 Like

I’m trying to understand what could cause it. The first step of the sync addon is to download a Gramps XML export (by simply fetching /api/exporters/gramps/file) and import it into an in-memory SQLite database. I don’t see how this Gramps XML export could end up having placeholders in it.

1 Like

Was it a backup of the database? or was it an export which can be anonymized by the user when it is created.

1 Like

It seems like the initial export from Gramps for Desktops (which is subsequently imported into GrampsWeb) has to be to source of the placeholders. The string comes from the Text preferences and the data replaced with a placeholder (whether that is given name, surname or description) is specified be the Export Assistant settings, particularly the Person, Living and Privacy menu selections in the Filters and privacy.

It seems reasonable that the Gramps Web data is going to have a “more fresh” timestamp than the data being exported. (Certainly that will be true if Citation and Tags are added during import.)

At that point, won’t the Gramps Web Sync try to flow data from fresher records in GrampsWeb to stale records in GrampsDT? So this first use is likely when the Placeholders are mostly likely to incorrectly backwash.

(Placeholder records are unlikely to be updated using GrampsWeb… they are too ambiguous. Unless someone dies and the placeholder names are replaced when the death data is entered… and at that point the placeholder data has been replaced and ‘backwash’ is moot)

1 Like

I agree that would explain it. Exporting an incomplete tree, importing it to Gramps Web, and then synchronizing is something I would also classify as a clear user error. We have to check how we can make our documentation even clearer to make it less likely.

2 Likes

Perhaps GrampsWeb Sync could have an “Initialize” that validates connection and a good sync setup (requiring either: a blank tree online and a local populated tree; or, a populated online tree and a blank local tree) then does the initial sync … flowing from the populated to empty. (Bypass the whole Export Assistant “failure by user misunderstanding” point.)

Yes, that makes sense!

When I start an export in the desktop program, it starts with the last used settings. Can that be the same when you use the API, that it uses those settings? It’s something that you don’t want, because it makes the web tree incomplete from the start, but it might be the cause of this issue.

I don’t know anything about the API, so this is a wild guess, but it’s the only explanation I can find, and if it’s true, I hope that the API has a way to export all data, without any filter.

The backup seems to bypass the filtering.

That presupposes that the user wants the complete database to be present on both platforms. A filtered export would still be valuable.

1 Like

The way I read the response from @DavidMStraub is that is the only supposition that the Gramps Web Sync supports at this time.

Disappointing, because that will make a collaboration FAR more unlikely. My maternal cousins are not going to want to have their data cluttered with my paternal half of my tree. And Gramps is not good with harmonizing a shared branch between two local trees.

Heck… there isn’t even a way to have 2 trees up so you want clone a Family object (and its secondary objects) from one to the other.

The backup does, I think, but David mentioned export. And as far as I can see, these things are not the same, since exports can be filtered.

My main concern is, that if this contamination was caused by backwashing, where more things can occur than placeholders, like completely eliminating objects that are marked as private, we have something that’s invisible to users, and can go on for quite a long time before it is discovered. And that’s a sort of damage that I really don’t want in my tree.

I had such a thing when I migrated from Brother’s Keeper to PAF, where the latter silently truncated parts of my late father’s citations. I discovered that years later, when I saw truncated texts in Gramps.

As a software engineer, I always want to know why something happened, because in most cases, that is the only way to a possible cure.

1 Like

Your comment was about not knowing if there was the option to bypass filters in the API. I was implying that the backup is likely to be in the API as Nick has stated elsewhere that the backup is just an Export. (So I read that as the backup using part of the Export API.)

That’s true, and that’s why I wrote that my hint is a wild guess. It’s wild, but I can’t think of any other way to explain the pollution.

I have no idea what the API can and can’t do, but like in Air Crash Investigation, there’s often a sequence of events, and I think that we need to find the actual trigger, which can be a filter used in an earlier export.

The thing is, that I never buy it when someone says that what I just saw can’t happen, and in that case, I press on, till we find the real cause.

1 Like

If it’s an export, yes, but in that case you need a smart way to deal with the way back, because in that case, you can’t allow the filtered export to overwrite the original data, which does seem to have happened here.