Restoring merged sources

Yesterday night, I discovered that, earlier this year, I messed up a few dozen sources by using the Automerge feature that comes with the Isotammi tools. This feature works great for locations, when you merge those on title, but it can merge sources by title too, and that’s quite destructive, when you have sources that are all titled ‘email’, but which are all by different authors. In that case, I definitely don’t want to have those merged, but the Automerge did that, because it merges on title, so it does what it says, and not what I want.

As a consequence, I need to restore these sources from a tree that I backed up in January, and I see no clear way to do that, using regular means. If I’m lucky, the connected citations are still intact, so the obvious solution would be to export all citations connected to sources titled ‘email’ to a .gramps file, and merge them back into my latest tree, assuming that their handles, and possibly theit ID’s, haven’t changed, so that it’s easy to restore all relations.

To my regret, I can’t find a filter for this. I can select the proper citations in source or citation view, but all export facilities work on a person filter, so the regular export does not help.

I bet that I can solve this by exporting both trees to normalized SQLite, and writing some advanced queries, but I’ve never tried that, and fear that it will be more complex than the alternative, which is manually restoring the sources from a source and citations report.

Is there anyone with experience in this? My latest experience with Oracle is from 2010, so that’s a bit old, and I have no idea whether I can do this with an advanced query using the supertool.

Hacking and merging Gramps XML may be easier, in this case.

In case it helps in some way, earlier this year @kku published the “Import CSV sources” tool among the Isotammi addons. See discussion here.

I’m happy to help you create SQL queries.

Did you try “Export View” to create a CSV export of the Sources view? (After filtering on the desired sources, not selecting them.)

I just did, and it gave me some interesting files, and if you like to see them, I can send them by mail. I’d rather not upload them for public view, because some relatives don’t want their name on the web.

So if I understand:

  • Your January database has many different sources titled “email”, each having their own citations.
  • Your current database has one merged source titled “email”, having all of the citations from the previously distinct sources, but the citations are still attached to the correct people, events, etc.
  • You’re able to export a CSV file of the various “email” sources from your January database.

Can you now:

  • Import the exported sources into your current database using the Import CSV Sources tool.
  • Manually reattach the citations to the correct source.
  • Delete the merged source.

Or is there more to it than that? Maybe you also had Notes and Media attached to the Sources?

Your understanding is mostly right, but for point 3 it seems to be better to export from the combined source/citations view, or from the citations view, modified to show source ID, author, and title, so that it shows a join of the source and citation tables with citation ID, date, volume/page, and source ID, author, and title.

When I export the reults of this join, I get 59 rows from the old tree, and 56 from the new one, which suggests that some citations have been merged, because they had empry dates and volume/page fields, and they could be merged, because they were attached to the same source after the Automerge.

The main thing that I like to avoid now is that manual labor of reattaching citations. If I was happy with that, I could just as well use the report, which is still an option. And at the database level that would basicaly mean that I update the source reference for every citation matched by ID (or handle) and to recreate the missing sources, some of which may have notes attached too. And at the moment, I don’t see how Gramps can do that, but I may be missing something. It’s a long time since I imported anything from CSV, and I need to make sure that the program doesn’t create new citations, but only recreates the citation source links, and the corresponding sources, which are all ‘new’, except one.

I think that this is possible when I modify the citation ID column, so that the importer will look for existing ID’s, and not try to create new ones, because that would spoil the whole effort.

In SQL terms, this means adding the old sources, where one might need to be replaced instead, and updating the source references in all existing citations with matching ID’s. When I can do that, only 3 citations will have to be recreated and reconnected to their original objects.

And that might be a fun project with Visual C# and XML queries too. :slight_smile:

So, to sum up: If I want to do this with SQLite, I need to:

Insert all missing sources and attached notes, overwriting the one that was left.

Update all source ID’s for the citations that point to the merged source to their original values as found in my old database.

Can you write some SQL for that?

Importing the CSV’s does nothing here. Any idea why?

Are you specifically running Tools → Isotammi Tools → Import Sources, and not the usual import from CSV? It has worked for me.

The import program won’t do anything with citation information, or notes. It only handles source data (title, author, abbrev, and pubinfo) and repository data (ID of existing, or name of new). Other columns are ignored (although I tweaked my copy to also create repository ref fields mediatype and callnumber).

Let’s assume for now that the insertion of sources will be done with the Isotammi Import CSV Sources tool.

Are all of the notes currently attached to the merged source, just like the citations?

Once you’ve imported the sources, and maybe manually created the three citations that got merged, then your current database will have the distinct objects that existed in the January backup, but they just won’t be connected properly.

And also reattach the notes to the correct sources.

Working with SQLite exports of the old and new databases, I think it’s a matter of generating INSERT/UPDATE/DELETE statements to fix the “links” table in the newer export and then reimporting it as a new database. Assuming that you use the original Gramps IDs (the visible ones, not the underlying handles) when you import the sources and recreate the three merged citations, then the old database can be used as a point of reference. The “links” table contains handles, not Gramps IDs, but all that can be sorted out correctly as long as the Gramps IDs are the same between old and new.

It looks like my Isotammi tools are a bit outdated, because I copied them from my old PC, and didn’t import them from GitHub, like I should normally do. And that version, there is no Import CSV Sources, so I tried the regular import.

Anyway, because I suspected that some things might be not as consistent as I hoped, I used the source/citation report to see how the original sources were attached, and reattached the citations to newly created sources, that now have a title like ‘email ’, so that they won’t be merged with the old Automerge anyway, making them also look better on screen.

And while doing that, I also found that some of the emails were in notes attached to sources instead of citations, so there were a few that I had to move too. Most of them were attached to the citations though, like I expected, because that’s the strategy that I now use to store a sequence of emails exchanged with the same author. Doing everything manually was also a nice way to review those email exchanges, add some formatting, and make sure that the private ones got the right flags.

I found one changed ID too, so merging on ID wouldn’t have been safe either, but merging by handle would probably have been quite safe, if I’d done the whole oparation in SQL, like I originally had in mind.

All in all, it was a nice experiment, and I thank you for bearing with me. :slight_smile:

Yes, I figured you could be done already if you had gone with the manual approach! Just to clarify on the IDs vs. handles, if you had re-imported the sources, they would be given new handles since they are new objects. Keeping the Gramps IDs the same would have let us query the relationships in the old database and then (by generating appropriate SQL statements) establish corresponding relationships in the new database, even though those relationships are actually encoded using handles (which would be different in old vs new databases). If you had 600 of these sources rather than 60 then it could have been worth a try.

That’s right, it would have been worth it, with 600 of these sources, and in that case, I’d do it all in SQLite, so I would not use that importer either, and only use joins on handles, because they are persistent, at least in this case, where I have two generations of the same database.

This would then imply a join between tables in two different databases, and that’s simple enough. The only problem with that is that I find it hard to figure out what’s going on inside, when I run such a query.

And that’s why it’s still quite tempting to do the whole thing in C#, right on the Gramps XML.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.