Matching duplicates before uploading (via ImportText)

So this is (the start of) what I’m going to do with the SQL I ran on for an exported SQLite database

I have lots of records to upload, which I plan to use ImportText for, but many of the people in them will already be in my GRAMPS, and want to minimise the amount of times I have to use ‘Eliminate possible duplicates’

With the records to be uploaded, I’ll have a surname, given name(s) and a date, which might be when born / baptised, died / burried, but often also when they were involved in some event, like getting married. So I want to see if there is someone in my GRAMPS already where these fields match

The matches don’t have to be exact, and using soundex is going to help a lot with text matching, and there can be some tolerance on dates.

I’ve not started on doing this, but it’s going to be an exercise in using Dataframes and applying some functions I’ll write in my notebook, and I think I’ll be able to get fairly good matching, and use existing [I0000] format codes when I upload via ImportText

I’m aware I may be reinventing what is done with the GRAMPS tool to eliminate possible duplicates after they’ve been imported, but as explained in earlier posts, I don’t know how to access such code directly. I had a look to see where it was, so I could maybe do some cutting and pasting, but I couldn’t find it.

If what I’m doing is something the GRAMPS team can see other people wanting to do, then I’ve be delighted if this could be seen as some prototyping for future developements

Another option is to go ahead and import everything into Gramps (duplicates and all), and only then do the SQLite Export, At this point you could clean up the exported database using whatever tools you like. Presumably you would only be deleting rows from tables (including the “link” table, as appropriate). Whatever remains, you would then use to create a new Gramps tree using the SQLite Import. And then in Gramps you could run the various tools that check for consistency. I can’t say whether this would be more work than finding and modifying the Gramps code that checks for duplicates. I just know it would be the approach I would try first, since I’m much more comfortable with SQL than Python.

1 Like

That’s certainly a way I could do it. As things stand, I’ll still be writing some Python in my Notebook to look up likely (surname, given, (dates)) tuples against any exported SQLite db, and I’ll be able to use it to match persons already within the DB and do the deletes you suggest, as a way of bulk cleaning the exported DB.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.