Matching duplicates before uploading (via ImportText)

Tim · December 8, 2022, 8:01pm

So this is (the start of) what I’m going to do with the SQL I ran on for an exported SQLite database

I have lots of records to upload, which I plan to use ImportText for, but many of the people in them will already be in my GRAMPS, and want to minimise the amount of times I have to use ‘Eliminate possible duplicates’

With the records to be uploaded, I’ll have a surname, given name(s) and a date, which might be when born / baptised, died / burried, but often also when they were involved in some event, like getting married. So I want to see if there is someone in my GRAMPS already where these fields match

The matches don’t have to be exact, and using soundex is going to help a lot with text matching, and there can be some tolerance on dates.

I’ve not started on doing this, but it’s going to be an exercise in using Dataframes and applying some functions I’ll write in my notebook, and I think I’ll be able to get fairly good matching, and use existing [I0000] format codes when I upload via ImportText

I’m aware I may be reinventing what is done with the GRAMPS tool to eliminate possible duplicates after they’ve been imported, but as explained in earlier posts, I don’t know how to access such code directly. I had a look to see where it was, so I could maybe do some cutting and pasting, but I couldn’t find it.

If what I’m doing is something the GRAMPS team can see other people wanting to do, then I’ve be delighted if this could be seen as some prototyping for future developements

GeorgeWilmes · December 9, 2022, 12:10am

Another option is to go ahead and import everything into Gramps (duplicates and all), and only then do the SQLite Export, At this point you could clean up the exported database using whatever tools you like. Presumably you would only be deleting rows from tables (including the “link” table, as appropriate). Whatever remains, you would then use to create a new Gramps tree using the SQLite Import. And then in Gramps you could run the various tools that check for consistency. I can’t say whether this would be more work than finding and modifying the Gramps code that checks for duplicates. I just know it would be the approach I would try first, since I’m much more comfortable with SQL than Python.

Tim · December 9, 2022, 12:02pm

That’s certainly a way I could do it. As things stand, I’ll still be writing some Python in my Notebook to look up likely (surname, given, (dates)) tuples against any exported SQLite db, and I’ll be able to use it to match persons already within the DB and do the deletes you suggest, as a way of bulk cleaning the exported DB.

system · January 8, 2023, 12:02pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Import from SQLite export Help third-party-addon	31	1107	April 1, 2023
Import duplicates and way to avoid or sort the issue Help	3	756	September 4, 2021
Personal introduction - steers sought on programmatic interation Help	21	853	December 15, 2022
Find duplicate persons in large database Help	3	496	August 1, 2021
Possible SQL server instead of SQLite and possible linkage using familysearch ID Ideas	7	207	March 25, 2024

Matching duplicates before uploading (via ImportText)

Related topics