I have been downloading some trees from familysearch, exporting them from rootsmajic to Gramps then importing the ones I want into my main file. Unfortunately I had it to include sources and now all my sources are doubled. I see the menu item to merge citations but nothing for sources. Please tell me I don’t have to merge all these identical sources one at a time. I have completed some extensive work to the file since importing and can not feasibly go back to the previous. I searched for this topic but nothing came up
On the Sources category view you can Merge Sources
How did you search ? Did your search include the index page of the user manual and using the Find in page feature of most Internet browsers?
oh my gosh, I missed that. THANK YOU!!!—
nope - not what I am looking for. I need to merge duplicates of about 500 sources
well crud, that just lets me merge two at a time - I knew about that part. I need to merge a bunch
Checking the feature request shows 10047: Multi-merge more than 2 items at a time and all the associated request!
OK, thank you very much.
if anybody needs me, I’ll be merging 500 duplicate sources. LOL
Are they identical Source records? Or is it the same source but identified differently? (like MLA vs. AP stylings of the same source.)
What about the Citations? are they duplicated too?
everything was duplicated from where I imported a file I had created on my gramps. double everything and each identical. I finally have the sources merged (took me all night) but the merge citations did not seem to work. Each citation has double of each of my notes (such as links, transcripts, etc)
Deduplication is a common operation in Big Data. I wonder if we could leverage an XML tool to do this task externally?
There an interesting blurb about an XML deduplication tool called XClean, developed at Hasso Plattner Institute in Potsdam. But the link is bad.
Maybe one of our German users can check with the researchers?
I think that is above my ability to use things like that. But, perhaps one of you smart folks can keep that in mind for a future “fix” for the next person that makes a big screw-up like this!
Give it a day or two for something to percolate here. Some of the external tools are quite automated. And the process of saving your Tree as XML, cleaning the XML & importing to a fresh Tree requires less effort than half a dozen merges! (Although inevitably forget to reset the Preferences items critical for customized new record creation: ID formats, Default family relationship, et cetera.)
The duplicates won’t replicate further in the meantime.
Here’s a way.
I am assuming you have 100 entries of the exact same source each with one citation.
Pick one source and copy the source to the clipboard. Then in Citations, filter on the Source Title. This should bring up the relevant citations.
You can then select say 20 of those citations at once, right click, edit. It will bring up 20 stacked edit windows. Then it is simple to substitute the source of each citation for the one on the clipboard. Yes, it is repetitious but easier than select two, merge, select two, merge…
In the end, one source will have all 100 citations and 99 will have none. Then it is a simple matter to Remove Unused Objects…
Tip: Make a slight alteration to the Source title on the clipboard. XX- to the start as an example. This will help identify which citations have been cleaned.
Note: I find this stacked edit option only works if each edit is exactly the same. You cannot be asking yourself, which edit window gets which edit. All edit windows getting the same edit works best.
Partial data harmonization scan leave a person with a bigger mess than before. Or having done unnecessary work.
For instance, merging citations would’ve serendipitously ‘orphaned’ the duplicate Sources. Those orphans would be ripe for automated pruning with that Remove Unused Objects tool.
Once the Citations are merged, each Citation isolates a much smaller pool of duplicated Notes, Gallery items & Attributes. It’d be a shorter automation process to compare each of the small pool.
Come to think of it, if these are exact duplicates and the People were merged already. deleting the new duplicated would be MUCH faster than merging too. You’d only need to Merge if they were not synchronized. Right?
Acting prematurely is probably a mistake.
well, there is not 100 of each source with same citation. there were 2 of each source with the exact same citations. Now there is only copy of each source, but each citation in the source has double the “notes” for example, for each citation I usually use at least three notes, 1 link to online data, 1 transcript of whatever it is and a “citation” note for the actual citation information. now inside each each main citation, each of these notes are doubled.
The only way I see to get rid of all the extras is to go into each one manually and remove the extra lines. Unless someone knows how to bail me out.
thanks for the help
qestion to any developer regarding this issue…
Would it be possible with Python to create a feature that use hash values to do an automated search and replace for all objects in a database, and maybe have a few parameter set manually for what field to use…
I.e. for a person, Look for all/any names field, birth date, one parent link, create a hash for those values and search for any other person with the same hash for those fields and merge…
Citations; create hash for source, date, volume and search all citations, merge any with same hash.
and so on…
To be able to do a clean merge of an entire database the script should possible start with repositories, then sources, then citations, then Places, then Families, and then People
Some AI library might be of use for something like this?
It would be possible to do a job as this in Openrefine or Microsoft Excel with Power Query, from a xml file, but the biggest job would be to write the cleaned data back to a new xml file.
In both Excel and OR you would need to set up a complete XLST to be able to write data to a new file. I have still no clue how to do that… sorry.
Can you go into the Notes view, use whatever filters might help, possibly also sorting by last changed date, select all of the unwanted notes, and delete them all at once? You will get a warning that they are in use, and that by deleting them you will also delete any references to them, but that’s what you want, right?
I have been looking at the notes view since you posted and I think that might work. I do notice that I will have to be extremely careful as some of the references seem to have been lost! Not for each one but on some of them, although the note is identical, one of them has no references. Oh boy, at least I think this way will be quicker than trying to open up each citation.
Oh and for reference, I did run the merge citations tool last night, I think that is what got me into the duplicate notes
You could go back to a previously saved version of the tree and start afresh. Only applicable for later versions of Gramps.
I would but I would have to do days worth of research and entries to get back to where I need to be, because I did not catch the duplicated sources for a few days