I have a large database which I want to break up to make it more manageable and would like to understand how best to do this without losing any data.
Can I simply identify any completely unrelated trees within the database and export them into separate databases?
Can I apply export filters to create multiple output files, for example “all males”, “all females” and all “unknown” without breaking the links between male and female siblings or man and wife etc?
You can create a filter of the people you want for the export. People related to <person> is a possible filter. It would depend on who and how extensive you want to be.
You would not want to create an export that ‘breaks’ relationships. An “all males” filter will do just that. But it will show any of their families and all the females will be missing and show as an unknown.
You mention you want to do this to make them more “manageable”. One area that will be less manageable is the creating and managing of the place database. You would now have two of those databases that will probably have many duplicate locations.
Personally, I would keep the one database.
But, if you provide a little more information as to what and how these families are related (or not related) we can tell you which filtering options will be best.
And yes, any filter created will be able to include all of the information you have included in People and Family records while excluding everything not needed for the export…
If I export all the males and females separately, i can then work on the reports to remove duplicate people and fix names and locations etc. Then when i merge them back together will it “reconnect” the relevant relationships without creating duplicates?
My current database is made up by importing hundreds of disparate gedcoms together and so includes many families and many unrelated trees - hence why i need to cleanse the data. Ultimately i want a single tree with everyone related to my children both directly and indirectly no matter how remotely.
My database has 9million individuals so the only way I can process it is by splitting it up.
I download and added GEDCOMS when the internet first came into being but I merged the files as I downloaded. And I am at 200K records.
Since migrating to Gramps, I am documenting and confirming what I have. I am also extensively pruning. It is a slow process!!
Oh WOW!!??!!
You have 25 John Jones that are married to 25 Mary Smith’s. If you exported by gender and merged and cleaned, you would have one John Jones in a family record with no one, and in another file one Mary Smith in a family with no one. Bringing the two files back together in a new file you would have to bring John Jones back together in a single family with Mary Smith. And do that for all the Smiths, Jones, and Adams to Zwells.
Unless others know of ways to export – clean – then merge back into a new file, I see this as a Merge activity in the current tree.
The problem, you can only merge two items at a time. There is a tool Find Possible Duplicate People but I fear it would take so long to process it would be days(??) to make its first comparison. 9m records each compared to the other 9m records.
Some other questions…
Do you still have the original file before you started the imports?
When you imported, did the records get tagged with a common import tag or with a source/citation? Gramps has these options in >> Preferences >> General tab if you are not familiar with the concept. It would help separating the trees.
Truthfully, I fear the best course of action would be to do an export of your base families (you and your wife) out to as many generations as you feel confident and then bring those families forward to include as many cousins that are linked.
Then Importing that file into a clean empty database.
You would spend so much time paring the large database down that you would lose the forest concentrating too much on individual trees, and you have admitted to having a lot of trees, (Maybe a bad analogy considering doing genealogy looks at the family tree.)
I will be happy to help build that filter. But I think first you have to pick a plan of activity. Of course if others have other options for @CarlB258 to consider… please help.
Even GENMERGE cant handle the files becuase of it’s 200,000 record limit. Would be good to get hold of a copy of GENMERGEDB if that was possible? The “Find Possible Duplicate People” tools crashes after running for eleven days with no error messages.
If you have the original file, the one that started in all (I’ll call the Prime), than I would go back to that file. If you still want to have those other GEDCOM files merged with the Prime, then one at a time, import the GEDCOM and merge it before adding the second.
Tags are what you add to the records. The things in common will be the people and families from the two files, the Prime file and the imported GEDCOM.
As you Import a GEDCOM, you can add a Tag or Source to those records. See the Preferences General tab.
As an added aid, when I am manually Merging two people, I created a Merge tag that I colored Hot Pink. This is especially helpful making sure that the correct two John Smith records get merged. Obviously, the tag has to be cleared before using on other records.
As an added step, before adding the GEDCOM to your Prime file, import the GEDCOM to its own empty Gramps database. This may help you to evaluate that GEDCOM to see it it is a file you want to merge into your Prime file.
Because you are starting with a smaller Prime file, if you import and merge one GEDCOM at a time, the Find Possible Duplicate People tool may be able to function in a shorter time frame.
My Merge experience within Gramps is now confined to those odd people that I have as a child in a family now marrying someone that I also have as child of their parents. Whenever I add a new person, I also check to see if they are already in the database. I am long past done doing the bulk mass merging.
And remember, after people and families are merged, their records may now have two birth records that now must be dealt with.