Import sources in bulk

chorlito · October 23, 2020, 8:12am

Gramps version 5.1.3-1 under Windows 10

Having used other software, I have a folder of my sources – text files, images, pdfs mostly. I cannot find (in help wiki or this forum) how to load all their names into my Gramps sources, other than to type up a form for each one. I don’t expect to live that long, so if anyone can point me towards a bulk operation, I’d be grateful.

SNoiraud · October 23, 2020, 8:55am

There is no bulk management in gramps.
I am not sure what you want to do is a good thing.
You’ll have a lot of document without any relation with events, persons, …

chorlito · October 23, 2020, 9:29am

Thank you for the ‘no bulk in Gramps’ note. Appreciate your time on that.

DaveSch · October 23, 2020, 12:20pm

Welcome

I assume you did an import from your previous program into Gramps. How did your previous application handle Sources and Citations?

Even if you could now do a bulk import of just Source information, that is all you would have, a list of sources not attached to any person or event.

When I migrated my database to Gramps, I had a database that was not adequately sourced. So now I am going family by family, person by person, adding the source/citations records. In the process, I am able to add more information about these people and families.

And yes, it is a slow but rewarding process.

emyoulation · October 23, 2020, 1:32pm

You CAN do bulk operations in Gramps via import. That you can leverage features for bulk operations just isn’t terribly intuitive. (For those of us conditioned the old days of punch cards, leveraging features to re-use data sources is 2nd nature. You just have to be aware of the law of unintended consequences.)

For instance, outputting a client’s phone & eMail address books as vCards for import is an interesting ways to jumpstart creating their personal Tree.

As @SNoiraud, pointed out, a list of Sources (without relationship to events, Persons, etc cetera) is of limited utility. Personally, I think you’d be better off importing the directory into a document management system and first organize it there. You could then output a CSV sample from both the Document Management System & an example 1 person sourced Tree in Gramps (to facilitate reconfiguring the data) and import that into a blank Test Gramps Tree.

Several of the Gramplets also do mini bulk updates.

emyoulation · October 23, 2020, 1:48pm

If I was to approach a problem like this, I think I’d want a Repository created for Sources that ARE the Media objects of appropriate types (text, PDFs, multi-page scans) & generate a citation to that source. (Again test the import into a blank Tree!)

(Personally, I like to create a Person for the authors of Genealogical reference book and link the book as a Source to a dated Occupation Event with description ‘author, published’. If the reference is used extensively, it is enlightening to determine how the author is related to the book’s subject & link them into my Tree.)

At that point, you’d have created a library of Citations that can be drag’n’dropped onto Events.

chorlito · October 23, 2020, 3:34pm

Thank you, emyoulation. Your response is very helpful. I have experimented a little and can work with that.

chorlito · October 23, 2020, 3:37pm

Thank you for taking the time to respond, DaveSch.

emyoulation · October 23, 2020, 3:39pm

Umm… The wrong message is marked as the solution. Instead of the message that has details about which approach will be used, you marked the message that acknowledges that message.

StoltHD · October 23, 2020, 3:55pm

I fully see the reason for bulk import both sources and citations, place names, media, Event data and unlinked people data…

But I do not work in a lineage-linked way, so for me its the information I start with, the Documents, the Facts, and then I start link things together, one relation or one connection at the time…

so if I have 400 documents mention 800 Events, and 2000 people, I will start by connecting the documents to the Events with Citations, I would connect the Events to the Places, and the people to the Events with the roles they was given in the document, I still struggle a little when a person has more than one role in a relation with an Event though…
And this would be much easier if all the data already was imported to Gramps from the CSV, JSON, and XML file I find them in.

This would be easy to achieve, if the CSV import/Export also hold the Sources and Citations.
Same with the “Import Text” gramplet,

Sadly there are to few developers and users that think that better handling including import/export of other subjects than people are important.

A full csv database export/import would have given feature like this without any problems.

If you can a little SQL, You can use the SQLite import,
or if you can write some vba, you can export to the Gramps JSON file (it might be a little easier than creating a xml file from Excel or LibreOffice in the right format.

emyoulation · October 23, 2020, 4:04pm

This discussion thread on importing directories of PDFs into Zotero is interesting.

They were talking about throwing together a plug-in in Feb 2020. It would be nice to see if that worked out.

StoltHD · October 23, 2020, 4:16pm

its here:

StoltHD · October 23, 2020, 4:25pm

He also have a python script that creates a RDF file that can be imported to Zotero.

Maybe that script could be changed a little to create a Gramps XML instead…?
If I just knew a Python Developer that was willing to do something like that…

PLegoux · October 24, 2020, 6:19pm

I’ve tried it, it returns RDF formated directory of files with specified type:

<?xml version="1.0" ?>
<rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:link="http://purl.org/rss/1.0/modules/link/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:z="http://www.zotero.org/namespaces/export#">
	<z:Attachment rdf:about="#14520574_1485069028186229_4687748316610910615_n.jpg">
		<z:itemType>attachment</z:itemType>
		<rdf:resource rdf:resource="14520574_1485069028186229_4687748316610910615_n.jpg"/>
		<dc:title>14520574_1485069028186229_4687748316610910615_n.jpg</dc:title>
		<link:type>14520574_1485069028186229_4687748316610910615_n.jpg</link:type>
	</z:Attachment>

If you do the same thing to obtain a gramps xml file you’ll only get the media structure with the path to objects the script found, no source or citation, it couldn’t define them for you.
Only interesting thing it does it’s looking recursively into folders. Is it that you are interested for?

StoltHD · October 24, 2020, 6:33pm

I just suggested that it might be possible to do something like that with Python… there are python libraries that reads metadata from PDF’s, there are Python libraries that reads CSV or spreadsheets…

To create a Gramps XML where all your media files are created into an importable format would be of help for people with large archives of images and documents,instead of having to import each and everyone one at a time.
It could also be possible to create sources by using the File Names, or a metadata Title or Description…
I don’t know what people actually want…

Personally I want a complete export/import to CSV and to atleast one common network graph format like i.e. graphml, and reconfigured the JSON export to be a JSON-LD export/import…

With those three changes, Gramps would be able to interchange data with 99,98% of ALL research tools used, including Tropy, Zotero, Gephi, Omeka, Archers, Heurist, Constellation, Tulip, Pajek, Cytoscape, Social Network Analyzer, InfraNodus, HeadStart, Palladio, Mendeley, JabRef, Openrefine… SAGA, QGIS and a bunch of other tools I don’t even know the name for…

chorlito · October 25, 2020, 1:54pm

emyoulation - thank you for the pointer on where to put the ‘solved’ tick.

StoltHD - the features you mention would, from my point of view, be great additions to Gramps.

This thread has been very useful to me. Thank you all.

StoltHD · October 25, 2020, 2:29pm

The most helpful format for a bulk import, regardless of object type, would be to fulfill the CSV export /import to include Repositories, Sources, Citations and Media…

With that, any user could create a compatible list in any spreadsheet or table tool and import that list to Gramps, If no connections was to be find in the CSV, Gramps should just import what ever in the lists as items/objects, and if the CSV was wrongly formatted, just refuse to import it.

There should have been a choice of “Overwrite or Add” to existing objects where ID is the same, that way it would also be possible to extend a list with more information in i.e. Openrefine or any other spreadsheet/list/table tool later, and get that information in to an existing database. without having to “ruin” earlier work in Gramps, or having to manually merge thousands of entries…

system · November 24, 2020, 2:29pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Bulk import of media, documents, etc Help media-records , drag-n-drop , media-verify	8	1735	February 21, 2021
Importing multiple citations Help	11	491	August 19, 2022
Bulk adding citations Help	2	525	January 17, 2022
Merging data from two databases Help	8	1585	February 27, 2022
Transfer just a few people from one tree to another? Help	3	1131	December 11, 2020

Import sources in bulk

Related topics