Distinguishing between archives, sources, and third-party publishers

One of my difficulties with Gramps (version 5) is organizing sources and repositories. I have materials coming from the same records, digitized and published by different parties. Each of those parties digitizes records from different, unrelated archives as well. How do I express this in Gramps, making sure that each party is attributed correctly?

Say a Local Administration has an archive full of births, deaths, and marriages of that area, covering 200 or so years. LDS came in in 1954 and digitized the first 100 years, and in the 2000s a local nonprofit digitized the rest. Both parties publish their scans themselves, under their own terms and conditions.

As far as I understand, the Local Administration’s archives are the repository. Their “Records of Births” are the source, which I reference from multiple citations. But how would I attribute these parties that digitized the data in such a way that they’re separate entities in Gramps that I can reference from any other media item they published, even if the originals all come from the same or different archives?

My end goal is to be able to filter all media programmatically, easily selecting only those published by FamilySearch, or only those published by Local Administration X, etc.

Using the example you gave, instead of having a single source “Records of Births”, you could have two sources "Records of Births " and "Records of Births ", for the two different periods of time.

Even when there is just a single microfilm reel, I create a separate source for each book whose pages were filmed.


Thanks! I like it because it’s simple, and definitely helps me attribute everything correctly and allow me to perform some form of access control when publishing (I’m using Betty). I’d have to find a way to link the related sources together, so the visitors of my site can browse them as one.

I don’t know must about Betty (just saw the recent posts), but you could give the sources the same Author, whether that is the name of a church parish, county courthouse, or whatever jurisdiction was responsible for producing the original records.

I might be thinking about this completely the wrong way. It looks like media items can have their own citations. I can use that to attribute the digitized documents, in addition to using other citations to support the facts in the family tree. Then each party that provides these documents can be represented by a repository, and I can then attach behavior to documents from specific repositories. The documents themselves can still be attached to the citations that support facts.

I’ll give this a try tomorrow and report back.

One thing you need to think about is which reports (if any) you may want to use. Many reports do not show images attached to citations. So you may want to do a test to see what is displayed with the various reports.

I use the Narrative Web to send to my cousins requesting their branch of the family. I do not post it to the web (living people not privatized) but send it on a thumb drive. The web site generated is in a form they are familiar with and it will display all citations and images on all records.

I don’t use any of Gramps’ own reports. That’s what I use Betty for, and it shows all of this stuff. It’s more of a Wikipedia-like site than anything else. But because it can show everything, I need to organize my sources and repos in such a way that I can easily tell it to exclude certain repositories’ documents from being published if that repo does not allow it through their terms and conditions, for example.

Here an example of Belgium registers on FamilySearch:

  1. Repository is Belgium archives
  2. If a digitalized microfilm contains more than one register (or register parts) Item number is there to distinguish them*
  3. In reference notes you could add any information you need to know about the register or the site where you’ve found it

* You could have more than one repository reference associated to one source if your register is part of more than one microfilm, here Brussels 1882 births Register is splitted on two microfilms:

With that kind of references to repositories you can filter them using that sources filter (sorry, I don’t know its english name - search sentence means “Online, FamilySearch.org”):

Yes you could do that but you could simply share image medias with their source record too to link them together.

You know that you can have multiple repositories for each source/document, right?

So for each source you have you can add all the places you find that document as a repository with different types.

So you can add you local administative archive as archive, Familysearch and any other web site repository as Electronic or Digital Copy Archive or what you find best as a type for the different places you find a copy of that document/source.
And you can also add different “Link Notes” for for each of the versions of the Source.

When I have multiple locations for a source I always use the Original Archive as “primary” source when citing, but creates “Citation Notes” for any extra place I find the document. But in the Citation itself, I use the Open Data free access place, if I can’t download a copy locally.

So if the Local Administration Archive Charge for copies, but FS have a free copy, I use local archive “name and number” for the source, use familysearch as citation (or any other free web page), I never ever use a private “behind a paywall” service as source unless I also can download/get a copy of the document to my local storage.

So if your Local administrative archive has birth records as “Birth Records, Charlston County, 1881-1882”, “Birth Records, Charlston County, 1883-1884”, each of them become a source, then I create citations for the pages I find information on for each Event/piece of information/object of interest.

If I also have found that information on FS, I just add the full familysearch source as Link Note for the Source, and add Familysearch as a repository. and I add the full Familysearch Citation string as a Citation Note to the Citation for that page/document, and if the administrative archive have paid services only for copies, I will change my citation string in the volume/page field of the Citation to point at the familysearch source, and add the citation to the local administrative archive as Citation Note, if not already done.

1 Like

Note that FS is generaly not a repository (unless for their own published works, books often). Repositories for their records are indicated in front of their catalogs:

so you have just to reproduce them in gramps (I’ve selected the main one):

It can be found in the second images tab too, in the citation quote they offer:

“Belgique, Brabant, registres d’état civil, 1582-1914,” database with images, FamilySearch (https://familysearch.org/ark:/61903/3:1:3QS7-89MN-D63B?cc=1482191&wc=STKW-SPD%3A966896201%2C967218901 : 22 May 2014), Brussel > Bijgevoegde akten 1878 Huwelijksbijlagen nr. 1-200 1879 > image 1 of 2950; België Nationaal Archief, Brussels (Belgium National Archives, Brussels).

Right, so it’s not a repository of the original information, or even the original documents. But one could argue it is a repository for the scans they provide, and to which they own the copyright. So by that thinking, if I want to attribute each little bit of data correctly, I’d do this:

  • For a birth, add a citation pointing to the Book of Births of Local Administration (source), which is contained by the Local Administration Archive (repository).
  • To that citation, I add the scan of the birth certificate from FamilySearch as a media file, which has a different citation pointing to the FamilySearch album (source), which is contained by the FamilySearch repository.
  • If Local Administration has digitized this birth certificate as well, then this media file could reuse the same citation as the birth event, as both the information and the scan come from the same source, and the same repository.

This is a bit verbose, but it’s also very explicit about where each bit of data (facts and media files) comes from. I can then toggle privacy per repository, allowing all citations and sources and repositories to be published, but preventing the media files from certain repositories from being included in the publication if that is what the repository’s terms and conditions demand.

I’m still adding and updating some of this data in Gramps, and will then update my site based on it to see how this ends up working out, and so I can share the results with you all :slight_smile:

1 Like

Familysearch is a repository for the digital copies they have, therefore I add them as a Digital Copy Archive as Repository Type.
And that’s also the reason I always add the original repository if I can find it.

That’s why the “Type” is there, to be able to set what type of repository you found the sources in!

Please don’t make a discussion out of something that’s already explained in the comment.

I was adding multiple image files to a common source. They were all PDF books from google books and published by the New England Historical Society. One volume was published each year starting in 1847. One source held all the PDF’s and citations started “Volume:## Page:##”. I thought all was good.

Then a cousin wanted her branch of the family and I created a Narrative Web for her. The problem was for a few citations referencing this source all of the PDF records transferred into her report instead of just the one PDF she would actually need. 76 PDF books.

I separated the source into individual Sources by Volume.

Not saying you need to do the same I am just putting this out there to think about. The individual image page scans I would put with the citations for that page.

1 Like

You can think and say what you want…

…and me too ! (I can also make use of exclamation marks !!)

Yes, I do that way too. I grouped all images I used of that source into the source record and in the citation media the one (or more) image(s) the citation is citing

Just want to add that while you can of course do what you want, standard academic practice is to consider a single book to be a single source. Citations generally go to the page number, maybe a note number, if there is one.

Whether you read it in person, on microfilm, in a scan, is of no import. That’s repository information.

Well, Even if you say something many times, you may not be right.
So maybe you should stop advocating heresy.

I like that idea, and might adopt a similar strategy. My practice so far has been to include a “link” type note in the repository reference for a source, which in the case of FamilySearch links to their catalog entry for the source, which includes information about other repositories. But I know that such links could someday break, or that the organization of their catalog could change.

So far, I haven’t published anything, so I haven’t attached citations to media. Rather, my media files (scans etc.) are simply attached to source citations. I do include a “link” type note in the citation as well, and again I realize those links might someday break.

You raise an import point being able to observe the repository’s terms and conditions. For example, FamilySearch’s terms and conditions say:

“You may not post content from this site on another website or on a computer network without our permission. You may not transmit or distribute content from this site to other sites.”

However, it’s possible that some other repository holding another copies of a given source may have different terms. (For example, if the other repository had made their own digitizations.)

The discussion here is typical of many I’ve read of the Citations, Sources & Repositories of Gramps.

It highlights that the implementation is a bit too close to the rudimentary GEDCOM specification and that users have to shoehorn information into a structure that doesn’t have a separate place for an ideal structure.

It is like the way that Place model has evolved.

Originally, it was a simple field with the Civil Divisions concatenated in an ambiguous CSV format inspired by the GEDCOM’s ADDRESS_LINE.

Over the next few versions it evolved incrementally: new place tree view was introduced in 2010 for Gramps version v3.2, an additional level was added in 3.3, then a flexible New hierarchical place structure was added to the 4.1 version. In 4.2, features were added to allow Alternative Names for places. A variety of Tools were slowly added to mirror data in standardized mapping systems. And a decade after a Place hierarchy was introduced, Gramps is revising the Place Model again and enhancing its GEDCOM interaction. Maybe there will be a direct external database reference ID that can be called by that database’s API… along with aliased metadata attributes that are internally indexed in Gramps.

Citation models are currently in the same rudimentary boat as the original Place model. There is an entire industry around organizing sources… library science. And another industry involved in validating citations. Right now, you have to shoehorn in the data that allow either of those industries to disambiguate enough data or parse it well enough to get query parameters for an API.

But to get where we WANT to be is a long trip. One that needs to be broken down into steps short enough that we don’t fall on our faces. And that won’t take a decade to traverse.

It could be helped by a working group to develop specification and plan a workable trip. Then negotiate with the developers to scale the trip plan back to something that no one is happy about but is realistic.

Here are the GEPS that ought to be considered in that process.
GEPS 15: Repository Research Support
GEPS 18: Evidence style sources
GEPS 23: Storing data from large sources
GEPS 24: Natural transcription of Records

1 Like

Yes, but that won’t solve my problem, because I must identify all media files coming from a specific party, and not accidentally include the citations about facts (without media files) in this selection, or I’ll prevent all of that from being published on my site, which defeats the purpose.

The disadvantage of notes is that they are harder to reliably process programmatically. If I can reliably assign all media files coming from a specific party to a resource, such as a repository, representing that party, I can easily traverse the graph of referenced resources and make only those private that are required to be private. With notes I’d have to parse each note and hope there are no typos (and gracefully handle any errors) to do the same.

I like reading how everyone tries to solve their own problems. Data management is hard! Please keep this thread friendly :slight_smile:

It took a bit longer than intended as I had to add support for this and I had a few other things to do today, but I’ve got an example of what I was talking about earlier and I’d like to show you. Let’s start with my ancestor Jan Jans de Boer. In the media gallery near the bottom you’ll find two certificates (marriage banns and national militia). These all come from the Burgerlijke Stand (civil registry) of the Gemeente Blankenham (municipality of Blankenham), but the scans were made by the GSU. The Burgerlijke Stand is fully public and the province of Overijssel it is part of has digitized lots, but not all of it, so in cases like this one we have to fall back to FamilySearch’s older microfilm scans. You’ll also see the marriage banns scan corresponds to reference #7 (at the time of writing, because these reference numbers are sequential and change as the list of citations used on the page changes).

The citation for that reference shows the facts extracted from it, and the media files associated with that citation. As you can see all of this references two public archives (Burgerlijke Stand), but none of it refers to FamilySearch or GSU because the information itself comes from the public source. The image attached to that citation, however, is listed to come from the GSU’s records. While under FamilySearch’s terms and conditions I should perhaps not be publishing this (am I not a nonprofit news forum exempt from FamilySearch’s restrictions?), now that I assigned the image to a different source/repository, I can relatively easily make the repository private, and write the code that privatizes all associated media files automatically, while keeping the citations about facts, referencing public sources, intact.