Sources and Citations: some thoughts

StoltHD · November 1, 2022, 11:19pm

You are not alone wanting to do that…

Thats one of the reasons I use Zotero and Obsidian/Foam for VSC for all my research now.

ennoborg · November 2, 2022, 12:15am

@GeorgeWilmes can you define external? Is that another program, like Zotero, on your PC, with its own database, filled by you collecting sources? Or is it something outside your home, like on the web?

GeorgeWilmes · November 2, 2022, 2:18am

I don’t use Zotero but I think I understand what it does. If Gramps could do a little bit of that and just pull the source details directly from certain places, then I wouldn’t need to use another program. For example, currently there is the “GetGOV” addon which will populate place data from the online GOV gazetteer. Could there be similar functionality that pulls data from WorldCat or the FamilySearch Catalog or others (if they had a suitable API or their pages could be easily scraped or whatever)? For example, I would point to a particular volume of a particular book that I am citing, and it would pull in those details plus the source and repository hierarchy above it (if they are not already in my Gramps database).

Maybe I’m just lazy, but I would prefer not to spend my time copying and pasting information that is already better organized elsewhere. (And I should admit that I don’t use the GetGOV addon. I have a minimal place hierarchy and would gladly do without storing any detailed place data if I could somehow link to an external source such as Wikidata, though I don’t know how that could work.)

ennoborg · November 2, 2022, 6:07pm

Well, I looked at that, and I didn’t see enough meta data to do that, and as far as I know, there is no public API to get citation variables from FamilySearch. I use RootsMagic to connect to that site, and I only get formatted citations from them. They must be using variables to create them, but they seem to be for internal use, just like the variables that our local archives make available to the Dutch portals. There is one portal that has these variables in their page source, but that’s about it.

This suggests that web scraping is the only thing that might work on the short term, or maybe just a sort of source import Gramplet, into which you can paste the relevant part of a page.

I’m just as lazy as you are, so I don’t like to use another program, even though I tested some.

P.S. When you install RootsMagic, you get a separate location database that lives outside your tree,

P.P.S.S. Can you give some examples of the sources that you use?

PLegoux · November 2, 2022, 8:54pm

I don’t know if there is or there is no FS API to get their citations but you could take a look at that code to see if it use them or not (it replicates some functionalities of Legacy 9):

From this post on Gramps Geneanet Forum [fr] (which include some screenshots):

Note: it works only on Linux platforms; it needs asynchio Python library unavailable on Windows version of Gramps.

ennoborg · November 2, 2022, 10:26pm

I know what it is, because that interface is also used by RootsMagic, and the only things that you get are a title, a formatted citation, and a URL, which are written to TITL, AUTH, and PUBL, meaning that everything goes into the source object, and the citations fields are always empty.

You can also see this in the GEDCOMX specification, which only mentions the bibliographic citation, and no variables, for film, page, item of interest, or anything. I know that, because I just checked again, and the discussion to define variables stopped years ago.

You can see what I mean by checking a source that I just added to ‘my’ part on the tree:

When you look at that page, and change language using the globe symbol at the top of the page, you will see that the title changes, and the labels before the source data, but the long reference note stays the same. I checked that for French, English, German, and Italian, and no matter which of these languages you choose, the collection title will always be Dutch, and the other text mostly English, with the archive name in Dutch at the end. You can also see that the collection information right above the long citation text does change with your language. Only when you choose Dutch, it will be the same as the 1st element of the citation.

GeorgeWilmes · November 3, 2022, 12:35am

Sure, here’s a typical example:

You can see the Repository is FamilySearch and the Call Number is 1,305,530 (the number of the microfilm roll).

I also attach a Repository Reference Note containing a link to the Family Search Catalog entry:

If you go there, you’ll see that it’s part of one of their “collections” (“Ontario, Roman Catholic Church records …”) but I don’t store that information. I prefer to store information about the original book that was microfilmed. Notice that I use the Author and Publisher information as it appears in the Family search catalog entry. That’s the page that I would like to scrape data from. I have to be more careful about the Source Title, since a typical microfilm has several different sources on it.

A typical citation looks like this:

That shows exactly where to find the entry in the original book (if it still exists) and the attached note provides a link to the digital image (which I also have a copy of in the Media). If you go there, you’ll find yourself in the 1849 section of the book, and entry number 80 is the second one on the left.

I don’t use the citation strings that FamilySearch generates.

ennoborg · November 3, 2022, 1:36pm

OK, thanks. I see that you treat a single film/volume as a source, which is something that I don’t do. When I use the Amsterdam civil registry, or just the Amsterdam death register, I treat that as a source, and put film, volume, page, and record numbers in the citation.

And it looks like I need to rephrase my question a bit, because I really wanted to know which external sources you use, like physical archive repositories, and different web sites, because that helps us to get an idea of the different fields that they present, and how they could be stored in Gramps, and whether the sites have some form of meta data that we can use.

I know FamilySearch quite well, for Dutch sources, and I can compare how they’re presented with similar sources on Wie Was Wie, or openarch.nl, but I don’t use much from Ancestry, except for the 1950 US census, because it’s free, and I have some emigrant relatives in that.

Can you tell how you use Ancestry, or other American sites that I may not be aware of?

GeorgeWilmes · November 3, 2022, 2:29pm

No, actually I treat each “item” (original physical book) on the film as a separate source. (An ability to nest sources would be nice!) I realize my approach may be somewhat extreme compared to others. I based it on trial and error to make the endnotes in reports look the way I wanted them to (not wanting to attempt changing the code myself). Depending on how Sources and Citations evolve, I’ll certainly consider changing the way I do things.

I use Ancestry only when they have something I can’t get on FamilySearch or other places. (I don’t have a personal subscription to Ancestry; I go to the library when I need to use it.)

In case it helps, here are some of the North American repositories that I use, that might also be used by others:

Name	Home URL
Allen County Public Library	https://acpl.lib.in.us/
America’s GenealogyBank	https://infoweb.newsbank.com/gbnl/
Canadiana Online	canadiana.ca
Chronicling America	https://chroniclingamerica.loc.gov/
Community History Archive	https://directory.advantage-preservation.com/SiteDirectory
FOLD3	https://www.fold3.com
Heritage Hub	User account
HeritageQuest	http://www.heritagequestonline.com
Library and Archives Canada	https://www.bac-lac.gc.ca/eng/pages/home.aspx
Library of Congress	loc.gov
National Archives and Records Administration	http://archives.gov
National Cemetery Administration, U.S. Department of Veterans Affairs	cem.va.gov
NewsBank	newsbank.com
Newspaper Archive	http://newspaperarchive.com
Newspapers.com	newspapers.com
Old Fulton NY Post Cards [and Newspapers]	Old Fulton New York Post Cards
Our Ontario	ourontario.ca
The Newberry	http://www.newberry.org
U.S. Department of the Interior, Bureau of Land Management, General Land Office Records	http://glorecords.blm.gov
United States Patent and Trademark Office	uspto.gov

ennoborg · November 3, 2022, 4:40pm

OK, thanks for that list. I hope it helps me and others to get an idea of the types of elements that exist in the wild.

I have a free account on Ancestry, and just attached one 1950 census hint to a cousin, which creates a citation like this:

And I assume that these fields will all show in the GEDCOM file when I download that from the site. And if that’s the case, I will probably edit the citation part, to remove everything before Record Group. And I hope that will look good in the endnotes.

Another nice thing that Ancestry does is that it shows the facts that the citation is linked to, and these facts can also be used to support the discussions on working with evidence. See below:

ennoborg · November 3, 2022, 9:15pm

And I hope that this discussion helps the evolution a bit, especially because it became quite difficult without examples, and for me, supporting hundreds of templates, like we had in an earlier enhancment proposal, inspired by the templates made by John Yates, would still be sort of a show stopper, because such an amount would be quite difficult to translate, and you’d need a wizard to pick the right one.

So, here’s another question: How many levels of nesting would you need, when you think of the sources that you see in the wild? I ask that, because in the sources that I work with, a total of 4 levels would be enough, and that can also be realized with a flat model, if you want.

I write this, because the only extra level that I really want is a collection of books. And that need is inspired by the way the Dutch archives work, and the similarities that I see with archives in England and Germany.

I need that level, because here, in most archives, all church books have been removed from their original fonds, and have been rearranged in a collection of DTB books, where DTB means Doop (Baptims), Trouw (Marriage), and Begraven (Burial). That means that every archive repository has an archive collection (inventory) where the DTB books are numbered 1 to something, even when they are actually grouped by event type, and church, meaning that there is a hierarchy that goes further than that. That hierarchy is sort of flattened, because each book has a unique number, no matter where it is in that hierachy. It has a path too, for those that are interested, but most old fashioned Dutch genealogists just record the DTB number.

This system works, because the DTB collection itself has a call number that is unique for the repository. So for example, in Amsterdam, the DTB books have call number 5001, and book number 137 has the baptisms of the English Presbyterian Church between 1607 and 1811.

For this book, the official path is

https://archief.amsterdam/archief/5001/137

but when you click that, it redirects to

https://archief.amsterdam/inventarissen/details/5001/path/1.1.12

and I pasted this as code to prevent Discourse from following that link. It shows that the actual book sits 3 levels deep in the 5001 inventory, but that it can be found by its unique ID.

In a similar fashion, documents made by the Amstel Beer Brewery, where my grandfather once worked, can be found 5 levels deep inside the Heineken inventory, where they also have a unique ID, so that they can be found using only two numbers.

For me this means that a flat model that has distinct variables for the top and bottom items, i.e. for the collection (inventory) and the actual document, where I can store name and number for both would solve a lot of problems already.

The only exception to that is when such a book is indexed into a database, and you want to cite both the indexed item and its origin, in a layered citation.

Nick-Hall · November 3, 2022, 10:05pm

A catalogue hierarchy is different from the citation layers that I described earlier.

If you went to an archive and cited an actual document you had seen, then your citation would consist of a single layer, but the document may be several layers deep in a catalogue hierarchy. If you viewed a microfilm then this would add an extra citation layer, and an online database of the microfilm would add yet another layer.

We could allow users to construct catalogue hierarchies within Gramps. This could be useful to organise sources. If an external archive had an online catalogue which conformed to the ISAD(G) standard, we may even be able to import catalogues or parts of them of interest to a user.

GeorgeWilmes · November 3, 2022, 11:09pm

I consider the Record Group number to be repository reference information, not source information. It’s where you can find the source within the repository’s catalog (the National Archives).

I currently keep a separate source for each Enumeration District (within each county, within each state – which is why multiple levels of sources would be nice) because each one was originally a separate document created by people who went door-to-door within a district. (More recent censuses, in which residents mailed in forms, or now submit something online, are a different matter, but I probably won’t be around to deal with them when they are released).

I’ll have to think about that. In my census example, the ideal source hierarchy would partly mirror my place hierarchy. I don’t currently create a place for each enumeration district (the boundaries can change for each census) but county and state are more stable as places.

Yes, but in the case of the FamilySearch “collection” of Ontario Roman Catholic church records in the example above, the original records are still at the individual parishes (at least, that’s what it says in the catalog entry for my example). In other cases they might be in diocesan archives, but my point here is that a FamilySearch “collection” may be only a logical collection of their creation, not something that actually exists anywhere.

As a lazy user, I would much prefer to import than to create, or better yet to somehow link to it.

ennoborg · November 4, 2022, 10:50am

Well, I can give you an answer to your last sentence, because I visited the Family History Library in Salt Lake City in 2007, where I saw the collection in the form of a large number of drawers with films. This means that it actually exists, in several places, because the orginals are stored in the Granite Mountain Vault in Little Cottonwood Canyon, east of the city.

I went there, because access to these films is restricted in Europe, and I preferred to look at it myself over depending on a German archivist to look at it for me. And here, I’m referring to the 1st film listed here:

That was 2007, but even today, I can’t see any of these images on-line, but maybe you can.

GeorgeWilmes · November 4, 2022, 1:11pm

No, I don’t mean the original microfilms, I mean the original paper records. In my example, the catalog entry says “original records in possession of the Purification of the Blessed Virgin Mary Rectory, Lindsay, Ontario.”. I like to know where those are, in case some portions didn’t get microfilmed.

Yes, that catalog entry says “Access in Germany limited to members of The Church of Jesus Christ of Latter-day Saints. No circulation to all other family history centers in Europe including the United Kingdom, Norway, Sweden, Finland and Iceland.”. And I cannot access it from home here in the US, though I might be able to from an affiliate library.

Also note that the catalog entry says that some of those records are not the originals either: “Early records reconstructed after destruction in 1765.”

ennoborg · November 4, 2022, 1:45pm

I know that, and I looked at them, but I didn’t find my ancestor in it, because I was unable to read that old German script, so in that respect, my trip to Utah didn’t help much, although I enjoyed the scenery.

And later on, a friend, who is a pastor in the area, helped me get in touch with the archive, so now I have a color scan of a page in that book that has my ancestor on it. And when I saw that, I realized that I had seen that before, because another friend in San Diego sent me a copy of that page, but she misread it, and I did that too, which is sad, because it did have what my father was looking for, and I got that when he was still alive, and passed it on to him. And we were both not aware of its value at that time.

Today, I can see the scans of the real book on Archion.de, when I pay, so I have those too. And I got some scans from the film too, because they were sent to me by a friend in New York.

To summarize this, there are at least 4 ways to cite this source:

As a page on that film, which I can cite by its number and its repository,
As a page in the book, as it’s held by the Lutheran archive in Bielefeld,
As a scan found on Archion.de, with the path needed to find it on that site,
As a document that I got mailed by the chief archivist, with explanation.

And in this case, there are several choices for layered citations too, when I want, but I can also use a template which has the layers built-in. Examples of those can be found here:

https://www.evidenceexplained.com/index.php/content/sample-quickcheck-models

And to me this means that there are still a lot of choices available, and the real question is how much support we want for these in Gramps, and how much of that can be built in a reasonable time, and can be created in such a way that it appeals to us, and fellow users.

And that’s the whole point.

ennoborg · November 4, 2022, 7:58pm

When I think about this, and layers too, I think that it’s a good idea, because it’s more flexible than our current model, and because it follows the structures of the archives that I use, and the model behing GEDCOMX.

And if possible, I would also like to combine this with the idea that authors may be people, or other types of entities, like churches, or government offices, which can make use of our locations table, just like persons do.

There is one possible issue here, which is that in GEDCOMX, there are no definitions for citation elements, yet, so that we must either define our own, or re-use the ones that already exist in GEDCOM 5.5.1 or 7, or adopt CSL variables where appropriate.

Do you know any archive that has such a catalogue? I Googled this, and it seems that some Dutch archives have an API, but what they actually present is a local standard, which was one promoted by my fellow countryman Bob Coret. That’s the A2A standard that you can see in the page source of his Open Archives site (openarch.nl), but that standard does not seem to support hierarchies or layers. Also, a large part of the documentation has disappeared from the web.

And a new Google search today reveals that this so-called A2A is nothing like the internation A2A archive to archive standard, for which I can find lots of documents, and which does support hierarchies.

Nick-Hall · November 4, 2022, 9:30pm

The Discovery online catalogue from The National Archives in the UK is ISAD(G) compliant. I think the results are returned in EAD XML.

See also the Adoption section in the ISAD(G) wiki page.

I was just pointing out that if an archive catalogue followed such a standard then it would be fairly easy to import entries. There are likely to be usage limits, so I’m not sure how it would work on a practical level.

ennoborg · November 6, 2022, 3:32pm

OK, I understand, and today I found a message on a local genealogy software group that said that a fellow Dutchman created a plug-in for TNG that reads the Dutch Open Arcives API:

https://tng.lythgoes.net/wiki/index.php/OpenArch

Like I wrote earlier, I don’t think that this API is ISAD(G) compliant, but it does show what can be done.

Nick-Hall · November 6, 2022, 7:47pm

I’ve updated my pull request.

The plugins can now define citation variables which can be entered using the citation editor. I’ve created a simple example plugin called “Template” which defines the variables page, url and accessed. A formatted citation of the form <page> (<url> accessed <accessed>) is generated. It is also possible to open the url in a web browser from within the editor.

At the moment I store the variables as a comma separated list of key: value pairs. This conforms to the Gedcom standard, but if we go down this route we will need to think about how to handle variables that contain commas and colons.

Topic		Replies	Views
Source vs Repository Genealogy	29	1213	July 3, 2025
How are we doing sources? Help	19	405	April 13, 2025
Parts of sources Genealogy	11	1740	December 21, 2020
Distinguishing between archives, sources, and third-party publishers Help	28	1064	January 26, 2021
Update Sources and citations Ideas forms	94	4194	December 8, 2023

Sources and Citations: some thoughts

Related topics