Sources and Citations: some thoughts

The Gedcom standard handles simple citations quite well.

For published material, the source record contains fields for author, title, abbreviation and publication information. The source-citation allows any key-value pairs to be stored in the PAGE tag. This makes citations quite flexible.

For unpublished material, such as documents and artifacts, we can store a catalogue reference (call number) in a structure that points to a repository record. The repository is the physical location, such as an archive, where the source can be found.

Where it doesn’t work quite as well is for layered citations. There is a good article called Layered Citations Work Like Layered Clothing on the Evidence Explained website. I also found a video if anyone wants to look into this further. The same issue applies with applications such as Zotero.

In Gramps, we introduced an extra layer to handle large sources. This was a fixed two layer Citation-Source solution. It doesn’t help with three layer (or more) citations, and doesn’t provide a neat way of recording single layer citations.

A solution to this would be to allow multiple layers of sources. A citation would become a source layer. Each layer should be able to store key-value pairs to use in a formatted citation.

Such a solution would obviously need a different way of producing formatted citations. At the moment, we only use the Gedcom fields. Creating citation formatter plugins would seem like the way to go here. We could write a default formatter for Gedcom fields, another that uses templates, and perhaps another that generates Chicago Manual of Style compatible citations. I have already experimented with this and it works quite well.

5 Likes

I really like the idea of a citation formatting service with plugins for different styles.

5 Likes

I like that idea, in particular the template one. My sources and my citations are standardized (my own standard :wink: ) and a template I could use a way it recognize fields to dispatch them like I want could be very cool.

Something like some reports where we could choose fields with %something (%a, %A, 
) or using regex.

For example:

That citation of a civil register (the 2nd source above the citation) if it were transformed into a 3rd layer source, could look like the first source line, which is that of a death certificate among the civil registers:

image

Using something like a regex and a %field like this:

  • from source title: ^(.*, )
  • from citation date: %d
  • from citation page: ^\d{1,4}\. Acte de (.*)\. (.*)$

which could be reformated with a formula like this one: S:($1), C:($1). C:%d; C:($2); C:($3)

would make it possible to obtain this 3rd layer source title, similar to the blue line on the picture:
FR-93053, Noisy-le-Sec. Etat civil, Acte. 1917-06-08; décÚs, n° 230; Legoux, Pierre Paul Eléonore

2 Likes

Interesting idea.

Would the connection from source to source be one-to-many (basically having a Source with parent_source_handle) like with citations now or many-to-many (basically having a source_ref_list) like for places?

I would prefer the first option as it’s much simpler.

Do we still need repositories then? Aren’t they just top-level sources after all?

It’s been a while since ESM asked her publisher to give me a free digital copy of her book, and since I lost it in the dungeons of Adobe’s software libraries years ago, I can’t read it anymore, but I know the article, because I’ve seen it before.

I also think that I didn’t fully migrate to Gramps until the large sources change was ready, because I was already dealing with large sources in PAF, and this change made Gramps powerful enough to conver to. I mean, this is what GEDCOM 5.5.1 advised long ago, and Gramps added the feature that GEDCOM SOURCE_CITATIONS have their own ID, and can be shared, which is still quite unique.

Anyway, if you ignore the layered citation idea for a while, or allow for one citation object to reference another, as a source, or a type of nesting, when you really want to, like in DeadEnds, and GEDCOM X, and read a few QuickCheck models, you will soon see that all citations can be stored in a single table with less than a dozen columns. The maximum number of columns that I counted today’s sample was 11. And in fact, that number 11 came from an example where the 11th column addressed a layered citation in a single field.

This also means that all fields that you need to generate a full reference note can be stored in a single reference object, which holds the attributes that GEDCOM followers tend to store in the citation, source, and repository objects, which means that we really don’t need those. They’re nice for grouping purposes, but this grouping can be automated, just like Grams does this for person names.

This is what scientists do, when they use a reference manager like Zotero, which seems to use a flat table with attachments, or something like that. It is also what FamilySearch does, as far as I can check, because the whole API that they offer is based on GEDCOM X, a standard that all client programs need to use, and which has no source, citation, and repository layers. Instead, they have SourceDescription objects, that can use SourceReference objects for layers and hierarchies. And they have a SourceCitation object that was defined as a container for meta data.

And alas, the contents of that container were never expanded to include the citation elements that we find in CSL and similar standards, and Zotero.

One-to-many with the Source object having a parent_source_handle.

We would probably keep repositories for Gedcom compatibility. However, we could allow storing repository references as variables in the top-level source as an alternative.

There is also the problem that Gedcom allows a source to point to more than one repository.

1 Like

And this a useful feature.

I have family history books that are in my personal collection, the Family association’s collection, the local public library & the US Library of Congress. Being able to easily find call numbers for each or determine which is the nearest repository with all the required books is great.

3 Likes

Yes, it would be possible to store all the reference fields in a single Source object. The user could choose to use CSL variables and we could provide a CSL formatting plugin. An import from Zotero may also be possible.

Other users may prefer a two level solution as we have at present. More levels would also be possible.

I also use multiple repository entries for a multi volume source. One for each PDF download.

3 Likes

Everyone approaches cataloging multi-volume assets differently.

The Cooperative Computer Services (CCS) (serving a consortium of 28 libraries) wiki notes Multivolume call numbers raise several issues.
The Library of Congress site mentions other points.

1 Like

The major consideration was how the NarrWeb handled multi volume sources.

In the example I attached, the four-volume source has continual page numbering through the four volumes. And so far, four volumes have been the largest set. So those represent a single source.

There is another source that publishes a volume once a year and has been doing so starting in 1847 to the present day. My intention was to have a single source with the citations Vol:## Page:## etc
 I attached the first hundred years of volumes as PDFs to the source.

The problem was when I sent my first NarrWeb to a cousin. At the time, only a few of the volumes were referenced for my cousin’s tree. But because all 100 volumes were attached to the source, they were all sent to the cousin. I soon broke each volume into their own source and moved and reworked the citations to their correct source.

Unless there is a strong need to keep multi volume sources as a single source, my preference is to create a single source by volume.

When I have a source where I have the book as a PDF, I write the citation as Page:## (pdf:##) The page number of the book (the source) and the PDF page number. But be careful. I occasionally will find a better PDF copy of the book so I will make the switch. When this happens, I have to confirm PDF page numbers of the citations still match up to the new PDF.

2 Likes

With a three level source you could have a periodical as the top level, volume as the second level and a page citation at the third level.

3 Likes

But in current Gramps, that doesn’t work well, because the repository is ignored in most reports. For that reason, I always put the name of the periodical, or a site name like Wikipedia, in the source, and volume, page, and/or article title in the page field of the citation.

I only add a repository, when the title is not well known, and/or impossible to find without that.

And now that I think of it, I would love to have a separate (and official) URL in the citation. I know that I can put that in an attribute, but in that it’s not clickable, so most of the time, I put it in the citation note, and make sute that notes are visible in the bottom pane.

And for some odd reason, the WWW tag, which is available for addresses, and repositories, does not exist for sources and citations, so exporting that would be against GEDCOM rules.

I tried putting a citation URL in an attribute but found that they are not exported in a gedcom. The only way I could get it to be included was to tack it onto the end of the volume/page field. I agree with you, there should be a better way to handle the citation URL. Putting it into a note does not work well for uploads to WikiTree.

2 Likes

Perhaps there could be URL variation of the Source/Citation combination that passed a “unique identifier” parameter from the Vol/Page field into the Source string.

Say that one of your sources was a wiki page on the Gramps website:
https://gramps-project.org/wiki/index.php/Addon:DataEntryGramplet#Usage

The source would have a URL source: “https://gramps-project.org/wiki/index.php/<cit:vol_page>”
and the UID Citation would have a Vol/Page of “Addon:DataEntryGramplet#Usage”

This would let us handle dynamic content linkrot by changing a single URL.

So example source/citation combination might be:

  • WikiTree :
    “https://www.wikitree.com/wiki/<cit:vol_page>” ;
    “Washington-11”
  • FamilySearch (sources) :
    “https://www.familysearch.org/tree/person/sources/<cit:vol_page>” ;
    “KNDX-MKG”
  • FamilySearch (person) :
    “https://www.familysearch.org/tree/person/details/<cit:vol_page>” ;
    “KNDX-MKG”
  • FindAGrave :
    “https://www.findagrave.com/memorial/1075/<cit:vol_page>” ;
    “1075/george-washington”
  • Wikipedia (english)
    “https://en.wikipedia.org/wiki/<cit:vol_page>” ;
    “George_Washington”

Obviously, that parameter passing pattern is an abstraction. But whatever was adopted for the core could be used to make the WebConnection Add-on.

Maybe it could follow the Report template RegEx parsing as described by @PLegoux above?

1 Like

Indeed, just the other day I implemented editing/adding URLs in Gramps Web and was really puzzled realizing repositorys have a urls attribute while sources and citations don’t. I often have a website as repository (say, Wikipedia) and the specific page as a source or citation.

The ULR I’m recording is a specific URL to the document on the site.
For example; you click on the “shared” icon in Ancestry and “copy link”. This gives you a long string which includes a token.
It should be clickable in gramps to take you to the document.
Ancestry has just made some changes to the URL format.
What is implemented has to be generic so it works with all sites, because they all may be different.

Right, that’s very close to what can work for many links, except for two of the items in your example:

  • For persons, FamilySearch ID’s are already stored in attributes, so you don’t need a citation for those. The attribute is filled automatically when you import a GEDCOM file from Ancestral Quest, Legacy, RootsMagic, and other known FamilySearch clients, including the getmyancestors program,
  • People may use Wikipedia in different languages, and forget to add the proper qualifiers, or the underscores. I know that I forget both.

Also, for FamilySearch sources, those ID’s don’t look that good in a report, so for those I also think that the ID’s can be better stored in an attribute.

I will try again, supporting the CSL Citation Style Language will do ALL of what’s need in Gramps.

CSL support custom templates, so someone could create a CSL Gramps Genealogy Template and submit it to the CSL team.
CSL even support translations of the template as separate translation files.

If using CSL, you can easily use a single text field for a simple or advanced Citation string for those who want to use that (as in Gramps today ) and you can build a fully key based citation and bibliography based on “predefined” or custom “citation attributes”.
e.g.,

(In Gramps attributes) vs. (CSL)
Cit_Page_Numbers = Number of pages
Cit_Author = Author
Cit_Author_2 = Author 2

etc. etc.

AND NO, CSL do not have the limitations that some people try to preach., the different software, e.g., Zotero, Citava, JabRef etc. can have limited fields, but CSL do not, it is totally up to the template builder to define the fields/attributes of the template and how the end product of the Citation and Bibliography should look.

So if someone wants to create templates for the EE definition, no problems (and I am sure the author will help out as best she can if she see that it is serious), it can be one template for each variant defined in the EE “framework”, or for those that want to, they can just use any of the more than 6000 Citation styles already in the system, e.g. Chicago or APA, just by importing that template.

Yes, there will be a lot of programming, but there are already multiple libraries in Python supporting CSL, so some reuse should be possible, hopefully?

If you then create an export/import of the Citations/Bibliography that confirm to CSL (it can be in a multitude of formats, either json or xml,. bibtex or biblatex or even csv) Gramps suddenly has an interchangeable format


And, if it reads and writes it to a file, it can even be interoperability friendly, using e.g., Zotero with the Better BibTeX addon or any other software that read a bibtex/csl-json on the fly, e.g. Obsidian/Zettlr/VSC etc.

But since there was especially one user in this forum that always attacked me when I wrote about this and preach that it was demanding and stupid and not possible, I ended up deleting anything I wrote about this


And I bet the same “profile” will start all over again now, it usually do!

3 Likes

One note: I’ve taken part in and have seen conversations with Elizabeth Shown Mills, and she does not endorse or license any kind of software. The templates that we used in an earlier attempt to make this work, meaning the templates made by John Yates, were sort of ‘tolerated’ I think, and the same goes for the much larger amount of templates that are available in RootsMagic.

And also, and I repeat that, CLS variables have English names, so storing those as attributes is not very friendly for users that work with other languages. A Dutch user would need a string like “Aantal pagina’s”, or “2e auteur”, so every attribute that we except must be processed by a translator.

So that’s two notes in the end. :slight_smile: