Adding media and citations, scanned sources, digital only sources. Discussion

I just have some thoughts about how I have been filling out my media, sources, citations in to my tree. (I may be overthinking) Sorry for the wall of text in advance.

I know how info is filled is personal preference, but I would like to hear opinions:

General questions:

  1. When you have a link to a digital source, is it better to add that as an attribute or as an link in a note?

  2. Currently, I consider one page in a book, or maybe multiple if they are connected, as one citation. But I came over some that had multiple citations per document page, more like one per information block (paragraph etc) or similar, anyone else that does that? Maybe an article in a magazine as the citation rather than the page?

  3. In addition to links, if a source is digital text information, I try to copy paste the info in to a note attached to a citation, while if its an image, I add it as a media to the citation. I could export digital info to PDF and add that to it instead of note, but other than it seeming more real, seems a little pointless?

  4. If you have a media file, or note, of parts of a source (no point downloading the whole thing) Would you add it to anything else than the citation(s) its related to? On the sources?

  5. I have marked source media type based on what the original source was, rather than what I looked at it on, so a scanned book is a book rather than electronic, what do you do?

  6. Census: I have not made specific census events, I have only added information from them to other vents, aka birth, resident event and extending the date etc. But I read about others that make them spesific events, seems like it might be a good idea. Do you put it as a shared event on all the people in it, including the family?

Little more spesific questions:

One thing is to Cite books that have a clear book/page distinction, but when the data is digital only, it can have more layers than that.
For example this one, that is Deaths 1951-2014 (Døde 1951-2014)(Digitalarkivet), is digital only, and sort of have 3 layers rather than a book that have two.

  1. I have put in the area of the source as a page. For example “1983-02-07” But there is a third layer that is the person, so i consider adding the name or permanent ID in there too?
    Permanent ID may fit more as an attribute.
    With first published as the date.

With things like that, or digital scanned physical books on the same site, I have been putting it is Repository not as that site, but the depository aka “Riksarkivet”, but have “Digitalarkivet” in publication info.

  1. I am now considering changing to moving “Digitalarkivet” as its own Repository, and just have two Repositories on the source even if I only used one of them. Good idea?

Another item to consider…

It is great to be able to quantify all the elements of a source. But when you transcribe dozens of pieces of information from 1 page cited from source, if would be save work to tag every new entry with a citation until the page is finished.

1 Like

Just a couple of comments.
I like putting the URL for the online resource, appended to the Page/Vol field. This field is part of the gedcom protocol, but attributes are not. I also copy the transcribed record into a note but don’t keep a media copy because it is redundant if you have the transcript and the URL.
I upload to WikiTree so this process works well with that site. Being that I put the URL in the Page field, the URL is kept with the citation, however if I put it into a note, it would end up as a stand alone note and not obvious it is part of the citation.
I create a Census event for each one I document but record little info. I use them to verify I have all the family members but more important, the location where they lived. Each Place has the Lat/Log included in the record. With each of these created as events you can then see where the family was with the use of the Geographical module. The Census event can be shared except for some that you cite the line number of the person, then they should be unique citations.


This sounds like a working solution, but it’s more like a hack. For example, personally, I often use other information for search, which is recorded in the Page/Vol field. And if I write a source url there, I will lose some other technical search capabilities. I write there a page number and a record number. This makes it easier for me to search to understand whether I already have a certain record in the database, or whether I should add it now.

I would suggest adding a native attribute called, for example: url or source url or public url, which would be concatenated to the Page/Vol field when report generation. This approach seems a bit cleaner. Although there are as many people as there are opinions. I’m just wondering if anyone support my idea for creating a native attribute for this purpose?


Case 1. if the description of the event began on one page and ended on another.
In this case, I add two different media to the same citation, and in each of them I highlight the relevant fragment. I also make manual transcription of these fragments and add them to the notes.
Also, somewhere on the forum I saw that researchers use special OCR programs (I think like Finereader) or maybe even something more specialized (I wonder if there is something for high-quality digitization, maybe even a paid API). For now, I do all this manually because I do not really believe in the satisfactory quality of digitizing old handwritten documents. Maybe I’m wrong. It will be necessary to somehow try to teach some program Old Slavic letters and see how it will work.

Case 2. if several fragments are on the same page and they are a logical continuation of each other.
In this case I add the same media and select on each of them its own fragment of the image.

Case 3. if several fragments are on the same page and they are NOT a logical continuation of each other.
In this case I create several separate citations with the same media but I select different image fragments there.


I make pdf sometimes from a note text. This is because I have a lot of text (more than 1-2 pages). Notes have glitches with a lot of text. I beleive this is a bug and I’ve written about these troubles. I think there should be setting to disable text’s calculations inside notes to avoid glitches. But currently it works so. Thats why I use converting to pdf sometimes. Anyway, on my own opinion saving data as text only is OK, media isn’t requires.

This question is partly related to repositories. And I saw heated discussions on this forum about how to organize it better. I would like to describe how I have been doing it recently and I am also interested in hearing alternative opinions.

About repositories:
All the documents that I am looking at are in various regional archives in the form of paper documents, and they can also be digitized. For each regional archive, I have a separate repository with the type Archive. I always add this repository to Source if I know it exists in the regional archive. But usually I do not go to regional archives, but use their already digitized media.
If these are media that are located on Familysearch, then I add a second repository called with the type of Website to the source.
If there are media that I ordered for money (or even for free) from the regional archive, then I add a second repository called Local Archive to the source, because I have it on a local PC. The type of this repository is Archive.

About sources:
In the sources, for convenience, in the field, I briefly list the places where this document-case is located. Example:

Regional archive..., local archive

Also, in the source attributes, I add an attribute called: Media provider, where I indicate where exactly I downloaded the images from.

This will help me later to keep private media, such as those I downloaded from familysearch. But if I bought them, I can share them. That is, this attribute can be used as a filter.
I also add the title pages (and last pages) of the document as media to the source. These title pages refer to all citations included in the source.
When I attribute a new repository to source, I set media type as Electronic for Local repository of Fiche for Familysearch repository. There are my the sot popular cases in use.

About citations:
All my citations are part of the sources.

About media:
If they are title pages, I add them to the source as media, but if they are pages that I citate, I only add them to the citation and not to the source.

I’m wondering if the concept I’m currently using will always work correctly for report generation. I checked it on NAVWEB reports and it suited me. But really, Gramps has many different accounts. And maybe some of them will not accept this approach to data organization.

1 Like

An example of how I work is the UK Census

I use Forms for Data Input
I download from either Familysearch, Ancestry or FindMyPast the images
for the census and also a .pdf of their transcription. These are all stored.
I then enter my own transcription in the Form which becomes attached to
the individuals contained within that household I do not include non
family members (ie servants, boarders etc) merely making a note that
there were others in the household.
The citation merely includes the Reference No for the Census, the Piece,
Folio and Page and is used as a pointer to the source.

On occasion I have downloaded transcriptions from all 3 because they are
all different and in my opinion wrong, which is why at some point maybe
a year or so after input I go back and check mine for typos and
incorrect interpretation.

Transcriptions by whoever are subject to error so keeping the original
in digital form if possible is invaluable.
I never save URLs for any purpose other than temporary links as to where
data maybe found, or to Family Trees that might help corroborate or
otherwise some area I am working on. These are such that if anything
should happen losing them is an inconvenience not a disaster.


What happens if you try to export a Gedcom, is the information lost, or is it converted to a note or similar? Because if it is converted to a note, I wouldnt consider that big of a deal, because having it as a note is what I currently have been doing.

Hm, Interesting, didnt know that.

Do you then not have a residence event with the same information, the census event would be a replacement (maybe more accurate too) for residence event?

Even if its one “article” or one household information, but it covers information that would be on different events and even different people, for example parents and their children?

If yes its similar to what I am doing.
Usually my case it would be one PDF media with for example two pages.
Plenty of the things I have but in to my tree until now have either already had a transcribed version, for example the Norwegian 1920 Census. Then I copy that in to a note in addition to the media.
But if it has been news articles / magazines (From national library) that already has OCR embedded in the downloaded PDF, even if its not perfect, I havent bothered having a note with manual transcription.

1 Like

Attributes are not converted to notes. The attribute data is just not included in the gedcom. The note associated with the citation is included in the gedcom. I chose my current method as it works best in WikiTree. Like me, you may have to adjust your workflow based on what you want to do with your data. I could have added the URL to the note, but I don’t always have a note with the citation and that would cause clutter.

The Census could be a residence event, however there is a Census event in the menu which seems to be more accurate as to how the data was collected.
For a Census event, I only enter in the Page/Vol field the important fields that would identify the record such as the ED, etc., plus the URL. If capturing the names of all the family members is important, I copy that information into a note in the Family record.

I think I changed my workflow 3-4 times before becoming comfortable with the results. I’m amazed with what others do to collect all the info, however, I decided that my remaining time was limited and my objective is focused on building the tree out with sufficient supporting citations. Family members, their birth year and occupation (if recorded), was my focus with a Census. I have recorded 6600 people in GRAMPS, and I have 10,000 ahead of me. I see no point recording data I will never use. The important data is what allows the features in GRAMPS to work.


I would never include a transcription or an OCR as anything other than an interesting/amusing attachment what is in my tree is my responsibility not someone or something else’s.

Let’s say you have an individual provably born 1820 and yet an OCR document comes back with 1828 do you accept the OCR or leave the date you have got if you choose the latter you are admitting the OCR is inaccurate or unreliable and therefore a waste of space in your tree.


1 Like

Whilst agreeing with most of what you say I fail to see how including the URL as part of the Citation Detail is relevant or indeed accurate because surely your Citation has a Source and Repository which point the way to where you found it.

A URL is only good for a relatively (genealogically) short period of time, companies come and go, government and statutory bodies change as does the very nature of the internet.


If you are citing an online resource, it is usual to cite the url of the resource and the date that it was accessed. In the short-term it is useful to be able to open this link in a browser.

In Gedcom, the Page/Vol field is the correct place to record this information. The specification says “It is recommended that the data in this field be formatted comma-separated with label:value pairs”.

Adding new “URL” and “Accessed on” fields in the Gramps citation editor seems like a good suggestion.


Having the “Accessed on” default to a today() value when a URL value is entered could reduce the data entry burden.

1 Like

Yes. We could also put a button next to the “URL” field to open the link in a browser.

Another “Jump to” button?

Is there any possibility of changing that non-standard interface? You could eliminate the edit text box in favor of a more familar (not directly editable) hotlinked (blue or purple with underline) text with an edit button beside it?

There might be a way to make that data-entry friendly too.

If the Edit button swapped over to a Table Form of edit fields (with that always had 1 blank line for new data), that could be used to simplify the data-entry on ‘Internet tabs’ too. I often want to quickly add a FamilySearch, WikiTree, FindAGrave, and my personal website link to a person or in sources.

And additional future benefit might be disabling the Edit, Add, Remove buttons when letting Aunt Martha browse your tree. (A lot easier than fixing the tree after she clicks the “Remove” button half a dozen times while asking “what’s this do?”)

1 Like

Thats why I load all found documents, save and use them locally in gramps. There are thousands of documents already. This is a big job, but I dont see another way - any urls are really temporary.

1 Like

The URL is a required entry for a citation on WikiTree (with few exceptions).


Yes. If you accessed the source online then it is good practice record the URL.

Downloading documents or taking screenshots is also a good idea, but you should record where you got them from and also when.


This is why I wonder what other people are doing, because how I am doing it now, leaves an increasing number of notes that only have links in them.

So the Page volume would for example be something like:
p.256 [ED] [URL]

One thing about that, is that some pages, have specific permanent/long lasting URL that is different form the normal URL, and they only link to the source and not specific pages. Seem to fit better on the source rather than the citation, but maybe its fine on the citation too.

URL is still way better than nothing in my eyes, makes finding you way back much easier when they work. If they exist, and they do on the two main websites I use the most (Government run ones), I use spesific “Permanent/long lasting” URLs to sources they have.