Update Sources and citations

StoltHD · June 14, 2021, 9:31am

jacrider:

That is a difference between Microsoft Windows users and long-time Unix
users. The Unix philosophy for many years was that tools should do one
thing very well, and that if you needed additional capability, you would
feed the output of one tool into another tool to perform another
function and it might be necessary to use several tools to accomplish
what was needed. In the days before GUIs, pipes and redirection of
standard input and standard output made this easy, and shell scripts
could automate the process if it needed to be done frequently. And most
developers consider it foolish to reinvent the wheel. So if there is a
tool that can convert an output of Gramps to an input for another
program, it would be foolish for the Gramps developers to expend the
effort to duplicate and maintain the functionality of that tool (unless
you can find a developer who is sufficiently bored that they would be
willing to do it).

Well, the big problem with Gramps is that you can’t “pipe the outcome” to another tool, because Gramps doesn’t have any interchangeable formats other than gedcom, and gedcom is a lossy limited format, Neither the XML nor the JSON is in a format that is readable by other software without programming knowledge, and to be able to use the sqlite format a user need to know both sqlite and sql.

And if you read any of my comments, you will see that I don’t ask for or talk about large feature enhancements in Gramps it self, I talk about supporting interchangeable formats that actually give Gramps just that functionality, to pipe data to other tools, or to pipe data from other tools to Gramps.

That is what interchangeable formats is all about, and that is what interoperability is all about, to be able to use data in other software.

And in the case of this post, be able to use data from multiple a choice of external Reference Managers in Gramps.

I have never talked about changed what’s already ion the Gramps GUI, because the single text field in both Citations and Sources are more than enough for me, because I use Zotero as my Reference Manager and Source Manager, and Copy and Paste the Citation or Bibliography string from Zotero to those fields, people ask for a better and more automated way to do this, or to support better source handling that confirm more to Reference Standards used.
support for read/write to CSL JSON and BibliaTeX will provide much of this.

And just so it’s clear, I have never once demanded anything, I have asked for features, and I have tried to write what I think is needed as a researchers working outside of the somehow narrow lineage-linked niche, BUT when people start attacking my thought with “fork it and do it yourself” attitude, I do answer! and in this forum and in the mailing lists, that is the standard answer for anything those people don’t like!

StoltHD · June 14, 2021, 9:46am

It really starts to get boring and tiresome that you never read what I write!

I have never once written that direct interoperability with Zotero is the solution, I have only written about supporting an OPEN STANDARD like CSL JSON and BIBLIATEX, because by doing that users have the freedom to user whatever Reference Manager they want, and if whats in Gramps is enough, they can use that.
But many people use their sources and references to multiple “things”, and it would be great to not have to register the same source multiple times in different software.

SO AGAIN: - The only thing I write about is supporting an Open Data exchange of data, file format like CSL JSON.

I wonder how many times I need to write that before it get picked up!

Zotero are just one of many Reference Manager or Software that support export/import of those formats, and by using those formats, there will also be an easier way to use the same references in Word, Open Office, R Studio, many LaTeX Editors, Zettlr, Obsidian, Joplin, Atom, VSC, Omeka, Arches, etc. etc.

PS. Since people here don’t understand that giving examples of software that use a format doesn’t mean a demand for direct interoperability with those software, This is only examples of software that use this specific FILE FORMAT to read data a Reference Manager of choice.

Support for an Open Data, Open Source interchangeable FILE FORMAT like CLS JSON and BibliaTeX, not direct interoperability with any Reference Manager.

Do I need to repeat it 100 times?

jacrider · June 14, 2021, 4:37pm

I have been using Gramps for over 15 years, and it has evolved
significantly in that time. I’ve spent a considerable amount of time
learning to use new features and improving my own research as a result,
and I frequently discover that there are other features that I hadn’t
discovered, some of which I find useful and others that don’t appear
useful for my purposes.

I would challenge you to name any project that depends largely or
entirely on volunteers that ever has enough volunteers. There are
always people who criticize those projects for not doing enough but
won’t get involved. It is true of most open source projects (the
exceptions are those that have some sort of corporate sponsorship and
are not solely reliant on volunteers and those that keep the focus of
the project sufficiently narrow to not overwhelm the volunteers
available) and it is true of many other endeavors in life. I’ve seen it
over and over in the local Chamber of Commerce in the six years that
I’ve been active. There are always ideas about what we should do and
critics that we don’t do what they want but never enough volunteers to
make it possible.

If what you want is just a matter of exporting existing data in a format
not currently provided by Gramps and if you don’t have the expertise to
develop it yourself, consider writing a detailed specification that
would help a developer understand what you are looking for. If it
really does fit that description, it could be developed as a report
add-on and would be an ideal project for someone interested in becoming
a Gramps developer that doesn’t have the experience to come up with an
idea for something new and needs a specification to work from. If you
don’t know how to write such a specification, find someone with similar
interests that could help you with it.

ennoborg · June 16, 2021, 11:47am

Good points, and focus is part of the problem. Most reference managers can write way more variables than currently exist in the Gramps data model for repos, sources, and citation, so reading CSL JSON is not as easy at it looks. That is, because for each of the dozens of potential variables, you have to decide what to do with it. Some can be easily mapped to existing variables, like author, or title, or date, but volume/page is already a problem. And for each of the other ones, you have to make a decision about how to present them to the user, i.e. to aunt Martha, and accept that they must be translated to all the languages supported by Gramps. Just storing all in extras in citation attributes is not an option, because that means that they will always appear in English, and we are an international community.

As far as I’m concerned, this isn’t worth the effort, for any of the reference managers that I know, because the ones that I tested, and that includes Zotero, do not detect the meta data that is essential for genealogy, and that is the the information that I mentioned earlier, which is needed to represent the hierarchical nature of archives, and the wish to point to a particular group of lines or a record number, in a document, or a time fragment in a piece of film or audio. These are all data that do not exist in CSL.

CSL and other reference standards also don’t have enough provisions for the record data itself, for which other standards exist, so if there is a standard to follow for open data, I think it’s way more interesting to follow any of those, like the one that you can find on schema.org

In either case, any decision we make, as developers, will mean a further diversion from GEDCOM itself, so basically it will mean that we’re on our own, and that makes investing time even more difficult.

emyoulation · June 16, 2021, 12:38pm

It seems like this might be a good opportunity to document how to dip your toe into hacking Gramps.

I did beta testing with an Gramplet that took a CSV of known structure, pulled the data in & visualized it.

Because the format was known, the developer wrote code that parsed the data directly. (Doing it yourself because you have known data is often a programmer’s first instinct.) So the code has the potential to fail badly if the unexpected happens … there’s no error handling.

But there is a CSV library that could have handled the CSV gracefully with full error handling for file errors or data errors. It would’ve been better to use the library to parse the data into Gramps Attribute records, then work with known attributes.

So maybe the best idea is to start a project where we learn to turn a Note of a custom Type (like CSL or JSON) into custom Attributes & back again using the built-in Python package json or process it with a project like citeproc-py. (Using a Note is a stepping stone. Later refinements might be to import a specific file, poll a data stream, or handshake with an API to pipe the data from another application.)

Once the data becomes available in a predictable place, people can begin experimenting with it in Reports or Gramplets.

But the big thing is to document how to expand getting different forms of data into usable Gramps attributes.

Mihle · June 17, 2021, 3:21pm

I will just say my small opinion, that might or might not be the same of the average user of gramps.

I dont care that much about export and import of my three to anything else other than things to have an alternative visual reports or whatver possibly.
(dont care about whatever you call reserch platforms that I dont know what can do that Gramps cant, dont have knowlage about it)

But something I DO want a lot, is putting events under other events, or having folders of sources. those two things would be soo good to have

emyoulation · June 17, 2021, 3:33pm

I do not understand the intent of an ‘event under other events’.

Do you have example how it would be useful? (Or maybe there is a discussion I’ve forgotten & ‘search’ didn’t list.)

Mihle · June 17, 2021, 3:52pm

I dont think any examples was mentoned here but the concept was, maybe in other threads, but I dont remember.
As an example, for example someone that had millitary service and then was mutliple places with the military.
It would be great having just Military as an main event, and then the different locations or groups in Milltary as events that is under that one main event.

Or in another exaple that I have in my tree. A person that worked in Salvation Army, Having one main event that is Occupation Salvation Army, and then events under that one that is different divisions. Or can be something else the person did while being in it.

PLegoux · June 17, 2021, 4:02pm

Subevents discussion: here

PLegoux · June 17, 2021, 4:05pm

Yes, I would like to have them too !

emyoulation · June 17, 2021, 4:20pm

So, more like Groupings rather than actual Events?

Or more than that? Like a master Event that has a Military Service which an enlistment covering a 4 year span and where any date outside that range is invalid. Sub-events might be: enlistment, mustered-in, training, promotions, a series of deployments, mustered-out.

I’ve always hoped that emigration & immigration would somehow become linked. And it’d be all the better with direct linking of port-of-call waypoints.

PLegoux · June 17, 2021, 4:52pm

Yes it’s exactly that. I really like that idea.

A property event and buying property, building the house and selling the property as subevents,
A war and its battles as subevents,
A trip and the different steps as subevents,
A railway construction and all stations opening as subevents,
Etc…

I actually use attributes associated to the event but they cannot be (really) shared with another people (or place soon).

cdhorn · June 19, 2021, 1:12am

I think it would likely be best modelled as a Group entity.

A group could consist of people, events, places, artifacts.

As a top level entity a group exists over a particular time span, there are events and attributes that apply to it as members may come and go and they undergo different experiences.

Group was recognized as one of the core concepts in the GenTech data model if I recall.

GeorgeWilmes · June 19, 2021, 1:43pm

See also this discussion of sets.

cdhorn · June 19, 2021, 4:41pm

Thanks, yes, I agree same concept pretty much. I think I prefer the word group but that is just semantics.

StoltHD · June 30, 2021, 12:27am

ennoborg:

Good points, and focus is part of the problem. Most reference managers can write way more variables than currently exist in the Gramps data model for repos, sources, and citation, so reading CSL JSON is not as easy at it looks. That is, because for each of the dozens of potential variables, you have to decide what to do with it. Some can be easily mapped to existing variables, like author, or title, or date, but volume/page is already a problem. And for each of the other ones, you have to make a decision about how to present them to the user, i.e. to aunt Martha, and accept that they must be translated to all the languages supported by Gramps. Just storing all in extras in citation attributes is not an option, because that means that they will always appear in English, and we are an international community.

I am Norwegian and I use Zotero, Jabref and Citava for ALL my sources, I have no problem what so ever to get the citation/bibliography string I need from any of them… It is also possible to copy/paste strings from FOR EXAMPLE Zotero to any text field in Gramps with either any of the predefined CSL Standard Citation/Bibliography Style formats, or create your own format with the style editor in Zotero, I can not tell if that is possible in jabref of citava, because I don’t use those two software for that purpose.
I don’t see the problem with “Volume/Page” the way you do. In that field I can paste any fully qualified Citation or Bibliography string I like from my Bibliography Manager of choice, and if I need more information for some reason, I can add it manually by editing the string, the problem is that doing this manual job 500 times when adding new information is time consuming, and because of that, it would be great if there was support for already existing bibliography formats, and CSL is one that is well established in most research environments, including most universities world wide, another widely used standard is Dublin Core.

The bibliography ontology that schema.org base there work on is dated, the newest information I could find about it was dated 2015, and it is as limited as CSL if not more limited.
And it lean heavily on RDF/OWL.
The risk is again that a project based on this will be the same as inventing the wheel all over again.
And Personally I really don’t understand why it’s so important to do that for developers, to “invent” there own solutions to something that already have multiple well supported and working solutions.

CSL Json is only one of multiple standards that can be used, I have asked for import/export of both JSON-LD (RDF/OWL) and other open data/open standard formats, like a graph format, but the same answer occur regardless, some “developer” or someone in the “inner circle of Linux users” doesn’t understand what it can be used for and therefore the idea bad, it’s not usable, its not possible, etc. etc. and then the standard comment comes “Fork it and do it yourself”.

And since any of my suggestions are so useless and so bad… I just dropped using Gramps as anything but a simple data storage, all my research logs, research notes, data analyzing etc., is now in Markdown and network graph format stored in neo4j, and my unstructured text in markdown is easily analyzed in Cytoscape, Tulip or Gephi with the use of the Juggl add-on for Obsidian, citations and notes made in Zotero can be easily used with plugins, I can export anything in any “style” or “text format” I like with a plugin for Zotero, and I can easily use any Bibliography style I like with plugins for either Obsidian, VS Codium + Foam or Zettlr, and I get the exact same information into any network graph software or any other research tool I use.
One of the reasons I stopped using Legacy was because of the locked in data, but I actually find that data in Gramps is as locked in when it comes to utilizing it for multiple purpose, and I’m about to stop using Gramps for the same reason.
My bad to believe that Open Source Software also ment Open Data.

So what I have done, is to use Excel with Power Query as hub as a hub (until my Python Script is working as I want it to), merging data to and from the Gramps xml file.

It was exactly the same when I first suggested Main-/Sub-Event and Places as Subjects for Events…
And here you are, asking for the same thing, 1,5-2 years or so later.

I also remember all the negative response from a few users in the “inner circle” when I first suggested to use Openrefine to clean up badly formatted place data for a user in the email list.

That was at the same time I asked for a full database export to CSV, but the answer was that CSV wasn’t a format many used, even though most big data dataset is shared as CSV.

ennoborg · June 30, 2021, 11:33am

You can go on and on, but as a user, I want real open data, and the formatted citations that you paste from your reference tools are not open data. IMO, they’re mostly tedious, and not fit for publication in a local magazine, should I ever want to publish there.

The real fact is that none of the tools that you mention support the meta data used by archives, which means that they are completely useless for me. And allthough I’m not the benevolent dictator that rules Gramps, it does mean that I won’t spend a minute to write software to support these tools.

Your reasoning that these tools are used by scientists, and therefore must be good, also does not make sense to me. I’m really not impressed by that, on the contrary.

I do however think that Gramps supports open data, in the sense that our XML format is documented, so you can write tools for that. I’m doing that myself too. The only problem with citation data is, that although there are a few standards, they are not widely used, for various reasons, and they are not part of any current GEDCOM or GedcomX.

You can see a working example of Dutch open data on this site, operated by a fellow Dutch genealogist, and web master:

What you can see here is data about a possible cousin of yours, coming from a regional archive. And on this page, you can see a formatted citation, which is generated from the meta data, and which changes when you change the page language.

If you look at the page source, you can actually inspect the open data behind it. It’s close to the end of the source code, in the part identified by the name of the standard, which is A2A.

If you look further, or rather earlier in the source, you will also find references to historical-data.org, which is also an open standard used by archivists.

moorob · June 30, 2021, 1:30pm

The reason I would like to see a citation template engine is it helps keep all my citations consistent. It doesn’t have to be strictly EE based. Right now I’m going through my citations and fixing them, all 2000+ of them. Unless I keep a cheat sheet where I can see it, I forget the order that I’ve been using without look at another citation first. Sucks to get old, I think my old organic harddrive has a bunch of bad sectors.

Just add a Template engine that allows a user to select from a set of standard fields such as DATE, Place, Text, etc and in the order they want them. If they want to set them like EE they can.

PLegoux · June 30, 2021, 3:31pm

I like that idea. Something looking like the Form gramplet: xml definition of fields to be used in repository, source or citation fields.

Fields could be grouped together to go into one of these three objects fields. ie:

<source>
  <sourceTitle="#1, #2">
    <subfield n=1>Volume name</subfield>
    <subfield n=2>Author name</subfield>
  </sourceTitle>
</source>
<citation>
  <citation date />
  <citationTitle="p. #1. #2>
    <subfield n=1>Page number</subfield>
    <subfield n=2>Citation description</subfield>
  </citationTitle>
</citation>

to generate a form with two fields (Volume name and Author name) and giving a Source title like this: Evidence Explained, Elisabeth Mills; and a Citation like this: p. 100. Citation of a FamilySearch record explanation

Each fields of repository, source and citation, including attributes could be chosen from existing records (for those which can be choose: repository for source, source for citation…) or manualy entered

StoltHD · June 30, 2021, 3:50pm

There is nothing special with that source citation, I have no problem generating one that is exactly similar except the html/css formating for the page, in Zotero.
I could if I like to even create a complete page looking like that by setting up at template in the Markdown plugin for Zotero, so I really don’t understand all the noise and buzz you create!

I have no problems making a citation or bibliography in Zotero that give me all the information needed to find back information in Norwegian or Swedish Archives, who is notorious for not supporting any standards at all, I do not have any problem creating a bibliography for English or American archives and newspaper either… it’s really strange that I am able to do that, when the software is so totally useless as you say.
And the Bibliography I can get from Zotero is no different than the example you use, even as a direct copy/paste that text string of information under the “sources” on that web page, is not a problem at all, and if I can manage to create a bibliography like that, you as a developer shouldn’t have any problem doing it…

And it’s really strange that so many researchers in the world actually use this tools, if they do not the tools are as useless as you want them to be.

Strange that so many use LateX with BibliaTeX,
strange that Pandoc support CSL and not the bibliography extension for Schema.org, if the CSL style format is so useless.

But please, make support for JSON-LD RDF/OWL, it would be great and absolutely useful, But you need to learn to read, I am using CSL and Zotero, Citeproc as examples for software that is actively maintained and developed, and that has interoperability via a standard.

The fact that you think you can’t use that standard, doesn’t mean others don’t have huge benefits of it.

Gramps XML is “open” but not interchangeable with any other system without major programming skills, i.e. useless for anyone that don’t know at least one programming language, Open Data is not only about be able to “read” the data in a text editor, but to actually use the data without an extreme effort and high programming skills, something most “normal” user don’t have and will never have.
And even though a “new” version of gedcom has been released, that version is still lossy and have a lot of limitations.

I don’t care what “standard” that’s used for bibliography and citations in Gramps, as long as it is interoperable with other systems without the need of weeks of hacking or developing scripts to read the data… or, actually I don’t really care at all at this point, because it is very clear there is some things that never going happen and that is that some developers understand the benefit of interoperability for users outside the “inner circle of linux users”, i.e. less than 2%.

Why is it that you are so afraid of interchangeable formats and interoperability between other research software?

Topic		Replies	Views
Internet links for sources and citations Ideas	16	643	January 6, 2021
Documenting Sources Genealogy	3	545	April 7, 2023
How are we doing sources? Help	19	273	April 13, 2025
FamilySearch citation pasting Help	10	415	May 10, 2024
By way of encouragement Feedback	0	451	January 14, 2020

Update Sources and citations

Related topics