Proposed changes to Source and Citation objects to better support evidence analysis

I have been giving some thought to sources and citations in Gramps with respect to the Evidence Analysis Process Map from Evidence Explained among other things.

There are some changes I think might be beneficial to make and I wanted to solicit feedback from the Development team as to whether others see them as useful and whether a PR might be accepted for them.

Source Objects

  • The Source object schema would be extended to add a SourceType.

  • The SourceType would allow the source to be identified as an original record, derivative record, authored narrative, or unknown.

Citation Objects

  • The Citation object schema would be extended to add InformationType, EvidenceType, and an assertion_list containing AssertionType objects.

  • The InformationType would allow the information cited to be identified as primary (firsthand) information, secondary (secondhand) information, or information of an undetermined origin.

  • The EvidenceType would allow the information cited to be identified as providing direct, indirect, negative or unknown evidence.

  • As an Event is actually a collection of assertions and those are not modeled separately in Gramps the assertion_list would contain a list of AssertionType objects to indicate what assertions the citation supports.

  • An AssertionType would classify an assertion as being about a type (can be either attribute or event type), a date, a place, a name, or a relationship. Note while relationships have types I think assertions about them need to be identified separately.

I think the above would be sufficient to develop some kind of evidence analysis report for use in helping formulate a genealogical proof argument.

It might also be used to generate some metric to better measure the quality of the information as well. I know the “confidence” attribute is intended to capture that, but that is a subjective measure of the researcher.

A written proof argument is of course saved as a note, and I imagine most people use the “Report” note type for that purpose. But it might also be nice to have a specific “Proof Argument” type for notes as well to make them easy to identify. An evidence analysis report would also be able to identify them then.

2 Likes

I would love to have an evidence analysis report in Gramps. I think you could prototype your idea without any data model changes, simply by using Attributes on the Source and Citation objects, by storing the various pieces of information from a citation in appropriate attributes (on Persons, Events, etc.), and by attaching citations to names, childrefs, etc.

2 Likes

Yes that could be one way. I suppose for the assertion_list nothing stops me from having a jsonified string as an attribute value.

1 Like

So something like this is better handled a little differently looking at it again after a couple days.

For the Source object there would also be a CredibilityType. I realized this may make sense to add after looking at Tony Proctor’s STEMMA model again. Credibility would be expert, questionable, trusted, unsubstantiated and unknown.

For the Citation object there would only be the addition of assertion_list. It would be a list of Assertion objects, not just types.

An Assertion object then would have a number of attributes: type, subject, value, information type and evidence type keeping the InformationType and EvidenceType types for the last two. I think this makes sense after looking at the old GENTECH model again. It might also have a conclusion boolean indicator.

At least these are my current thoughts.

I had not seen the STEMMA and GENTECH models before, but will study them now. Thanks for making me aware of them.

When I think about trying to attach information type and credibility type to my sources, I struggle with how to do that for records containing pieces of information having different types. It seems each piece of information would need its own citation in that case.

For example, in a death certificate, the information provided by the doctor/coroner/etc. would be considered primary and presumably expert, whereas other information about the deceased (provided by some other informant) would be considered secondary, and the credibility for each piece of that information could vary.

In my mind the citation references that portion of the source record you are extracting information from, and it is each peice of information extracted that is an assertion. Each peice of information may be from a primary or secondary or unknown source and so that is a measure of reliability or credibility. Each peice of information may provide direct, indirect or negative evidence of something. That is why a citation would have one or more assertions associated with it.

In some sense the confidence measure associated with the citation maybe should be at the assertion level as well to capture your own subjective measure should you choose.

The measure of credibility for the source is intended to be for the source as a whole and may not be applicable to all sources. For example the source type might be a authored/narrative work, and the author might be considered an expert in their field, like Robert Charles Anderson. Or it could be authored by someone unheard of to you in which case it may contain more questionable material you would want to confirm for yourself. Or some of the stories passed down about the family history perhaps came from the Uncle everyone knew liked to spin tall tales, so there could be some truth in them but they are largely unsubstantiated.

I am not sure if anyone is aware of either of these evidence based Genealogy applications, they both take a different approach to things building from bottom up instead of top down:

https://www.centurial.net/en/blog/2019/2/15/evidence-based-genealogy-part-1-what-is-evidence

https://evidentiasoftware.com/evidentia-step-step-part-1/

Clooz are also a research tool that have a more document(source) focus approach.

The new version 4, will have a lot of new features and look really great (i’m beta testing it).

It will sync with both Legacy and RootsMagick, and can read/write gedcom to “sync” other software.
At the moment only Windows.

Problem with Centurial, Evidentia and Clooz, is that neither do sync with Gramps database or xml, and gedcom is a lossy/limited fileformat for lineage-linked research only.

Thank you I was not aware of that one and will take a look at that now too!

Each of these has been useful to examine.

The Centurial approach clearly recognizes that correlation is part of the analysis process that turns an assertion into evidence.

The Clooz document construct captures all of the information in a source citation and in doing so all the possible assertions. The process of merging people is the correlation step.

The Evidentia approach walks through the analysis of the information, breaking the citation information down into claims and further into specific assertions classifying things along the way. Claims in Evidentia are higher order assertions about assertions, something the GENTECH model recognizes, that help capture more of the information analysis process. The whole process flow really forces you to document your analysis and conclusions.

In all cases assertions are about a persona, and the conclusions about a person are built up from combining or merging the personas.

To support this better it seems like a separate Persona object is needed, and the Person object would represent the research conclusions and have a corresponding persona_list maybe.

I guess what I’m trying to get my head around is can Gramps evolve to support both the legacy lineage-linked model and this other sort of evidence-linked or document-linked model at the same time.

This, too, is something you could prototype in the current data model. You could create multiple Person objects for a single (real-life) person, and give all but one of them a tag or attribute identifying them as a “persona” so that you can easily filter them out of views, reports, etc. Then, create Associations from the real person to the personae. Anything you do to the real person would be considered your conclusions, while anything attached to the persona would be all of the evidence. You would only need to do this as the need arises, and probably not for the vast majority of your people.

I haven’t thought through all of the implications for Families and Events, but again this is just for purposes of prototyping, not an actual solution (unless it turns out to be surprisingly robust). As part of the effort, you’d probably need to customize or create a few gramplets (or if you’re more comfortable with SQL than Python, export your database to SQLite and write queries).

1 Like

Yeah that could work for prototyping.

I see GEPS 040 considers Persona and I was not aware of the Form Gramplet and am starting to look at that now.