Possible Research Journal design and ideas

Nick-Hall · October 19, 2022, 2:49pm

Yes. We would want to be able to export evidence as XML and then import it into another database.

emyoulation · October 19, 2022, 3:09pm

PDF fillable forms have internally ID’d field names & metadata. (Most PDF forms have generic autogenerated field names & tabbing navigation orders) There are Python libraries for polling & pushing data to/from those fields.

Maybe just generate an XML DOM to associate with the PDF media object’s field & metadata? Then the data could be stored in XML ready for manipulation. And the Crosswalk could be matched to the DOM.

ennoborg · October 19, 2022, 3:25pm

Yes, that’s exactly what I have in mind. And I just found a nice example in my own tree, when I found a police report from 1944, listing the arrest of a person, and his brother.

The entry is just 5 lines, but it provides a lot of data, like:

the date and time of the arrest
the place and nature of the offence (they had a hand cart with a lot of wood)
birth dates and places
a home address
their occupations

When you import all of this into the current lineage-linked conclusion model, you need to create a lot of different objects, and a quick count would lead to a dozen or so. What I mean is that for each person you need a person object which stores their name, and an event object with an associated place for their birth data, and another one for their address (residence event?), and yet another one for their profession (occupation event). And then you also need a family object to which you connect both as a child.

You can decrease the number of objects a lot, when you introduce a subject object (pun intended), that stores all known facts of these people as attributes, so that you get a single object with first and last name, sex (or gender …), birth date and place, current address, occupation, and role, where for the latter, you designate one as the leader (or head) and the other as his brother.

With this approach, each evidence person (or subject) can be stored as a single row in a table, and indeed the same can be done for the event, by storing the location as a string inside the event object, instead of linking to our place object(s).

I chose this arrest as an example, because I found that today, but the model is the same for the usual events, like births, baptisms, marriages, deaths, burials, and of course censuses.

Is that what you mean?

GeorgeWilmes · October 19, 2022, 3:52pm

Why not link to the place objects? Then, for example, the References gramplet in the Places category view could show the evidence events as well as the conclusion events, and distinguish the two somehow. (Or there could be a separate Evidence References gramplet.)

ennoborg · October 19, 2022, 4:14pm

Because you don’t always know what the actual place is. When in The Netherlands, it’s quite obvious that Harlingen refers to our province of Friesland, but for a Dutchman in Texas, it mght be Harlingen, TX. And in a similar way, the string New York may refer to the city, the county, or state, and you often need more analysis to figure out what it is.

For this reason, I think that a place in a source should always be just a string, as you found it in the source document. And the other reason is, that I want evidence objects to be as independant as possible, so that I can use them in different databases, regardless of how well the place table has been curated.

cdhorn · October 19, 2022, 4:40pm

I agree that might be a good approach for identifying separate claims in unstructured data.

The actual Source objects remain largely the same and would be used in common across the two submodels.

I am unsure about all that is actually needed in terms of additional database tables to make it work how I envision it. I have ideas but when the time came thought I would experiment a bit to see what might work best. I may try to share something later when I have more time, I need to grab lunch and back to my day job.

ennoborg · October 19, 2022, 5:35pm

That sounds like the easiest way, modelling wise, but since it’s one of my hidden pleasures to rock the boat, or ruffle a few feathers …

What about a single source object, that has all the attributes that we find in the repository, source and citation? I mention that, because most objects in the world outside the GEDCOM (and Gramps) model look like that. It can be an email, stored as a file, a picture with meta data, or an object saved with EverNote, OneNote, or Zotero, or any other application that users may like.

Nick-Hall · October 19, 2022, 6:58pm

Yes.

A person record would store each piece of information as a string along with a type. So you would have an event record for the arrest but not for the person’s occupation. This would just be stored as a string with the type “Occupation”. It could be used to create an event in the conclusion model.

Nick-Hall · October 19, 2022, 6:59pm

The first step would be to record the place as a string. The second step would be to link it to a place record. This might be done at the same time as step 1, but maybe later.

cdhorn · October 19, 2022, 9:40pm

That would be a mistake.

A Source is a container for information, a Citation is a reference to information in the container usually in the form of a specific record. They are two different things with different purposes.

Using them for Heirlooms with no intrinsic information value in themselves is overloading them for a different purpose which is something you usually want to avoid.

Repository really is just a type of Place as mentioned in another thread.

cdhorn · October 20, 2022, 4:22am

@Nick-Hall I took a pass at trying to refine some of my thoughts this evening and I kind of envision things along these lines at the moment:

As you can see I’m still hoping I can change your mind about Artifacts at some point…

My thinking is with structured content the xml could be enhanced in such a way as to identify the subjects and the assertions about the subjects so the data lands in the right place. I think the claim basically is the record in that scenario, ie a row on a census sheet, a birth certificate, etc. So something like your Forms Gramplet is perfect for that.

With unstructured content the claim will typically be a scentence or phrase in the material being evaluated. So a Claims editor would need to allow for entering or editing the Claim, allowing the user to enter a subject and then the assertions about that subject, and then the next subject and so on if there are multiples involved.

I picture the assertion fields being used something as follows say for a birth record:

Subject: John Smith

Assertions:
Property          Type          Value        Date         Time        Place
Characteristic    Sex           Male
Experience        Birth                      12 Jan 1874  12:01 PM    Manhattan, New York, USA
Relationship      Son           Linda Smith
Rank              Child Number  4

Most likely we would need to use Attribute instead of Characteristic and Event instead of Experience because that is what people are used to, but I feel those words better define their intended use.

I just realized that Correlation would not have a list of Assertions as those will be gotten from the Subjects. I need to think more around the whole Correlation peice I feel like I may not have that modelled right.

Anyway I am sure there are things I have not considered, and I may be missing some attributes for the objects, but I hope it better shows where my current thinking is with it.

emyoulation · October 20, 2022, 5:36am

How are adding “remotely possible” to the confidence levels?

Also consider using the something like the .pst email archive format for the correspondence. Being able to directly read & write to it from an email package would open possibilities. Like, enable the thread to be kept alive, rather than killed & mounted in a log. Likewise the vCard address books are better as a living document… only with the former addresses being shunted to to the genealogy tool instead of merely being discarded.

Nick-Hall · October 20, 2022, 11:11am

An alternative would be to use the persona model. In this design, you would have a Persona table and an Experience table.

Experience

E1
Type: Birth
Date: 12 Jan 1874
Time: 12:01 PM
Place: Manhattan, New York, USA

Persona

P1
Name: John Smith
Sex: Male
Role: Primary → E1
Relationship: Son → P2

P2:
Name: Linda Smith
Role: Mother → E1
Relationship: Mother → P1

These objects could then be built into a hierarchy.

In a model such as DeadEnds, this would also be used for conclusion objects. We are proposing to link the top level objects in these tables into our conclusion model.

cdhorn · October 20, 2022, 11:36am

It could be added but would be so speculative you should probably not have a subject for it in the conclusion model in the first place. If you have a copy of Anderson’s book reference his definition of it on page 37. If not I advise obtaining a copy and reading it.

These are implementation details. Something like that would best be an add on probably. Not everyone’s mail is stored in .pst files.

emyoulation · October 20, 2022, 11:54am

Very true. Which is why we have a plugin for database engines too. There are doubtlessly Python libraries for reading & writing most of the more popular eMail client formats. We could choose the one that was the best match to our open source license as the default. Then let add-ons evolve.

But, as you pointed out, “implementation detail”

cdhorn · October 20, 2022, 12:10pm

That is an approach you could take, yes.

I think it is not generalized enough and feel like it is better to keep as much ‘flat’ as possible. Although the linkage bundles/dossiers do just that to build up a persona but in a different manner.

ennoborg · October 20, 2022, 1:34pm

I know what they are, but as far as I’m concerned, the existing model is inadequate, and I like Gramps to support imports from EverNote, or Joplin, or Zotero, which IMO means that we need to depart from the citation-source-repository model, because that forces us to 3 layers that are not a good representation of the real world.

What I mean is that when I copy some text from a record on the web, that was transcribed from a civill registry, or a chuch book, I create a source for the book, and put the page or record number, and the date, in the citation, and the text in a note attached to the citation. And that means that the information is NOT contained in the source object, but in the citation. Also, the citation is not in the citation, which only has a date and page/volume, but distributed among the repository, source, and citation objects. And that’s quite a mess. We’re used to it, because we adapted the GEDCOM model, but it’s still a mess.

When people use a reference manager, like Zotero, all reference data is stored in a single object, which can also hold a snapshot of the web page that is referenced, meaning that it can also act as a (copy of the) source. And one can argue that this is not a good idea, and the source should rather be treated as an attachment, which can also be a picture, or some other media, and you may also need to find a place for a transcription.

And no matter whether you separate these elements or not, they’re not the same as the GEDOM source, citation, and repository, objects, so I think that they should not be used in the evidence model.

I also don’t see much need to replicate Clooz, or Evidentie, or Centurial, meaning that I prefer to avoid the whole administration of claims, assertions, and other stuff like that. When I decide to link evidence to conclusions, my reason simply is ‘just because’, and I don’t need to explain that to anyone.

Or do I?

ennoborg · October 20, 2022, 1:40pm

@Nick-Hall what do you think about sources and citations?

cdhorn · October 20, 2022, 4:49pm

Okay yes I understand your argument.

It would make sense to just have a Source object and then some Catalog heirarchy it can belong to within a Repository. The other models do it that way, I think Gedcom is the exception. I could be wrong though.

And that is fine, there is no need to use this at all and you could continue to use the lineage linked model. The citations provide the evidence there.

Or you could use this but choose not to document your analysis/rationale for correlating stuff.

The idea is to try to provide another set of tools, whether and how people use them is up to them.

PLegoux · October 20, 2022, 4:53pm

With GedcomForGeneanet addon developed by @grocanar i get this citation on Geneanet :

FR-21142, Chanceaux. Registres de catholicité, Registres. 1654-1775; Baptêmes, Mariages, Sépultures - Prêtres de la paroisse - FRAD021, Côte-d’Or. Archives départementales - http://www.archives.cotedor.fr/cms/archives-en-ligne.html - Collection départementale - FRAD021EC 148/009 - Electronic - 1761, fol. 2. Acte de baptême. Legoux, Antoine - Source de qualité très elevée - 13 JUN 1761 - URL - http://www.archinoe.fr/v2/ark:/71137/g3eb274b8d85540ab7e3bc7341082bf50/f4f20cd25bc7f167a49fcf80e2677ed2/373/ZnJhZDAyMV8xNDhfMmUxNDhhcnQwMDFfMDM3My5qcGc= - N° d’image - 373/478 - Consulté le - 2019-10-02

Where these fields come from:

Repository title: FRAD021, Côte-d’Or. Archives départementales
Repository URL: http://www.archives.cotedor.fr/cms/archives-en-ligne.html
Repository URL title: Collection départementale
Source title: FR-21142, Chanceaux. Registres de catholicité, Registres. 1654-1775; Baptêmes, Mariages, Sépultures
Source author: Prêtres de la paroisse
Source reference to repository: FRAD021EC 148/009
Source reference type: Electronic
Citation page: 1761, fol. 2. Acte de baptême. Legoux, Antoine
Citation quality: Source de qualité très elevée
Citation date: 13 JUN 1761
Citation attributes:
- URL: http://www.archinoe.fr/v2/ark:/71137/g3eb274b8d85540ab7e3bc7341082bf50/f4f20cd25bc7f167a49fcf80e2677ed2/373/ZnJhZDAyMV8xNDhfMmUxNDhhcnQwMDFfMDM3My5qcGc=
- N° d’image: 373/478
- Consulté le: 2019-10-02

I’ll appreciate if an addon or an integrated function in Gramps could return the result i can see on Geneanet (probably I could do it with SuperTool with a citation)

Topic		Replies	Views
How do you manage your research contacts? Help	7	1423	June 3, 2022
Research Journal Help	15	1254	April 16, 2021
Researcher meta-data record Development bug-filed	5	228	April 27, 2025
5.2 topics for discussion Beta Testing	2	95	October 5, 2024
Graphviz Export - Wish for Feature Ideas thickets	23	3127	September 10, 2020

Possible Research Journal design and ideas

Related topics