Possible Research Journal design and ideas

Yes. We would want to be able to export evidence as XML and then import it into another database.

PDF fillable forms have internally ID’d field names & metadata. (Most PDF forms have generic autogenerated field names & tabbing navigation orders) There are Python libraries for polling & pushing data to/from those fields.

Maybe just generate an XML DOM to associate with the PDF media object’s field & metadata? Then the data could be stored in XML ready for manipulation. And the Crosswalk could be matched to the DOM.

Yes, that’s exactly what I have in mind. And I just found a nice example in my own tree, when I found a police report from 1944, listing the arrest of a person, and his brother.

The entry is just 5 lines, but it provides a lot of data, like:

  • the date and time of the arrest
  • the place and nature of the offence (they had a hand cart with a lot of wood)
  • birth dates and places
  • a home address
  • their occupations

When you import all of this into the current lineage-linked conclusion model, you need to create a lot of different objects, and a quick count would lead to a dozen or so. What I mean is that for each person you need a person object which stores their name, and an event object with an associated place for their birth data, and another one for their address (residence event?), and yet another one for their profession (occupation event). And then you also need a family object to which you connect both as a child.

You can decrease the number of objects a lot, when you introduce a subject object (pun intended), that stores all known facts of these people as attributes, so that you get a single object with first and last name, sex (or gender 
), birth date and place, current address, occupation, and role, where for the latter, you designate one as the leader (or head) and the other as his brother.

With this approach, each evidence person (or subject) can be stored as a single row in a table, and indeed the same can be done for the event, by storing the location as a string inside the event object, instead of linking to our place object(s).

I chose this arrest as an example, because I found that today, but the model is the same for the usual events, like births, baptisms, marriages, deaths, burials, and of course censuses.

Is that what you mean?

Why not link to the place objects? Then, for example, the References gramplet in the Places category view could show the evidence events as well as the conclusion events, and distinguish the two somehow. (Or there could be a separate Evidence References gramplet.)

1 Like

Because you don’t always know what the actual place is. When in The Netherlands, it’s quite obvious that Harlingen refers to our province of Friesland, but for a Dutchman in Texas, it mght be Harlingen, TX. And in a similar way, the string New York may refer to the city, the county, or state, and you often need more analysis to figure out what it is.

For this reason, I think that a place in a source should always be just a string, as you found it in the source document. And the other reason is, that I want evidence objects to be as independant as possible, so that I can use them in different databases, regardless of how well the place table has been curated.

1 Like

I agree that might be a good approach for identifying separate claims in unstructured data.

The actual Source objects remain largely the same and would be used in common across the two submodels.

I am unsure about all that is actually needed in terms of additional database tables to make it work how I envision it. I have ideas but when the time came thought I would experiment a bit to see what might work best. I may try to share something later when I have more time, I need to grab lunch and back to my day job.

That sounds like the easiest way, modelling wise, but since it’s one of my hidden pleasures to rock the boat, or ruffle a few feathers 


What about a single source object, that has all the attributes that we find in the repository, source and citation? I mention that, because most objects in the world outside the GEDCOM (and Gramps) model look like that. It can be an email, stored as a file, a picture with meta data, or an object saved with EverNote, OneNote, or Zotero, or any other application that users may like.

Yes.

A person record would store each piece of information as a string along with a type. So you would have an event record for the arrest but not for the person’s occupation. This would just be stored as a string with the type “Occupation”. It could be used to create an event in the conclusion model.

The first step would be to record the place as a string. The second step would be to link it to a place record. This might be done at the same time as step 1, but maybe later.

That would be a mistake.

A Source is a container for information, a Citation is a reference to information in the container usually in the form of a specific record. They are two different things with different purposes.

Using them for Heirlooms with no intrinsic information value in themselves is overloading them for a different purpose which is something you usually want to avoid.

Repository really is just a type of Place as mentioned in another thread.

@Nick-Hall I took a pass at trying to refine some of my thoughts this evening and I kind of envision things along these lines at the moment:


As you can see I’m still hoping I can change your mind about Artifacts at some point


My thinking is with structured content the xml could be enhanced in such a way as to identify the subjects and the assertions about the subjects so the data lands in the right place. I think the claim basically is the record in that scenario, ie a row on a census sheet, a birth certificate, etc. So something like your Forms Gramplet is perfect for that.

With unstructured content the claim will typically be a scentence or phrase in the material being evaluated. So a Claims editor would need to allow for entering or editing the Claim, allowing the user to enter a subject and then the assertions about that subject, and then the next subject and so on if there are multiples involved.

I picture the assertion fields being used something as follows say for a birth record:

Subject: John Smith

Assertions:
Property          Type          Value        Date         Time        Place
Characteristic    Sex           Male
Experience        Birth                      12 Jan 1874  12:01 PM    Manhattan, New York, USA
Relationship      Son           Linda Smith
Rank              Child Number  4

Most likely we would need to use Attribute instead of Characteristic and Event instead of Experience because that is what people are used to, but I feel those words better define their intended use.

I just realized that Correlation would not have a list of Assertions as those will be gotten from the Subjects. I need to think more around the whole Correlation peice I feel like I may not have that modelled right.

Anyway I am sure there are things I have not considered, and I may be missing some attributes for the objects, but I hope it better shows where my current thinking is with it.

1 Like

How are adding “remotely possible” to the confidence levels?

Also consider using the something like the .pst email archive format for the correspondence. Being able to directly read & write to it from an email package would open possibilities. Like, enable the thread to be kept alive, rather than killed & mounted in a log. Likewise the vCard address books are better as a living document
 only with the former addresses being shunted to to the genealogy tool instead of merely being discarded.

An alternative would be to use the persona model. In this design, you would have a Persona table and an Experience table.

Experience

E1
Type: Birth
Date: 12 Jan 1874
Time: 12:01 PM
Place: Manhattan, New York, USA

Persona

P1
Name: John Smith
Sex: Male
Role: Primary → E1
Relationship: Son → P2

P2:
Name: Linda Smith
Role: Mother → E1
Relationship: Mother → P1

These objects could then be built into a hierarchy.

In a model such as DeadEnds, this would also be used for conclusion objects. We are proposing to link the top level objects in these tables into our conclusion model.

1 Like

It could be added but would be so speculative you should probably not have a subject for it in the conclusion model in the first place. If you have a copy of Anderson’s book reference his definition of it on page 37. If not I advise obtaining a copy and reading it.

These are implementation details. Something like that would best be an add on probably. Not everyone’s mail is stored in .pst files.

1 Like

Very true. Which is why we have a plugin for database engines too. There are doubtlessly Python libraries for reading & writing most of the more popular eMail client formats. We could choose the one that was the best match to our open source license as the default. Then let add-ons evolve.

But, as you pointed out, “implementation detail”

That is an approach you could take, yes.

I think it is not generalized enough and feel like it is better to keep as much ‘flat’ as possible. Although the linkage bundles/dossiers do just that to build up a persona but in a different manner.

1 Like

I know what they are, but as far as I’m concerned, the existing model is inadequate, and I like Gramps to support imports from EverNote, or Joplin, or Zotero, which IMO means that we need to depart from the citation-source-repository model, because that forces us to 3 layers that are not a good representation of the real world.

What I mean is that when I copy some text from a record on the web, that was transcribed from a civill registry, or a chuch book, I create a source for the book, and put the page or record number, and the date, in the citation, and the text in a note attached to the citation. And that means that the information is NOT contained in the source object, but in the citation. Also, the citation is not in the citation, which only has a date and page/volume, but distributed among the repository, source, and citation objects. And that’s quite a mess. We’re used to it, because we adapted the GEDCOM model, but it’s still a mess.

When people use a reference manager, like Zotero, all reference data is stored in a single object, which can also hold a snapshot of the web page that is referenced, meaning that it can also act as a (copy of the) source. And one can argue that this is not a good idea, and the source should rather be treated as an attachment, which can also be a picture, or some other media, and you may also need to find a place for a transcription.

And no matter whether you separate these elements or not, they’re not the same as the GEDOM source, citation, and repository, objects, so I think that they should not be used in the evidence model.

I also don’t see much need to replicate Clooz, or Evidentie, or Centurial, meaning that I prefer to avoid the whole administration of claims, assertions, and other stuff like that. When I decide to link evidence to conclusions, my reason simply is ‘just because’, and I don’t need to explain that to anyone.

Or do I?

@Nick-Hall what do you think about sources and citations?

1 Like

Okay yes I understand your argument.

It would make sense to just have a Source object and then some Catalog heirarchy it can belong to within a Repository. The other models do it that way, I think Gedcom is the exception. I could be wrong though.

And that is fine, there is no need to use this at all and you could continue to use the lineage linked model. The citations provide the evidence there.

Or you could use this but choose not to document your analysis/rationale for correlating stuff.

The idea is to try to provide another set of tools, whether and how people use them is up to them.

With GedcomForGeneanet addon developed by @grocanar i get this citation on Geneanet :

FR-21142, Chanceaux. Registres de catholicitĂ©, Registres. 1654-1775; BaptĂȘmes, Mariages, SĂ©pultures - PrĂȘtres de la paroisse - FRAD021, CĂŽte-d’Or. Archives dĂ©partementales - http://www.archives.cotedor.fr/cms/archives-en-ligne.html - Collection dĂ©partementale - FRAD021EC 148/009 - Electronic - 1761, fol. 2. Acte de baptĂȘme. Legoux, Antoine - Source de qualitĂ© trĂšs elevĂ©e - 13 JUN 1761 - URL - http://www.archinoe.fr/v2/ark:/71137/g3eb274b8d85540ab7e3bc7341082bf50/f4f20cd25bc7f167a49fcf80e2677ed2/373/ZnJhZDAyMV8xNDhfMmUxNDhhcnQwMDFfMDM3My5qcGc= - N° d’image - 373/478 - ConsultĂ© le - 2019-10-02

Where these fields come from:

I’ll appreciate if an addon or an integrated function in Gramps could return the result i can see on Geneanet (probably I could do it with SuperTool with a citation)

2 Likes