Possible Research Journal design and ideas

@StoltHD @Nick-Hall so this is another of many things I have been thinking about lately. At the moment I think I would model the core of it along these lines:

Contact is not shown here, but would basically be Researcher prompted to a top level table object. I do not think you can use Person records for it, as contacts might also include things like town clerks, funeral homes, genealogical societies and so forth. There would be a way to associate a contact with a Person or Repository though most likely. The name Contact might not be a good one either, in the DeadEnds model the word Entity is used, and I think in GedcomX the word Agent but may misremember.

This is basically a paired down and then modified version of the GeneTech Administration submodel. I added Session as when I have travelled to do on site research I often have multiple objectives for multiple parts of my family, and have at times checked some things for other people. And I added Correspondence because that is relevant as well.

While I think all of these end up being table objects, none of them would get their own category, the category would be Journal or something like that and then there would likely be a project view, session view and correspondence view.

I recognize it would be desireable to link the objective to the research subject/s it applies to. This was just an initial iteration to get a feel for the data at a higher level. A proof of concept would really need to be put together to better refine things.

I’d greatly value both of your thoughts around this. Note this isn’t something I had in mind to tackle in the near future, but it would be good to get feedback on now and who knows maybe someone else will see this and get inspired.

2 Likes

How do you rely that structure to other gramps objects? Something like that?

2 Likes

Is this project open source? We need to be very careful if we copy part of an existing design.

Please read Howto: Contribute to Gramps if you need a reminder of the requirements for contributions.

See GEPS 015: Repository Research Support. This discusses creating a research plan.

The actual research could then be recorded in a journal as you suggest. In the past this would have been in the form of a written journal. I can probably find out how this was done, but each source consulted would be recorded, along with the results. Perhaps an activity at its lowest level should be a search for one person in a single source?

A final step would involve recording information from the source. Index cards were often used for this purpose.

I desperately hope that you intend to separate our ‘research process’ from our actual research data.

My data in Gramps is intended for sharing. My communication with other researchers is often sensitive with the implicit understanding that parts will remain private. And the tasks can span different Trees.

I have test Trees with pseudo data; collaboration Trees with verified research; and my (currently mangled into uselessness) primary tree for collating theoretical relationships and questionable sources.

As Nick pointed out, some of what is intended exists in other tools. Perhaps working on the API with an eye toward better hooks for integration would have greater return than building a new structure inside the Tree data model?

We already have tools for managed threaded communication that includes scheduling, contact manager with Address books (in a format Gramps can already import/export) and reliable archiving functionalities. Many eMail packages cover those bases.

But what they DON’T have is a way to crosslink data in various Gramps Trees and get a report that helps plan/track the research.

I misspelled that, GenTech Administrative submodel. It is a data model, not a project.

1 Like

I will check this over.

In my mind this is intended for tracking research tasks, not the actual data produced from those tasks.

I saw it sitting in the same database but that is a good point that tasks and work could span databases depending on how people use them. So it could be a separate self contained database perhaps.

Think of it like a side car. Gramps core is Gramps core and it provides a lineage linked model. This adds functionality on the side parallel to that. This might reference objects in that model, but that model would not really know about this one.

As I see it we want to provide all the tools to assist in genealogical research as we can, but as much as we can we also need to make those things optional and not force them on users.

2 Likes

The article Research Reports for Research Success by Elizabeth Shown Mills is worth reading.

Yes that is useful and gives me more to think about. Thank you.

@StoldHD thank you so much for the long and informative reply.

The half I am missing here and that you refer to would be the evidence based submodel.

I do have a rough model sketched out for that too and trying to implement something like that is my long term goal. This research journal stuff would tie into that but not in the initial iteration.

What I would like to see for Gramps someday is a framework where a couple different evidence based methadologies could be supported so you could choose to possibly use a Evidentia or Centurial style workflow or the Linkage Bundle/Dossier workflow Anderson presents in his Elements of Genealogical Analysis if you wanted to. I will probably try to implement the later before the former actually.

I am not a developer of Gramps, so I can’t help you with any “sub-models”…

I tried to give some ideas, but the same person started again, so…

@cdhorn When I look at this, and think about Clooz, Evidentia, and Centurial, I think that the evidence based submodel is more important than anything else, and should be no more complicated than Tom Wetmore’s DeadEnds model as is referenced here (note that the 2nd link doesn’t work)

https://archive.fhiso.org/BetterGEDCOM/DeadEnds+Model

and that we must re-use existing Gramps objects as much as possible. We may need to provide and API for this, like @emyoulation wrote, and the whole concept should be no more complex than the use of index cards, as referenced by @Nick-Hall

Anything more complex than that is likely to fail, because most users don’t want to enter data in two places like they have to do in Clooz or Evidentia, which don’t integrate well with other software because they still rely on GEDCOM, and they are closed source. I tried both, although I didn’t try Clooz 4 referenced by @StoltHD and they’re far more complex than I like.

My experience with Centurial was so bad, that I gave up on that even faster than on Clooz and Evidentia, partly because the program had the slowest GEDCOM import that I have ever seen in the world, and also because after waiting for more than an hour for it to get the import done, I noticed that the program didn’t even import my citations, so I deemed it unfit for the job. I also just read that the project is currently on hold due to personal circumstances, and the refactoring mentioned on the roadmap is definitely needed to get things right.

As far as I’m concerned some of the author’s ideas looked better than the ones I found behind Clooz and Evidentia, but it is closed source, so I would never adopt it anyway.

And even though I like to keep things as simple as DeadEnds, I really like these ideas:

And I’ll do almost anything to prevent myself from becoming a bureaucrat.

Yes, I am overthinking and what I had in mind was overcomplicating things.

Again, it was a bad idea to try to help out with some ideas

Please don’t think that.

There are always many ways to do things, all with pros and cons and compromises, and it is a fair question to ask can things be done within certain parameters.

Once you have an entrenched data model in a design and accumulated technical debt and concerns with backwards compatibility it tends to place bounds on things.

3 Likes

@cdhorn In an earlier message, you mentioned an evidence based model that you were thinking of yourself. Can you elaborate on that a bit?

When I look at this, DeadEnds would be an obvious candidate, because it has everything that I need, and nothing more than that. I also gave Clooz and Evidentia another try today, but they have no added value for me, because they require way too much useless input, and they’re closed source, and they don’t speak my language. And the only program that does speak that is also closed source, and besides, I don’t give much about the so-called Genealogy Proof Standard either. And in my (local) community, I don’t know anyone that really cares about that, or about Clooz or Evidentia, nor about Centurial. People use spreadsheets instead, just like my late dad, and they use those, because they’re way easier than any piece of dedicated software on the market.

And it seems to me that spreadsheet like research data also appeared in the examples shown here earlier. Is that right?

P.S. For me, the research journal and logs can reside in the same database as the tree and the raw data.

If you read through the DeadEnds thread there is a comment from mstransky on 2010-12-02T06:28:17-08:00 that kind of hits the nail on the head. And I think I’ve mentioned it here before as well, that I think the actual research data should be kept separate of the conclusions altogether.

So my thinking is you would store both structured and unstructured content outside the conclusional lineage linked model.

A tool like the Forms gramplet wouldn’t update objects in the existing model, it would store the structured data here. This is very much the Clooz approach. Being structured data, the claims are pretty much known because they are inherent in the structure of the document. You know the person name in a census record is the subject, so you know the birth year is a claim about that subject.

You also need to account for unstructured data of course that ideally you would have transcribed from a source. There then needs to be a way to identify the claims in the data or the source, or enter them directly, and the subjects they apply to. If I recall this is discussed a little in the GenTech model somewhere but it may have been elsewhere. Maybe you can highlight the claim in the transcription to identify it for example, stuff like that.

Once you know the claims, and the subjects they apply to, you can do a number of things with them and the data. The subject names are just names, but they basically represent the ‘personas’ without having their own dedicated objects. You might use an object representation for working with the data within Gramps but ideally it would be stored in normal tables in the database so external tools could potentially be used with it and not as pickled or json objects.

One approach for working with the data might be an Evidentia style interface that walks you through extracting the assertions and documenting the analysis and the resulting conclusion and all.

Another approach would be the ability to assemble the claims into linkage bundles, and those into dossiers. This is the methadology documented in Elements of Genealogical Analysis.

Whatever tools are devised to help work with the data, you would be able to correlate or associate any extracted subjects in this model with a conclusional subject in the lineage linked model. If you later find that a claim did not apply to that conclusional person you just uncorrelate the data and optionally document why. You never delete the data, it is a repository that grows over time and contains everything that supports what you enter in the conclusional model and the reasoning behind it.

As mentioned, I picture this sitting in parallel to the existing lineage linked model. Users who do not want to use any of this would be free to do things as they always have, or they could mix and match. It would be just another set of tools in the toolbox.

The research journal idea would tie into this as the search process could include support for extracting and recording the data and not just keeping some high level notes about the search and what was found and what was not.

In the end you would have three submodels in Gramps so to speak. The administrative research submodel, the evidence submodel, and the current conclusional lineage linked submodel but hopefully with a few enhancements like groups and heirarchies for events, groups and sources.

These are my high level thoughts around things and how they might be someday. How well would it all work? Is it really practical? I honestly don’t know.

3 Likes

Keeping the evidence model separate from the conclusions seems like the way to go.

I always envisaged that the Forms would eventually store their data somewhere else. Another method of data entry might be to apply markup to a transcription.

Perhaps we could share the source structure and just create a couple of new tables for evidence people and events? Relationships could be just direct links between people.

3 Likes

Maybe you could use PDF fillable forms and save the data as JSON/XML blocks associated to a particular form Media Object. And have a crosswalk to a Gramps XML structure for each form. That would work towards a import/merge and export/reports for reading in & push out the data.

An added benefit would be the ability to spin out a partially-filled Form that can be emailed to another person (or yourself) and filled out digitally by people who don’t have Gramps (or are on the road with a mobile device that doesn’t work with Gramps).

I already have a small collection of publicly shared fillable PDF forms for genealogy.