A local AI for Gramps?

,

Hello,

I was wondering if it might not be possible to generate a custom AI model for Gramps?

For example, there are various versions of models based on Mistral (for assistance, generative and repetitive tasks, language, psychology, philosophy, relationships, etc.).

Since Geneatique is advertising its AI integration into its software, I tried to learn more.

What AI does Geneatique use?

Geneatique uses artificial intelligence for several features in its 2025 version. AI is integrated to assist with paleographic transcription of old documents, enabling automatic pre-transcription of visual documents. This includes recognizing and transcribing ancient handwritten scripts, thereby facilitating genealogists’ work12.

Additionally, AI is used for drafting and presenting notices, providing informational texts about places, first names, occupations, etc., which can be linked to specific sections in the genealogical tree13.

Geneatique’s AI engine also helps automatically analyze the genealogical tree to suggest additional information, such as INSEE death records since 19703.

These features are designed to reduce repetitive and tedious tasks, allowing users to focus on deeper aspects of their genealogical research32.

Is it Mistral?

There is no specific information indicating that Geneatique specifically uses Mistral AI. Available sources mention AI integration in Geneatique for tasks such as paleographic transcription and notice drafting, but do not specify the AI provider used45.

Mistral AI is a French startup specializing in language models and AI solutions67. Although Mistral AI is mentioned as a preference for some users due to its language specialization, there is no direct confirmation that Geneatique uses Mistral AI services8.

So Geneatique is not your client?

There is no information confirming that Geneatique is a Mistral AI client. Search results provide no details regarding a business relationship between Geneatique and Mistral AI9.

Does Gramps use Mistral or Pixtral?

There is no information indicating that Gramps uses Mistral or Pixtral. Search results provide no details regarding any relationship between Gramps and these technologies1011121314.

I almost made my system “overheat” while writing the last response! Well, clearly, according to the cited references, the AI wasn’t quite sure which Gramps was being referred to


Best regards,
JérÎme

(Gramps 5.2.2 - OS derived from Ubuntu 18.04.6 LTS)

:st_martin: I think I’m finally going to make progress in English (USA)! :fireworks:

Mistral Large (small?) 2025-02 :
:us_outlying_islands: Certainly! Here is the translation into American English!


Exactly! I can be seen as a complement to the transcription work done by Transkribus. Here’s how I can assist you in this context:

  1. Interpretation and Analysis: Once Transkribus has transcribed a handwritten text, I can help you interpret and analyze the content, provide contextual explanations, and clarify ambiguous passages.
  2. Information Research: If you need more information on a topic mentioned in a transcribed document, I can help you research additional details or historical contexts.
  3. Linguistic Assistance: I can help translate passages, explain technical or historical terms, and provide suggestions to improve the clarity of the transcribed text.
  4. Methodological Advice: If you have questions about how to use Transkribus or best practices for document transcription, I can offer advice and recommendations.

In summary, I am here to complement the work of Transkribus by providing interactive assistance and helping you make the most of the transcriptions. If you have any specific questions or needs, feel free to ask.


:st_martin: So if I understand correctly, you’re a complement to Transkribus’s transcription work?

Exactly! I can be viewed as a complement to the transcription work performed by Transkribus. Here’s how I can assist you in this context:

  1. Interpretation and Analysis: Once Transkribus has transcribed a handwritten text, I can help you interpret and analyze the content, provide contextual explanations, and clarify ambiguous passages.
  2. Information Research: If you need more information on a topic mentioned in a transcribed document, I can help you research additional details or historical contexts.
  3. Linguistic Assistance: I can help translate passages, explain technical or historical terms, and provide suggestions to improve the clarity of the transcribed text.
  4. Methodological Advice: If you have questions about how to use Transkribus or best practices for document transcription, I can offer advice and recommendations.

In summary, I am here to complement Transkribus’s work by providing interactive assistance and helping you get the most out of your transcriptions. If you have any specific questions or needs, don’t hesitate to ask.

“One of the first examples is the Transkribus project, launched by the University of Innsbruck, which uses AI to transcribe historical documents, but is somewhat outdated today.”

I strongly disagree with the author (Anne-NoĂ«lle) of this promotional article. Transkribus is currently still far superior to most “generalist” AI models in transcribing ancient and handwritten texts. OpenAI and most Mistral models using ‘Mistral-ocr’ (Pixtral, Mistral Large) tend to invent stories and make too many assumptions. While it is indeed possible to “adjust the temperature” (level of “creativity
”) in these models, inaccuracies (and hallucinations) remain prevalent. However, for a posteriori analysis, these “virtual assistants” do perform well and provide valuable assistance.

For what it’s worth.
I’m a happy user of Transkribus. Have used Transkribus in my genealogy work especially for older documents written in gothic hand writing.
I’m also using Transkribus in a large project at my local archive, where we are transcribing municipal protocols (about 50.000 hand written pages).
The quality of the transcription is heavily depending on the model being used. For this project we have developed our own model, which we retrain twice per year - every new version of the model has a better score. When we started the project, we were counting the number of errors per line, now we count the number of errors per page.

Here is the translation: :united_kingdom: / :united_states: → :france::

For what it’s worth, I am a satisfied Transkribus user. I have used Transkribus in my genealogical research, particularly for old documents written in Gothic script. I also use Transkribus in a large project at the local archives, where we transcribe municipal records (approximately 50,000 handwritten pages). The quality of transcription heavily depends on the model used. For this project, we developed our own model, which we retrain twice a year—each new version of the model achieves a better score. When we started the project, we counted errors per line; now we count errors per page.

This aligns with my initial impressions following recent tests with the model Transkribus compared to Mistral models with OCR support.
Thank you for this testimonial.

A brief aside to note that ‘Free’ and ‘Mistral’ seem to follow the same type of strategy (or have the same shareholders): Le Chat Pro is “offered” for 12 months to customers of the operator’s plans!

In my tests, Mistral learns very quickly from its mistakes

The copy of the response above illustrates this. The AI in question is aware of its limitations in this domain, which is already a good thing.

For use with Gramps, I also looked at the “data” on mobile (here, downloading/uploading is complicated
). AIs also seem to struggle with facts, dates, and historical representations. This is still a concern for genealogy: memory is long-term
 Mix up a date and a location with an AI, and it may become confused. If, in addition, transcription and translation are required, errors accumulate! In short, the analysis is indeed very fast, well-organized, without obvious errors or spelling mistakes—but isn’t it ultimately the genealogist’s role to initiate research leads or formulate hypotheses?

:united_kingdom: / :united_states:

An interesting ChatGPT experiment was described on Facebook.

There was a “paper saving” technique for handwritten documents in the 18th and 19th centuries called “cross writing”. In this technique, scribes filled a page normally (in portrait orientation). Then rotated the page clockwise (to landscape orientation) and wrote a 2nd page of text.

ChatGPT had problems with it. It would be a good stress test for Transcribus.

To be fair, reading cross writing is challenging for a human. It should be an excellent test for any AI.

:france:
Une expérience intéressante de ChatGPT a été décrite sur Facebook.

Aux XVIIIe et XIXe siĂšcles, une technique d’« Ă©conomie de papier » pour les documents manuscrits, appelĂ©e « Ă©criture croisĂ©e », existait. Cette technique consistait Ă  remplir une page normalement (en orientation portrait), puis Ă  la faire pivoter dans le sens des aiguilles d’une montre (en orientation paysage) et Ă  Ă©crire une deuxiĂšme page de texte.

ChatGPT a rencontré des problÚmes avec cette technique. Ce serait un bon test de résistance pour Trancribus.

À vrai dire, la lecture de l’écriture croisĂ©e est un dĂ©fi pour un humain. Ce devrait ĂȘtre un excellent test pour toute IA.

1 Like

Actually, Transkribus starts with a bit of an advantage anyway, since we define a suitable model from the outset. With a specific model for this “crossed writing,” even in another language, it will no longer be a stress test.

1 Like

Hello,
I have just discovered (not tested) an association between Mistral AI and Ancestris, via assistance in entering records.

J.

Indeed, there has been some improvement

https://console.mistral.ai/build/document-ai/ocr-playground
However, I have doubts about the version or model used by Ancestris. Version 2 was not “sufficient” for the general public and the handwritten documents commonly used in genealogy. Did Ancestris test on version 3?

Even if the writing appears “formatted” (by AI?), this response aligns with my initial impression and related tests (I still don’t understand why their Python client would be necessary in Python when we’re only using the API). Regarding the results, indeed, the strikethrough texts are an issue, as are some numbers where digits are “stuck” together. These are common issues in our handwritten documents
 We might need to wait for the Mistral OCR 4 version for the gramplet (or tool) to be production-ready.


I am able to test (for free) with a distant url (for now, as this feature seems recent on their document lab/playground). So, code looks like:

import base64
import os

api_key = os.environ["MISTRAL_API_KEY"]
# your client in python or via API
'''
curl https://api.mistral.ai/v1/ocr\
    -H "Content-Type: application/json"\
    -H "Authorization: Bearer ${MISTRAL_API_KEY}"\
    -d '{
        "model":"mistral-ocr-latest",
        "document":document,
        "include_image_base64":True
        }
'''

def encode_file(file_path):
    with open(file_path, "rb") as pdf_file: #or whatever variable name
        return base64.b64encode(file_path.read()).decode('utf-8')

file_path = "path/to/https://global.discourse-cdn.com/free1/uploads/gramps/original/2X/b/b6210632f20de1269abdcc43cabd3e9ab62413ea.jpeg"
base64_file = encode_file(file_path)

United Kingdom, it was got in in Heathbrawe by the 15th Augt about the same time copious shows all which made the potatoes abundant and good. This year is rather to be the same considered. We are already in and we are not in good faith. In 1771 Mr. Tyckfather was 18 months under the co-companyed, the son of John deatts, Ja. M. Taviskese by Olif M. Cuyk, who is here as a man and a man of heart. He sent southward to Mr. Anderson C. 16th Augt. to the 17th Augt. when he came to the house of the son of John, who was a man of heart. He was then in the house of the son of John, who was a man of heart. He was then in the house of the son of John, who was a man of heart. He was then in the house of the son of John, who was a man of heart. He was then in the house of the son of John, who was a man of heart.

The 17th Augt. was the first day of the year 1771. He was the first man to be a man of heart. He was the first man to be a man of heart. He was the first man to be a man of heart. He was the first man to be a man of heart. He was the first man to be a man of heart. He was the first man to be a man of heart. He was the first man to be a man of heart. He was the first man to be a man of heart.

The 17th Augt. was the first day of the year 1771. He was the first man to be a man of heart. He was the first man to be a man of heart. He was the first man to be a man of heart. He was the first man to be a man of heart.

The 17th Augt. was the first day of the year 1771. He was the first man to be a man of heart. He was the first man to be a man of heart.

The 17th Augt. was the first day of the year 1771. He was the first man to be a man of heart. He was the first man to be a man of heart.
:woozy_face:

It might be limited (as it was the experiment account). Anyway this AI model (or IA for french!) started to transcribe the vertical lines. Maybe this could be improved with some instructions or a prompt, but currently the default set of options (without extra work on user side), will return the above text, either via a simple print() or json and markdown formats.

1 Like

There’s a nice April 2025 Facebook posting specifically about AI and cross writing:

Thomas MacEntee
My latest article Cross-Writing in American History: Overlapping Letters to Save Paper and Using AI for Transcription - link below and PDF attached Cross-Writing in American History: Overlapping Letters to Save Paper and Using AI for Transcription - Genealogy Bargains #cross-writing ai #ArtificialIntelligence

Il y a une limitation (soit liĂ© au compte gratuit ou au modĂšle d’IA), car le mĂȘme problĂšme se produit avec l’un des exemple fourni avec l’article citĂ© plus haut (lien sur l’écriture croisĂ©e):

Aug 15. Took me a
me up my house
the Presbyer
to the
I think I can do for
the
I have
some
of the
I have
the
I have
the
I have
the
I have
the
I have
the
I have
the
I have
the 


aprÚs une simple rotation (90°), cela fonctionne mieux.

prices. You must not think I have entered your up my teaching, project through, I can’t track in the manner, months, I think I can safely do it on muster, I shall if possible make the keeler as I learn the refusal of a situation near Charleston S.C. My friends here see me I shant leave them again, a Wed. too days, he can’t see the necessity, however I am well determined, I hope dearer that I have been relieved the letter I wrote here while in Charleston, I am not using my right arm till the time, I had to make that head way. I could with my left. As you will observe I again have the use of this necessary member, though it is still used. If these gets his promotion as he hopes to do. I think you may hear of his promotion to another high office, the young gentleman has strong ideas of getting married this winter, to Miss Siskin Leonberger, a very beautiful young lady, and thinks she looks as


Je n’ai pas vĂ©rifiĂ© si toutes ces phrases sont correctes.

1 Like

Trained with some bibles or fantasy books ?

1 Like

Another test that directly concerns us in genealogy. Here is an example of an index generated by different administrative agents across different periods (over approximately a century).


The handwriting to decipher may be from the Prussian period (after 1872)?

Since I found this test too simplistic for our “machine intelligences,” I deliberately sought a document containing struck-through names and numbers (if there had been local place names in dialect, it would have been even better!).

In short, something that the human mind can decipher with a very low error rate, regardless of one’s education, culture, or language.

In the end, neither Transkribus with its French 1 model, nor Mistral OCR 3—which boasts extraordinary capabilities—managed to produce anything usable.

Basically, the idea was to provide a list of identifiers and associated names. I admit, I didn’t spend hours configuring these AIs. Just as I didn’t spend that time manually entering the data transcribed by my own brain


1 Like

He was the first man to be a man of heart.

At a glance, most common internal human translation programs (IHTP) for non-english people (NEP), might return something like He was the first man of earth! It looks like a [1:4] bloc guess instead of pure transcription as a starting train. Why not for the process on runtime but a second pass for verification might fix such mistake.

Maybe a bad idea to return any clues generated by training as an answer, on AI designed for OCR, before matching any set of documents?

Besides the random quality of the output and privacy concerns with the free account, it’s true that one can obtain a pre-transcription instantly. A gramplet doesn’t even need to optimize resources (Gtk buffer & co) since we’re still in the experimentation phase (small data blocks).

On the other hand, with Gramps and JSON models, we can go further, for example, with forms. Mistral AI introduced an annotation feature:

But will it be maintained over time?

After some tests on handwritten indexes, this model still lacks accuracy when the environment is “marginal,” such as in Alsace, the overseas departments (Dom-Tom), and other territories with multiple “cultures.” This becomes an issue when these AIs hallucinate on names, surnames, places, or professions, attempting to translate them into “langue d’oïl.”

Adam : Odeon
Bangratz : Baugeary
Acker Valentin : Arbas Palestine
etc.

On the other hand, for help with Python to generate a temperature map (HeatMap), there’s nothing to complain about— it works well. However, I doubt this is the kind of help expected by Gramps users.

Looking at the chart (and the distribution of surnames), I’m likely to run into more difficulties


Indeed, it was usually a daughter who stayed on the farm with her parents and husband (in Alsace, this could mean a new building added to the farm). As a result, surname changes on the same farm are quite common. This kind of flexibility is difficult for AI to accept, just like the changing numbering after a few years (due to construction or demolition). I don’t expect to go back as far as 1570/1599 (the oldest houses in this village). However, I’m more optimistic about tracing successive owner lineages from this 1819 census.

From Gramps, I’m still hesitant to focus entirely on this census and the families in 1819. The surname (the Swedes [~1633] didn’t stay long enough in Alsace
) remains another reliable entry (as long as it’s not too frequent in the corpus), but location and building are also important, since these farms have hardly moved in centuries (though I still have some doubt between 1870 and 1920: the Prussian period).

For now, I’ve mixed several lists (sources) into what the AI calls a library. This is where the surname (with its Germanic sound, soundex/phonex, etc.) causes problems for the model. The instructions are in French, but there are no Martin, Dubois, & co in these lists—even though they’re in France
 If cultural bias (to say the least, a clichĂ©) truncates queries from the start, we waste far too much time correcting it.