Gramps Assistant

This might not be of interest to everyone, but it might be to some. Let me know below.

First, I made a new type of plugin that is a persistent column on the far right. Then I made such a plugin called “Gramps Assistant”. This is a chatbot similar to what you may have seen before, but this is different.

This is what it looks like:

After selecting what model you want to use (including those that you can run locally on your own computer) then you can ask it questions about how to do something in Gramps, or to interact with your data unlike before.

The first thing I tried was just asking a question posed by @RogerDodger8989:

And Gramps Assistant responded with a very thorough answer (I won’t post it here, but goes into some detail):

It also did something interesting: it changed views to the Places category.

Now, let’s try another question:

How many Garners were born before 1900?

Instead of using some other chatbot or MCP function to answer the question, it switched views, filled in the data in the Person sidebar filter, and ran the query (effectively clicking on “Find” for you).

You can also enter this:

Edit Elizabeth Garner

And it does a few things, ending with:

It does not actually change any data, but helps you do it.

In summary, it knows the Gramps application, and can do actions in the GUI, and can answer questions about Gramps.

FAQ:

  1. Can you use local models? Yes, via (LLM servers like ollama or LM Studio).
  2. Does it cost money? No, not if you run it locally.
  3. Does it use Python packages that need to be installed? No, it is written from scratch in Python, no additional requirements

Would you be interested in trying this?

  • Yes, for informational help, or for help on using the Gramps app
  • No, not for me
0 voters

Been thinking about something like this for a while, very interesting implementation!

I’m curious but having searched for ‘ollama’ I’m not sure I’d ever get it working on Windows without idiot proof step by step instructions.

The steps, regardless of OS, are in the abstract:

  1. Install the ollama server.
  2. “Pull” the model that you are interested in (like “lama3.1”): ollama pull lama3.1

That’s it! When you select an ollama-based model, it just “works”.

Caveats: The smaller the model (in terms of parameter count), the less likely it is to understand how to use the tools. On the other hand, the larger the model, the more powerful a computer you will need (to run, and to get timely results).

I’ve been using Anthropic’s claude sonnet 4.5 and it stuningly good, at a reasonable price. Although not “locally run” on your computer.

Forgive my ignorance but despite using Windows since v3 (and prior to that knowing of Windows 1 & 2) I’ve never had need to explore the depths of the command line. In that context, to install Ollama do you mean type ollama pull lama3,1 into a command prompt?

If so, would the command prompt need admin rights?

Also, I’m not curious enough to spend money on a LLM so what is likely to be the most powerful free option that will run on my local PC, which is a decent machine (Core Ultra 7 CPU, 32GB RAM, RTX 5070 12GB VRAM).

On Windows it can be easier to run LM Studio, even though it may show a small overlay.

There are just a few BIOS settings you should be aware of — on both Windows and Linux — if you’re using GPUs with more than 4 GB of VRAM.

Boot into your BIOS and make sure the following settings are configured:

CSM (Compatibility Support Module) → Disabled (Legacy Boot → Disabled).
Reason: Modern GPUs require full UEFI initialization. CSM can prevent the GPU from being properly detected.

Above 4G Decoding → Enabled.
Reason: Allows the system to address PCIe devices with large memory regions (GPUs with more than 4 GB VRAM). Required for stable operation.

Optional (but recommended if supported):
Resizable BAR → Enabled.
Reason: Allows the CPU to access the full VRAM range directly, improving performance for LLM workloads.

Some extra info:

Older motherboards (especially pre‑2016) may not support Above 4G Decoding, Resizable BAR, or full UEFI GPU initialization. In those cases, users may experience black screen on boot, GPU not being detected, CUDA/OpenCL errors, or LM Studio not finding the GPU. Checking BIOS capabilities is important.

VRAM requirements for local models:
A simple rule of thumb is that:

  • 3B models need around 4–6 GB VRAM
  • 7B models need around 8–12 GB
  • 13B models need around 16–20 GB
  • 30B models need around 24–48 GB
  • and 70B models need around 48–96 GB.

Anything larger than a 7B model will typically require at least 16 GB VRAM if you want full GPU offload.

And one more thing that is important to understand when running local models:

Quantization matters a lot.
A Q3 model uses noticeably less VRAM and RAM than a Q4 model, and the difference becomes even bigger when you compare Q2, Q3, Q4, Q5, etc.
Lower quantization means lower resource usage, but also slightly lower model quality.
Higher quantization means better quality, but more VRAM and RAM required.
So choosing the right quantization level for your hardware is important.

Context size also has a big impact on resource usage.
A model running with a 4k or 8k context window uses far less memory than the same model running with 16k or 32k.
If you don’t need long‑context reasoning, keeping the context window smaller will save a lot of VRAM and system RAM.

CPU fallback:
If the GPU cannot be used, LM Studio (and other local LLM tools) will fall back to CPU, but performance will be significantly slower. CPU‑only inference is usually so slow that the experience becomes impractical, so GPU offloading is strongly recommended.

Running a local LLM is actually quite easy once you keep these simple rules of thumb in mind. And the best part is that it’s completely free — without the artificial restrictions, rate limits, or usage rules that online AI services impose.


PS: Be aware that for Ollama, LM Studio, and all other local LLM servers, these BIOS hardware settings apply if you want to fully utilize your GPU and VRAM. In addition, you must configure VRAM/GPU offloading per model in both Ollama and LM Studio. I had to choose LM Studio because Ollama still does not support my AMD card, so I cannot describe the exact steps in Ollama. In LM Studio, however, it is simply a slider you adjust if you want to run it as a headless server (not required), so you don’t need to keep the application window open while using the model.

Also remember to shut down the application or service after use, so you free up both VRAM and system RAM.

Thanks for the additional details!

I don’t think it matters for Gramps Assistant on any OS what LLM server you use (LM Studio, Ollama, etc.) as long as the model respond son the given URL in the correct format.

No, it doesn’t matter for the “client”. These things only affect how easy it is for the LLM software itself to run and manage models. In LM Studio you simply get a list of models you can download, almost like installing plugins. The actual use of the LLM is the same regardless of which tool you use.

With this spec you should be able to run any 7B model in Q4 without any problem, and you can probably run a 13B model at Q3 as well. The 12GB VRAM on the RTX 5070 gives you enough room for that, as long as you pick a reasonable quantization.

Also spend a moment setting the context window properly, because it has a big impact on memory usage. Try 8k first (8192) — that’s a good balance between performance and capability.

If you want to try 16k, the correct value is 16384.

A 16k context will use more VRAM and RAM than 8k, so start low and increase only if you actually need the longer context.

Many models and LLM tools now come with a default context window of 32k or even higher.
This can dramatically increase memory usage, so it’s important to check and adjust this setting — especially if the model suddenly becomes extremely slow or starts using far more VRAM and RAM than expected.

Absolutely! Is this already available?

It could be… it is for master, and needs to be available for Gramps 6.1, with the ASSISTPANEL PR merged. But I can walk you through manually getting it setup when I have a free few minutes.

Why didn’t I study computer science…?

Hi, we’re just regular Windows users. This is way too complicated for me. I just watch from the sidelines now… :woozy_face:

Woody, it’s really not as difficult as it may seem when reading about it.

Using something like LM Studio is honestly not much more complicated than installing Gramps and a few of the Gramplets you probably already use.

In the simplest form, there are only two things you need to do:

  1. Download and install LM Studio like any other Windows program.
  2. Pick a model you want to use from the built‑in list.

After that, there are basically just two settings you need to adjust for that model:

– “Offload to GPU” (this enables GPU offloading) – The context cache (the number you set for how much text the model can keep in memory)

That’s it.

As for the BIOS settings I mentioned earlier: those are general best‑practice settings for any system with a GPU that has more than 4 GB of VRAM. They’re not specific to LLMs, and you only need to look at them once. If you’re on a laptop or a machine with shared memory only, you can usually ignore them.


I am sure that when this and other LLM/VLM Gramplets eventually appear in the official Gramplet list, they will also include clear instructions on how to use them. By that time there will probably be more users who can contribute installation notes, recommended models, and general tips as well.

The models themselves are still being upgraded and retrained very quickly, so what is considered “best” today may be outdated in six months.


It might even happen that someone trains a model specifically for genealogy and the Gramps advanced data schema. If that happens, such a model could potentially be used “embedded” with Python, so that it can be downloaded and installed together with the Gramplet as a package. That would make everything even easier for end‑users.

I’ve got some model trains in my garage :locomotive: My sons built the layout many years ago. It still works too. Pity it won’t help here though :grin:

yeh, that’s the wrong type of trains… I had a few of those when I was young too, but they didn’t help me much when I got into computers and software — not even when I moved on to networks.
One would think that something running on rails should work on other types of networks as well…

Thanks for your reassuring words. I’ll give it a try when it’s released. I even managed to get Gramps Web running on my Raspberry Pi (I was 10 years older afterwards).:joy:

Does it make sense to have some standard guidelines for the Gramps assistant?

Maybe one for the Gramps coding guidelines for Agents and another for Genealogy assistance?

Preferably with a functionality to pre-process/cache in order to reduce the recurring token cost.

If a model is local, or run with something like transformers natively in Python, there is no “cost” involved — other than your own computer resources.

And when you run things locally, it becomes much easier to use a model that is specialized for the task. For example, you can run a Norallm model if you work with Norwegian or Scandinavian sources and ancestors, a Mistral variant for more general European material, or even one of the new Swiss models.

All open‑source and with no strings attached — quite the opposite of some of the American online options.


But yeh, when these Gramplets start getting published, there really should be proper documentation — not only on how to use them, but also on the cost side, and how to avoid paying for online services when you don’t need to.

Maybe an alternative would be to include some information about good small local models that can run on relatively “normal” computer hardware.
Not everyone has the resources to run the larger models, of course, but the smaller ones are improving rapidly now — and many of them might work surprisingly well for genealogy‑related tasks.

That would be great. I like the idea and the innovative concept. As soon as a short installation document will be available, I will try it. To have some guidelines and risk analysis for AI usage would be helpful.

I don’t understand what you mean. There is no cost for a local LLM.

I recently learned about (but have not yet tried) this “Genealogical Research Assistant”: Open-Genealogy/skills/gra/research-assistant-v8.5-full.md at main · DigitalArchivst/Open-Genealogy · GitHub