AI API for Supertool

Urchello · February 18, 2025, 8:54am

@kku, I have an idea that could significantly expand Supertool’s capabilities. Gramps does not have built-in artificial intelligence, likely due to privacy concerns. However, Supertool is designed mainly for developers, and anyone can use AI to meet their needs without limitations – solving any tasks they want.

Users can select data from the database, generate AI queries, receive recommendations, and even use the results to update database records. In my opinion, this is a very interesting and promising feature. And I believe Supertool is the perfect place for its implementation.

kku · February 18, 2025, 10:22am

I don’t know much about AI. Maybe you can write a prototype.

dsblank · February 18, 2025, 2:25pm

I think a regular Gramps addon could work. I do this for my day job. Let me see if I can put something together.

FYI, @DavidMStraub also has a prototype AI chatbot in gramps-web. But it was designed specifically to allow answering questions about your family tree. Which an addon could do too.

Urchello · February 18, 2025, 2:28pm

I already have a working simplest prototype. But I want prepare more useful script to demonstrate how we can use it

import requests

OPENAI_API_KEY = "sk-..."

url = "https://api.openai.com/v1/chat/completions"
headers = {
    "Authorization": f"Bearer {OPENAI_API_KEY}",
    "Content-Type": "application/json"
}
data = {
    "model": "gpt-4o-mini",
    "messages": [
        {"role": "system", "content": "Return only 'true' or 'false'."},
        {"role": "user", "content": "Does this description contain a page number at the end? Example: 'Document Page 12'"}
    ],
    "temperature": 0,
    "max_tokens": 5
}

response = requests.post(url, json=data, headers=headers)

try:
    print("API Response:", response.json()["choices"][0]["message"]["content"].strip())
except Exception as e:
    print("Error:", e)

dsblank · February 18, 2025, 2:34pm

Yes, that is a start I’m imaging a real addon that allows you to even use your own model without costing you anything.

DavidMStraub · February 18, 2025, 2:46pm

Yep, see gramps-web-api/gramps_webapi/api/llm/__init__.py at master · gramps-project/gramps-web-api · GitHub

and please also have a look at this discussion:

github.com/gramps-project/gramps-web-api

Teaching the AI assistant to call tools

offen 08:33PM - 09 Dec 24 UTC

DavidMStraub

(NB: this is a feature request, but also the start of a discussion - I think we …need some good ideas first.) Currently, the AI assistant is not very smart as it can only retrieve individual Gramps objects and doesn't know anything about relationships, so you can't even ask it for your grandfather. To solve that, we need to teach it how to call tools/functions. In approaching that, there are several questions to answer: - which functions should it call? - how (if at all) can we make the tool calling not just work in OpenAI models, but also in open source models for people running chat locally? One challenge I see is that the number of possible functions is quite large: - retrieve a person by some filter - retrieve an event by some filter - find people with a certain relationship - ... Although I haven't tried it myself yet, common lore is that for an LLM to identify the right function to call only works well if the number of functions is small, probably below 10. What I find quite promising is leveraging query languages like [GQL](https://github.com/DavidMStraub/gramps-ql) or @dsblank's [Object QL](https://github.com/dsblank/object-ql), where I suspect the latter is a better choice. What could be done is the following: 1. Create a large number of possible queries that are considered useful for the assistant 2. Describe what the query does 3. Use an LLM to generate questions based on the description of what the query does 4. Use an embedding model to compute vector embeddings for each of the questions and store them with the query Now, with these embeddings at hand, when the assistant gets a question, it could 1. Calculate the embedding for the question with the same embedding model used for the query language questions 2. Use vector similarity to identify the 5 most likely queries 3. Feed these 5 queries as function calls to the LLM and let it decide which function to use 4. Execute the query recommended by the LLM and feed the results back to the LLM 5. Generate the answer Funnily enough, this would even be less resource intensive than the retrieval-based answers, since it only needs a vector index of queries that can be computed in advance once and for all. I don't think I'll have time to work on this myself in the next 2 months or so, but if anyone experiments with this or has other ideas, please share here! 🤖

Urchello · February 18, 2025, 5:08pm

Well, I have obtained positive results demonstrating the possibility of using artificial intelligence in conjunction with SuperTool. I pass the father’s first name and the middle names of his children. The AI’s task is to verify whether the children’s patronymics are correct. If they are not, it suggests specific corrections.

Solving this problem using traditional conditional logic is quite difficult because Ukrainian names have many forms, endings, and exceptions to the rules. This is exactly the kind of task that AI is suited for.

In the terminal, I demonstrated how it works for a single person. I have not tested the script on the entire database. This is a demonstration script that requires improvements and additional validations.

Supertool script:

[Gramps SuperTool script file]
version=1

[title]
Check Patronymic Validity

[description]
This script checks whether the patronymic of children matches their father's name.

[category]
People

[initial_statements]
import json
import sys
sys.path.append("/home/my/Documents")
from openai_client import OpenAIConfig, OpenAIClient

# OpenAI API Configuration with optional training data
config = OpenAIConfig(
    api_key="ADD KEY HERE",
    training_examples=[
        {
            "role": "user",
            "content": json.dumps({
                "father": {"gramps_id": "I7204", "name": "Порфирій Савович"},
                "children": [
                    {"gramps_id": "I0861", "name": "Антон Порфирович"},  # ✅ Correct
                    {"gramps_id": "I1118", "name": "Макар Іванович"},  # ❌ Incorrect
                    {"gramps_id": "I2345", "name": "Марія Порфиевна"},  # ❌ Misspelled
                    {"gramps_id": "I9876", "name": "Оксана"}  # ❌ Missing patronymic
                ]
            }, ensure_ascii=False)
        },
        {
            "role": "assistant",
            "content": json.dumps({
                "I0861": {"status": "correct"},
                "I1118": {
                    "status": "incorrect",
                    "explanation": {"current": "Іванович", "suggested": "Порфирович"}
                },
                "I2345": {
                    "status": "incorrect",
                    "explanation": {"current": "Порфиевна", "suggested": "Порфирівна"}
                },
                "I9876": {
                    "status": "incorrect",
                    "explanation": {"current": "No patronymic", "suggested": "Порфирівна"}
                }
            }, ensure_ascii=False)
        }
    ]
)

openai_client = OpenAIClient(config)

def check_patronymic_validity():
    """
    Prepares and sends a request to OpenAI API to validate children's patronymics.
    """
    father_firstname = firstname

    data = {
        "father": {
            "gramps_id": gramps_id,
            "name": father_firstname
        },
        "children": []
    }

    for child in children:
        data["children"].append({
            "gramps_id": child.gramps_id,
            "name": child.firstname
        })

    user_messages = [
        {
            "role": "user",
            "content": json.dumps(data, ensure_ascii=False)
        }
    ]

    # Use assistant training if needed
    result = openai_client.send_request(
        user_messages,
        system_prompt="Identify the father's first name and ignore his patronymic. "
                      "For each child, extract only the patronymic and ignore the first name. "
                      "Compare them using proper Ukrainian patronymic formation rules. "
                      "If incorrect, return 'status': 'incorrect' with 'explanation': "
                      "{\"current\": \"<child's current patronymic>\", \"suggested\": \"<corrected patronymic>\"}. "
                      "If correct, return only 'status': 'correct'.",
        include_training=True  # Toggle training data on/off
    )

    print("Request:", json.dumps(data, indent=4, ensure_ascii=False))
    print("Response:", json.dumps(result, indent=4, ensure_ascii=False))

[statements]
if gender == "M":
    check_patronymic_validity()

[scope]
selected

[unwind_lists]
False

[commit_changes]
False

[summary_only]
True

OpenAI API Client

# openai_client.py
import requests
import json

class OpenAIConfig:
    """Configuration class for OpenAI API."""

    def __init__(
        self,
        api_key: str,
        model: str = "gpt-4o-mini",
        temperature: float = 0,
        max_tokens: int = 100,
        api_url: str = "https://api.openai.com/v1/chat/completions",
        system_prompt: str = "",
        training_examples: list = None
    ):
        """
        Initializes the OpenAI configuration with default or custom parameters.

        :param api_key: OpenAI API key.
        :param model: AI model to use (default: "gpt-4o").
        :param temperature: Sampling temperature (default: 0.0).
        :param max_tokens: Maximum response tokens (default: 100).
        :param api_url: OpenAI API endpoint (default: GPT-4 endpoint).
        :param system_prompt: Default system prompt (can be overridden per request).
        :param training_examples: Optional assistant training data (default: None).
        """
        self.api_key = api_key
        self.model = model
        self.temperature = temperature
        self.max_tokens = max_tokens
        self.api_url = api_url
        self.system_prompt = system_prompt
        self.training_examples = training_examples or []  # If None, use an empty list


class OpenAIClient:
    """Client for interacting with OpenAI API."""

    def __init__(self, config: OpenAIConfig):
        """
        Initializes the OpenAI client.

        :param config: OpenAIConfig instance containing API settings.
        """
        self.config = config

    def send_request(self, user_messages: list, system_prompt: str = None, include_training: bool = True):
        """
        Sends a chat completion request to OpenAI API.

        :param user_messages: List of messages in OpenAI format (e.g., [{"role": "user", "content": "Your input"}]).
        :param system_prompt: Custom system prompt (if None, default from config is used).
        :param include_training: If True, include assistant training examples.
        :return: Parsed JSON response from OpenAI.
        """
        messages = [{"role": "system", "content": system_prompt or self.config.system_prompt}]
        
        # Add training examples if enabled
        if include_training and self.config.training_examples:
            messages.extend(self.config.training_examples)

        # Add user messages
        messages.extend(user_messages)

        request_data = {
            "model": self.config.model,
            "messages": messages,
            "temperature": self.config.temperature,
            "max_tokens": self.config.max_tokens
        }

        headers = {
            "Authorization": f"Bearer {self.config.api_key}",
            "Content-Type": "application/json"
        }

        try:
            response = requests.post(self.config.api_url, json=request_data, headers=headers)
            response_json = response.json()

            if "choices" in response_json and response_json["choices"]:
                return json.loads(response_json["choices"][0]["message"]["content"].strip())

        except Exception as e:
            print("API Request Error:", e)
            return None

Terminal results:

GeorgeWilmes · February 18, 2025, 6:50pm

Interesting! Something similar could be used for Irish names:

“For about two centuries (from the late 1700s through to the early to mid-1900s) the Irish favoured a precise convention for naming their children that can suggest what names to look for in a previous generation. All that’s needed is for one sibling in a family to have used this pattern with accuracy (even if one’s own direct ancestor deviated a little).”

https://irelandxo.com/ireland-xo/news/irish-naming-conventions-and-baptism-traditions

Urchello · February 18, 2025, 7:21pm

yeah, this is similar ukrainian middlenames

emyoulation · February 18, 2025, 7:23pm

But I think the example by @Urchello is dealing with surname data derived from given names of parents. And there are some “origin” etymology that could be filled in.

The Irish was inheritance pattern of given name to given name. I suppose you could add eponym and namesake associations.

Urchello · February 18, 2025, 7:45pm

I believe I have many duplicate people with the same names and surnames.
Some of them are actually different people, while others are true duplicates. Identifying which ones are duplicates requires a very complex analysis. To make accurate decisions, it is necessary to examine not only the people themselves but also their partners, children, parents, godparents, dates, attributes, events.

I believe that if AI is given such data samples, it could assist with this kind of analysis. Even if it cannot fully resolve the issue due to some missing facts, AI could still make well-founded assumptions, eliminate clearly incorrect options, and significantly reduce the workload.

This is how I plan to use artificial intelligence in conjunction with SuperTool in the coming years. I believe that this problem is not unique to me but is also encountered by other researchers.

Gramps could be the first genealogic app which makes similar things.

emyoulation · February 18, 2025, 8:00pm

Perplexity is fairly good at refining and fleshing out such concepts. Here is a discussion about “disambiguation of identically named persons” after several rounds. It probably needs to be expanded with discussions

perplexity responds:

When discussing genealogy research and distinguishing between individuals with the same birth name, several terms and concepts are used interchangeably with “disambiguation”. These include:

Deconfliction: Resolving conflicts between records that appear to represent the same person.
Name resolution: Determining which records belong to which individual when multiple people share the same name.
Identity resolution: Establishing a person’s unique identity beyond just their name.
Deduplication: Removing duplicate records from databases or family trees.
Name differentiation: Distinguishing between individuals with identical or similar names.
Record linkage: Connecting different records that belong to the same individual, even if names vary slightly.
Entity resolution: Determining whether multiple records refer to the same real-world entity.
Same Name Rule Out methodology: Systematically excluding individuals who share a name but cannot be the person of interest.
Unique identification: Assigning or recognizing unique identifiers for individuals to prevent confusion.
Name disambiguation: The primary term, often used interchangeably with the others mentioned.
Namesake and eponym identification: A crucial aspect of genealogical research, especially when dealing with individuals who share the same name. This process involves several strategies:

a. Contextual Analysis: Examining historical and familial context to distinguish between individuals.
b. Temporal Differentiation: Using birth, marriage, and death dates to distinguish between namesakes across generations.
c. Geographical Distinction: Considering locations associated with each individual.
d. Suffix Usage: Employing suffixes like I, II, III to distinguish between individuals across generations.
e. Occupation and Social Status: Using information about profession or social standing to differentiate.
f. Family Relationships: Analyzing family structure and relationships to identify distinct individuals.
g. Nickname or Middle Name Identification: Looking for variations in how a person’s name is recorded.
h. DNA Evidence: Using genetic testing to differentiate between individuals who share the same name but are not closely related.

These terms and strategies are used in various contexts within genealogical research to address the challenges of namesakes, unrelated individuals with identical names, and duplicate records resulting from data entry errors or import issues. By employing these methods, genealogists can more accurately identify and distinguish between individuals, ensuring the integrity of their family history research.

Urchello · February 18, 2025, 8:04pm

Small example:

Urchello · February 18, 2025, 8:05pm

emyoulation:

a. Contextual Analysis: Examining historical and familial context to distinguish between individuals.
b. Temporal Differentiation: Using birth, marriage, and death dates to distinguish between namesakes across generations.
c. Geographical Distinction: Considering locations associated with each individual.
d. Suffix Usage: Employing suffixes like I, II, III to distinguish between individuals across generations.
e. Occupation and Social Status: Using information about profession or social standing to differentiate.
f. Family Relationships: Analyzing family structure and relationships to identify distinct individuals.
g. Nickname or Middle Name Identification: Looking for variations in how a person’s name is recorded.
h. DNA Evidence: Using genetic testing to differentiate between individuals who share the same name but are not closely related.

Yes, this is huge job and AI can make a big part. Not all of course

emyoulation · February 18, 2025, 8:07pm

agreed. And I noticed something it overlooked in the discussion (and equally large dataset that is good AI fodder): Points of correlation

Perplexity responds

Yes, “points of correlation” suggests another important aspect in genealogical research and disambiguation. This concept refers to the specific elements or data points that align or connect across multiple sources, helping to establish relationships or identify individuals[7].

Points of correlation can include:

Consistent demographic information (e.g., age, birthplace, occupation)
Recurring family members or associates
Matching event dates (birth, marriage, death)
Similar residence locations
Consistent naming patterns

By identifying and analyzing these points of correlation, researchers can build a stronger case for linking records to the same individual or family, even when dealing with common names or limited information[1][5]. This approach is particularly useful in resolving ambiguities and distinguishing between individuals with similar names or characteristics.

Citations:
[1] Correlation: When and How « Genealogy Certification: My Personal Journal
[2] Timelines for Analysis & Correlation « Genealogy Certification: My Personal Journal
[3] https://books.byui.edu/fhgen_110_textbook_/chapter_3_genealogical_analysis_and_conclusions
[4] 3 Ways to Advance Your Research with Correlation - Legacy Family Tree Webinars
[5] Eight Key Types of Evidence in Genealogical Research - Kentucky Genealogical Society
[6] https://www.ngsgenealogy.org/wp-content/uploads/Complimentary-NGS-Monthly-Articles/NGS-Monthly-Johnson-Analysis-Correlation-Apr2015.pdf
[7] https://cms-b-assets.familysearch.org/4b/71/a2f6de0e0aa0b1347419c72020d7/course-handout.pdf
[8] https://www.youtube.com/watch?v=Vn7rrAP9xIM

dsblank · February 18, 2025, 10:53pm

I should have started on this a long time ago

Here is the start of a GrampsChat. This should also be a good way for exploring a chatbot for gramps-web as well. There is no reason they need to be two different things.

Currently, I have it hardcoded for openai, but through litellm, it should be easy to configure for any AI model, including your own.

Note that if you don’t restart your conversation often, your context messages will build up and that costs $$$$.

It doesn’t currently load any any of your local database, but it could. I’ll package this up soon and make an addon. (It’s pretty ugly right now, and is really just a Gtk wrapper around a chatbot conversation. But let me know if this would be useful to you.)

output

emyoulation · February 22, 2025, 9:59pm

Dunno if this posting from Reddit’s r/genealogy list is pertinent…

r/Genealogy • about 2025-02-22T18:30Z
sushibait
Professional Genealogist - Willing to help!

GEDCOM Data into a High-Tech Family Graph for use with AI

Free Resource (MIT license)

Hey fellow genealogists! Have you tried using AI in your genealogy work?

I [sushibait] created a free Python script that takes your GEDCOM files and transforms them into structured knowledge-graph data for AI large language models. That means you can leverage modern tools (even LLMs!) to explore your family trees in entirely new ways. The script is totally public domain and you can do what you like with it.

Here’s what makes this script cool:

• It parses GEDCOM files to extract not only individual records (names, birth & death dates) but also family relationships (husband, wife, and children).

• The output is a clear list of “entities” (the people) and “relations” (the connections), making it a breeze to represent your family data in any knowledge graph or graph database.

• It handles most GEDCOM date formats and is designed to be straightforward—just point it at your GEDCOM file, and it does the heavy lifting.

I [sushibait] built this with the goal of helping genealogists like us not only preserve our family histories but also discover new insights using the power of modern data structures and AI. Whether you’re a seasoned researcher or just starting out and curious about how tech can enhance your work, I’d love for you to give it a try.

Feel free to ask questions, offer suggestions, or share your own experiences working with GEDCOM files. You can also find more details in the source code comments.

Here’s a link to the github repo with the script: GitHub - sushibait/remotely-useful-stuff

Happy family tree building and data graphing!

P.S. I’d really love your feedback on this—what features would make it even more useful for your genealogical adventures?

Urchello · February 23, 2025, 1:29pm

I haven’t tried it yet, but I think we need to create something similar specifically for Gramps, not just for GEDCOM. I believe GEDCOM loses some data, such as attributes. For example, I save information like caste or person associations. Personally, I use it to reference godparents, witnesses, and other important connections. I think other genealogists also use attributes, associations, and additional data that could be lost in GEDCOM.

But yes, I agree that this is a very good approach for today. We must leverage the power of AI in our research to make it faster and of better quality.

emyoulation · February 25, 2025, 8:36am

The experimental “Pruned” mode of FamilyTreeView (FTV) is currently using Person Filter Rules to get a list of Person objects. It then makes a best guess of which relations that should be graphed vs those that should be “collapsed”.

It seems like a function that returns both Objects and connection paths would be a common need for growing the relationship/network Chart options.

system · March 27, 2025, 8:36am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
New update to SuperTool available Ideas supertool-script	3	379	February 6, 2024
Supertool: run a set (batch) of scripts Ideas third-party-addon , supertool-script	11	167	February 10, 2025
Tree vivisection experiments with the Isotammi SuperTool User Manual third-party-addon , supertool-script	2	2005	November 2, 2023
New Gram.py Script for 6.0 Development third-party-addon	27	233	April 18, 2025
Is a GitHub "gist" an appropriate sharing tool for SuperTool scripts Development	1	35	November 17, 2024

AI API for Supertool

agreed. And I noticed something it overlooked in the discussion (and equally large dataset that is good AI fodder): Points of correlation

GEDCOM Data into a High-Tech Family Graph for use with AI

Related topics