Gramps, AI (Artificial Intelligence) and the Future

Nick-Hall · September 6, 2024, 5:13pm

The Copilot case against GitHub, Microsoft, and OpenAI certainly influenced our decision. I also looked for opinions of open source institutions.

I notice that from the Jul 2024 Register article Coders’ Copilot code-copying copyright claims crumble against GitHub, Microsoft it appears that decisions are going in Copilot’s favour.

In order for us to apply a software license to contributed code we need the contributor to own the copyright. This may be a problem if the AI generates code that substantially borrows from its training code. Such code could be classed as a derivative work, and we would have to abide by the original licence. At the minimum this would involve crediting the original, but could also taint our licence.

In the case of Copilot, there is a filter that help to avoid such situations. A carefully chosen set of code completion tools with approved settings may be a possible route forward. We may also decide to impose additional restrictions such as limiting the size of code snippets, marking AI generated code, crediting original sources etc…

The legal complications will continue to be clarified. We will keep an eye on developments.

emyoulation · September 6, 2024, 5:21pm

Its not about AI writing code, it is about the possible Copyleft licensing implications when using AIs in development.

comeng · September 6, 2024, 5:50pm

Sorry but how can you have features with out writing code and if you do
not write code you can have no features it seems to me to be the old
chicken and egg scenario.
phil

codefarmer · September 7, 2024, 2:45pm

There are some coding assistants which disclose their sources, while others don’t. Some assistants can train on your repos (including private repos) and can even ingest wikis to answer questions. These products aren’t free, but those who need legal protection and protection of their Intellectual Property (IP) use such tools today.

@Nick-Hall Thanks for pointing to the case against Microsoft and OpenAI, it was helpful.

emyoulation · September 9, 2024, 5:38pm

How about the following use of an AI? Does it violate the “no AI in development” guideline?

“As an expert in Python, database design and the Gramps genealogical software, please identify the specific code sections within the Relationships view (gramps/gramps/plugins/view/relview.py at master · gramps-project/gramps · GitHub) that directly retrieve the Families (IDs or handles) of the Active Person, and the Persons (IDs or handles) of those Families.”

That prompt did not give me the data requested and a follow-up prompt was needed.

Purpose:

I wanted to find out why the Relationship view retrieves all the data for Immediate Families (including person name/gender and fallback Birth/Death Event data with date/place) so efficently. While the PersonsInFamilyFilterMatch ( infamilyrule.py ) addon rule takes seconds to execute.

The prompt returned info about the add_family_members() method but the method desired was what directly handles the retrieval and display of person and event data for the relationships view.

codefarmer · September 9, 2024, 8:48pm

Good question. It would be helpful to have clarity on this. Since it hasn’t been posted in this thread yet, here’s the wiki for Contribution rules for Gramps which has clear rules about code committed to the repository, although I don’t see a broad statement like “no AI in development” (or I haven’t found that yet).

Thinking out loud…what if you asked a Gramps developer the same question? That would probably not be prohibited, so my initial thought is that asking a tool (including an AI-based tool) questions seems reasonable, as long as it doesn’t result in generating code which is then committed to the repo.

Thoughts?

emyoulation · September 9, 2024, 8:55pm

One of the difficulties is that AIs tend to anticipate the ultimate objective. So it tends to offer code when asking a code related question.

codefarmer · September 10, 2024, 1:19pm

Engineer the prompt and instruct the AI to not reply with any code?

ennoborg · September 10, 2024, 2:29pm

Worse. No doubt about that. And that’s largely because there’s no I in so-called ‘AI’. It’s a language model, which can create code that looks nice, but there’s no real knowledge behind it.

Examples:

I made several efforts to let ChatGPT write simple text filters to extract info from GEDCOM files, or make small modifications to them, like modifying month names in GEDCOMs created by Ancestry, and it often failed, because it had no idea that in real GEDCOM, the tag is always preceeded by a number. It created code that looked for tags at the start of each line, but that’s not where the tags are. It had some vague knowledge of tags, but not precise enough to write working code.
When I asked ChatGPT to write code to add checksums to my UIDs, it did indeed write code that added checksums, but they were not of the type that is actually used by other software, like PAF.

A human will normally not make such mistakes, because he/she will take time to figure out where the tags actually are, or what type of checksum it really is.

ennoborg · September 10, 2024, 4:31pm

Another example, also from ChatGPT:

I just asked it to create a program that removes all lines with tag _MASTER from a GEDCOM file, which is quite a simple task.

What it wrote was a program that reads all lines from a file, and writes those that do not contain the string ‘_MASTER’. And that’s not correct, because in theory, the string ‘_MASTER’ may appear in a note, or somewhere else.

codefarmer · September 10, 2024, 4:57pm

@ennoborg A general purpose model such as the one used by ChatGPT may not perform as well specialized models such as those used by coding assistants. There are quite a few available if you feel like experimenting.

ennoborg · September 10, 2024, 5:06pm

Well, there is another part that I’m interested in, and that is improving existing code, like our relationship calculators. They are quite slow, compared to the ones in other programs, and it would be nice if AI could study the code, and show where it does redundant things, and where it can be improved in other ways.

Another subject would be something like comparing GEDCOM files, which is quite a nuisance, because a normal text comparison does not ignore irrelevant things like IDs.

ennoborg · September 10, 2024, 5:23pm

I’d love to check some, for personal goals, so if you can give us some links to experiment with, please do.

codefarmer · September 10, 2024, 6:28pm

Sure. There are many options for coding assistants and I’m trying some I found by searching for “best ai coding assistants”. Here’s a quick list in no particular order, which may or may not be free:

GitHub Copilot, Amazon Developer Q, Cody, Replit, Tabnine, Codeium, Cursor, …

Which one you pick might depend on which IDE you want to integrate with, privacy and security considerations, what features it has, etc. First thing I do is disable/opt-out of using my queries and data for training. And needless to say, keep the context of this thread in mind

Please share your discoveries.

Mihle · September 10, 2024, 9:18pm

I would expect ChatGPT to be better on Python than it is on GEDCOM. There is much more information about Python on the internet, and therefore most likely a larger part of its training data (Compared to GEDCOM).

ennoborg · September 10, 2024, 9:49pm

And that’s exactly the problem that I have with AI, and why someone from the University of Cambridge wrote that it produces BS:

What I mean is, that when you ask a human, an intelligent person may tell you that he or she does not have enough information about GEDCOM to write such a program, and/or ask you for more information about the subject (of GEDCOM) before it goes on.

ChatGPT however acts like a little child that has learned that it’s wrong to say no, and tries to please you with garbage, because you seem to be expecting something, whatever that is. And that’s not what I want, because it means that it’s wasting my time. And the cause of this is, that it’s a language model, designed to produce results that look good. and nothing more.

I discovered this myself when I asked specific questions about local subjects not related to programming, where ChatGPT produced garbage. It grabbed texts, and put them together, but had no real clue.

I compare it with that little child, because when I informed it about an error, like not knowing that GEDCOM lines start with a number, it apologized, and produced another piece of code that still was no good.

And the main cause of this is that it’s a language model, that has no clue about the real gaps in its knowledge. And in fact, it has no clue at all.

emyoulation · November 9, 2024, 2:35pm

A disturbing AI event happened on GitHub. The Latta AI project sic’d their bug fixing AI on open source projects with a 10 star rating. (The Gramps-project has 2.2k stars but the Taapeli/isotammi has 10 stars.)

Without invitation, the owners of Latta AI had it evaluate the code and submit PRs for those “10 star” projects. It looks like this was a test balloon before broadening the targets.

It appears as though GitHub suspended the LattaAI account and deleted the PRs.

Here is the disturbing part… the Latta AI site does not identify the owners of the site. But their “opt out” process requires submitting contact information before exposing the process. Feels like a phishing scam.

Does GitHub have a discussion forum where developers and contributors can be apprised of such things?

emyoulation · November 25, 2024, 6:47pm

It appears that GitHub has changed the permissions for AIs to reference repositories in their domain. (Might have been a reaction to the Latta AI overreach.)

Perplexity (after much probing) confirms that reference to https://github.com/gramps-project/ from October 2023 (and before) was previously permitted. That is no longer the case. Where I could previously ask it to point out a line of code where something was done in the source, that no longer works.

Is there an alternative? Can GitHub Copilot be used for this purpose? If so, how can we use that AI without violating Gramps policies on AI.

emyoulation · December 20, 2024, 6:30pm

2 posts were split to a new topic: Guidelines for using AI when documenting Gramps

RenTheRoot · December 28, 2024, 4:30am

When I use AI for any coding assistance (its rare but I like it for certain complex data manipulation or debugging tasks) I like to just take whatever concept it came up with for solving the problem and then correct my code independently. Its a process similar to consulting a stack overflow response for code guidance.

The one thing I think it is extremely useful for is in adding detailed logging code to my programs for testing. I always remove this code once I find the source of an error, but it saves so much time in finding said errors when you don’t have to type every logging message you could want by yourself.

Topic		Replies	Views
Guidelines for using AI when documenting Gramps Development ai , policy	10	262	April 28, 2025
Proposed update to our contribution policy regarding AI Development ai , policy	21	189	May 30, 2025
Update to our code contribution policy Development ai , policy	13	359	October 12, 2025
What are the conclusions on use of AI? Development ai , policy	7	124	September 14, 2025
About Gramps usage and organization to store and research Help	27	894	November 14, 2020

Gramps, AI (Artificial Intelligence) and the Future

Purpose:

Related topics