Gramps, AI (Artificial Intelligence) and the Future

A decision has been made to exclude any code from the Gramps project that has been written with AI assistance.

I accept that this is the project policy at this time. But there ought to be a discussion of where the boundaries lay.

For instance, GitHub is gearing up its AI driven tools with tools like GitHub Copilot. There are only a few others that are explicitly AI-driven at this time but we can expect more and more tools to have some AI reliant components. The pylint and black are still rules-based but such tools are likely to move to ai in the future.

The discobot in this forum, its translation tools, google translate are all ai-driven and used in our support. Does that make the project tainted?

It any developer has used a search engine looking for answers to development dilemmas, their development has been AI assisted.

I can see creating and segregating another “Project” repository for AI developed or powered addons but not core code. Allowing the purity of our project to remain untainted but still benefit. (Clearly labeled “AI only” in the copyrights with new licensing exceptions in the derivative works terms for AI … with their own separate drinking fountain and seats in the back of the bus.)

There have been rulings in the USA courts that code entirely developed by AI cannot by copyrighted. However, a human writing the AI prompt (that results in code output) means the code developed is human-directed and the rights can be claimed by that human or their employer.

Naturally, other nation’s courts have issued other rulings.

In 2022, the Software Freedom Conservancy has established a committee to explore the copyleft implications of AI-assisted programming. (Specifically, GitHub Copilot)

This seem to be the height of “hypocrisy” we have one group arguing to bring improvements to GRAMPS to allow easier importing of AI generated transcriptions of original records and yet we are not using it to assist grossly overloaded developers, surely if clearly identified/tested as AI generated it would greatly help reduce the backlog of bug fixes etc. Not to mention the refresh cycle times for GRAMPS updates.

Also is the AI generated code any better or worse than human generated code?
phil

1 Like

Worse at this time. AI services are extremely susceptible to “hallucinations”

All AI work should be double-checked.

By the way, my use of segregation policy words in the original post was intentional. The policies that we may have to live under now will be just as unjustifiable to future users as discrimination is to our politically over-corrected society. But until the copyright precedents are set in the courts, we are stuck.

I have been using AI since 1988. I can say that the “I” for intelligence is a bad word.
For the moment “AI” does not invent new code but uses old codes from the same or other projects.
Thanks to the “GAFAM” who collect all data (good and bad) and I hope that all data is quite good.

For me, AI is a “miroir aux allouettes” in French and a “funhouse mirror” in English.

If you want gramps to use new code, have new features that are not dependent on the old ones, we must prohibit the use of “AI”.

3 Likes

I tend to agree with you on this and it reinforces why I am wary of AI generated transcriptions.

phil

In my expericence LLMs tuned for coding have come a very long way. They have the potential to improve productivity significantly. No question, generated code should be checked, just as code written by programmers is reviewed in PRs.

Could you elaborate on this? One way I interpret this is, legacy code may not have been written well (e.g. may not have used newer language features), and if AI uses that to generate new code, it would be propagated further. Is that what you mean?

In my limited experience, LLMs geared towards coding having been trained on various sources and language features, are able to generate code using modern standards. But I may not have understood your point fully.

On the other hand, if we had an LLM that was trained on Gramps code (as suggested by a reviewer on one of my PRs) then the generated code might use Gramps APIs, and possibly avoid duplicated code.

At this point I’m seeing enough potential that I’m open to the possibilities presented by AI, while respecting guidelines set to not use it for code contributed to the repo, as long as we understand why the rules are put in place.

1 Like

I prefer a poorly written but readable code for maintenance than an incomprehensible code generated by the AI that will be difficult to maintain.
AI does not invent new code. It always relies on existing code.

We have two cases:
1 - The gramps core.
Nothing replaces the human even if its code is written badly.
As I said before , IA will never invent new code, structure or other things.

2 - The gramps addons.
These are generally used for reports. I can accept that the AI is used in this case. The authors of these addons will know how to maintain them. If the authors do not maintain these modules, they may be deleted.

For all those things, I don’t want to have IA code in gramps. I speak for me. I don’t know what other developpers think.

2 Likes

I am not a developer but agree with both points. I want to build addons with the assistance of AI and have a great fear AI generated code in the core. (But proof-of-concept code tests that require complete rewrites and specification documents using AI seem reasonable.)

And I very, VERY much want some LLM tools and Gramplet addons. Natural Language Parsing of OCR’d Obits to find/link Names, Relationships and Places would be very welcome.

One thing I can agree with is that any and all code (human generated or machine generated) should be reviewed and it is a reasonable expectation (requirement even) that it should be readable and maintainable. I suggest that AI written code is not necessarily poorly written, unreadable, or unmaintainable. Could we find a lousy AI generator that does that? Sure. Are all code assistants that bad? Not by any means.

Just for fun, if we want the assistant to generate poorly written code, it could easily be added to the prompt. “Write spaghetti code as if you were a programmer who doesn’t know Python” :upside_down_face:

It’s not my intention to convince anyone to use AI code assistants, or for Gramps to start allowing that today. Although in the past year, my eyes have opened to exciting possibilities while at the same time, like you and others, I wonder what the future looks like.

The way I see code assistants work is by helping, not replacing me. For example I write the code and then I could ask the assistant to review the code and find bugs in it before I commit the code (we do things like that with pylint and black today). What if it could take my implementation and optimize for speed, or memory, or storage? Does the code have to be a new invention by the AI? No. It would make my life easier if it could complete some of these tasks so I could focus on what is more important.

BTW, it’s encouraging to see that there’s openness to using AI generated code in non-core code. And I state my opinions for discussion, not taking a hard stance here. Would love to hear differing views.

1 Like

There is also something between no AI at all, and Ask ChatGPT to write something for you.

For one, IDEs like IntelliJ is adding “AI” enhanced code completion. Basically it suggests one and one line, rather than one function at the time like normal code completion do.

Things like ChatGPT can also be used to ask it questions about code, how some things works, ask it to explain things in a different way that the documentation or stack overflow explains it. Personally I have much more positive experience with that than getting it to write the code itself.

2 Likes

When we updated our contribution policy in April 2023, we were concerned about the legal consequences of allowing AI generated code. Does the AI own the copyright of the code it writes, so it can assign it to us to be compliant with our licence? Can we be sure that the code will not infringe anyone else’s copyright?

3 Likes

Yes. Code completion and asking for help about general coding concepts seems like it could be acceptable.

There are unanswered questions for sure. Is there any specific project, case, or company Gramps is basing the decision on this issue? I’d like to express my interest in efforts to re-evaluate the decision for Gramps in the future.

In the contribution rules, I see the restrictions related to proprietary code; what about LLMs trained on permissively licensed code? What if the LLM was trained only on Gramps source? If Gramps could narrow down restrictions, we might be able to identify LLMs or tools which could still be used, starting with advanced code completion tools you mentioned in in another reply.

It is likely that LLMs will eventually have to have a Curriculum Vitae which lists their baseline training. Plus a professional certification covering which Programming track was trained. With some advanced coursework in Database application design, big data, Python and UX/UI design.

For use in our sector, maybe it would need secondary training in traditional Genealogical and History research.

Then we as users would add layer on SME training choices such as Gramps source, Gtk+, Glade, the MantisBT reporting tool and the Gramps MantisBT databases, the Gramps maillists, Discourse , GitHub Gramps-project repository commits and PR history, the SourceForge repository, MediaWiki and the Gramps wiki documentation, Sphinx and the Gramps API, Gramps Web, the GNU license and copyleft understanding, … and so on.

There seem to be leanings toward the CV approach instead of universal expertise LLMs. However, the granularity of the SME training is beyond the scope of current AI trends. And it might be too resource intensive for the near future.

Not using AI? Why fight progress? Don’t use computers then! Everything must be done by hand and kept in paper folders!

This I feel is not about fighting progress it is more about how you use GRAMPS.

I want to use it store the information I have found about my family roots the enjoyment of doing this pastime is the search for records and then the transcription of said records from originals or digitised copies of originals to a form where I can find them easily.

I am sure there are as many uses for GRAMPS as there individuals using it many will not be Family History Projects so if it is appropriate for you as an individual or the GRAMPS community then use AI if not do the other thing.

I suspect Ancestry, MyHeritage, 23andMe etc etc are all working on AI purely as an enhanced marketing tool for the DNA tests. You can just see it coming “do a DNA test create a tree just your name will do (please include your date of birth), then sit back and wait for your tree to be presented all the way back to the Neanderthal in the 3rd cave on the left”.

Where is the intellectual challenge or dare I say fun in that.

phil

No, if I’m not wrong, the original post is about developers using AI to assist them while writing code for Gramps. It’s not about the Gramps features or use.

1 Like

This isn’t about being a Luddite with regards to a new technology. It is about complying with the spirit of the licensing model.

And it isn’t the 1st time various licensing models prevented the project from folding in a strongly desired functionality.

A lot of people have wanted Gramps to interface directly with outside tools and services. But most of those require executing a legal agreement about proprietary API confidentiality or data being exchanged.

Since the project is not incorporated, it cannot execute legal contracts. And even if we did incorporate, some of those terms are in direct conflict with the openness of open source.

So the Gramps-compatible addons that ARE available (for APIs like FamilySearch) are offered from OUTSIDE the project, shared by an individual who can execute the contract and abide by the terms.

1 Like

Another one who hasn’t read or understood the post.
It’s not about Gramps features, it’s about using AI to write code.

Same thing as far as I am concerned where do you draw the line even if
you ignore issues like copyright and licensing
phil