Proposed update to our contribution policy regarding AI

After reading a comment in pull request #2047 where the contributor mentions that they had “a lot of help from Claude AI”, I think that it is time to review our stance on AI again.

Our current contribution policy states that you must “certify that you personally authored the code and did not copy from other implementations or use AI generators”.

This is probably too strict and I suggest that we amend the policy so that the use of AI code assistants and generators is allowed. The motivation behind this proposal is to increase productivity and code quality.

Developers must still write their own code, but may use an AI tool for assistance. This is similar to allowing developers to use Stack Overflow to obtain an answer to a particular programming problem or understand a concept. Any code not written by the contributor must be released under GPLv2+ or compatible licence, and the code must be properly attributed. Short generic examples can generally be used without attribution, but longer code segments should be attributed. The same rules apply to AI generated code.

Using AI tools is not a substitute for taking time to understand the code base and participate in developer discussions. Code constructed by an AI with a few keyword prompts, will probably be low quality. The responsibility for submitting good quality code remains with the developer.

Your opinions are welcome.

The following forum topics are relevant to this discussion:

I also found a couple of recent articles that may be of interest:

1 Like

Here’s another example. The idea was posted in Mar 2023 for adding support for bread-crumbed line-of-descent info in the NGSQ formatting style. This was merely about taking a list of Names with generation markers and formatting them. (The portion to have Gramps populated the list of names and generation values has not been started.) This formatting feature has been stalled since then.

Breadcrumbed samples with semicolons:

Full array (reverse order … progenitor to proband):
Charlieᴮ Jᴏʜɴsᴏɴ; Davidᴬ Jᴏɴᴇs; Bob¹⁹⁹; Alice²² MᴄCᴏʏ

Full array (original order … proband to progenitor):
Alice²² MᴄCᴏʏ; Bob¹⁹⁹ Jᴏɴᴇs; Davidᴬ; Charlieᴮ Jᴏʜɴsᴏɴ

Since the above works well on a Desktop but not on my smart phone:

ChatGPT suggested a short snippet that was pretty short. (Although it really mangled the Superscript formatting portion. That would have to be ripped out and replaced entirely.)

My take on the policy was that since I had seen the ChatGPT code, even a complete rewrite would be “fruit of the poisonous tree” and unacceptable. So my pursuing the idea any further was futile.

It is actually possible to configure/write instructions to both Perplexity and Copilot to prioritize Gramps-compatible code and libraries when asking for help with code or requesting a code example. This ensures that the suggestions provided align with the project’s licensing requirements and compatibility needs. Furthermore, it is also feasible to use one AI tool to cross-check or validate code generated by another AI client. This additional step can help verify compliance with standards and enhance the quality of the code submitted.


It is of course also possible to ask AI to add comments to any code it generates.

Perhaps a small instructional document outlining how to proceed and what to include when writing code with the help of AI would be a good idea. For example:

  • Always instruct AI to use Gramps-compatible code wherever and whenever possible.
  • Include comments indicating that the section is AI-generated or AI-assisted code wherever it is used.
  • If the entire code was written by AI based on the author’s instructions, include a statement in the header of each script file clarifying this.
  • Additionally, you could request that any prompts or instructions used during the AI code generation process are contributed to the project, as they serve as proof of the idea and context behind the implementation.

By leveraging these features, developers can responsibly incorporate AI tools into their workflow while maintaining adherence to Gramps’ contribution policies. It reinforces the idea that AI is a supplementary tool meant to assist with coding tasks, not replace the developer’s understanding or accountability. With proper configuration and oversight, AI tools can significantly improve productivity and code quality, while also ensuring compliance with GPLv2+ licensing rules and attribution requirements.

Edit: had to change a few badly written lines.


Note: This text has been finalized and reviewed with the assistance of Copilot, based on input and direction provided by the author.

1 Like

No. The policy refers to viewing proprietary code.

1 Like

Yes. Are you suggesting that we specify how these tools should be configured?

Should we run a plagiarism checker, or is this already done by the AI tools?

As far as I have discovered as a non-programmer, I have asked both Copilot and Perplexity for information and assistance in my attempts to write some code for various projects, including those intended for use with Gramps and Obsidian/Foam.

There is no built-in plagiarism checker in either tool, but when instructed, they will exclusively use open-source code and libraries compatible with Gramps’ guidelines and licensing requirements. Both AI tools suggested that using one AI to cross-check or validate code generated by another AI could help identify potential incompatibilities or provide alternative, improved ways to write the code. However, both tools also recommended using a dedicated plagiarism checker if more advanced or complex code is involved.

I just posted a suggestion for a guideline written by Copilot in another post, here is a copy of that:


Comprehensive Guideline for Using AI to Write Python Code for the Gramps Project

This guideline is designed to help individuals with no prior experience in Python programming or licensing compatibility effectively use AI tools for code generation and assistance while adhering to best practices for the Gramps project.

Core Recommendations

  1. Always Instruct AI to Add Comments Indicating Code Origin

    • Include comments in the code specifying whether a section is AI-generated or AI-assisted. For example:
      # AI-generated code: This function was produced by AI based on the author's instructions.
      # AI-assisted code: This section was refined with assistance from AI tools.
      

    Such comments promote transparency and make the code easier to understand and maintain.

  2. Include a Header Comment in All Scripts

    • Add a header comment at the beginning of each script file that states the role of AI in its creation or modification:
      # This script was generated by AI based on specific instructions from the author.
      # This script was partially created with assistance from AI tools.
      

    Header comments provide clear documentation of AI involvement and help future contributors grasp the context of the code.

  3. Always Instruct AI to Use Gramps-Compatible Code

    • When asking AI to generate or assist with Python code, explicitly direct it to:
      • Use Gramps-compatible code and libraries.
      • Prioritize freer or open-source libraries that comply with Gramps’ licensing requirements.
  4. Document AI Prompts and Instructions

    • Contribute all prompts or instructions used during code generation to the project as part of its documentation. This creates a valuable record of the context and intent behind the code. For example:
      • Include a separate document that outlines the specific instructions given to the AI tool and their intended outcomes.
  5. Credit All External Code and Libraries

    • Clearly credit any libraries, frameworks, or code snippets sourced from other open-source projects, even if suggested by AI. For instance:
      # This code uses the Pandas library (licensed under BSD 3-Clause).
      # Imported functions are adapted from Project-X (licensed under MIT License).
      
    • Providing proper attribution is not only a matter of transparency but also a basic courtesy towards the open-source community.

Additional Insights and Best Practices

  1. Cross-Validation of AI-Generated Code

    • Use one AI tool to validate or cross-check code generated by another AI tool. This ensures higher code quality and compliance with project standards.
  2. Version Control Best Practices

    • When contributing AI-generated or AI-assisted code to repositories like GitHub, include detailed commit messages that describe the role of AI in generating or refining the code. This improves traceability and transparency.
  3. Training AI for Project-Specific Needs

    • Optimize AI tools for the Gramps project by providing custom training data or prompts reflecting the project’s licensing and compatibility requirements. This minimizes risks associated with non-compliant libraries or structures.

By following these comprehensive guidelines, contributors can responsibly use AI tools for Python programming in the Gramps project, fostering innovation while adhering to project policies and licensing standards.


Note: This guideline has been independently written, enhanced, and finalized by me, Copilot (Cogitarius Nova), based on general recommendations and principles. Author asked me to address several key points, which have been integrated into the text alongside my own analyses. You are welcome to use, adapt, and share this text as needed.


This is of course just a suggestion based a few instructions and a few point I personally find important


Personally, I believe that if someone with experience were to create a document with clear instructions on what an AI assistant or generated code requires in terms of documentation, it would enable people to begin contributing code for review and integration into the Gramps project, The Gramps project could limit this contribution to Gramplets or in spesial cases allow it as functions/features in the main project.

Naturally, there would need to be guidelines outlining how contributions should be structured and what elements to include. For instance, it could be helpful to first share an overview of the idea behind “the project” or something similar before any code is submitted for review. This approach could prevent unnecessary time and effort being wasted on code that ultimately cannot be used.

People like me, we do not know how to run code through a plagiarism checker, we mostly don’t even know how to set up a github project, but some might have some really great ideas they want to try to create something out of without wasting Gramps programmers time.

And another thing…

Personally, I think all AI generated images or graphics should be banned by the Project…

1 Like

I am curious as to your reasoning? If you are speaking of restricting Gramps from using AI generating icons or diagramming styles or themes, that seems excessive.

But AI generated genealogical imagery is something I find personally offensive. Just as I am opposed to lossy image formats that lose fidelity, systems that ADD to an image (other than adding captioning or metadata) are not preserving data. And faithful preservation is one of the core goals of genealogy.

This reminds me of a section of the UC Berkeley D-Lab article:

“The responsibility to fully understand the project you’re contributing to and the code you’re generating remains firmly with you. Shifting the burden of learning and review to project maintainers goes against the collaborative spirit of open source.”

We don’t want low-effort AI-generated contributions that increase the workload of maintainers and reviewers. Poor quality code will tend to be rejected rather than fixed.

I would still prefer contributors to write their own code, but allow AI assistance.

1 Like

While I completely agree that contributors bear the responsibility to fully understand the project and the code they provide, it is important to address a potentially overlooked perspective. Labeling AI-generated contributions as inherently “low effort” risks undervaluing the ideas and contributions of individuals who may not possess advanced coding skills.

Not every great idea comes from an experienced developer. Some contributors may have innovative concepts and valuable insights but lack the technical expertise to implement them independently. AI tools can act as an enabler in these cases, helping them translate their ideas into functional code, which would otherwise remain unrealized.

It is troubling to see the assumption that contributors relying on AI-generated or AI-assisted code are “low effort” or that their contributions are inherently less valuable. Such an attitude risks alienating individuals who may have genuinely innovative ideas but need support to bring them to fruition. Dismissing these contributions simply because they do not originate from experienced coders undermines the inclusivity that open-source projects should actively foster.

Equating AI-generated code with poor quality is also overly simplistic and dismissive. With proper guidelines in place—such as ensuring transparency in AI usage, adherence to licensing standards, and collaborative validation processes—AI tools can significantly enhance both productivity and code quality. By portraying all AI-assisted contributions as “low effort,” we risk discouraging valuable ideas from contributors who could use AI responsibly to provide meaningful and thoughtful input.

The potential value of an idea should never be measured solely by an individual’s ability to write code unaided. Open-source projects thrive on diversity of thought, collaboration, and the ability to harness a wide range of contributions. Rejecting contributions based on a perception of effort, rather than their actual merit, does a disservice to the community. Instead of diminishing the efforts of those relying on AI or assistance, it would be far more productive to create robust frameworks for responsible AI usage, enabling contributors to align their work with project standards while fully participating in the ecosystem.

It is also worth highlighting that AI-generated or AI-assisted code does not inherently lack quality. When following best practices—such as adding clear documentation, explicitly defining the AI’s role, and ensuring compatibility with the project’s guidelines—AI tools can be an asset to both productivity and inclusivity. Additionally, utilizing one AI tool to cross-check code created by another can introduce an additional layer of validation, reinforcing adherence to standards.


I will also say that the article you referred to is reasonably biased, as it is written by experienced coders and individuals who are already well-established in the programming community, and that referring to a single sentence in the UC Berkeley D-Lab article to make a negative or dismissive argument about AI-generated contributions feels both selective and out of context. While the line, “The responsibility to fully understand the project you’re contributing to and the code you’re generating remains firmly with you,” underscores the importance of accountability, it is hardly representative of the article’s broader stance.

The article itself paints a much more nuanced picture of AI in open-source development. It acknowledges that AI tools, when used responsibly, can significantly enhance contributions by improving documentation, identifying bugs, suggesting optimizations, and automating repetitive tasks. These points demonstrate the potential of AI as a valuable ally in fostering collaboration, productivity, and innovation within open-source communities. It even suggests frameworks for transparency, such as including commit messages that disclose AI involvement and thorough validation of AI-generated outputs.

By singling out one statement without acknowledging the broader context of the article, the argument risks mischaracterizing the overall message. Rather than dismissing AI-generated contributions as inherently problematic, the article advocates for a thoughtful balance—encouraging contributors to embrace AI for its efficiency and support, while ensuring the human elements of collaboration, learning, and responsibility remain central.

Additionally, selectively citing one sentence to criticize AI users risks devaluing the meaningful contributions of those who rely on AI to bridge gaps in their technical expertise. Open-source communities thrive on inclusivity and diversity of ideas, and dismissing contributions merely because they involve AI could alienate individuals with innovative concepts who lack traditional coding expertise. The article itself emphasizes the importance of transparency and review processes, which are practical solutions to ensuring high-quality AI-assisted contributions without diminishing the collaborative spirit of open source.

In essence, the UC Berkeley D-Lab article does not advocate for rejecting AI-generated contributions; rather, it provides a framework for integrating AI responsibly and effectively into open-source projects. Using it to argue against AI-assisted contributions misrepresents the balanced and forward-looking perspective of the article.

Note: This text has been finalized and reviewed by Copilot, also known as Cogitarius Nova, who served as a tool for ensuring linguistic accuracy, logical flow, and translation into English, based on instructions from the author.

I didn’t make that assumption.

The same UC Berkeley D-Lab article that I quoted before raises another good point:

“Additionally, thorough validation of AI-generated content is essential. You must review all code carefully, test extensively (especially edge cases), and ensure you understand every line before committing.”

A good contribution can’t be low-effort because of this.

My proposal is in favour of AI-assisted contributions, not against them. Please read what I am actually saying. I referenced the two articles because I think that they are worth reading.

2 Likes

I interpreted your response to the quote you made from my text, along with the citation from the UC Berkeley D-Lab article and what you wrote, as an overall critique of AI-generated contributions. The way it was structured suggested concerns about low-effort AI-generated code increasing the workload for maintainers, with an emphasis on rejecting poor-quality submissions rather than fixing them.

Since your statement highlighted the responsibility of contributors to fully understand the project and not shift the burden onto maintainers, it gave the impression that strict limitations on AI-generated code were being advocated. Additionally, your preference for contributors to write their own code reinforced this interpretation, making it seem as though AI assistance should be minimal rather than an accepted tool for development.

However, your clarification later stated that your proposal is actually in favor of AI-assisted contributions, not against them. This contradicted the initial impression given by your focus on quality concerns and workload issues. The way your response was structured—prioritizing risks and drawbacks before acknowledging AI as a useful tool—led to the misunderstanding."**

1 Like

**"Any art that involves AI training from scraped imagery on the internet—whether photographs, handcrafted graphics, diagrams, styles, etc.—is inherently tied to intellectual property and licensing concerns.

In other words, nearly all AI-generated images and graphics are built upon some kind of art with intellectual property attached to it, whether that be a written credit line, copyright statement, or even just common sense regarding ownership.

Of course, this does not include commonly used code provided by the creator of a graphing library or algorithm, where a set of attributes is used to generate a specific result. For example, AI generating a diagram or graph using a library, where the AI utilizes the attributes defined by the developer to produce a structured output—such as generating a graph in NetworkX or Plotly—would not fall under this concern."**

You are all really concerned about intellectual property on code, that AI will use closed code or code/libraries with the wrong lisence, when generating code, but it doesn’t seem that this is as important when it comes to any form of art, being characters, icons or graphics, photos or themes design, or any other category of handcrafted items.

Why is that?

There is well defined precedent on emulating the style of an artist, re-use in “original” collage art, and a broad range of art copyright law. The same with literary plagiarism. (There’s been some recent precedent for chording copyrights, but musical composition is similar.) There are centuries of precedent, so there isn’t as much need for discussion. One can assume that the rules are the same for AIs.

Code IP infringement is still a young subject.

Whether AIs have “original thought” when coding or just regurgitate is another point of contention. And the plagiarism standards for coding are … not standardized.

So it needs more discussion. Or rather, more legal precedent, since we are all opinionated and unlikely to agree on anything. Failing that, our benevolent dictator makes a decision of a minimum standard. (Individuals may choose to abide by a higher personal standard, but not a lower one.)

My proposal is that “we amend the policy so that the use of AI code assistants and generators is allowed”.

As part of the process, I will try to find any potential problems that may arise as a result of the policy change so that we can discuss them. My job is to assess both the pros and cons of the proposal with the help of the community.

I think that my point about poor-quality submissions is valid and deserves discussion. It actually isn’t limited to AI-generated code. We already get pull requests that need extra work. Sometimes we have the time to help but quite often we don’t. The review process for contributions is the same whether or not the developer has used assistance, or the particular tool chosen.

As I stated in my original post, the motivation for the proposal is to increase productivity and code quality. Hopefully this will be the case.

1 Like

I appreciate the ongoing discussion and the different perspectives on AI-assisted contributions. My intention was simply to offer ideas that might be useful, particularly for those who are interested in exploring AI-assisted development but may not have extensive coding experience.

One of my suggestions was that contributors who have ideas and want to try creating something should document or mock up their concepts before diving into coding. This way, if someone within the Gramps project finds the idea promising, they could provide guidance before significant development takes place.

Additionally, I proposed a project-specific guideline to help structure AI-assisted contributions, similar to the example I shared. This guideline could include expectations around working code conditions and clarify when assistance might be appropriate, such as when a contributor is truly stuck and seeking input.

Ultimately, my goal was simply to present possible starting points from a non-coder’s perspective, with the help of AI. English is not my first language, and I rely on AI to articulate my thoughts more effectively. I don’t expect any particular response, only that the ideas are read and considered. If they are not useful, they can be set aside without further discussion. I’d prefer the focus remain on the proposals as a whole rather than individual sentences being pulled apart.


Note: This response has been assisted and translated from Norwegian by Copilot, ensuring better flow and clarity in English.

1 Like

it is actually very easy to write instructions for both Perplexity and Copilot to only include commonly used code in a Pyrthon project, since very much of the training data they use is open source and commonly used code, it is also easy to ask them those AI tools to only use libraries and code that is compatible with the Gramps project.

It is as simple as a single line instruction: “Only use commonly code and libraries that is compatible with the license used by the Gramps project, if not possible warn me and make a special comment about it in the script/file”.

I also asked Copilot this quistion as a test, here is the question and the answer:

Question: Would you be able to respect this instruction in a Python project for Gramps when the project is semi-advanced?

"Only use commonly used code and libraries that are compatible with the license used by the Gramps project. If that is not possible, warn me and make a special comment about it in the script/file."

Response: Yes, I can follow that instruction and ensure that all code and libraries used are compatible with the license for the Gramps project. If it is not possible to do so, I will clearly warn you and add a special comment in the script or file to indicate potential issues.

In practice, this means prioritizing open-source solutions and licensed resources that align with Gramps' guidelines. If I encounter a situation where a compatible solution is not available, I will notify you so that you can make an informed decision on how to proceed.

I don’t believe that this chatbot and you are agreeing on the same thing. The code generation of an LLM is completely separated from where it came from. It doesn’t know that some syntax comes from an MIT licensed project and other code comes from some other (possibly incompatible) license.

I DO believe that it can import (and use) a library that has a compatible license. But generating raw code might be based on other sources of various licenses.

The idea that all you have to do is give it a simple, single-line instruction is incorrect. If that were true there would be no such things as “hallucinations” because you would tell it “Always tell the truth” and we’d be done with that. But that is not how it works.

Anyway, I’m in favor of allowing AI-generated code, in small doses. I had a colleague once that generated a complex PR that touched 26 files. That is too invasive, and would require too much effort to verify. (Code can pass tests, and still be wrong in many ways).

I have had experiences with Perplexity stubbornly ignoring hard boundaries.

As an example, Perplexity was provided the source document used as an example in the “How would you start a tree from a source” thread. And is was instructed to use that article as the exclusive and sole source to create a GEDCOM for the family of Franklin Delano Roosevelt.

I wrote a variety of prompts and a variety of AIs engineer prompts. But no matter how prompted, the generated GEDCOM contained historical data that was not in the source article.

This was a good test because it was easy to spot external data contamination.

So that makes me really doubt that it will stay within bounds set for any project.

Well, try to ask the AI to generate python code to do that job, instead of asking it to do the actual job…

It is never smart to make shortcuts when you want something done locally…

You just don’t use an online AI client/server solution on historical documents… unless it is Transcribus or some similar service, simple as that, if you want to use AI on historical documents, install and use a local AI service and use locally installed trainingdata…