Update to our code contribution policy

After considering your feedback and reviewing some common policies of other open source projects, this is my suggestion:


Gramps source code is released under the GPLv2 or later license (see 1). In order for code submissions to be accepted, they must conform to this license. Here is some additional guidance:

General

  • Contributors must add their real name and contact information at the top of any file they wish to contribute.

  • The contributor attests that they wrote the code and release the code under the Gramps license.

Third-party code

  • Contributors are responsible for disclosing any copyrighted materials in their code.

  • Third-party code can be used with the permission of the copyright holder and in compliance with the other source license terms. This license must be compatible with the Gramps license (GPL v2, MIT, public domain, etc.)

  • Third-party code must be properly attributed and a source license included where applicable.

  • When using code from sites such such as StackOverflow, a link to the answer used should be included in the commit message. Small code snippets that are non-copyrightable can be included without attribution.

AI generated code

  • AI code assistants and generators can be used, but their use must be disclosed.

  • Contributors should ensure that the terms and conditions of the AI tool do not place any restrictions on the use of the code that is inconsistent with the Gramps license.

  • Contributors should include a statement in the commit message to include the AI tool and version used.

  • Code substantially written by an AI tool should use the ā€œGenerated-by:ā€ tag and include an indication of the prompts used to generate the code.

  • When a commit includes assistance from an AI tool, such as a GitHub CoPilot review, then the ā€œCo-authored-by:ā€ tag should be used.

  • Contributors should use features of their AI tool that suppress code that is similar the training data and/or employ code scanning to avoid unintentionally including copyrighted material.

These guidelines may be subject to change as the technology advances and copyright law is clarified.


As usual comments are welcome.

2 Likes

Hi @Nick-Hall
A thoughtful list, thanks for putting it together. A few questions for clarification:

Gramps source code is released under the GPLv2 or later license (see 1).

Was the reference here supposed to be Project License - Gramps ?

Commit history includes author info, and GitHub profile contains more details about the author. So it makes me curious about the rationale for this, could you elaborate? Is may be possible that this could make some contributors back off from submitting code to Gramps.

Question about ā€œAI Toolā€ - so if I use an editor like VS Code with a plugin which hosts LLMs, for e.g. Continue or Cline, which I then configure to use Granite, Llama, Codestral, or other models, what is required to be reported? Versions to be reported for tools and/or LLMs?

We can help our contributors by investigating which LLMs and/or tools are compatible Gramps’ license ā€œGPLv2 or laterā€ and which are definitely incompatible. Another thing that would be useful (which I mentioned in another thread) is a set of prompt which make a good starting point for using AI in coding for Gramps.

My opinion is, the simpler the requirements we place on contributors, the better. Everything asked for should serve a documented purpose.

The wiki page that I am updating is the Howto: Contribute to Gramps page. I’ll update my post with the link.

1 Like

This was in the original policy. The reason is so that the author asserts their copyright.

Looking at policies of other open source projects, it seems to be good practice to be transparent about the tools and models used. This can help when reviewing code, and also for traceability if problems arise in the future.

The OSRF Policy on the Use of Generative Tools in Contributions gives the following example:

Generated-by: GitHub Copilot v3.2; Amazon CodeWhisperer 2024/10 release.

Their guidance is to provide ā€œthe fully-qualified name of the tools, including the provider and version/release information.ā€

I don’t think that this is practical to include in our official policy. Perhaps our users could provide additional guidance for specific tools.

Do these guidelines also apply to manifests for compiling the flatpak?

Regarding ā€œContributors must add their real name and contact information at the top of any file they wish to contribute.ā€

Is this the only way authors assert their copyrights? I looked at a few open source projects, and did not see this being done with any level of consistency. I question the wisdom of adding dozens of lines (growing to hundreds of lines over decades) to the top of the file. With modern source control systems I don’t see this as useful.

Since we’re reviewing our policies, could we look at simplifying along with adding policies?

Thanks for examples of what information is expected for usage of AI tools, that seems reasonable. I also agree that documenting which tools are compatible doesn’t belong in the policy, it is an implementation detail so a separate document/wiki with developer knowledge about current tools, language models and prompts would be more appropriate, and yes, developers would keep that updated.

I am glad to see the policy revision.

About a year ago in July 2024, before the original policy was put forth, I did an experiment with using ChatGPT writing a prototype formatter for NEHGR (New England Historical and Genealogical Register) style breadcrumb pedigrees. The idea was to make a variant of the Detailed Descendant Book report to assist Genealogists who want to submit articles for the NGS Quarterly.

Then the policy was established, forcing the abandonment of the idea because any code would now be considered ā€œfruit of a poisonous tree.ā€

With the new policy, the idea can be pursued again.

2 Likes

Yes.

I think that nowadays the copyright owner maintains legal rights even if they don’t assert copyright. This was not the case in the past in the USA and maybe other jurisdictions.

The copyright lines are a visible way for us to acknowledge contributors in the code. They can also provide a quick way to see who last made non-trivial changes to the code and when they did so. I’m happy to review this though.

1 Like

I’d like to hear everyone’s opinion on using AI tools which, as a condition for using at no-cost, use prompts and responses (and maybe more) for training their AI models, and/or improving their tools?

Personally, I am against everything related to AI. It is only the processing of information that consists of tapping into what already exists without taking copyright into account.

There is no intelligence in the AI.

2 Likes

For fun, and testing formating tools provided via an AI (was Mistral), I asked (via too many prompts…) a simple review on an addon’s code. I have few problems (less) to cleanup myself comments and polish my code after many experimentations, but during exploration we can quickly make code up-side-down. So I made this testing. I saw that codestral(coding with Mistral AI) just removed, during review, some sections (line references), like:

This program is distributed in the hope that it will be useful,

but WITHOUT ANY WARRANTY; without even the implied warranty of

MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the

GNU General Public License for more details.

You should have received a copy of the GNU General Public License

along with this program; if not, write to the Free Software

Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

but it keeps:

This program is free software; you can redistribute it and/or modify

it under the terms of the GNU General Public License as published by

the Free Software Foundation; either version 2 of the License, or

(at your option) any later version.

and the copyright references. So, maybe my bad, I wasn’t too strict on prompting.

Was it fair? Is there any (commercial) logic (intelligence) on codestral task? Maybe, but it is also my fault.

About help on coding and automatic tasks, in the past, I used Geany, which have pycodestyle, pyflackes, pylint support (code checker, helper).

So, many AI with coding support just have an alternate interface to these workflow and environment?

By going further, I asked to help me for using threading support (e.g., iteration on individuals). After heping me to provide a pool section, using queue, etc. the addon had a thread support, but I was no more able to maintain this code (or fixing any possible issue).

Maybe I should rather generate an alternate view (ie., on Relation Category) by using the TreeView model. So, moving the code designed for a run into a tool to a view. I cannot do it myself. Some automatic tasks might be useful.

There is maybe a grey zone, where the starting idea and solutions should still be generated by the author.

Le but, c’est le chemin ~Johann Wolfgang von Goethe

The way (or the path !), is the goal ~Johann Wolfgang von Goethe (or Confucius?)

… still valid with AI?

As resume, with this AI coding service, my mind (way or flow) moved from do not reinvent the wheel to blind trust… Sure, I also got more details on how to use these new features, the logic, etc. and into my mother tongue! Unit tests have also been proposed. :sunny:

Here, one sample of this new code section:

@staticmethoddef calculate_family_network_centrality(db, person_handle):
    ā€œā€"
    Calcule un score de centralitƩ pour un individu dans le rƩseau familial.
    ā€œā€"
    person = db.get_person_from_handle(person_handle)

    # Compter les descendants
    descendants = set()
    stack = [person]

    while stack:
        current_person = stack.pop()
        descendants.add(current_person.get_handle())

        for family_handle in current_person.get_family_handle_list():
            family = db.get_family_from_handle(family_handle)
            for child_ref in family.get_child_ref_list():
                child = db.get_person_from_handle(child_ref.get_reference_handle())
                stack.append(child)

    num_descendants = len(descendants) - 1

    # Compter les ancĆŖtres
    ancestors = set()
    stack = [person]

    while stack:
        current_person = stack.pop()
        ancestors.add(current_person.get_handle())

        for family_handle in current_person.get_parent_family_handle_list():
            family = db.get_family_from_handle(family_handle)
            for parent_ref in [family.get_father_handle(), family.get_mother_handle()]:
                if parent_ref:
                    parent = db.get_person_from_handle(parent_ref)
                    stack.append(parent)

    num_ancestors = len(ancestors) - 1

    # Compter les liens de couples
    num_unions = len(person.get_family_handle_list())

    return num_descendants + num_ancestors + num_unions

//code helper or assistant : Codestral 25.08

ok, it fits fine into the plugin and instructions remain still mine.

What about if I only use such code as a new filter rule and provide it as a new Pull Request? What should be the copyright?

Few lines for a feature which could be included as a core function, either via one Relationships module (class) or via a Filter Rule. No revolutionary or tapping method, maybe optimization and common python code. The AI should now know what kind of instructions (prompts) could provide such ā€œoptimizationā€ in relation with the set of core plugins of gramps (or how and where romjeromelike to go!). Except the while stack and stack.pop() the style remains mine. It was a common and basic clustering logic. Let me know if there is a copyright issue!

@SNoiraud l tried to raise, provoke, deal with limits on a draft Pull Request on an addon.

I quickly stopped because it can go really too far… I was neither able to test all proposals nor maintain all this code. It was clear, that most sources were others gramps Plugins or standard python coding samples (clustering) from any documentation, like we can all do without AI. This provides features but I no more feel myself as the author, even with the original addon started in 2017. Too many proposals for something very basic and on only few days (or only few hours). I tested it, and dislike the fact to always need to stop ā€œintrusionsā€ according to the first intructions.:sun: :sun_behind_small_cloud: :sun_behind_cloud: :sun_behind_rain_cloud: :cloud_with_lightning_and_rain:

That’s not coding help, that’s a

>>> while infinite loop

You are right on this point : such too large support for coding might be problem.

update : Codestral AI might be a little rascal…:face_with_raised_eyebrow:

The common way for the proposal was coming from the addon itself! By re-using the output of the related filter rule, the addon will point out to the code (and location) of the gramps filter rule, which is not far away from the logic shared by the AI.

So, I need to add the name of the original author, which was a contributor to gramps’ code…

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.