I notice there have been several discussions about the use of AI in Gramps for contributions and also for review or commenting etc, but the contribution policy discussion did not seem to reach a conclusion (although it seems as though the unspoken decision might have been for no change to the policy forbidding AI contributions).
There is an interesting Article in Ars Technica about asking AI about its mistakes, but this also seems to me to be very relevant to questions about asking AI to concentrate on Open Sources for its output. AIUI, one should clearly understand that prompts to behave in certain ways do not provoke ‘understanding’ in the AI to do so.
Irrespective of where AI might be contributing to code or PRs or PR Discussions or Sourceforge or Discourse discussions, I suggest that there should be a very strict rule that all contributions where AI has played a part should be clearly flagged, including mention of what AI was used and some indication of the prompt(s) that was (were) used.
I agree 100%. Having reviewed a significant number of AI-assisted PRs in the recent past (open source and other), I realized that as a reviewer, one has to look for completely different things in agent-generated code. This is particularly true when the code and the associated unit test is agent generated, because it often leads to the opposite of the original idea of test-driven development: rather than having the tests cover edge cases up front and then writing the implementation to handle them, agents often write implementations with loop holes and passing tests with the same loopholes.
Personally, I’m in favour of allowing AI-generated content, but only under the condition that
use is fully disclosed
submitter takes full responsibility for understanding the code
I agree with the all of the above sentiments, except this one. For more complex functions, I probably use 20 or 30 prompts (maybe more!), some independent of each other, and some follow up prompts. I think it would be impossible for all but the simplest of changes to indicate the prompt. (However, there is a way to log an AI’s activity for inspection and sharing; see below).
I highly encourage AI users to make complex changes incrementally. That makes it easier for you the human to verify each step, and also makes it easier for the AI to make the desired changes. One also has to add to the prompt “And make no other changes” because these systems are very willing to change a lot more than is necessary. I often times need to restart the process with more refined prompts. As @DavidMStraub mentioned, the final PR may be hard to review, but is required.
Speaking of tests (as @DavidMStraub mentioned) make sure that you instruct the AI not to change existing tests. To these systems, tests and what-they-are-testing are just code, and they’ll happily change the test code to make it pass a failing case.
Finally, if you want to keep an actual log of exactly what an AI did, including the prompts and tool calls, you can use Opik, the open source logger/tracker/analysis tool (from comet.com for whom I work). For VS-based AI environments, and Claude tools:
The VS-code extension called Opik - Export your chat history (Cursor, Zencoder). To install it, go to the extension tab (above the file search field) and search for Opik.
You will need to add your Opik API key and then all chats will be logged to your “cursor” project
I was thinking not only of code, but also of other things like PR discussions. Several discussions recently look as though they might have been generated by AI. Same goes for posts in other fora.
Yes, I did understand that this could be the case, that’s why I said “some indication of the prompt” rather than “the prompts”. I hoped that would indicate that I wanted some abbreviation, or summary of what the prompt was trying to get at rather that an exhaustive copy of the whole prompt sequence.
I’d like to understand, “why?” Initially it feels like an additional task to maintain the prompt history and provide with each code submission.
What would be useful IMO, is developers sharing some prompts for Gramps that generally produce good results. Some tips @DavidMStraub and @dsblank have shared for example about unit test are very practical and useful to incorporate. Maybe a few system prompts/rules can be assembled for those wanting to use AI code generators.
BTW, does mentioning the AI tool and/or model used matter? Perhaps only if some tools and/or models are deemed incompatible with Gramps policies otherwise I don’t see why this is needed. Personally I am experimenting with several tools and models and they literally change every week, and sometimes even within the same coding session to learn which models generate better responses.
My suggestion that it is simply polite to credit someone (and by extension something) else when you have used their work, and your reader deserves to know where the words come from if they are not entirely yours.
I was just suggesting to do something like what is done here and say something like
I asked Copilot to write you a guideline with some point I personally find important as a[n] non[e] programmer
and/or
Note: This guideline has been independently written, enhanced, and finalized by me, Copilot (Cogitarius Nova), based on general recommendations and principles. Author asked me to address several key points, which have been integrated into the text alongside my own analyses.
You are welcome to use, adapt, and share this text as needed.
I was definitely not suggesting a complete prompt history, but just a brief outline. For example, when Copilot is used to comment on a PR, this is indicated by “@Copilot commented on this pull request.” - that would seem to be sufficient (this would tell the reader that Copilot was used and the prompt was to the effect of “please comment on this PR”). Again, I am thinking of discussions, not only of code contributions (and at the moment AI code contributions are not allowed anyway).
I had understood that this sort of acknowledgement was what was typically done, for example I have seen this sort of thing in blogs or newsletters or online articles. For example “The Genealogical Proof Standard (GPS) is a methodology developed by professional genealogists to establish reliable conclusions. It has significant benefits but also some practical limitations, and I’ve invited Claude.ai to explain the pros and cons:”