Problem/Motivation
The much talked about ChatGPT tool is capable of generating very plausible text about a number of subjects, including programming. Its large language model has been trained on 45 Terabytes of text data, presumable scraped from the World Wide Web, so if the "answer" is somewhere on the web, odds are good that ChatGPT will come up with it.
The well-respected website StackOverflow has (temporary?) banned answers generated by ChatGPT, see Temporary policy: Generative AI (e.g., ChatGPT) is banned.
The site moderators on StackOverflow seem to think abuse of ChatGPT to generate answers is severe enough to ban it. I don't know if it will create problems on Drupal.org, but I just wanted to post this to the Site moderator's issue queue for general discussion about whether we need a policy in this area.
Examples:
- https://www.drupal.org/forum/general/general-discussion/2023-05-22/can-i...
- https://www.drupal.org/forum/general/general-discussion/2023-05-04/how-c...
- https://www.drupal.org/forum/support/post-installation/2014-10-12/combin...
- https://www.drupal.org/forum/support/module-development-and-code-questio...
- https://www.drupal.org/forum/support/upgrading-drupal/2023-07-01/migrate...
- #3515646: Add automated <img srcset> generation
- #3521303: LLM-generated modules have been published
- #3489528: Update telephone filter to Drupal 10 & 11
Current de-facto policies
Issue Ettiquette — Use of AI generated content
Abuse of the Contribution Credit System
Proposed resolution
To be determined.
Comments
Comment #2
gisleChatGPT-generated posts are starting to appear on Drupal.org.
There has been several of these reported in issue queues, but they have been summarily deleted as spam.
This morning, I spotted three in the support forums:
To me, it looks the quality of the answers it generates are below par. These answers are trash and IMHO provides no benefit to the community. It look like some people are using AI-generated content as a low bar to gain recognition and earn issue credits.
Should we have a policy allowing side moderators to sanction people that use AI to generate the content they post on Drupal.org?
Comment #3
bramdriesenI think we need a policy for such cases. I'm okay with it being used to for example make wordings of a sentence better. But just plainly copy pasting answers like in the 3 links you posted is a no-go for me.
Here is the policy which Stack Overflow (and all subs) are following: https://meta.stackoverflow.com/questions/421831/temporary-policy-chatgpt...
In a nutshell:
EDIT: Just noticed the policy is already in the OP if you click through on the web page of SO.
Comment #4
john_b commentedYes a policy would be good. I can see no motive other than spamming for posting such stuff. Marking it spam makes sense to me. Can we have an AI bot to test posts for ChatGTP? It usually seems to churn out material (intelligently?) cut and pasted from elsewhere, which should be identifiable.
Comment #5
gisleUnfortunately, it does not. Academia have used plagiarism-checkers for years, and these will discover copy-pasted materials quite reliably. But they don't work with ChatGPT. (Yes, I've tested it.)
GPT is an acronym for "Generative Pre-trained Transformer" and this type of AI works by predicting what word a human would use next when making conversation about a specific topic.
However, ChatGPT output has a certain "style", and I like to believe I've gotten rather good at recognizing it. But what generally characterises the output produced by ChatGPT output is how bad it is. Unlike the rule-based AI that we pursued in the seventies, the GPT-based AI has zero knowledge about the domain. They're just incredibly good at generating letter-perfect and plausible-sounding sentences.
Until there is some reliable automated tool for detecting it, we just have to rely on human moderation, just like we do with spam. (Well, we also use an automated spam detector called "Akismet" – but it gets it wrong too often, and human site moderators often needs to sort out both false positives and false negatives.)
If we end up following Stack Overflow, and ban all AI-assisted generated content (including ChatGPT), we may need to have a mechanism to report such posts (or just overload reporting on the present spam flag), and leave it to site moderators to review the reports and remove content and impose sanctions if they find the report accurate.
For starters, I think posting AI-generated posts should block a user from getting the 'confirmed' user role.
Comment #6
gisleMoved examples to issue summary.
Comment #7
gisleAdded another one to issue summary.
The MO for this one was to post AI generated content to build credibility as a contributor, and then go on to post spam comments under case studies.
Comment #8
catchNew examples, some of these have thankfully been deleted or unpublished by gisle already:
https://www.drupal.org/project/drupal/issues/3366843#comment-15158138
https://www.drupal.org/project/drupal/issues/3224941#comment-15155329
These were very long, obviously generated comments, with no relationship to the discussion going on. They could possibly in some cases look more plausible if they were the first comment on a new issue instead of a non-sequitur.
I'm not entirely sure this needs a specific policy yet - these were worthless spam, so we can just ban the spammer; exactly how the text was produced was secondary except for the speed and volume.
Comment #9
gisleThe standard definition of "spam" is that it is content posted for the purposes of advertising. A lot if the AI-generated content is spam, where a chunk of ChatGPT generated content is simply used as wrapper for one or more advertising links, to make the post appear legitimate. These posts are unceremoniously deleted, and the poster instantly banned (re our current policy on spam).
However, there's a lot of bad content posted on Drupal. Being bad does not make it spam according to the above definition. Some of it are posted with good intentions by community members that happen to lack knowledge, and some of it is AI-generated contents posted to create an appearance of contributing to issues (perhaps to earn some recognition or some issue credit). If is obviously wrong or off-topic, I use my site moderator privileges to unpublish or delete it (depending on how bad it is), but the site moderators need to thread careful to avoid exercising censorship. I, for one, do not interfere with bad content posted with good intentions.
Classifying "these [as] worthless spam" is a policy. That's not our current policy on AI generated content, and until it is some concensus that this should be our policy, the site moderators cannot ban these users.
It should also be noted that there are some enthusiasm for AI generated content in the community, for example, see this related issue: #3336313: Use Ai for assisting Drupal issues to increase rate of development. I don't share their enthusiasm, but how to treat such content and the users posting it clearly not a cut and dried case.
Independent of this issue, Hestenet of the DA recently expanded our guidelines on Issue Etiquette with a section on AI Generated Content. It states the following policy:
My opinion is that this is reasonable, and that we probably now have our policy.
Comment #10
catchhmm I think that's an incomplete definition. Let's say someone really loves the book module. So they post 3,000 nearly identical messages on random core issues about how much they love the book module. They're not advertising anything except for how much they love the book module, but it's still spam - bulk messages posted indiscriminately and unsolicited. Another example would be someone posting exactly the same support question in 15 different slack channels within the space of a minute.
This is true, but we've also had cases like someone won't fixing 50 random issues in a day which swiftly resulted in a ban. There is a type of behaviour that is this kind of bulk + low effort posting, which for me is best described by 'spamming'. I think it's fine if there's another term for it, just that's the one I associate with it.
The addition looks basically fine, but I worry that it's going to let through not incorrect but redundant, useless, and lengthy content with a disclaimer that it was generated by AI, which I also don't want to wade through. I guess that can still be covered by 'not relevant' and can always be tightened later.
Comment #11
gisleIMHO, this not spam, but instead:
And they're going to be banned for doing that (re our current policy on trash content).
There are people outside of the site moderator team that may decide to ban a user for disruptive behavior (I believe this includes the employees of the DA, some CWG members, some senior core committers). IIRC, this decision to ban this particular user was not made by the site moderators.
However, users are banned by site moderators for abuse of the credit system (Example: #3375893: Block user Harshita Mehna for abuse of the issue queue). This is based on the policy stated in the breakout box "Abuse of the credit system" on our guidance on Attribution and credit system"
Speaking as a site moderator, I prefer there to be clear and objective policies for deleting content and banning users. When we do this, we engage in acts of censorship. It is important that such acts are based on transparent and recognized policies, and not the personal opinion or bias of the individual site moderator.
As for the lengthy and not relevant missives typically produced by ChatGPT, I just noticed that our automatic anti-spam tool (Akismet) just unpublished one of them, without it containing any advertisement link. I've left it unpublished (instead of deleting it) to see whether its author is going to complain about it not being published.
Comment #12
avpadernoYes, spamming is also used for posting the same content multiple times, but we do not treat people doing that like spammers; differently, even the accounts used by people who post the same comment for issues that must be closed (like I do) should be blocked.
We are restrictive about what we call spam because spammers are blocked without warnings, while we warn people who is doing something they should not be do, like repeatedly closing old issues. The account could still be temporary blocked, for example to avoid that person keeps closing issues at the rate of 10 issues per minute, but we also send a message to let the person know what it should not be done on drupal.org.
We need a policy because when people do something they should not do, we contact them and give a link to a page explaining what should not be done. I agree, it can be difficult to write a policy about AI-generated content, but that policy does not need to give too much details about why we do not like AI-generated content in issue queues and other places; it could do like the Drupal.org Terms of Service page, which does not say why shared accounts are not allowed to make commits in drupal.org repositories.
Comment #13
catchOK I agree with this, there will be occasions where someone comes up with some new horrible behaviour that's not covered, but then it can at least be added retrospectively. The main question for me was whether there needed to be a specific policy for AI-generated content or whether it could be covered by a less-specific one encompassing the same kinds of content whether human or AI-generated. The 'trash' policy deals with some of that. I haven't had site moderation permissions since I resigned when a spyware module company was re-instated without discussion in about 2010 or similar, so a bit out of touch.
Comment #14
hestenetAs @gisle noted above, I've made a very basic effort to start developing a policy for AI generated content:
In which I updated both:
https://www.drupal.org/docs/develop/issues/issue-procedures-and-etiquett...
And:
https://www.drupal.org/drupalorg/docs/marketplace/abuse-of-the-contribut...
This may be a bit tricky to write a policy for, because (at least for the moment) I've been leaning towards allowing AI generated content if and only if the use of AI is disclosed, and the user posting has reviewed and edited the material for accuracy.
But maybe that's most of what we need:
Users agree to do the above when using AI, or else the account will be temporarily suspended, and the user contacted with a link to the policy pages above.
Users will be unblocked when they have acknowledged the policy materials and agreed to follow them in future. After an additional violation they may be blocked permanently.
Thoughts on what else we should include?
Comment #15
deviantintegral commentedI think it would be great if the policy would apply to content in the drupal.org planet feed too. As written, the policies wouldn't prevent sites from using generated content if they so choose, but requires a baseline of human review, quality, and attribution that would increase the overall quality of content on the feed.
Comment #16
mindaugasd commentedAI is good at writing. @deviantintegral is there some existing (or predicted) AI related problems within drupal planet feed?
Comment #17
deviantintegral commentedIt may be good at the structure of writing, but hallucinations are a challenge still.
There's been some articles over the past 6 months that have felt like AI-generated content in that they were filled with generalizations and so on. But, I'm not confident enough to link to them. I think a policy would get ahead of things so that if content starts publishing problematic content there is a guideline to refer authors and publishers too.
Comment #18
gisleAI is terrible at writing. It is just very good at phrasing and grammar.
The problem with AI generated content is not just the much talked about "hallucinations", but the verbosity and utter triviality of AI generated utterances (which seems to be getting worse all the time as the AI companies tries to get rid of the "hallucinations" by removing any specificity from the output they spew forth).
I think the problem with AI generated content is the same as the problem with spam. It is not bringing the world down, but having to deal with it is a big waste of time and resources – including my time. My vote is for muting any contributor, including Drupal Planet contributors, that post AI generated content.
Comment #19
gisleFixed broken link in issue summary.
Comment #20
rolandschuetz commentedIn considering our approach to AI-generated content, the emphasis should be on the quality of the text rather than the means by which it was created.
It's important to recognize the distinction between AI that generates content independently and AI that is employed to articulate and polish a user's pre-existing ideas effectively. While the first can be problematic, the second is merely better grammar correction.
Just to prove my point, AI mostly wrote this based on: `Write a comment: Focus on text quality, not how created; Differentiate AI can be used to make up content, or to formulate users' ideas nicely - the second is good.` Having a first draft and then refining it is much easier for me than writing from scratch.
PS: If someone were actually able to provide correct and helpful comments with an LLM and RAG, that would be wonderful. I wasn't able to do this yet.
Comment #21
bramdriesenYou are correct, but I think the issue that started it all here was that some user was posting forum answers by just copy pasting the question and then posting the (very long) answer of ChatGPT or whatever he used.
Like you say, if it’s to help you write what you’re trying to say (grammar/wording) it’s fine to an extent that it isn’t noticeable. Just like your example.
The tools will become better and more difficult to detect anyway over time. I think we need to keep an eye on this and keep evaluating what is happening in the community.
Comment #22
ressaThere has been a few veiled attempts at SEO link spam recently, by posting 5-6 sort of correct answers, and also some wrong answers -- seemingly written by an LLM such as ChatGPT -- and then finally a post with a link to a site:
If you check the visuals and source of the two WP web sites, they look nearly identical. In this post, one even answers the other.
So I agree with the current policy of not allowing LLM generated content in the forums:
https://www.drupal.org/docs/administering-a-drupal-site/troubleshooting-...
Maybe this rule could be used more?
https://www.drupal.org/docs/administering-a-drupal-site/troubleshooting-...
Drupal RAG
I agree @rolandschuetz, a functioning Drupal RAG providing precise, non-hallucinatory answers would be useful. Akansha Saxena is closest, I think, see Inside the Codebase: A Deep Dive Into Drupal Rag Integration. She's also on Planet Drupal under Akansha Tech Journal.
Comment #23
ghost of drupal pastThere's a much worse problem: people now post AI generated modules which were clearly never even read by a human. The cherry on the cake is they have the audacity to opt such code into security coverage. I just reported a security hole in such and I haven't even started looking seriously. These modules should be removed from security coverage and their publisher needs to lose their security coverage privileges IMO perhaps based on a strike system. I doubt we can stop the proliferation of them because they will just post this garbage to github otherwise but no one should be able to force the security team to deal with slop.
Sure, to err is human, every software has bugs not not like these.
It will be a thankless, contentious job for sure. Probably a vote by X people with Y roles could help?
Comment #24
catchI tried to review an MR against experience builder recently, and it turned out the entire MR was LLM-generated and had not been reviewed by a human, none of this was disclosed in the issue summary. It wasted more than an hour of my time wondering wtf was going on. See #3515646: Add automated <img srcset> generation.
I think there needs to be a disclosure policy, and when that's repeatedly broken, issues and projects should be treated as spam.
When there's disclosure, and the project code is still un-reviewed slop, which is the situation that chx describes, that is starts to feel like we need a new issue for the Technical Working Group (which is not properly functional)/DA/Security team. Maybe a new issue in https://www.drupal.org/project/issues/securitydrupalorg to start with?
I have seen some truly awful Drupal modules posted in my time, but LLMs allow these to be written and posted much, much faster than they previously could be.
Comment #25
poker10 commentedLinking a similar core issue.
Comment #26
cmlaraFirst off: I'm aware of the user and modules refereed to in #23, and agree they are poorly written (to the extent I've added them to an internal list of modules to not to utilize and maintainers to avoid).
I am weary of this as an option. Responsible/Ethical/Coordinated disclosure is a key aspect of security. It would be one concept if D.O. did not use wording such as
Security issues do not need to be privately reported for the module_name project., with this wording D.O. is actively encouraging public uncoordinated disclosure for those not 'opted in'.Any action D.O. takes to remove the ability for maintainers to compel private disclosure is an affront to security. D.O. requiring a test and to remain in the 'good blessings' of the Security Team would be conflict with security industry norms.
To put this in another light, how would the community feel if I demanded the Core Team pass an arbitrary test that I personally create, with a drawn out process of months to approve with the ability to revoke based on a condition I decide,, before I would cease disclosing vulnerabilities publicly? I presume the community would be very upset, especially in the case of a Drupelgeddon level vulnerability, and yet that is the standard applied to contrib maintainers today that this suggests extending.
Whatever that occurs for a policy regarding AI generated code there should not be considering the compromising of security privileges in the equation. Consider LLM code spam, consider LLM code unacceptable for commit due to copyright laws, consider it unwanted and target the user on those grounds, whatever is done, do not make a policy that would compromise the ability to request private disclosure.
Comment #27
ghost of drupal pastMy two cents: already the security team has the policy to mark unmaintained projects unsupported. Spraying a probabilistic series of PHP tokens into git is not maintainership.
But, it's up to the security team whether they share my concern of being overrun.
Comment #28
catch@cmlara I actually agree, in this specific case, revoking git access altogether would be better. I really do not like the 'opt in approval' process, although it's better than what it replaced, but we do need to be able to prevent people from bulk-publishing inherently flawed code on Drupal.org. I've opened #3521303: LLM-generated modules have been published.
Comment #29
mradcliffeIn the new first-time contributor workshop slides that volkswagenchick developed we have a single slide titled "Use of AI" with the following scripted notes. This could be a good start for an official policy.
I am not sure why we mention it as required, but it probably should be changed to recommended for now pending policies.
Comment #30
cmlaraThat appears to be a direct pull from the Issue Etiquette page .
Hestnet added basic policy in comment #14 to the Credit Abuse policy and added the Etiquette page text.
Discussion on this issue switched to “does the policy need changes” after that post.
Comment #31
mradcliffecmlara, thank you. That jarred my memory. I was trying to find where it came from.
Comment #32
cmlaraUpdating title to better reflect the status of this issue after #14's creation of policy by D.A. staff.
Comment #33
mradcliffeI updated the issue summary with links from #14 and added the links to recent issues in the examples item list.
Comment #34
mradcliffeI added an example of a module porting exercise using ChatGPT. The use of which is disclosed as part of the merge request (with prompts included in the merge request).
Comment #35
hestenetAdding a related issue for a specific proposal for AI-assisted contribution.
Decide a separate issue was better since this issue has a lot of discussion of AI generated comments and other text, and is a broader focus than just the code contribution stuff.
https://www.drupal.org/project/governance/issues/3565917
Comment #36
hestenetSharing what is not really a policy change, but an update of the language on the page that felt necessary to post as new people come in (especially with the new Google Summer of Code class)
Comment #37
bircherI didn't know that we now have this policy.
I would propose that we add a point (phrasing to be improved)
3. You must ask for consent to use AI from a maintainer before posting AI generated content.
I think it would be great to have an opt-in instead of an opt-out of AI generated code and issue contributions. Some modules can declare on their module pages that gen AI is welcome, some may declare that it is not welcome, but I think the default should be to ask first.
Comment #38
bircherA second point (and so that it can be replied to independently) that I think a lawyer should clarify is:
How well AI generated content (to which at least in the US you can not get a copyright for) is compatible with the GPL?
Can you contribute code to a GPL project for which you do not own the copyright?
I think before announcing that AI generated code is ok we should be sure that it actually is, or at least be cautious about it.
Comment #39
hestenetWorked with @bredeirt and @scottfalconer in the contrib room here at DrupalCon Chicago.
We have synthesized a lot of the discussion across the several different issues that are currently open, and made a further update to the basic policy that I had written previously.
Why am I just updating this directly?
While there are ongoing and important discussions, I'm also using my role to leverage a 'stop the bleeding' attitude towards these policies, to move quickly on policy adjustments that, while they won't solve everything, will help iteratively improve the situation for maintainers.
There are two or more axes of concerns: Pragmatic policies for usage requirements and restrictions, ethical policies, and perhaps pragmatic policies for improving the quality of output.
Here is the latest change, now moved to a dedicated page, and linked from the Issue Etiquette policy:
Issue etiquette: https://www.drupal.org/docs/develop/issues/issue-procedures-and-etiquett...
New dedicated page: https://www.drupal.org/docs/develop/issues/issue-procedures-and-etiquett...
Comment #40
breidert commentedThis looks good
Comment #41
cmlaraIt sounds like the latest version is saying this is not part of the Terms of Service for use of the Drupal Site, am I understanding that correctly ?
Comment #42
kristen polThanks for this. I’ve read it and IMO it’s clear and provides the required guidance.
Regarding
“ perhaps pragmatic policies for improving the quality of output”
We’re trying to provide docs and resources for better AI outputs for contributors here:
https://www.drupal.org/project/ai_best_practices
And there is a new #ai-learners channel is Drupal Slack for people to share their best practices and learnings to improve the quality of contributions.