Active Ξ0.0405

AI detection tools have a false positive rate less than 2%.

By Capt. Sol Hando, SolHando Inc. Posted 18 days ago

Description

Context and Rationale:
AI-generated content detectors, particularly Turnitin’s AI detection system, are now widely used in academic settings. However, concerns have been raised about the risk of false positives—wrongly flagging genuine human-written work as AI-generated. A high false positive rate would undermine the fairness of academic integrity enforcement. There was recently a post on r/slatestarcodex, with ample anecdotal support in the comments, that these AI detectors have a high false positive rate. (https://www.reddit.com/r/slatestarcodex/comments/1k3op60/turnitins_ai_detection_tool_falsely_flagged_my)

Turnitin claims its AI detection tool has a very low false positive rate (<2%)(https://guides.turnitin.com/hc/en-us/articles/28477544839821-Turnitin-s-AI-writing-detection-capabilities-FAQs#h_01J2HRSX360JM2AXPMPHRCMWP5), though independent large-scale evaluations are limited.

The importance of this conjecture lies in balancing academic integrity enforcement with fairness toward students. If Turnitin’s false positive rate were higher than claimed, it would raise serious ethical concerns about its use as evidence in misconduct proceedings.

https://edintegrity.biomedcentral.com/articles/10.1007/s40979-023-00140-5
https://lawlibguides.sandiego.edu/c.php?g=1443311&p=10721367
https://archive.ph/n9KoB
https://www.jdsupra.com/legalnews/chatgpt-and-ai-detection-tools-the-1617850/
https://www.insidehighered.com/news/tech-innovation/artificial-intelligence/2024/02/09/professors-proceed-caution-using-ai

https://guides.turnitin.com/hc/en-us/articles/28477544839821-Turnitin-s-AI-writing-detection-capabilities-FAQs

Closing Procedure:
If a falsification submission meets the criteria, the bounty will be paid, and this conjecture will be marked falsified.

Falsification Criteria

Falsification Criteria:
The conjecture is falsified if either of the following two conditions are met:

Condition 1: Turnitin Falsification
A sample of papers great enough to prove statistical significance (32+)
Written by American high school or college students,
Either written by humans (proof required), or authored prior to January 1, 2020,
Each paper must be at least 400 words in length,
Is analyzed using Turnitin’s official AI Detection tool (through a teacher Turnitin account),
If more than 2% of the papers (i.e., more than 1 in 50) are flagged as more than 20% likely AI-generated, the conjecture is falsified.

Condition 2: Alternate Tools Falsification

Test at least two of the following AI detectors:
ZeroGPT
OpenAI Text Classifier (archived)
Go Winston
DetectGPT
Compilatio

Not all AI detectors are made the same. If you would like to falsify a detector not mentioned above, please message or email me and I'll review the public information on them. If an AI detector does not claim to have a <1% false positive rate, it's not especially interesting to demonstrate that.

Using the same set of papers (32 minimum, matching the criteria above),
If at least two detectors each show a false positive rate greater than 2%, the conjecture is falsified.

Proof Requirements:
Document Verification:
Each paper must have verifiable evidence of pre-2020 authorship, such as:
File metadata with unaltered creation dates,
Archived public postings (e.g., blog posts, forums),
Submission records to educational institutions,
Email attachments with timestamped records,
Google docs history from pre-2020.

Detection Results:

Screenshots or official output reports from Turnitin or other detectors must be submitted.
For Turnitin, results must explicitly show the AI percentage assessment.
For third-party detectors, output must clearly display AI probability or binary classification.

Summary Report:

Number of papers tested,
Number and percentage flagged as false positives,
Copies of supporting evidence for each paper flagged.

Review Period:
This bounty will remain open for 120 days from posting.
Questions can be sent here, or in an email sent to {me [at] solhando.com}

Bounty:
If I figure it out, see the associated ETH bounty.

The bounty will be awarded promptly to the first submission meeting the falsification criteria, subject to independent verification of the evidence.

AI Feedback

1. **Brief critique and context**: The conjecture that AI detection tools, specifically Turnitin's, have a false positive rate of less than 2% is critical given their widespread use in academia. A high false positive rate would unfairly penalize students and undermine trust in these tools. The claim relies heavily on Turnitin's own reports, which may not be independently verified, and anecdotal evidence suggests a higher rate of false positives. The lack of large-scale, independent evaluations is a significant gap in the current understanding of these tools' accuracy.

2. **Recent research**: A study by Stark and colleagues (2023) in the journal "Education and Integrity" highlights the limitations and challenges of AI detection tools, including potential biases and inaccuracies: https://edintegrity.biomedcentral.com/articles/10.1007/s40979-023-00140-5. Another article discusses the need for caution in using AI detection tools due to potential inaccuracies: https://www.insidehighered.com/news/tech-innovation/artificial-intelligence/2024/02/09/professors-proceed-caution-using-ai. A legal analysis also raises concerns about the reliability of these tools in academic settings: https://www.jdsupra.com/legalnews/chatgpt-and-ai-detection-tools-the-1617850/.

3. **Bayesian likelihood of falsification**: 60% likelihood of being falsified within 5 years. Given the reliance on self-reported data by Turnitin and significant anecdotal evidence of higher false positive rates, there is a reasonable probability that independent, large-scale studies could demonstrate a false positive rate exceeding 2%. The absence of extensive third-party evaluations contributes to this uncertainty, and ongoing research and scrutiny are likely to challenge the current claims.

Powered by OpenAI. Feedback may reference recent research and provide a Bayesian estimate of falsification likelihood.

Bounty

Ξ0.0405

Contribute to the bounty for anyone who can successfully refute this conjecture

Contributors

  • Capt. Sol Hando, SolHando Inc.
    Ξ0.0265 Confirmed
  • Anonymous User
    Ξ0.014 Confirmed

You must be signed in to contribute to the bounty.

Sign in

Refutations

Rational criticism and counterarguments to this conjecture

No refutations have been submitted yet.

Be the first to provide rational criticism for this conjecture.

You must be signed in to submit a refutation.

Sign in

Discussion

monkyyy 14 days ago

> Document Verification:
> Each paper must have verifiable evidence of pre-2020 authorship, such as:

Doesn't this produce a bias and an unnecessary requirement?, you should accept evidence of humans live streaming writting essays

Sign in to join the discussion.

OSZAR »