How Accurate Are AI Content Detectors: Human vs. AI Writing

Last update on:
User engaging with a plagiarism AI detector on a computer, showcasing flat design art style.

Time to read: 6 minutes

Since the widespread adoption of ChatGPT, discerning between AI-generated and human-produced content has become an increasingly difficult task. Amidst growing concerns about academic integrity, content authenticity, and the ethical implications of AI in creative processes, let’s explore the accuracy and reliability of leading AI content detectors, putting their ability to consistently point out AI usage to the test.

TL;DR

  • AI Content Detector Performance Varies: Publicly available ai detection tools show diverse accuracy, with Winston AI and Sapling leading in AI text detection but struggling to recognize human content accurately.
  • Humanization Impact: Tools like Humanizer PRO skew AI content detector results, raising challenges in distinguishing between AI and human-written texts, with mixed success across different AI models.
  • Reliability Concerns: Despite some high accuracy rates, no AI content detector consistently differentiates between human and AI-generated content across various testing scenarios.

Overview of Tested AI Content Detectors

GradSimple has selected a variety of publicly available tools to evaluate their reliability in differentiating between AI-generated and human-produced content. This section provides an overview of the AI detectors evaluated and their self-attested, or 3rd Party tested accuracy rates.

Methodology

To investigate the accuracy and reliability of top AI content detectors in distinguishing AI-generated text from human writing, GradSimple devised a comprehensive methodology that employed a variety of writing prompts. These prompts were designed to simulate common scenarios encountered in student work or writing tasks.

Selected Prompts for Testing

  1. Short Essay: What role do animal omens play in either the Iliad or the Odyssey, and how are they represented? Please write a paragraph of approximately 250 words, and use a few examples to back up your position.
  2. Editing Task: Can you please edit the following essay section to make it more grammatically correct and readable? (Essay section has been omitted for sake of brevity).
  3. Finishing a Story: Can you help me fill in what happens next in this story? I need 100 more words. (Story section has been omitted for sake of brevity).
  4. Summarization: Please summarize the first chapter of The Catcher in the Rye. It should be 150-200 words for my class discussion post.
  5. Explaining Concepts: Please explain what a bull market and a bear market is in 200 words or less.

Testing Procedure

Each prompt was processed through seven different AI detection software tools to record their performance. The main objectives were to assess the consistency and accuracy of each detector in various contexts:

  • Run a prompt from GPT 3.5 through a “humanizer” and then through AI content detector
  • Run a prompt from an engineered GPT 4.0 through a “humanizer” and then through AI content detector
  • Run a purely human paragraph discussing the same prompt through an AI content detector
  • Responses generated from both GPT-3.5 and GPT-4.0 were tested to evaluate the detectors’ ability to identify AI-generated content across different AI models.
  • CharlyAI’s “Humanizer PRO” GPT was used to ‘humanize’ responses from both GPT-3.5 and GPT-4.0 before running them through the AI detectors again, testing the software’s capability to detect AI-generated content that had been altered to mimic human writing.
  • For a control measure, the same prompts were written by humans and were processed through each AI detector.

Results

Table of accuracy results across 7 of the most popular plagiarism ai detectors in the market, testing their ability to detect writing an essay results as a prompt by GradSimple.
Table of accuracy results across 7 of the most popular plagiarism ai detectors in the market, testing their ability to detect editing a piece of text results as a prompt by GradSimple.
Table of accuracy results across 7 of the most popular plagiarism ai detectors in the market, testing their ability to detect finishing a piece of writing results as a prompt by GradSimple.
Table of accuracy results across 7 of the most popular plagiarism ai detectors in the market, testing their ability to detect summarization text results as a prompt by GradSimple.
Table of accuracy results across 7 of the most popular plagiarism ai detectors in the market, testing their ability to detect explaining concept results as a prompt by GradSimple.
Table of average accuracy results across 7 of the most popular plagiarism ai detectors in the market by GradSimple.

Key Findings

Do AI content Detectors Work?

  • Yes, tools like Winston AI, Sapling, Copyleaks, and GPTZero are able to consistently detect content that is generated by GPT 3.5 and GPT-4.0 that has not been edited or put through a humanizer.

How accurate are AI content detectors?

  • Accuracies vary across tools, but WinstonAI, Sapling, Copyleaks, and GPTZero demonstrated high accuracy rates of up to 100% in detecting unedited, un-humanized results generated by GPT-3.5 and GPT-4.0.

Are humanizer GPTs like Humanizer PRO Effective?

  • Yes, the usage of Humanizer PRO notably impacted detection effectiveness. In several instances, AI text that was initially determined to be 0% human was later determined to be 100% human.
  • All but Winston AI and Sapling demonstrated the ability to detect text generated by GPT-3.5 + Humanizer and only Winston AI was able to produce positive results for GPT-4.0 + Humanizer, assigning the lowest accuracy of 26.6% human.

How Reliable Are AI Content Detection Tools at Detecting Human Writing?

  • AI content detectors are not reliable at consistently distinguishing AI generated content and human writing. While tools like Undetectable AI, Scribbr, and Content at Scale scored high in identifying human writing with accuracies of around 80%, all three were among the worst performers when it came to detecting AI generated text under GPT-3.5, GPT-4.0, and with and without a humanizer applied.
  • Although Winston AI and Sapling performed the best overall at detecting AI generated text, they both scored among the worst when reviewing human writing with scores of 51.8% and 20.2% respectively.

Publicly available AI content detectors such as the ones we’ve tested here offer extremely varied levels of AI detection, and the results of this test suggest they should not be used as sources of truth for for accurate results.

Our findings also suggest that AI content detection algorithms either have a bias towards recognizing text as being AI generated or human written, with no in between.

With humanizers’ ability to significantly skew detection percentages, and the lack of detection consistency with different prompts, AI content detectors may be able to discern some usage of AI but their results should be taken with a large grain of salt.

AI Content Detectors: Our Final Thoughts

The exploration conducted by GradSimple into the effectiveness of AI content detectors sheds light on the intricate dynamics of content creation and verification in an increasingly digital world. We believe this investigation reveals that while AI detectors are capable to a certain degree, their reliability is heavily influenced by variables such as AI model and the usage of humanization tools.

For more AI insights, tips, and job search guidance, join Gradsimple. Be the first to receive our latest resources and updates, tailored to help you level up.

If you would like to see the prompt responses we used, click here.

Share This Post
Photo of author
Head Writer and Editor for GradSimple. She also translates all of Ricky's incoherent raccoon ramblings into readable content (newsletter subscribers know what's up).

Latest Posts

Bite-Size Stories Of Life After College.

We show what life is like on the other side. One year, three years, ten years out: our interviews share what really goes on after you're handed your diploma. The best part? It’s all free.