LibGuides: Using and Evaluating AI Tools: AI Detectors

Detecting Artificial Intelligence

The proliferation of AI has resulted in a demand for tools to detect and identify AI-generated content. However, it’s important that we evaluate these tools to better understand their impact on students and the ethical concerns that surround their use.

What is an AI Detector?

AI detectors are algorithms that attempt to “classify” or label inputs as either human or AI generated. This type of algorithm is also known as a classifier and can be evaluated by their training data and performance metrics, including accuracy, precision, and recall.

What's in the training sample?

To label inputs as either human or AI, a classifier would need to be trained on both human and AI generated content. However, human-generated content is not a monolith. A reliable classifier of human generated content should be trained on a diverse sample of content that respects genre, linguistic background, education experience, etc.

Consider an AI detector for student writing. If you're trying to detect AI-generated content in a student paper, would you expect an AI detector trained only on web-published news articles to perform well? Probably not. To detect AI-generated content in a student paper, the training data should include human-generated student papers.

Positive and negative predictions.

Generally a classifier will have a positive and negative class. For an AI detector, the positive class might be AI generated and the negative class might be human generated. When given an input, the classifier will make a either a positive or negative prediction. Knowing which label is positive and which label is negative is helpful for understanding what it means for a classifier to report a false positive. If AI generated is the positive class, then a false positive can be costly because it means the classifier has wrongly predicted human-generated content as AI.

The marketing surrounding AI detectors can sometimes be confusing. Some detectors may claim that having human generated as the positive prediction improves accuracy, while discussing their model's low false positive rate. However, if human generated is the positive prediction, a false positive rate is less informative and less costly, since there is less risk (at least when detecting AI in student writing) in AI generated content being mislabeled as human generated. Therefore, we would want to know the false negative rate to know how often human-generated content is mislabeled as AI.

Is Accuracy enough?

With positive and negative classes defined, we can focus on three metrics (Accuracy, Precision, and Recall) to help evaluate the overall reliability of a classifier. For the following definitions, assume that AI is the positive class and human is the negative class.

Accuracy: How often is AI-generated content and human-generated content correctly labeled? Accuracy is calculated by dividing the number of correct predictions by the total number of predictions.
Precision: What proportion of AI-generated labels are correct? We calculate precision by dividing the number of correct positive-class predictions by the sum of correct and incorrect positive-class predictions.
Recall: What proportion of AI-generated content in the sample was correctly labeled? Recall is calculated by dividing the number of correct positive-class predictions by the sum of correct positive-class predictions and incorrectly negative-class predictions

Often, marketing surrounding AI detectors will provide the Accuracy, but not the Precision and Recall of a model. It’s important to evaluate a classifier on more than just the accuracy, because it can be misleading on its own. Consider a sample of 100 student papers, where 10 are written by a human and 90 are written by AI. If the classifier predicts that all the papers are written by AI, then it has an accuracy of 90%. While that might look like a good accuracy, the class-imbalance in the sample (10 to 90) results in an unreliable accuracy metric. Knowing the precision and recall of a model can help to identify these issues. Further, since many AI detectors use proprietary machine-learning methods and their Accuracy is often revealed in their marketing materials, there is an incentive to hide vital information that could help to evaluate the true reliability of their results.

Want to learn more about classification and machine learning? Check out Google's Machine Learning Crash Course.

If you are interested in using an AI detectors, consider broadening your evaluation to include research that evaluate their results, such as:

Detecting AI content in responses generated by ChatGPT, YouChat, and Chatsonic: The case of five AI content detection tools
By Chaka Chaka and published in the Journal of Applied Learning & Teaching (JALT)
Testing of Detection Tools for AI-generated Text
By Weber-Wulff et al. published in the International Journal for Educational Integrity
Evade ChatGPT Detectors via a Single Space
by Shuyang Cai and Wanyun Cui
Why doesn’t NYU license an AI detector?
Vanderbilt University: Guidance on AI Detection and Why We’re Disabling Turnitin’s AI Detector

Ethical Concerns

Bias

Again, consider the sample used to train an AI detector. If the sample is over- or under-representative of certain styles of writing, such as writing by second-language learners or writers with strong regional dialects, the detector may overly classify those writers as AI.

Privacy

When uploading student work to AI detectors, you can't always be certain how their work is going to be stored or used by the detector. Be aware that uploading student assignments with personally identifiable information may be a violation of FERPA.

Purpose

Consider why you're using an AI detector. Are you only checking papers that you already suspect of using AI-generated content? How do your personal and implicit biases play into decisions around the use and application of AI detectors? Rather, consider designing your assignments to be resistant to AI or incorporate explicit uses of AI to encourage responsible use.