Generative AI, through large language models, has become a cheap and quick method to generate misleading or fake stories. However, producing false or inaccurate results is not always intentional, as machine-generated contents are subject to “artificial hallucinations”. Defining the (non)human nature of the author seems pointless, especially since detection methods still remain limited. Another perspective is grounded in the tradition of human judgement methods studied and developed in natural language processing to assess the qualitative characteristics of machine-generated content. It consists of evaluating the system’s ability to stick to the facts through an adapted language-independent metric. Practically, it helps humans to assess machine-generated text, sentence by sentence, attributing it a true, partially true, false or partially false rating. The system generates an Information Disorder Level (IDLI) index that ranges from 0 to 10. A text is considered reliable and accurate with a score of 0. In an experiment on a corpus of generated content, we get an average score of 3.9 and a maximum score of 8.3. Beyond research, this tool helps to understand the limits of generative AI and fosters a reflection on what factuality is.
This research was presented on December 6, 2023, at the MASCHINE conference (Aarbord University, Copenhagen) – « GENERATIVE METHODS – AI AS COLLABORATOR AND COMPANION IN THE SOCIAL SCIENCES AND HUMANITIES ».
The assessment tool is available in a standalone version that you can run on your computer: https://github.com/laurence001/idl/
The online version aims to build a training dataset from the assessment (there it is also possible to reassess already assessed content): http://idlindex.net
SSH_IDLIndex_Bergen