AI-driven journalism refers to various methods and tools for gathering, verifying, producing, and distributing news information. Their potential is to extend human capabilities and create new forms of augmented journalism. Although scholars agreed on the necessity to embed journalistic values in these systems to make AI systems accountable, less attention was paid to data quality, while the results’ accuracy and efficiency depend on high-quality data in any machine learning task. Assessing data quality in the context of AI-driven journalism requires a broader and interdisciplinary approach, relying on the challenges of data quality in machine learning and the ethical challenges of using machine learning in journalism. To better identify these, we propose a data quality assessment framework to support the collection and pre-processing stages in machine learning. It relies on three of the core principles of ethical journalism –accuracy, fairness, and transparency– and participates in the shift from model-centric to data-centric AI, by focusing on data quality to reduce reliance on large datasets with errors, making data labelling consistent, and better integrating journalistic knowledge.
Paper presented on August, 29. Mini-conference on AI Infrastructures, University of Amsterdam