Kosuke Imai
IBD Amphi
AMU - AMSE
5-9 boulevard Maurice Bourdet
13001 Marseille
Nicolas Clootens : nicolas.clootens[at]univ-amu.fr
Romain Ferrali : romain.ferrali[at]univ-amu.fr
The use of Artificial Intelligence (AI), or more generally data-driven algorithms, has become ubiquitous in today's society. Yet, in many cases and especially when stakes are high, humans still make final decisions. The critical question, therefore, is whether AI helps humans make better decisions compared to a human-alone or AI-alone system. We introduce a new methodological framework to empirically answer this question with a minimal set of assumptions. We measure a decision maker's ability to make correct decisions using standard classification metrics based on the baseline potential outcome. We consider a single-blinded and unconfounded treatment assignment, where the provision of AI-generated recommendations is randomized across cases with humans making final decisions. Under this study design, we show how to compare the performance of three alternative decision-making systems — human-alone, human-with-AI, and AI-alone. Importantly, the AI-alone system includes any individualized treatment assignment, including those that are not used in the study. We also show when to provide a human-decision maker with AI recommendations and when they should follow such recommendations. We apply the proposed methodology to the data from our own randomized controlled trial of a pretrial risk assessment instrument. We find that the risk assessment recommendations do not improve the classification accuracy of a judge's decision to impose cash bail. Our analysis also shows that both the risk assessment score and a large language model generally perform worse than human decisions.