This paper proposes a statistical framework in which artificial intelligence can assist human decision making. Using observational data we benchmark the performance of each decision maker against the machine predictions, and replace decision makers whose information process quality is dominated by machine predictions based on the proposed criteria. The statistical frameworks that we proposed are applicable based on both Bayesian principles and frequentist principles of hypothesis testing and confidence set formation. Our theoretical discussion is illustrated by an example of birth defect detection, using a large data set of pregnancy outcomes and doctor diagnosis from the Pre-Pregnancy Checkups of reproductive age couples that are provided by the Chinese Ministry of Health. Based on doctor’s diagnosis, we find doctors, especially those who are from rural areas, can be replaced by the machine learning prediction. Statistically, the overall quality of our algorithm on a testable data set outperforms the diagnoses made only by doctors, with higher true positive rate and lower false positive rate. Our example also informs that decision making with artificial intelligence is more beneficial to poor areas relative to developed places.