In recent years, the use of machine learning to predict personality from digital data has gained increasing interest from organisations, academics and the public. In turn, a new field of personality computing has developed, which involves combining machine learning techniques with psychological measures of personality. However, effectively integrating these approaches is challenging - the fields of machine learning and psychology are highly disparate, with different objectives, methodologies, and perspectives on performing and reporting research. In this article, we report findings from a systematic review that analysed 178 personality computing studies published before November 2020. We developed a novel set of criteria that was used to evaluate the quality of study design and reporting of each study according to 10 criteria: hypotheses, study rationale, selection of features, algorithm training, ground truth, sampling, the evaluation of algorithms’ performance (i.e., classification, regression), the performance measures reported, and detail concerning ethics and open science practices. Our findings highlight that a large proportion of studies lack detail on the above criteria, which leads to questions over the validity, reliability, and replicability of the findings. We discuss the implications of this research for practice and recommend directions for future work.