Given significant concerns about fairness and bias in the use of artificial intelligence (AI) and machine learning (ML) for assessing psychological constructs, we provide a conceptual framework for investigating and mitigating machine learning measurement bias (MLMB) from a psychometric perspective. MLMB is defined as differential functioning of the trained ML model between subgroups. MLMB can empirically manifest when a trained ML model produces different predicted score levels for individuals belonging to different subgroups (e.g., race, gender) despite them having the same ground truth level for the underlying construct of interest (e.g., personality), and/or when the model yields differential predictive accuracies across the subgroups. Because the development of ML models involves both data and algorithms, both biased data and algorithm training bias are potential sources of MLMB. Data bias can occur in the form of nonequivalence between subgroups in the ground truth, platform-based construct, behavioral expression, and/or feature computing. Algorithm training bias can occur when algorithms are developed with nonequivalence in the relation between extracted features and ground truth (i.e., algorithm features are differentially used, weighted, or transformed between subgroups). We explain how these potential sources of bias may manifest during ML model development and share initial ideas on how to mitigate them, recognizing that the development of new statistical and algorithmic procedures will need to follow. We also discuss how this framework brings clarity to MLMB but does not reduce the complexity of the issue.