PSX-A-29 Late-Breaking: Use of machine learning algorithms to predict residual feed intake value and classification groups in commercial beef cattle
Abstract The objective of this study was to explore the potential of Machine Learning (ML) algorithms to predict residual feed intake (RFI) classification group (high or low RFI) and individual RFI using performance records and genomic information. A total of 4145 animals from research and commercial herds with RFI performance records were included in the study from which 3899 cattle had genomic information (genotyped using Illumina Bovine 50k SNP BeadChip). Different libraries based on R and Python including Lazy Predict, Scikit-learn, PyCaret, and H2O Flow were used to test various ML models. Genomic information was subjected to quality control by removing SNPs with an allele frequency less than 0.05 or with a call rate lower than 0.95. A total of 42,689 SNPs remained for further analysis and accounted for 34% of phenotypic variation (heritability of 0.34±0.07) in RFI. Different numbers of SNPs were selected based on their contribution to phenotypic variation (500 SNPs, 1K, 5K, 10K, and 15K) then were included in the ML models. The GLM Stacked Ensemble model with 15k SNPs performed better than the other models to predict RFI classification group (R2 = 0.54). Regardless of the number of SNPs included in the model, GLM Stacked Ensemble performed better than other models to predict individual RFI. This model’s performance improved with increasing SNPs (MAE=0.39 for 500 SNPs; 0.31 for 15k SNPs). In the test data set, an increasing number of SNPs did not change the performance of the model and had a MAE of 0.39). The results demonstrate the potential for ML to improve predictions for feed efficiency compare to genomic analysis in beef cattle without measuring feed intake.