Machine learning discovery of missing links that mediate alternative branches to plant alkaloids
Abstract Engineering the microbial production of secondary metabolites is limited by the known reactions of correctly annotated enzymes in sequence databases. To expand the range of biosynthesis pathways, machine learning is herein demonstrated for the discovery of missing link enzymes, using benzylisoquinoline alkaloid production as a model application with potential to revolutionize the paradigm of sustainable biomanufacturing. Bacterial studies utilize a tetrahydropapaveroline pathway, whereas plants are reported to contain a more stable norcoclaurine pathway, which is exploited in yeast. However, committed aromatic precursors are currently produced by microbial enzymes that remain elusive in plants. Accordingly, the machine learning enzyme selection algorithm is first applied to clarify the early missing links in plant alkaloid pathways. Characterization of predicted sequences via metabolomics reveals distinct oxidases and carboxy-lyases, which complete a plant gene-only benzylisoquinoline alkaloid pathway from tyrosine. Synergistic application of aryl acetaldehyde producing enzymes results in enhanced production through hybrid norcoclaurine and tetrahydropapaveroline pathways. Transplantation of features into homologous enzyme templates leads to the highest levels of bacterial norcoclaurine and N-methylcoclaurine. Mechanism-directed isotope tracing patterns confirm alternative flux branches from aromatic precursors to alkaloids. This machine learning-driven workflow can be adapted to numerous pathways.