scholarly journals Identification of Nitrogen Fixation Genes in Lactococcus Isolated from Maize Using Population Genomics and Machine Learning

2020 ◽  
Vol 8 (12) ◽  
pp. 2043
Author(s):  
Shawn M. Higdon ◽  
Bihua C. Huang ◽  
Alan B. Bennett ◽  
Bart C. Weimer

Sierra Mixe maize is a landrace variety from Oaxaca, Mexico, that utilizes nitrogen derived from the atmosphere via an undefined nitrogen fixation mechanism. The diazotrophic microbiota associated with the plant’s mucilaginous aerial root exudate composed of complex carbohydrates was previously identified and characterized by our group where we found 23 lactococci capable of biological nitrogen fixation (BNF) without containing any of the proposed essential genes for this trait (nifHDKENB). To determine the genes in Lactococcus associated with this phenotype, we selected 70 lactococci from the dairy industry that are not known to be diazotrophic to conduct a comparative population genomic analysis. This showed that the diazotrophic lactococcal genomes were distinctly different from the dairy isolates. Examining the pangenome followed by genome-wide association study and machine learning identified genes with the functions needed for BNF in the maize isolates that were absent from the dairy isolates. Many of the putative genes received an ‘unknown’ annotation, which led to the domain analysis of the 135 homologs. This revealed genes with molecular functions needed for BNF, including mucilage carbohydrate catabolism, glycan-mediated host adhesion, iron/siderophore utilization, and oxidation/reduction control. This is the first report of this pathway in this organism to underpin BNF. Consequently, we proposed a model needed for BNF in lactococci that plausibly accounts for BNF in the absence of the nif operon in this organism.

2019 ◽  
Author(s):  
DJ Darwin R. Bandoy ◽  
Bart C. Weimer

AbstractHighly dimensional data generated from bacterial whole genome sequencing is providing unprecedented scale of information that requires appropriate statistical frameworks of analysis to infer biological function from bacterial genomic populations. Application of genome wide association study (GWAS) methods is an emerging approach with bacterial population genomics that yields a list of genes associated with a phenotype with an undefined importance among the candidates in the list. Here, we validate the combination of GWAS, machine learning, and pathogenic bacterial population genomics as a novel scheme to identify SNPs and rank allelic variants to determine associations for accurate estimation of disease phenotype. This approach parsed a dataset of 1.2 million SNPs that resulted in a ranked importance of associated alleles of Campylobacter jejuni porA using multiple spatial locations over a 30-year period. We validated this approach using previously proven laboratory experimental alleles from an in vivo guinea pig abortion model. This approach, termed BioML, defined intestinal and extraintestinal groups that have differential allelic variants that cause abortion. Divergent variants containing indels that defeated gene callers were rescued using biological context and knowledge that resulted in defining rare and divergent variants that were maintained in the population over two continents and 30 years. This study defines the capability of machine learning coupled to GWAS and population genomics to simultaneously identify and rank alleles to define their role in abortion, and more broadly infectious disease.


2021 ◽  
Author(s):  
Xinxin Yi ◽  
Jing Liu ◽  
Shengcai Chen ◽  
Hao Wu ◽  
Min Liu ◽  
...  

Cultivated soybean (Glycine max) is an important source for protein and oil. Many elite cultivars with different traits have been developed for different conditions. Each soybean strain has its own genetic diversity, and the availability of more high-quality soybean genomes can enhance comparative genomic analysis for identifying genetic underpinnings for its unique traits. In this study, we constructed a high-quality de novo assembly of an elite soybean cultivar Jidou 17 (JD17) with chromsome contiguity and high accuracy. We annotated 52,840 gene models and reconstructed 74,054 high-quality full-length transcripts. We performed a genome-wide comparative analysis based on the reference genome of JD17 with three published soybeans (WM82, ZH13 and W05) , which identified five large inversions and two large translocations specific to JD17, 20,984 - 46,912 PAVs spanning 13.1 - 46.9 Mb in size, and 5 - 53 large PAV clusters larger than 500kb. 1,695,741 - 3,664,629 SNPs and 446,689 - 800,489 Indels were identified and annotated between JD17 and them. Symbiotic nitrogen fixation (SNF) genes were identified and the effects from these variants were further evaluated. It was found that the coding sequences of 9 nitrogen fixation-related genes were greatly affected. The high-quality genome assembly of JD17 can serve as a valuable reference for soybean functional genomics research.


2020 ◽  
Vol 21 (9) ◽  
pp. 3156 ◽  
Author(s):  
Sajid Shokat ◽  
Deepmala Sehgal ◽  
Prashant Vikram ◽  
Fulai Liu ◽  
Sukhwinder Singh

Terminal drought stress poses a big challenge to sustain wheat grain production in rain-fed environments. This study aimed to utilize the genetically diverse pre-breeding lines for identification of genomic regions associated with agro-physiological traits at terminal stage drought stress in wheat. A total of 339 pre-breeding lines panel derived from three-way crosses of ‘exotics × elite × elite’ lines were evaluated in field conditions at Obregon, Mexico for two years under well irrigated as well as drought stress environments. Drought stress was imposed at flowering by skipping the irrigations at pre and post anthesis stage. Results revealed that drought significantly reduced grain yield (Y), spike length (SL), number of grains spikes−1 (NGS) and thousand kernel weight (TKW), while kernel abortion (KA) was increased. Population structure analysis in this panel uncovered three sub-populations. Genome wide linkage disequilibrium (LD) decay was observed at 2.5 centimorgan (cM). The haplotypes-based genome wide association study (GWAS) identified significant associations of Y, SL, and TKW on three chromosomes; 4A (HB10.7), 2D (HB6.10) and 3B (HB8.12), respectively. Likewise, associations on chromosomes 6B (HB17.1) and 3A (HB7.11) were found for NGS while on chromosome 3A (HB7.12) for KA. The genomic analysis information generated in the study can be efficiently utilized to improve Y and/or related parameters under terminal stage drought stress through marker-assisted breeding.


2021 ◽  
Author(s):  
Michael Burns ◽  
Jonathan Renk ◽  
David Eickholt ◽  
Amanda Gilbert ◽  
Travis Hattery ◽  
...  

Lack of high throughput phenotyping systems for determining moisture content during the maize nixtamalization cooking process has led to difficulty in breeding for this trait. This study provides a high throughput, quantitative measure of kernel moisture content during nixtamalization based on NIR scanning of uncooked maize kernels. Machine learning was utilized to develop models based on the combination of NIR spectra and moisture content determined from a scaled-down benchtop cook method. A linear support vector machine (SVM) model with a Spearman's rank correlation coefficient of 0.852 between wet lab and predicted values was developed from 100 diverse temperate genotypes grown in replicate across two environments. This model was applied to NIR data from 501 diverse temperate genotypes grown in replicate in five environments. Analysis of variance revealed environment explained the highest percent of the variation (51.5%), followed by genotype (15.6%) and genotype-by-environment interaction (11.2%). A genome-wide association study identified 26 significant loci across five environments that explained between 5.04% and 16.01% (average = 10.41%). However, genome-wide markers explained 10.54% to 45.99% (average = 31.68%) of the variation, indicating the genetic architecture of this trait is likely complex and controlled by many loci of small effect. This study provides a high-throughput method to evaluate moisture content during nixtamalization that is feasible at the scale of a breeding program and provides important information about the factors contributing to variation of this trait for breeders and food companies to make future strategies to improve this important processing trait.


2015 ◽  
Author(s):  
Liya Wang ◽  
Peter Van Buren ◽  
Doreen Ware

Over the past few years, cloud-based platforms have been proposed to address storage, management, and computation of large-scale data, especially in the field of genomics. However, for collaboration efforts involving multiple institutes, data transfer and management, interoperability and standardization among different platforms have imposed new challenges. This paper proposes a distributed bioinformatics platform that can leverage local clusters with remote computational clusters for genomic analysis using the unified bioinformatics workflow. The platform is built with a data server configured with iRODS, a computation cluster authenticated with iPlant Agave system, and web server to interact with the platform. A Genome-Wide Association Study workflow is integrated to validate the feasibility of the proposed approach.


2021 ◽  
Vol 12 ◽  
Author(s):  
Xue Wang ◽  
Jianan Zhao ◽  
Fang Ji ◽  
Han Chang ◽  
Jiao Qin ◽  
...  

Multiple-replicon resistance plasmids have become important carriers of resistance genes in Gram-negative bacteria, and the evolution of multiple-replicon plasmids is still not clear. Here, 56 isolates of Klebsiella isolated from different wild animals and environments between 2018 and 2020 were identified by phenotyping via the micro-broth dilution method and were sequenced and analyzed for bacterial genome-wide association study. Our results revealed that the isolates from non-human sources showed more extensive drug resistance and especially strong resistance to ampicillin (up to 80.36%). The isolates from Malayan pangolin were particularly highly resistant to cephalosporins, chloramphenicol, levofloxacin, and sulfamethoxazole. Genomic analysis showed that the resistance plasmids in these isolates carried many antibiotic resistance genes. Further analysis of 69 plasmids demonstrated that 28 plasmids were multiple-replicon plasmids, mainly carrying beta-lactamase genes such as blaCTX–M–15, blaCTX–M–14, blaCTX–M–55, blaOXA–1, and blaTEM–1. The analysis of plasmids carried by different isolates showed that Klebsiella pneumoniae might be an important multiple-replicon plasmid host. Plasmid skeleton and structure analyses showed that a multiple-replicon plasmid was formed by the fusion of two or more single plasmids, conferring strong adaptability to the antibiotic environment and continuously increasing the ability of drug-resistant isolates to spread around the world. In conclusion, multiple-replicon plasmids are better able to carry resistance genes than non-multiple-replicon plasmids, which may be an important mechanism underlying bacterial responses to environments with high-antibiotic pressure. This phenomenon will be highly significant for exploring bacterial resistance gene transmission and diffusion mechanisms in the future.


2017 ◽  
Author(s):  
Khalil Abudahab ◽  
Joaquín M. Prada ◽  
Zhirong Yang ◽  
Stephen D. Bentley ◽  
Nicholas J. Croucher ◽  
...  

ABSTRACTThe standard workhorse for genomic analysis of the evolution of bacterial populations is phylogenetic modelling of mutations in the core genome. However, in the current era of population genomics, a notable amount of information about evolutionary and transmission processes in diverse populations can be lost unless the accessory genome is also taken into consideration. Here we introduce PANINI, a computationally scalable method for identifying the neighbours for each isolate in a data set using unsupervised machine learning with stochastic neighbour embedding. PANINI is browser-based and integrates with the Microreact platform for rapid online visualisation and exploration of both core and accessory genome evolutionary signals together with relevant epidemiological, geographic, temporal and other metadata. Several case studies with single-and multi-clone pneumococcal populations are presented to demonstrate ability to identify biologically important signals from gene content data. PANINI is available at http://panini.wgsa.net/ and code at http://gitlab.com/cgps/panini


2019 ◽  
Author(s):  
Haofeng Chen ◽  
Chengzhi Jiao ◽  
Ying Wang ◽  
Yuange Wang ◽  
Caihuan Tian ◽  
...  

AbstractThe evolution of bread wheat (Triticum aestivum) is distinctive in that domestication, natural hybridization, and allopolyploid speciation have all had significant effects on the diversification of its genome. Wheat was spread around the world by humans and has been cultivated in China for ~4,600 years. Here, we report a comprehensive assessment of the evolution of wheat based on the genome-wide resequencing of 120 representative landraces and elite wheat accessions from China and other representative regions. We found substantially higher genetic diversity in the A and B subgenomes than in the D subgenome. Notably, the A and B subgenomes of the modern Chinese elite cultivars were mainly derived from European landraces, while Chinese landraces had a greater contribution to their D subgenomes. The duplicated copies of homoeologous genes from the A, B, and D subgenomes were commonly found to be under different levels of selection. Our genome-wide assessment of the genetic changes associated with wheat breeding in China provides new strategies and practical targets for future breeding.


Sign in / Sign up

Export Citation Format

Share Document