Authoritative subspecies diagnosis tool for European honey bees based on ancestry informative SNPs

Abstract Background With numerous endemic subspecies representing four of its five evolutionary lineages, Europe holds a large fraction of Apis mellifera genetic diversity. This diversity and the natural distribution range have been altered by anthropogenic factors. The conservation of this natural heritage relies on the availability of accurate tools for subspecies diagnosis. Based on pool-sequence data from 2145 worker bees representing 22 populations sampled across Europe, we employed two highly discriminative approaches (PCA and FST) to select the most informative SNPs for ancestry inference. Results Using a supervised machine learning (ML) approach and a set of 3896 genotyped individuals, we could show that the 4094 selected single nucleotide polymorphisms (SNPs) provide an accurate prediction of ancestry inference in European honey bees. The best ML model was Linear Support Vector Classifier (Linear SVC) which correctly assigned most individuals to one of the 14 subspecies or different genetic origins with a mean accuracy of 96.2% ± 0.8 SD. A total of 3.8% of test individuals were misclassified, most probably due to limited differentiation between the subspecies caused by close geographical proximity, or human interference of genetic integrity of reference subspecies, or a combination thereof. Conclusions The diagnostic tool presented here will contribute to a sustainable conservation and support breeding activities in order to preserve the genetic heritage of European honey bees.

Download Full-text

Comparative machine learning approach for biomarker identification using multiomics data from patients with endometriosis

10.32469/10355/73840 ◽

2019 ◽

Author(s):

◽

Sadia Akter

Keyword(s):

Machine Learning ◽

Molecular Mechanisms ◽

Sequence Data ◽

Supervised Machine Learning ◽

Support Vector ◽

Learning Tools ◽

Next Generation ◽

University Of Missouri ◽

Gynecological Disorder ◽

Ngs Data

[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT REQUEST OF AUTHOR.] Endometriosis is a complex and common gynecological disorder yet a poorly understood disease affecting about 176 million women worldwide, and causing significant impact on their quality of life and economic burden. Neither a definitive clinical symptom nor a minimally invasive diagnostic method is available thus leading to an average of 10 years of diagnostic latency. Discovery of relevant biological patterns from microarray expression or next generation sequence (NGS) data has been advanced over the last several decades by applying various machine learning tools. The overall objective of this project was to identify diagnostic molecular mechanisms and biomarkers of endometriosis using a multi-omics approach and various machine learning classifiers. This objective was fulfilled by three related but independent aims: (1) mining rna-seq data to discover molecular mechanisms of endometriosis, (2) to discover diagnostics features of endometriosis in the DNA-methylation profile of the endometrium, and (3) develop innovative machine learning-based differential classification models using whole genome high throughput next generation sequence data. We experimented how well various supervised machine learning methods such as decision tree, Partial least squares-discriminant analysis, support vector machine, random forest and a newly developed method called GenomeForest perform in classifying endometriosis from the control samples trained on both transcriptomics and methylomics data.

Download Full-text

Predictive Modelling of Employee Turnover in Indian IT Industry Using Machine Learning Techniques

Vision The Journal of Business Perspective ◽

10.1177/0972262918821221 ◽

2019 ◽

Vol 23 (1) ◽

pp. 12-21 ◽

Cited By ~ 2

Author(s):

Shikha N. Khera ◽

Divya

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Confusion Matrix ◽

Predictive Modelling ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

It Industry ◽

Knowledge Based ◽

Employee Attrition

Information technology (IT) industry in India has been facing a systemic issue of high attrition in the past few years, resulting in monetary and knowledge-based loses to the companies. The aim of this research is to develop a model to predict employee attrition and provide the organizations opportunities to address any issue and improve retention. Predictive model was developed based on supervised machine learning algorithm, support vector machine (SVM). Archival employee data (consisting of 22 input features) were collected from Human Resource databases of three IT companies in India, including their employment status (response variable) at the time of collection. Accuracy results from the confusion matrix for the SVM model showed that the model has an accuracy of 85 per cent. Also, results show that the model performs better in predicting who will leave the firm as compared to predicting who will not leave the company.

Download Full-text

PSI-40 Two mitochondrial lineages revealed in North American yak

Journal of Animal Science ◽

10.1093/jas/skaa278.833 ◽

2020 ◽

Vol 98 (Supplement_4) ◽

pp. 477-477

Author(s):

Leah K Treffer ◽

Edward S Rice ◽

Anna M Fuller ◽

Samuel Cutler ◽

Jessica L Petersen

Keyword(s):

Sequence Data ◽

Haplotype Network ◽

Ovis Aries ◽

Similar Species ◽

Nucleotide Polymorphisms ◽

Mt Dna ◽

Protein Coding ◽

Sister Clade ◽

Mtdna Sequence ◽

The Impact

Abstract Domestic yak (Bos grunniens) are bovids native to the Asian Qinghai-Tibetan Plateau. Studies of Asian yak have revealed that introgression with domestic cattle has contributed to the evolution of the species. When imported to North America (NA), some hybridization with B. taurus did occur. The objective of this study was to use mitochondrial (mt) DNA sequence data to better understand the mtDNA origin of NA yak and their relationship to Asian yak and related species. The complete mtDNA sequence of 14 individuals (12 NA yak, 1 Tibetan yak, 1 Tibetan B. indicus) was generated and compared with sequences of similar species from GeneBank (B. indicus, B. grunniens (Chinese), B. taurus, B. gaurus, B. primigenius, B. frontalis, Bison bison, and Ovis aries). Individuals were aligned to the B. grunniens reference genome (ARS_UNL_BGru_maternal_1.0), which was also included in the analyses. The mtDNA genes were annotated using the ARS-UCD1.2 cattle sequence as a reference. Ten unique NA yak haplotypes were identified, which a haplotype network separated into two clusters. Variation among the NA haplotypes included 93 nonsynonymous single nucleotide polymorphisms. A maximum likelihood tree including all taxa was made using IQtree after the data were partitioned into twenty-two subgroups using PartitionFinder2. Notably, six NA yak haplotypes formed a clade with B. indicus; the other four haplotypes grouped with B. grunniens and fell as a sister clade to bison, gaur and gayal. These data demonstrate two mitochondrial origins of NA yak with genetic variation in protein coding genes. Although these data suggest yak introgression with B. indicus, it appears to date prior to importation into NA. In addition to contributing to our understanding of the species history, these results suggest the two major mtDNA haplotypes in NA yak may functionally differ. Characterization of the impact of these differences on cellular function is currently underway.

Download Full-text

A new hybrid record linkage process to make epidemiological databases interoperable: application to the GEMO and GENEPSO studies involving BRCA1 and BRCA2 mutation carriers

BMC Medical Research Methodology ◽

10.1186/s12874-021-01299-6 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Yue Jiao ◽

Fabienne Lesueur ◽

Chloé-Agathe Azencott ◽

Maïté Laurent ◽

Noura Mebirouk ◽

...

Keyword(s):

Record Linkage ◽

Gold Standard ◽

Brca2 Mutation ◽

Epidemiological Studies ◽

Supervised Machine Learning ◽

Training Dataset ◽

Support Vector ◽

Genetic Modifiers ◽

Brca1 And Brca2 ◽

Mutation Carriers

Abstract Background Linking independent sources of data describing the same individuals enable innovative epidemiological and health studies but require a robust record linkage approach. We describe a hybrid record linkage process to link databases from two independent ongoing French national studies, GEMO (Genetic Modifiers of BRCA1 and BRCA2), which focuses on the identification of genetic factors modifying cancer risk of BRCA1 and BRCA2 mutation carriers, and GENEPSO (prospective cohort of BRCAx mutation carriers), which focuses on environmental and lifestyle risk factors. Methods To identify as many as possible of the individuals participating in the two studies but not registered by a shared identifier, we combined probabilistic record linkage (PRL) and supervised machine learning (ML). This approach (named “PRL + ML”) combined together the candidate matches identified by both approaches. We built the ML model using the gold standard on a first version of the two databases as a training dataset. This gold standard was obtained from PRL-derived matches verified by an exhaustive manual review. Results The Random Forest (RF) algorithm showed a highest recall (0.985) among six widely used ML algorithms: RF, Bagged trees, AdaBoost, Support Vector Machine, Neural Network. Therefore, RF was selected to build the ML model since our goal was to identify the maximum number of true matches. Our combined linkage PRL + ML showed a higher recall (range 0.988–0.992) than either PRL (range 0.916–0.991) or ML (0.981) alone. It identified 1995 individuals participating in both GEMO (6375 participants) and GENEPSO (4925 participants). Conclusions Our hybrid linkage process represents an efficient tool for linking GEMO and GENEPSO. It may be generalizable to other epidemiological studies involving other databases and registries.

Download Full-text

Supervised Machine Learning Methods and Hyperspectral Imaging Techniques Jointly Applied for Brain Cancer Classification

Sensors ◽

10.3390/s21113827 ◽

2021 ◽

Vol 21 (11) ◽

pp. 3827

Author(s):

Gemma Urbanos ◽

Alberto Martín ◽

Guillermo Vázquez ◽

Marta Villanueva ◽

Manuel Villa ◽

...

Keyword(s):

Machine Learning ◽

Blood Vessel ◽

Hyperspectral Imaging ◽

Imaging Techniques ◽

Venous Blood ◽

Healthy Tissue ◽

Supervised Machine Learning ◽

Support Vector ◽

Arterial Blood

Hyperspectral imaging techniques (HSI) do not require contact with patients and are non-ionizing as well as non-invasive. As a consequence, they have been extensively applied in the medical field. HSI is being combined with machine learning (ML) processes to obtain models to assist in diagnosis. In particular, the combination of these techniques has proven to be a reliable aid in the differentiation of healthy and tumor tissue during brain tumor surgery. ML algorithms such as support vector machine (SVM), random forest (RF) and convolutional neural networks (CNN) are used to make predictions and provide in-vivo visualizations that may assist neurosurgeons in being more precise, hence reducing damages to healthy tissue. In this work, thirteen in-vivo hyperspectral images from twelve different patients with high-grade gliomas (grade III and IV) have been selected to train SVM, RF and CNN classifiers. Five different classes have been defined during the experiments: healthy tissue, tumor, venous blood vessel, arterial blood vessel and dura mater. Overall accuracy (OACC) results vary from 60% to 95% depending on the training conditions. Finally, as far as the contribution of each band to the OACC is concerned, the results obtained in this work are 3.81 times greater than those reported in the literature.

Download Full-text

Serum neurofilament levels reflect outer retinal layer changes in multiple sclerosis

Therapeutic Advances in Neurological Disorders ◽

10.1177/17562864211003478 ◽

2021 ◽

Vol 14 ◽

pp. 175628642110034

Author(s):

Caspar B. Seitz ◽

Falk Steffen ◽

Muthuraman Muthuraman ◽

Timo Uphaus ◽

Julia Krämer ◽

...

Keyword(s):

Multiple Sclerosis ◽

Optic Neuritis ◽

Previous History ◽

Supervised Machine Learning ◽

Support Vector ◽

Retinal Layer ◽

Neurofilament Light Chain ◽

Neurofilament Light ◽

Retinal Layers ◽

History Of

Background: Serum neurofilament light chain (sNfL) and distinct intra-retinal layers are both promising biomarkers of neuro-axonal injury in multiple sclerosis (MS). We aimed to unravel the association of both markers in early MS, having identified that neurofilament has a distinct immunohistochemical expression pattern among intra-retinal layers. Methods: Three-dimensional (3D) spectral domain macular optical coherence tomography scans and sNfL levels were investigated in 156 early MS patients (female/male: 109/47, mean age: 33.3 ± 9.5 years, mean disease duration: 2.0 ± 3.3 years). Out of the whole cohort, 110 patients had no history of optic neuritis (NHON) and 46 patients had a previous history of optic neuritis (HON). In addition, a subgroup of patients ( n = 38) was studied longitudinally over 2 years. Support vector machine analysis was applied to test a regression model for significant changes. Results: In our cohort, HON patients had a thinner outer plexiform layer (OPL) volume compared to NHON patients ( B = −0.016, SE = 0.006, p = 0.013). Higher sNfL levels were significantly associated with thinner OPL volumes in HON patients ( B = −6.734, SE = 2.514, p = 0.011). This finding was corroborated in the longitudinal subanalysis by the association of higher sNfL levels with OPL atrophy ( B = 5.974, SE = 2.420, p = 0.019). sNfL levels were 75.7% accurate at predicting OPL volume in the supervised machine learning. Conclusions: In summary, sNfL levels were a good predictor of future outer retinal thinning in MS. Changes within the neurofilament-rich OPL could be considered as an additional retinal marker linked to MS neurodegeneration.

Download Full-text

Financial Context News Sentiment Analysis for the Lithuanian Language

Applied Sciences ◽

10.3390/app11104443 ◽

2021 ◽

Vol 11 (10) ◽

pp. 4443

Author(s):

Rokas Štrimaitis ◽

Pavel Stefanovič ◽

Simona Ramanauskaitė ◽

Asta Slotkienė

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Short Term Memory ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Experimental Investigations ◽

Support Vector ◽

Applied Machine Learning ◽

Bayes Algorithm ◽

Website Content

Financial area analysis is not limited to enterprise performance analysis. It is worth analyzing as wide an area as possible to obtain the full impression of a specific enterprise. News website content is a datum source that expresses the public’s opinion on enterprise operations, status, etc. Therefore, it is worth analyzing the news portal article text. Sentiment analysis in English texts and financial area texts exist, and are accurate, the complexity of Lithuanian language is mostly concentrated on sentiment analysis of comment texts, and does not provide high accuracy. Therefore in this paper, the supervised machine learning model was implemented to assign sentiment analysis on financial context news, gathered from Lithuanian language websites. The analysis was made using three commonly used classification algorithms in the field of sentiment analysis. The hyperparameters optimization using the grid search was performed to discover the best parameters of each classifier. All experimental investigations were made using the newly collected datasets from four Lithuanian news websites. The results of the applied machine learning algorithms show that the highest accuracy is obtained using a non-balanced dataset, via the multinomial Naive Bayes algorithm (71.1%). The other algorithm accuracies were slightly lower: a long short-term memory (71%), and a support vector machine (70.4%).

Download Full-text

Genetic ancestry inference using support vector machines, and the active emergence of a unique American population

European Journal of Human Genetics ◽

10.1038/ejhg.2012.258 ◽

2012 ◽

Vol 21 (5) ◽

pp. 554-562 ◽

Cited By ~ 6

Author(s):

Ryan J Haasl ◽

Catherine A McCarty ◽

Bret A Payseur

Keyword(s):

Support Vector Machines ◽

Genetic Ancestry ◽

Support Vector ◽

American Population ◽

Vector Machines ◽

Ancestry Inference

Download Full-text

Nonsynonymous single nucleotide polymorphisms of NHE3 differentially decrease NHE3 transporter activity

AJP Cell Physiology ◽

10.1152/ajpcell.00421.2014 ◽

2015 ◽

Vol 308 (9) ◽

pp. C758-C766 ◽

Cited By ~ 7

Author(s):

Xinjun Cindy Zhu ◽

Rafiquel Sarker ◽

John R. Horton ◽

Molee Chakraborty ◽

Tian-E Chen ◽

...

Keyword(s):

Single Nucleotide Polymorphisms ◽

Large Fraction ◽

Regulatory Protein ◽

Nucleotide Polymorphisms ◽

Genetic Determinants ◽

Single Nucleotide ◽

Mutant Proteins ◽

Homologous Protein ◽

Genetic Abnormalities ◽

The Impact

Genetic determinants appear to play a role in susceptibility to chronic diarrhea, but the genetic abnormalities involved have only been identified in a few conditions. The Na+/H+ exchanger 3 (NHE3) accounts for a large fraction of physiologic intestinal Na+ absorption. It is highly regulated through effects on its intracellular COOH-terminal regulatory domain. The impact of genetic variation in the NHE3 gene, such as single nucleotide polymorphisms (SNPs), on transporter activity remains unexplored. From a total of 458 SNPs identified in the entire NHE3 gene, we identified three nonsynonymous mutations (R474Q, V567M, and R799C), which were all in the protein's intracellular COOH-terminal domain. Here we evaluated whether these SNPs affect NHE3 activity by expressing them in a mammalian cell line that is null for all plasma membrane NHEs. These variants significantly reduced basal NHE3 transporter activity through a reduction in intrinsic NHE3 function in variant R474Q, abnormal trafficking in variant V567M, or defects in both intrinsic NHE3 function and trafficking in variant R799C. In addition, variants NHE3 R474Q and R799C failed to respond to acute dexamethasone stimulation, suggesting cells with these mutant proteins might be defective in NHE3 function during postprandial stimulation and perhaps under stressful conditions. Finally, variant R474Q was shown to exhibit an aberrant interaction with calcineurin B homologous protein (CHP), an NHE3 regulatory protein required for basal NHE3 activity. Taken together, these results demonstrate decreased transport activity in three SNPs of NHE3 and provide mechanistic insight into how these SNPs impact NHE3 function.

Download Full-text

Optimizing machine learning models for granular NdFeB magnets by very fast simulated annealing

Scientific Reports ◽

10.1038/s41598-021-83315-9 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Hyeon-Kyu Park ◽

Jae-Hyeok Lee ◽

Jehyun Lee ◽

Sang-Koog Kim

Keyword(s):

Machine Learning ◽

Simulated Annealing ◽

Permanent Magnets ◽

Supervised Machine Learning ◽

Support Vector ◽

Micromagnetic Simulations ◽

Ndfeb Magnets ◽

Average Grain Size ◽

Macroscopic Properties ◽

Very Fast Simulated Annealing

AbstractThe macroscopic properties of permanent magnets and the resultant performance required for real implementations are determined by the magnets’ microscopic features. However, earlier micromagnetic simulations and experimental studies required relatively a lot of work to gain any complete and comprehensive understanding of the relationships between magnets’ macroscopic properties and their microstructures. Here, by means of supervised learning, we predict reliable values of coercivity (μ0Hc) and maximum magnetic energy product (BHmax) of granular NdFeB magnets according to their microstructural attributes (e.g. inter-grain decoupling, average grain size, and misalignment of easy axes) based on numerical datasets obtained from micromagnetic simulations. We conducted several tests of a variety of supervised machine learning (ML) models including kernel ridge regression (KRR), support vector regression (SVR), and artificial neural network (ANN) regression. The hyper-parameters of these models were optimized by a very fast simulated annealing (VFSA) algorithm with an adaptive cooling schedule. In our datasets of randomly generated 1,000 polycrystalline NdFeB cuboids with different microstructural attributes, all of the models yielded similar results in predicting both μ0Hc and BHmax. Furthermore, some outliers, which deteriorated the normality of residuals in the prediction of BHmax, were detected and further analyzed. Based on all of our results, we can conclude that our ML approach combined with micromagnetic simulations provides a robust framework for optimal design of microstructures for high-performance NdFeB magnets.

Download Full-text