Identification of Chicken Populations by Machine Learning Models Using the Minimum Number of SNPs

Mapping Intimacies ◽

10.21203/rs.3.rs-95706/v1 ◽

2020 ◽

Author(s):

Dongwon Seo ◽

Sunghyun Cho ◽

Prabuddha Manjula ◽

Nuri Choi ◽

Young Kuk Kim ◽

...

Keyword(s):

Machine Learning ◽

Snp Array ◽

Machine Learning Algorithms ◽

Case Group ◽

Machine Learning Classification ◽

Genetic Components ◽

Native Chicken ◽

Marker Combination ◽

A Genome ◽

Minimum Number

Abstract BackgroundA marker combination capable of classifying a specific chicken population could improve commercial value by increasing consumer confidence with respect to the origin of the population. This would also facilitate the protection of genetic resources, especially in developing countries. MethodsIn this study, a total of 20 lines 283 samples which were consist of Korean native chicken, commercial native chicken, and commercial broilers with layer population were used for finding the minimum number of marker combinations through the 600k high-density single nucleotide polymorphism (SNP) array. Application of the machine learning algorithms, a genome-wide association study (GWAS), linkage disequilibrium (LD) analysis, and principal component analysis (PCA) were used to distinguish a target (case) group from control chicken groups. In the verification of the selected markers, a total of 12 lines 182 samples were used to confirm the change in the accuracy of the target chicken breed identification.ResultsA total of 47,303 SNPs was used for classifying chicken populations; 96 LD-pruned SNPs (50 SNPs per LD block) served as the best marker combination for target chicken classification. Moreover, 36, 44, and 8 SNPs were selected as the minimum numbers of markers by Adaboost (AB), Random Forest (RF), and Decision Tree (DT) machine learning classification models, which had accuracy rates of 99.6%, 98.0% and 97.9%, respectively. The selected marker combinations increased the genetic distance between the case and control groups, and reduced the number of genetic components, confirming that an efficient classification of the groups was possible using small number of marker sets. In a verification study including additional chicken breeds and samples, the accuracy did not significantly change, and the target chicken group could be clearly distinguished from the other populations.ConclusionsThe GWAS and PCA analysis, machine learning algorithm used in this study is able to be applied efficiently to explore the minimum combination of markers that can distinguish varieties among a large number of SNP markers.

Identification of Target Chicken Populations by Machine Learning Models Using the Minimum Number of SNPs

Animals ◽

10.3390/ani11010241 ◽

2021 ◽

Vol 11 (1) ◽

pp. 241

Author(s):

Dongwon Seo ◽

Sunghyun Cho ◽

Prabuddha Manjula ◽

Nuri Choi ◽

Young-Kuk Kim ◽

...

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Fixation Index ◽

Machine Learning Classification ◽

Genetic Components ◽

Marker Combination ◽

A Genome ◽

Minimum Number ◽

Native Chickens

A marker combination capable of classifying a specific chicken population could improve commercial value by increasing consumer confidence with respect to the origin of the population. This would facilitate the protection of native genetic resources in the market of each country. In this study, a total of 283 samples from 20 lines, which consisted of Korean native chickens, commercial native chickens, and commercial broilers with a layer population, were analyzed to determine the optimal marker combination comprising the minimum number of markers, using a 600 k high-density single nucleotide polymorphism (SNP) array. Machine learning algorithms, a genome-wide association study (GWAS), linkage disequilibrium (LD) analysis, and principal component analysis (PCA) were used to distinguish a target (case) group for comparison with control chicken groups. In the processing of marker selection, a total of 47,303 SNPs were used for classifying chicken populations; 96 LD-pruned SNPs (50 SNPs per LD block) served as the best marker combination for target chicken classification. Moreover, 36, 44, and 8 SNPs were selected as the minimum numbers of markers by the AdaBoost (AB), Random Forest (RF), and Decision Tree (DT) machine learning classification models, which had accuracy rates of 99.6%, 98.0%, and 97.9%, respectively. The selected marker combinations increased the genetic distance and fixation index (Fst) values between the case and control groups, and they reduced the number of genetic components required, confirming that efficient classification of the groups was possible by using a small number of marker sets. In a verification study including additional chicken breeds and samples (12 lines and 182 samples), the accuracy did not significantly change, and the target chicken group could be clearly distinguished from the other populations. The GWAS, PCA, and machine learning algorithms used in this study can be applied efficiently, to determine the optimal marker combination with the minimum number of markers that can distinguish the target population among a large number of SNP markers.

Towards the Minimum Number of Wearables to Recognize Signer-Independent Italian Sign Language with Machine-Learning Algorithms

IEEE Transactions on Instrumentation and Measurement ◽

10.1109/tim.2021.3109732 ◽

2021 ◽

pp. 1-1

Author(s):

Alexandre Calado ◽

Vito Errico ◽

Giovanni Saggio

Keyword(s):

Machine Learning ◽

Sign Language ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Minimum Number

PSIX-15 Assessment of machine learning algorithms for prediction of Aleutian disease in American mink

Journal of Animal Science ◽

10.1093/jas/skab235.484 ◽

2021 ◽

Vol 99 (Supplement_3) ◽

pp. 264-265

Author(s):

Duy Ngoc Do ◽

Guoyu Hu ◽

Younes Miar

Keyword(s):

Machine Learning ◽

Random Forest ◽

Linear Models ◽

American Mink ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Training Data ◽

Enzyme Linked Immunosorbent Assay ◽

Linear Discriminant ◽

Machine Learning Classification

Abstract American mink (Neovison vison) is the major source of fur for the fur industries worldwide and Aleutian disease (AD) is causing severe financial losses to the mink industry. Different methods have been used to diagnose the AD in mink, but the combination of several methods can be the most appropriate approach for the selection of AD resilient mink. Iodine agglutination test (IAT) and counterimmunoelectrophoresis (CIEP) methods are commonly employed in test-and-remove strategy; meanwhile, enzyme-linked immunosorbent assay (ELISA) and packed-cell volume (PCV) methods are complementary. However, using multiple methods are expensive; and therefore, hindering the corrected use of AD tests in selection. This research presented the assessments of the AD classification based on machine learning algorithms. The Aleutian disease was tested on 1,830 individuals using these tests in an AD positive mink farm (Canadian Centre for Fur Animal Research, NS, Canada). The accuracy of classification for CIEP was evaluated based on the sex information, and IAT, ELISA and PCV test results implemented in seven machine learning classification algorithms (Random Forest, Artificial Neural Networks, C50Tree, Naive Bayes, Generalized Linear Models, Boost, and Linear Discriminant Analysis) using the Caret package in R. The accuracy of prediction varied among the methods. Overall, the Random Forest was the best-performing algorithm for the current dataset with an accuracy of 0.89 in the training data and 0.94 in the testing data. Our work demonstrated the utility and relative ease of using machine learning algorithms to assess the CIEP information, and consequently reducing the cost of AD tests. However, further works require the inclusion of production and reproduction information in the models and extension of phenotypic collection to increase the accuracy of current methods.

Learning Predictors from Multidimensional Data with Tensor Factorizations

Aresty Rutgers Undergraduate Research Journal ◽

10.14713/arestyrurj.v1i3.165 ◽

2021 ◽

Vol 1 (3) ◽

Author(s):

Soo Min Kwon ◽

Anand D. Sarwate

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Multidimensional Data ◽

Tensor Structure ◽

Statistical Machine Learning ◽

Machine Learning Classification ◽

Independent Variables ◽

Tensor Factorizations ◽

The Relationship

Statistical machine learning algorithms often involve learning a linear relationship between dependent and independent variables. This relationship is modeled as a vector of numerical values, commonly referred to as weights or predictors. These weights allow us to make predictions, and the quality of these weights influence the accuracy of our predictions. However, when the dependent variable inherently possesses a more complex, multidimensional structure, it becomes increasingly difficult to model the relationship with a vector. In this paper, we address this issue by investigating machine learning classification algorithms with multidimensional (tensor) structure. By imposing tensor factorizations on the predictors, we can better model the relationship, as the predictors would take the form of the data in question. We empirically show that our approach works more efficiently than the traditional machine learning method when the data possesses both an exact and an approximate tensor structure. Additionally, we show that estimating predictors with these factorizations also allow us to solve for fewer parameters, making computation more feasible for multidimensional data.

Predicting Vasovagal Syncope for Paraplegia Patients Using Average Weighted Ensemble Technique

Journal of Mobile Multimedia ◽

10.13052/jmm1550-4646.1817 ◽

2021 ◽

Author(s):

V. Vinodhini ◽

Akula Vishalakshi ◽

G. Naga Chandrika ◽

S. Sankar ◽

Somula Ramasubbareddy

Keyword(s):

Machine Learning ◽

Vasovagal Syncope ◽

Correct Diagnosis ◽

Machine Learning Algorithms ◽

Support Vector ◽

Ensemble Technique ◽

Machine Learning Classification ◽

Severe Fatigue ◽

Artery Disease ◽

Serious Disease

Vasovagal syncope (VVS) refers to fainting of people with a drop in blood flow to the brain more serious disease in paraplegia patients. Precognitive diagnoses are characterized by lightheadedness, nausea, severe fatigue, and an elevated heart rate. As a result, it’s important to seek care as soon as possible after experiencing syncope. Since receiving a correct diagnosis and appropriate care, the majority of patients may avoid complications with syncope. Syncope appears to be a sign of COVID 19 in people with coronary artery disease. Furthermore, a sudden heart attack might result in acute syncope. In a few circumstances, machine learning classification techniques may not be precise. For paraplegia patients, prediction vasovagal syncope needs more precise results in order to save their lives. The aim of this paper is to use the ensemble technique to improve the accuracy of conventional machine learning algorithms. EEG (ElectroEncephaloGram) brainwave dataset from kaggle is used to implement it. The accuracy of the proposed AWET algorithm is 82%. It improves the accuracy by 17% compare to Support Vector Machine, Random Forest, Naive Bayes, and MultiLayer Perceptron classifiers.

Effectiveness of Classification Methods on the Diabetes System

Asian Journal of Research in Computer Science ◽

10.9734/ajrcos/2021/v12i330287 ◽

2021 ◽

pp. 33-43

Author(s):

Ahmed T. Shawky ◽

Ismail M. Hagag

Keyword(s):

Machine Learning ◽

Naive Bayes ◽

Early Stage ◽

Research Paper ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbor ◽

Machine Learning Classification ◽

Bayes Algorithm

In today’s world using data mining and classification is considered to be one of the most important techniques, as today’s world is full of data that is generated by various sources. However, extracting useful knowledge out of this data is the real challenge, and this paper conquers this challenge by using machine learning algorithms to use data for classifiers to draw meaningful results. The aim of this research paper is to design a model to detect diabetes in patients with high accuracy. Therefore, this research paper using five different algorithms for different machine learning classification includes, Decision Tree, Support Vector Machine (SVM), Random Forest, Naive Bayes, and K- Nearest Neighbor (K-NN), the purpose of this approach is to predict diabetes at an early stage. Finally, we have compared the performance of these algorithms, concluding that K-NN algorithm is a better accuracy (81.16%), followed by the Naive Bayes algorithm (76.06%).

Variation in the reporting of elective surgeries and its influence on patient safety indicators

10.1101/2021.05.29.21257635 ◽

2021 ◽

Author(s):

Kenneth John Locey ◽

Thomas A Webb ◽

Sana Farooqui ◽

Bala Hota

Keyword(s):

Machine Learning ◽

Patient Safety ◽

Quantile Regression ◽

Claims Data ◽

Machine Learning Algorithms ◽

Patient Safety Indicators ◽

Hospital Safety ◽

Minimum Number ◽

Related Group ◽

Low Volume

Background: US hospital safety is routinely measured via patient safety indicators (PSIs). Receiving a score for most PSIs requires a minimum number of qualifying cases, which are partly determined by whether the associated diagnosis-related group (DRG) was surgical and whether the surgery was elective. While these criteria can exempt hospitals from PSIs, it remains to be seen whether exemption is driven by low volume, small numbers of DRGs, or perhaps, policies that determine how procedures are classified as elective. Methods: Using Medicare inpatient claims data from 4,069 hospitals between 2015 and 2017, we examined how percentages of elective procedures relate to numbers of surgical claims and surgical DRGs. We used a combination of quantile regression and machine learning based anomaly detection to characterize these relationships and identify outliers. We then used a set of machine learning algorithms to test whether outliers were explained by the DRGs they reported. Results: Average percentages of elective procedures generally decreased from 100% to 60% in relation to the number of surgical claims and the number of DRGs among them. Some providers with high volumes of claims had anomalously low percentages of elective procedures (5% to 40%). These low elective outliers were not explained by the particular surgical DRGs among their claims. However, among hospitals exempted from PSIs, those with the greatest volume of claims were always low elective outliers. Conclusion: Some hospitals with relatively high numbers of surgical claims may have classified procedures as non-elective in a way that ultimately exempted them from certain PSIs.

Three simple steps to improve the interpretability of EEG-SVM studies

10.1101/2021.12.14.472588 ◽

2021 ◽

Author(s):

Coralie Joucla ◽

Damien Gabriel ◽

Emmanuel Haffen ◽

Juan-Pablo Ortega

Keyword(s):

Machine Learning ◽

Model Development ◽

Research Literature ◽

Machine Learning Algorithms ◽

Support Vector ◽

Machine Learning Classification ◽

Diagnosis And Prognosis ◽

Eeg Data ◽

Clinical Adoption

Research in machine-learning classification of electroencephalography (EEG) data offers important perspectives for the diagnosis and prognosis of a wide variety of neurological and psychiatric conditions, but the clinical adoption of such systems remains low. We propose here that much of the difficulties translating EEG-machine learning research to the clinic result from consistent inaccuracies in their technical reporting, which severely impair the interpretability of their often-high claims of performance. Taking example from a major class of machine-learning algorithms used in EEG research, the support-vector machine (SVM), we highlight three important aspects of model development (normalization, hyperparameter optimization and cross-validation) and show that, while these 3 aspects can make or break the performance of the system, they are left entirely undocumented in a shockingly vast majority of the research literature. Providing a more systematic description of these aspects of model development constitute three simple steps to improve the interpretability of EEG-SVM research and, in fine, its clinical adoption.

A Comparative Evaluation of Supervised Machine Learning Classification Techniques for Engineering Design Applications

Journal of Mechanical Design ◽

10.1115/1.4044524 ◽

2019 ◽

Vol 141 (12) ◽

Author(s):

Conner Sharpe ◽

Tyler Wiest ◽

Pingfeng Wang ◽

Carolyn Conner Seepersad

Keyword(s):

Machine Learning ◽

Engineering Design ◽

Design Space ◽

Optimization Problems ◽

Machine Learning Algorithms ◽

Training Data ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Design Exploration ◽

Machine Learning Classification

Abstract Supervised machine learning techniques have proven to be effective tools for engineering design exploration and optimization applications, in which they are especially useful for mapping promising or feasible regions of the design space. The design space mappings can be used to inform early-stage design exploration, provide reliability assessments, and aid convergence in multiobjective or multilevel problems that require collaborative design teams. However, the accuracy of the mappings can vary based on problem factors such as the number of design variables, presence of discrete variables, multimodality of the underlying response function, and amount of training data available. Additionally, there are several useful machine learning algorithms available, and each has its own set of algorithmic hyperparameters that significantly affect accuracy and computational expense. This work elucidates the use of machine learning for engineering design exploration and optimization problems by investigating the performance of popular classification algorithms on a variety of example engineering optimization problems. The results are synthesized into a set of observations to provide engineers with intuition for applying these techniques to their own problems in the future, as well as recommendations based on problem type to aid engineers in algorithm selection and utilization.

Brain Asymmetry Detection and Machine Learning Classification for Diagnosis of Early Dementia

Sensors ◽

10.3390/s21030778 ◽

2021 ◽

Vol 21 (3) ◽

pp. 778

Author(s):

Nitsa J. Herzog ◽

George D. Magoulas

Keyword(s):

Machine Learning ◽

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Structural Changes ◽

Low Cost ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Progressive Dementia ◽

Machine Learning Classification

Early identification of degenerative processes in the human brain is considered essential for providing proper care and treatment. This may involve detecting structural and functional cerebral changes such as changes in the degree of asymmetry between the left and right hemispheres. Changes can be detected by computational algorithms and used for the early diagnosis of dementia and its stages (amnestic early mild cognitive impairment (EMCI), Alzheimer’s Disease (AD)), and can help to monitor the progress of the disease. In this vein, the paper proposes a data processing pipeline that can be implemented on commodity hardware. It uses features of brain asymmetries, extracted from MRI of the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database, for the analysis of structural changes, and machine learning classification of the pathology. The experiments provide promising results, distinguishing between subjects with normal cognition (NC) and patients with early or progressive dementia. Supervised machine learning algorithms and convolutional neural networks tested are reaching an accuracy of 92.5% and 75.0% for NC vs. EMCI, and 93.0% and 90.5% for NC vs. AD, respectively. The proposed pipeline offers a promising low-cost alternative for the classification of dementia and can be potentially useful to other brain degenerative disorders that are accompanied by changes in the brain asymmetries.