scholarly journals Identification of Target Chicken Populations by Machine Learning Models Using the Minimum Number of SNPs

Animals ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. 241
Author(s):  
Dongwon Seo ◽  
Sunghyun Cho ◽  
Prabuddha Manjula ◽  
Nuri Choi ◽  
Young-Kuk Kim ◽  
...  

A marker combination capable of classifying a specific chicken population could improve commercial value by increasing consumer confidence with respect to the origin of the population. This would facilitate the protection of native genetic resources in the market of each country. In this study, a total of 283 samples from 20 lines, which consisted of Korean native chickens, commercial native chickens, and commercial broilers with a layer population, were analyzed to determine the optimal marker combination comprising the minimum number of markers, using a 600 k high-density single nucleotide polymorphism (SNP) array. Machine learning algorithms, a genome-wide association study (GWAS), linkage disequilibrium (LD) analysis, and principal component analysis (PCA) were used to distinguish a target (case) group for comparison with control chicken groups. In the processing of marker selection, a total of 47,303 SNPs were used for classifying chicken populations; 96 LD-pruned SNPs (50 SNPs per LD block) served as the best marker combination for target chicken classification. Moreover, 36, 44, and 8 SNPs were selected as the minimum numbers of markers by the AdaBoost (AB), Random Forest (RF), and Decision Tree (DT) machine learning classification models, which had accuracy rates of 99.6%, 98.0%, and 97.9%, respectively. The selected marker combinations increased the genetic distance and fixation index (Fst) values between the case and control groups, and they reduced the number of genetic components required, confirming that efficient classification of the groups was possible by using a small number of marker sets. In a verification study including additional chicken breeds and samples (12 lines and 182 samples), the accuracy did not significantly change, and the target chicken group could be clearly distinguished from the other populations. The GWAS, PCA, and machine learning algorithms used in this study can be applied efficiently, to determine the optimal marker combination with the minimum number of markers that can distinguish the target population among a large number of SNP markers.

2020 ◽  
Author(s):  
Dongwon Seo ◽  
Sunghyun Cho ◽  
Prabuddha Manjula ◽  
Nuri Choi ◽  
Young Kuk Kim ◽  
...  

Abstract BackgroundA marker combination capable of classifying a specific chicken population could improve commercial value by increasing consumer confidence with respect to the origin of the population. This would also facilitate the protection of genetic resources, especially in developing countries. MethodsIn this study, a total of 20 lines 283 samples which were consist of Korean native chicken, commercial native chicken, and commercial broilers with layer population were used for finding the minimum number of marker combinations through the 600k high-density single nucleotide polymorphism (SNP) array. Application of the machine learning algorithms, a genome-wide association study (GWAS), linkage disequilibrium (LD) analysis, and principal component analysis (PCA) were used to distinguish a target (case) group from control chicken groups. In the verification of the selected markers, a total of 12 lines 182 samples were used to confirm the change in the accuracy of the target chicken breed identification.ResultsA total of 47,303 SNPs was used for classifying chicken populations; 96 LD-pruned SNPs (50 SNPs per LD block) served as the best marker combination for target chicken classification. Moreover, 36, 44, and 8 SNPs were selected as the minimum numbers of markers by Adaboost (AB), Random Forest (RF), and Decision Tree (DT) machine learning classification models, which had accuracy rates of 99.6%, 98.0% and 97.9%, respectively. The selected marker combinations increased the genetic distance between the case and control groups, and reduced the number of genetic components, confirming that an efficient classification of the groups was possible using small number of marker sets. In a verification study including additional chicken breeds and samples, the accuracy did not significantly change, and the target chicken group could be clearly distinguished from the other populations.ConclusionsThe GWAS and PCA analysis, machine learning algorithm used in this study is able to be applied efficiently to explore the minimum combination of markers that can distinguish varieties among a large number of SNP markers.


2021 ◽  
Vol 99 (Supplement_3) ◽  
pp. 264-265
Author(s):  
Duy Ngoc Do ◽  
Guoyu Hu ◽  
Younes Miar

Abstract American mink (Neovison vison) is the major source of fur for the fur industries worldwide and Aleutian disease (AD) is causing severe financial losses to the mink industry. Different methods have been used to diagnose the AD in mink, but the combination of several methods can be the most appropriate approach for the selection of AD resilient mink. Iodine agglutination test (IAT) and counterimmunoelectrophoresis (CIEP) methods are commonly employed in test-and-remove strategy; meanwhile, enzyme-linked immunosorbent assay (ELISA) and packed-cell volume (PCV) methods are complementary. However, using multiple methods are expensive; and therefore, hindering the corrected use of AD tests in selection. This research presented the assessments of the AD classification based on machine learning algorithms. The Aleutian disease was tested on 1,830 individuals using these tests in an AD positive mink farm (Canadian Centre for Fur Animal Research, NS, Canada). The accuracy of classification for CIEP was evaluated based on the sex information, and IAT, ELISA and PCV test results implemented in seven machine learning classification algorithms (Random Forest, Artificial Neural Networks, C50Tree, Naive Bayes, Generalized Linear Models, Boost, and Linear Discriminant Analysis) using the Caret package in R. The accuracy of prediction varied among the methods. Overall, the Random Forest was the best-performing algorithm for the current dataset with an accuracy of 0.89 in the training data and 0.94 in the testing data. Our work demonstrated the utility and relative ease of using machine learning algorithms to assess the CIEP information, and consequently reducing the cost of AD tests. However, further works require the inclusion of production and reproduction information in the models and extension of phenotypic collection to increase the accuracy of current methods.


2021 ◽  
Vol 13 (19) ◽  
pp. 3906
Author(s):  
Laura Crocetti ◽  
Matthias Schartner ◽  
Benedikt Soja

Global navigation satellite systems (GNSS) provide globally distributed station coordinate time series that can be used for a variety of applications such as the definition of a terrestrial reference frame. A reliable estimation of the coordinate time series trends gives valuable information about station movements during the measured time period. Detecting discontinuities of various origins in such time series is crucial for accurate and robust velocity estimation. At present, there is no fully automated standard method for detecting discontinuities. Instead, discontinuity-catalogues are frequently used, which provide information about when a device was changed or an earthquake occurred. However, it is known that these catalogues suffer from incompleteness. This study investigates the suitability of machine learning classification algorithms that are fully data-driven to detect discontinuities caused by earthquakes in station coordinate time series without the need for external information. For this study, Japan was selected as a testing area. Ten different machine learning algorithms have been tested. It is found that Random Forest achieves the best performance with an F1 score of 0.77, a recall of 0.78, and a precision of 0.76. Overall, 525 of 565 recorded earthquakes in the test data were correctly classified. It is further highlighted that splitting the time series into chunks of 21 days leads to the best performance. Furthermore, it is beneficial to combine the three (normalized) components of the GNSS solution into one sample, and that adding the value range as an additional feature improves the result. Thus, this work demonstrates how it is possible to use machine learning algorithms to detect discontinuities in GNSS time series.


Author(s):  
Munder Abdulatef Al-Hashem ◽  
Ali Mohammad Alqudah ◽  
Qasem Qananwah

Knowledge extraction within a healthcare field is a very challenging task since we are having many problems such as noise and imbalanced datasets. They are obtained from clinical studies where uncertainty and variability are popular. Lately, a wide number of machine learning algorithms are considered and evaluated to check their validity of being used in the medical field. Usually, the classification algorithms are compared against medical experts who are specialized in certain disease diagnoses and provide an effective methodological evaluation of classifiers by applying performance metrics. The performance metrics contain four criteria: accuracy, sensitivity, and specificity forming the confusion matrix of each used algorithm. We have utilized eight different well-known machine learning algorithms to evaluate their performances in six different medical datasets. Based on the experimental results we conclude that the XGBoost and K-Nearest Neighbor classifiers were the best overall among the used datasets and signs can be used for diagnosing various diseases.


2021 ◽  
Vol 2095 (1) ◽  
pp. 012058
Author(s):  
Xiaoyu Xian ◽  
Haichuan Tang ◽  
Yin Tian ◽  
Qi Liu ◽  
Yuming Fan

Abstract This paper addresses electric motor fault diagnosis using supervised machine learning classification. A total of 15 distinct fault types are classified and multilabel strategies are used to classify concurrent faults. we explored, developed, and compared the performance of different types of binary (fault/non-fault), multi-class (fault type) and multi-label (single fault versus combination fault) classifiers. To evaluate the effectiveness of fault identification and classification, we used different supervised machine learning methods, including Random forest classification, support vector machine and neural network classification. Through experiment, we compared these methods over 4 classification regimes and finally summarize the most suitable machine learning algorithms for different aspects of health diagnosis in traction motors area.


2020 ◽  
Vol 15 ◽  
Author(s):  
Shivani Aggarwal ◽  
Kavita Pandey

Background: Polycystic ovary syndrome is commonly known as PCOS and it is surprising that it affects up to 18% of women in reproductive age. PCOS is the most usually occurring hormone-related disorder. Some of the symptoms of PCOS are irregular periods, increased facial and body hair growth, attain more weight, darkening of skin, diabetes and trouble conceiving (infertility). It also came into light that patients suffering from PCOS also possess a range of metabolic abnormalities. Due to metabolic abnormalities, some disorder may occur which increase the risk of insulin resistance, type 2 diabetes and impaired glucose tolerance (a sign of prediabetes). Family members of women suffering from PCOS are also at higher hazardous level for developing the same metabolic abnormalities. Obesity and overweight status contribute to insulin resistance in PCOS. Objective: In the modern era, there are several new technologies available to diagnose PCOS and one of them is Machine learning algorithms because they are exposed to new data. These algorithms learn from past experiences to produce reliable and repeatable decisions. In this article, Machine learning algorithms are used to identify the important features to diagnose PCOS. Methods: Several classification algorithms like Support vector machine (SVM), Logistic Regression, Gradient Boosting, Random Forest, Decision Tree and K-Nearest Neighbor (KNN) are uses well organized test datasets for classify huge records. Initially a dataset of 541 instances and 41 attributes has been taken to apply the prediction models and a manual feature selection is done over it. Results: After the feature selection, a set of 12 attributes has been identified which plays a crucial role in diagnosing PCOS. Conclusion: There are several researches progressing in the direction of diagnosing PCOS but till now the relevant features are not identify for the same.


2019 ◽  
Vol 133 (10) ◽  
pp. 875-878 ◽  
Author(s):  
J W Moor ◽  
V Paleri ◽  
J Edwards

AbstractBackgroundMachine learning algorithms could potentially be used to classify patients referred on the two-week wait pathway for suspected head and neck cancer. Patients could be classified into ‘predicted cancer’ or ‘predicted non-cancer’ groups.MethodsA variety of machine learning algorithms were assessed using the clinical data of 5082 patients. These patients had previously been referred via the two-week wait pathway for suspected head and neck cancer to two separate tertiary referral centres in the UK. Outcomes from machine learning classification were analysed in comparison to known clinical diagnoses.ResultsVariational logistic regression was the most clinically useful technique of those chosen to perform the analysis and patient classification; the proportion of patients correctly classified as having ‘non-cancer’ was 25.8 per cent, with a false negative rate of 1 out of 1000.ConclusionMachine learning algorithms can accurately and effectively classify patients referred with suspected head and neck cancer symptoms.


Diabetes is a most important health dispute that has reached distressing levels; today approximately half a billion individuals are living with diabetes universal. Diabetes is a state that damages the body’s capability to process glucose in blood, otherwise known as blood sugar. It is a metabolic disease that reasons high blood sugar. The hormone insulin transfers sugar from the blood into your cells to be stored for energy. With diabetes, your body either doesn’t make sufficient insulin or can’t efficiently use the insulin it does makes. The motive of this research is to design a method or prototype which can detect or predict the diabetes in patients with high precision. Therefore different machine learning classification algorithms namely decision tree, support vector machine, Naïve Bayes and k-NN are used in this research work for prediction of the diabetes. Two databases are used for experimentation. The first one is created from hospital with 82 patients and second one is readily available Pima Indian Diabetes database. The performances of different machine learning algorithms are estimated on different measures like Precision, Recall, F-measure and accuracy. The objective of this research is to study the accuracy of different machine learning algorithms and hence identify set of suitable algorithms for prediction of diabetes for further research work.


2018 ◽  
Vol 1 (1) ◽  
Author(s):  
Jingwen Sun ◽  
Weixing Du ◽  
Niancai Shi

The kNN algorithm is a well-known pattern recognition method, which is one of the best text classifi cation algorithms. It is one of the simplest machine learning algorithms in machine learning classification algorithm. In this paper, we summarize the kNN algorithm and related literature, introduce the idea, principle, implementation steps and implementation code of kNN algorithm in detail, and analyze the advantages and disadvantages of the algorithm and its various improvement schemes. This paper also introduces the development of kNN algorithm, the important published papers. At the end of this paper, the application of kNN algorithm is introduced, and its implementation in text classifi cation is emphasized.


Sign in / Sign up

Export Citation Format

Share Document