Application of Machine Learning in Animal Disease Analysis and Prediction

2020 ◽  
Vol 15 ◽  
Author(s):  
Shuwen Zhang ◽  
Qiang Su ◽  
Qin Chen

Abstract: Major animal diseases pose a great threat to animal husbandry and human beings. With the deepening of globalization and the abundance of data resources, the prediction and analysis of animal diseases by using big data are becoming more and more important. The focus of machine learning is to make computers learn how to learn from data and use the learned experience to analyze and predict. Firstly, this paper introduces the animal epidemic situation and machine learning. Then it briefly introduces the application of machine learning in animal disease analysis and prediction. Machine learning is mainly divided into supervised learning and unsupervised learning. Supervised learning includes support vector machines, naive bayes, decision trees, random forests, logistic regression, artificial neural networks, deep learning, and AdaBoost. Unsupervised learning has maximum expectation algorithm, principal component analysis hierarchical clustering algorithm and maxent. Through the discussion of this paper, people have a clearer concept of machine learning and understand its application prospect in animal diseases.

2019 ◽  
Vol 6 (2) ◽  
pp. 226-235
Author(s):  
Muhammad Rangga Aziz Nasution ◽  
Mardhiya Hayaty

Salah satu cabang ilmu komputer yaitu pembelajaran mesin (machine learning) menjadi tren dalam beberapa waktu terakhir. Pembelajaran mesin bekerja dengan memanfaatkan data dan algoritma untuk membuat model dengan pola dari kumpulan data tersebut. Selain itu, pembelajaran mesin juga mempelajari bagaimama model yang telah dibuat dapat memprediksi keluaran (output) berdasarkan pola yang ada. Terdapat dua jenis metode pembelajaran mesin yang dapat digunakan untuk analisis sentimen:  supervised learning dan unsupervised learning. Penelitian ini akan membandingkan dua algoritma klasifikasi yang termasuk dari supervised learning: algoritma K-Nearest Neighbor dan Support Vector Machine, dengan cara membuat model dari masing-masing algoritma dengan objek teks sentimen. Perbandingan dilakukan untuk mengetahui algoritma mana lebih baik dalam segi akurasi dan waktu proses. Hasil pada perhitungan akurasi menunjukkan bahwa metode Support Vector Machine lebih unggul dengan nilai 89,70% tanpa K-Fold Cross Validation dan 88,76% dengan K-Fold Cross Validation. Sedangkan pada perhitungan waktu proses metode K-Nearest Neighbor lebih unggul dengan waktu proses 0.0160s tanpa K-Fold Cross Validation dan 0.1505s dengan K-Fold Cross Validation.


The healthcare industry is inflicted with the plethora of patient data which is being supplemented each day manifold. Researchers have been continually using this data to help the healthcare industry improve upon the way major diseases could be handled. They are even working upon the way the patients could be informed timely of the symptoms that could avoid the major hazards related to them. Diabetes is one such disease that is growing at an alarming rate today. In fact, it can inflict numerous severe damages; blurred vision, myopia, burning extremities, kidney and heart failure. It occurs when sugar levels reach a certain threshold, or the human body cannot contain enough insulin to regulate the threshold. Therefore, patients affected by Diabetes must be informed so that proper treatments can be taken to control Diabetes. For this reason, early prediction and classification of Diabetes are significant. This work makes use of Machine Learning algorithms to improve the accuracy of prediction of the Diabetes. A dataset obtained as an output of K-Mean Clustering Algorithm was fed to an ensemble model with principal component analysis and K-means clustering. Our ensemble method produced only eight incorrectly classified instances, which was lowest compared to other methods. The experiments also showed that ensemble classifier models performed better than the base classifiers alone. Its result was compared with the same Dataset being applied on specific methods like random forest, Support Vector Machine, Decision Tree, Multilayer perceptron, and Naïve Bayes classification methods. All methods were run using 10k fold cross-validation.


Author(s):  
Wilfried Wöber ◽  
Papius Tibihika ◽  
Cristina Olaverri-Monreal ◽  
Lars Mehnen ◽  
Peter Sykacek ◽  
...  

For computer vision based appraoches such as image classification (Krizhevsky et al. 2012), object detection (Ren et al. 2015) or pixel-wise weed classification (Milioto et al. 2017) machine learning is used for both feature extraction and processing (e.g. classification or regression). Historically, feature extraction (e.g. PCA; Ch. 12.1. in Bishop 2006) and processing were sequential and independent tasks (Wöber et al. 2013). Since the rise of convolutional neuronal networks (LeCun et al. 1989), a deep machine learning approach optimized for images, in 2012 (Krizhevsky et al. 2012), feature extraction for image analysis became an automated procedure. A convolutional neuronal net uses a deep architecture of artificial neurons (Goodfellow 2016) for both feature extraction and processing. Based on prior information such as image classes and supervised learning procedures, parameters of the neuronal nets are adjusted. This is known as the learning process. Simultaneously, geometric morphometrics (Tibihika et al. 2018, Cadrin and Friedland 1999) are used in biodiversity research for association analysis. Those approaches use deterministic two-dimensional locations on digital images (landmarks; Mitteroecker et al. 2013), where each position corresponds to biologically relevant regions of interest. Since this methodology is based on scientific results and compresses image content into deterministic landmarks, no uncertainty regarding those landmark positions is taken into account, which leads to information loss (Pearl 1988). Both, the reduction of this loss and novel knowledge detection, can be done using machine learning. Supervised learning methods (e.g., neuronal nets or support vector machines (Ch. 5 and 6. in Bishop 2006)) map data on prior information (e.g. labels). This increases the performance of classification or regression but affects the latent representation of the data itself. Unsupervised learning (e.g. latent variable models) uses assumptions concerning data structures to extract latent representations without prior information. Those representations does not have to be useful for data processing such as classification and due to that, the use of supervised and unsupervised machine learning and combinations of both, needs to be chosen carefully, according to the application and data. In this work, we discuss unsupervised learning algorithms in terms of explainability, performance and theoretical restrictions in context of known deep learning restrictions (Marcus 2018, Szegedy et al. 2014, Su et al. 2017). We analyse extracted features based on multiple image datasets and discuss shortcomings and performance for processing (e.g. reconstruction error or complexity measurement (Pincus 1997)) using the principal component analysis (Wöber et al. 2013), independent component analysis (Stone 2004), deep neuronal nets (auto encoders; Ch. 14 in Goodfellow 2016) and Gaussian process latent variable models (Titsias and Lawrence 2010, Lawrence 2005).


The supervised and unsupervised learning methods in Machine Learning are successfully applied to solve various real time problems in different domains. The Indian Music has a base of Raga structure. The Raga is melodious framework for composition and improvisation. The identification and indexing of Raga for Indian Music data will improve efficiency and accuracy of retrieval being expected by e-learners, composers and classical music listeners. The identification of Raga in Indian Music is very difficult task for naïve user. The application of machine learning algorithms will definitely be best key idea. The paper demonstrates K-means and Agglomerative clustering methods from unsupervised learning nonetheless K Nearest Neighbor, Decision Tree and Support Vector Machine and Naïve Bayes classifiers are implemented from supervised learning. The partition of 70:30 is done for training data and testing data. Pitch Class Distribution features are extracted by identifying Pitch for every frame in an audio signal using Autocorrelation method. The comparison of above algorithms is done and observed supervised learning methods outperformed.


2020 ◽  
Vol 7 (2) ◽  
pp. 156
Author(s):  
Endang Retnoningsih ◽  
Rully Pramudita

Abstrak: Machine learning merupakan sistem yang mampu belajar sendiri untuk memutuskan sesuatu tanpa harus berulangkali diprogram oleh manusia sehingga komputer menjadi semakin cerdas berlajar dari pengalaman data yang dimiliki. Berdasarkan teknik pembelajarannya, dapat dibedakan supervised learning menggunakan dataset (data training) yang sudah berlabel, sedangkan unsupervised learning menarik kesimpulan berdasarkan dataset. Input berupa dataset digunakan pembelajaran mesin untuk menghasilkan analisis yang benar. Permasalahan yang akan diselesaikan bunga iris (iris tectorum) yang memiliki bunga bermaca-macam warna dan memiliki sepal dan petal yang menunjukkan spesies bunga, dibutuhkan metode yang tepat untuk pengelompokan bunga-bunga tersebut kedalam spesiesnya iris-setosa, iris-versicolor atau iris-virginica. Penyelesaian digunakan Python yang menyediakan algoritma dan library yang digunakan membuat machine learning. Penyelesaian dengan teknik supervised learning dipilih algoritma KNN Clasiffier dan teknik unsupervised learning dipilih algoritma DBSCAN Clustering. Hasil yang diperoleh Python menyediakan library yang lengkap numPy, Pandas, matplotlib, sklearn untuk membuat pemrograman machine learning dengan algortima KNN memanggil from sklearn import neighbors termasuk teknik supervised, maupun DBSCAN memanggil from sklearn.cluster import DBSCAN termasuk teknik unsupervised learning. Kemampuan Python memberikan hasil output sesuai input dalam dataset menghasilkan keputusan berupa klasifikasi maupun klusterisasi.   Kata kunci: DBSCAN, KNN, machine learning, python.   Abstract: Machine learning is a system that is able to learn on its own to decide something without having to be repeatedly programmed by humans so that computers become smarter in learning from the experience of the data they have. Based on the learning technique, supervised learning can be distinguished using a dataset (training data) that is already labeled, while unsupervised learning draws conclusions based on the dataset. The input in the form of a dataset is used by machine learning to produce the correct analysis. The problem to be solved by iris flowers (iris tectorum), which has flowers of various colors and has sepals and petals that indicate the species of flowers, requires an appropriate method for grouping these flowers into iris-setosa, iris-versicolor or iris-virginica species. The solution is used by Python, which provides the algorithms and libraries used to make machine learning. The solution with the supervised learning technique was chosen by the KNN Clasiffier algorithm and the unsupervised learning technique was selected by the DBSCAN Clustering algorithm. The results obtained by Python provide a complete library of numPy, Pandas, matplotlib, sklearn to create machine learning programming with KNN algorithms calling from sklearn import neighbors including supervised techniques, and DBSCAN calling from sklearn.cluster import DBSCAN including unsupervised learning techniques. Python's ability to provide output according to the input in the dataset results in decisions in the form of classification and clustering.   Keywords: DBSCAN, KNN, machine learning, python.


Author(s):  
Hyeuk Kim

Unsupervised learning in machine learning divides data into several groups. The observations in the same group have similar characteristics and the observations in the different groups have the different characteristics. In the paper, we classify data by partitioning around medoids which have some advantages over the k-means clustering. We apply it to baseball players in Korea Baseball League. We also apply the principal component analysis to data and draw the graph using two components for axis. We interpret the meaning of the clustering graphically through the procedure. The combination of the partitioning around medoids and the principal component analysis can be used to any other data and the approach makes us to figure out the characteristics easily.


2021 ◽  
Vol 13 (3) ◽  
pp. 67
Author(s):  
Eric Hitimana ◽  
Gaurav Bajpai ◽  
Richard Musabe ◽  
Louis Sibomana ◽  
Jayavel Kayalvizhi

Many countries worldwide face challenges in controlling building incidence prevention measures for fire disasters. The most critical issues are the localization, identification, detection of the room occupant. Internet of Things (IoT) along with machine learning proved the increase of the smartness of the building by providing real-time data acquisition using sensors and actuators for prediction mechanisms. This paper proposes the implementation of an IoT framework to capture indoor environmental parameters for occupancy multivariate time-series data. The application of the Long Short Term Memory (LSTM) Deep Learning algorithm is used to infer the knowledge of the presence of human beings. An experiment is conducted in an office room using multivariate time-series as predictors in the regression forecasting problem. The results obtained demonstrate that with the developed system it is possible to obtain, process, and store environmental information. The information collected was applied to the LSTM algorithm and compared with other machine learning algorithms. The compared algorithms are Support Vector Machine, Naïve Bayes Network, and Multilayer Perceptron Feed-Forward Network. The outcomes based on the parametric calibrations demonstrate that LSTM performs better in the context of the proposed application.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Nasser Assery ◽  
Yuan (Dorothy) Xiaohong ◽  
Qu Xiuli ◽  
Roy Kaushik ◽  
Sultan Almalki

Purpose This study aims to propose an unsupervised learning model to evaluate the credibility of disaster-related Twitter data and present a performance comparison with commonly used supervised machine learning models. Design/methodology/approach First historical tweets on two recent hurricane events are collected via Twitter API. Then a credibility scoring system is implemented in which the tweet features are analyzed to give a credibility score and credibility label to the tweet. After that, supervised machine learning classification is implemented using various classification algorithms and their performances are compared. Findings The proposed unsupervised learning model could enhance the emergency response by providing a fast way to determine the credibility of disaster-related tweets. Additionally, the comparison of the supervised classification models reveals that the Random Forest classifier performs significantly better than the SVM and Logistic Regression classifiers in classifying the credibility of disaster-related tweets. Originality/value In this paper, an unsupervised 10-point scoring model is proposed to evaluate the tweets’ credibility based on the user-based and content-based features. This technique could be used to evaluate the credibility of disaster-related tweets on future hurricanes and would have the potential to enhance emergency response during critical events. The comparative study of different supervised learning methods has revealed effective supervised learning methods for evaluating the credibility of Tweeter data.


Author(s):  
Ke Li ◽  
Yalei Wu ◽  
Shimin Song ◽  
Yi sun ◽  
Jun Wang ◽  
...  

The measurement of spacecraft electrical characteristics and multi-label classification issues are generally including a large amount of unlabeled test data processing, high-dimensional feature redundancy, time-consumed computation, and identification of slow rate. In this paper, a fuzzy c-means offline (FCM) clustering algorithm and the approximate weighted proximal support vector machine (WPSVM) online recognition approach have been proposed to reduce the feature size and improve the speed of classification of electrical characteristics in the spacecraft. In addition, the main component analysis for the complex signals based on the principal component feature extraction is used for the feature selection process. The data capture contribution approach by using thresholds is furthermore applied to resolve the selection problem of the principal component analysis (PCA), which effectively guarantees the validity and consistency of the data. Experimental results indicate that the proposed approach in this paper can obtain better fault diagnosis results of the spacecraft electrical characteristics’ data, improve the accuracy of identification, and shorten the computing time with high efficiency.


Author(s):  
Noor Asyikin Sulaiman ◽  
Md Pauzi Abdullah ◽  
Hayati Abdullah ◽  
Muhammad Noorazlan Shah Zainudin ◽  
Azdiana Md Yusop

Air conditioning system is a complex system and consumes the most energy in a building. Any fault in the system operation such as cooling tower fan faulty, compressor failure, damper stuck, etc. could lead to energy wastage and reduction in the system’s coefficient of performance (COP). Due to the complexity of the air conditioning system, detecting those faults is hard as it requires exhaustive inspections. This paper consists of two parts; i) to investigate the impact of different faults related to the air conditioning system on COP and ii) to analyse the performances of machine learning algorithms to classify those faults. Three supervised learning classifier models were developed, which were deep learning, support vector machine (SVM) and multi-layer perceptron (MLP). The performances of each classifier were investigated in terms of six different classes of faults. Results showed that different faults give different negative impacts on the COP. Also, the three supervised learning classifier models able to classify all faults for more than 94%, and MLP produced the highest accuracy and precision among all.


Sign in / Sign up

Export Citation Format

Share Document