scholarly journals Combining expert knowledge and machine-learning to classify herd types in livestock systems

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Jonas Brock ◽  
Martin Lange ◽  
Jamie A. Tratalos ◽  
Simon J. More ◽  
David A. Graham ◽  
...  

AbstractA detailed understanding of herd types is needed for animal disease control and surveillance activities, to inform epidemiological study design and interpretation, and to guide effective policy decision-making. In this paper, we present a new approach to classify herd types in livestock systems by combining expert knowledge and a machine-learning algorithm called self-organising-maps (SOMs). This approach is applied to the cattle sector in Ireland, where a detailed understanding of herd types can assist with on-going discussions on control and surveillance for endemic cattle diseases. To our knowledge, this is the first time that the SOM algorithm has been used to differentiate livestock systems. In compliance with European Union (EU) requirements, relevant data in the Irish livestock register includes the birth, movements and disposal of each individual bovine, and also the sex and breed of each bovine and its dam. In total, 17 herd types were identified in Ireland using 9 variables. We provide a data-driven classification tree using decisions derived from the Irish livestock registration data. Because of the visual capabilities of the SOM algorithm, the interpretation of results is relatively straightforward and we believe our approach, with adaptation, can be used to classify herd type in any other livestock system.

2020 ◽  
Author(s):  
Yutao Lu ◽  
Juan Wang ◽  
Miao Liu ◽  
Kaixuan Zhang ◽  
Guan Gui ◽  
...  

The ever-increasing amount of data in cellular networks poses challenges for network operators to monitor the quality of experience (QoE). Traditional key quality indicators (KQIs)-based hard decision methods are difficult to undertake the task of QoE anomaly detection in the case of big data. To solve this problem, in this paper, we propose a KQIs-based QoE anomaly detection framework using semi-supervised machine learning algorithm, i.e., iterative positive sample aided one-class support vector machine (IPS-OCSVM). There are four steps for realizing the proposed method while the key step is combining machine learning with the network operator's expert knowledge using OCSVM. Our proposed IPS-OCSVM framework realizes QoE anomaly detection through soft decision and can easily fine-tune the anomaly detection ability on demand. Moreover, we prove that the fluctuation of KQIs thresholds based on expert knowledge has a limited impact on the result of anomaly detection. Finally, experiment results are given to confirm the proposed IPS-OCSVM framework for QoE anomaly detection in cellular networks.


2019 ◽  
Vol 11 (11) ◽  
pp. 1279 ◽  
Author(s):  
Pramaditya Wicaksono ◽  
Prama Ardha Aryaguna ◽  
Wahyu Lazuardi

This research was aimed at developing the mapping model of benthic habitat mapping using machine-learning classification algorithms and tested the applicability of the model in different areas. We integrated in situ benthic habitat data and image processing of WorldView-2 (WV2) image to parameterise the machine-learning algorithm, namely: Random Forest (RF), Classification Tree Analysis (CTA), and Support Vector Machine (SVM). The classification inputs are sunglint-free bands, water column corrected bands, Principle Component (PC) bands, bathymetry, and the slope of underwater topography. Kemujan Island was used in developing the model, while Karimunjawa, Menjangan Besar, and Menjangan Kecil Islands served as test areas. The results obtained indicated that RF was more accurate than any other classification algorithm based on the statistics and benthic habitats spatial distribution. The maximum accuracy of RF was 94.17% (4 classes) and 88.54% (14 classes). The accuracies from RF, CTA, and SVM were consistent across different input bands for each classification scheme. The application of RF model in the classification of benthic habitat in other areas revealed that it is recommended to make use of the more general classification scheme in order to avoid several issues regarding benthic habitat variations. The result also established the possibility of mapping a benthic habitat without the use of training areas.


Author(s):  
Ram D. Joshi ◽  
Chandra K. Dhakal

Diabetes mellitus is one of the most common human diseases worldwide and may cause several health-related complications. It is responsible for considerable morbidity, mortality, and economic loss. A timely diagnosis and prediction of this disease could provide patients with an opportunity to take the appropriate preventive and treatment strategies. To improve the understanding of risk factors, we predict type 2 diabetes for Pima Indian women utilizing a logistic regression model and decision tree—a machine learning algorithm. Our analysis finds five main predictors of type 2 diabetes: glucose, pregnancy, body mass index (BMI), diabetes pedigree function, and age. We further explore a classification tree to complement and validate our analysis. The six-fold classification tree indicates glucose, BMI, and age are important factors, while the ten-node tree implies glucose, BMI, pregnancy, diabetes pedigree function, and age as the significant predictors. Our preferred specification yields a prediction accuracy of 78.26% and a cross-validation error rate of 21.74%. We argue that our model can be applied to make a reasonable prediction of of type 2 diabetes, and could potentially be used to complement existing preventive measures to curb the incidence of diabetes and reduce associated costs.


2020 ◽  
Author(s):  
Yutao Lu ◽  
Juan Wang ◽  
Miao Liu ◽  
Kaixuan Zhang ◽  
Guan Gui ◽  
...  

The ever-increasing amount of data in cellular networks poses challenges for network operators to monitor the quality of experience (QoE). Traditional key quality indicators (KQIs)-based hard decision methods are difficult to undertake the task of QoE anomaly detection in the case of big data. To solve this problem, in this paper, we propose a KQIs-based QoE anomaly detection framework using semi-supervised machine learning algorithm, i.e., iterative positive sample aided one-class support vector machine (IPS-OCSVM). There are four steps for realizing the proposed method while the key step is combining machine learning with the network operator's expert knowledge using OCSVM. Our proposed IPS-OCSVM framework realizes QoE anomaly detection through soft decision and can easily fine-tune the anomaly detection ability on demand. Moreover, we prove that the fluctuation of KQIs thresholds based on expert knowledge has a limited impact on the result of anomaly detection. Finally, experiment results are given to confirm the proposed IPS-OCSVM framework for QoE anomaly detection in cellular networks.


Author(s):  
Fiorella Mete ◽  
David J. Corr ◽  
Michael P. Wilbur ◽  
Ying Chen

Collecting information on heavy trucks and monitoring the bridges which they regularly cross is important for many facets of infrastructure management. In this paper, a two-step algorithm is developed using bridge and truck data, by deploying sequentially unsupervised and supervised machine learning techniques. Longitudinal clustering of bridge data, concerning strain waveforms, is adopted to perform the first step of the algorithm, while image visual inspection and classification tree methods are applied to truck data concurrently in the second step. Both bridge and truck traffic must be monitored for a limited, yet significant, amount of time to calibrate the algorithm, which is then used to build a classification framework. The framework provides the same benefits of two data collection systems while only one needs to be operative. Depending on which monitoring system remains available, the framework enables the use of bridge data to identify the truck’s profile which generated it, or to estimate bridge response given the truck’s information. As a result, the present study aims to provide decision-makers with an effective way to monitor the whole bridge-traffic system, bridge managers to plan effective maintenance, and policymakers to develop ad hoc regulations.


2021 ◽  
Vol 13 (14) ◽  
pp. 7602
Author(s):  
Guofeng Ma ◽  
Xuhui Pan

Recently, decreasing energy consumption under the premise of building comfort has become a popular topic, especially visual comfort. Existing research on visual comfort lacks a standard of how to select indicators. Moreover, studies on individual visual preference considering the interaction between internal and external environment are few. In this paper, we ranked common visual indicators by the cloud model combined with the failure mode and effect analysis (FMEA) and hierarchical technique for order of preference by similarity to ideal solution (TOPSIS). Unsatisfied vertical illuminance, daylight glare index, luminance ratio, and shadow position are the top four indicators. Based on these indicators, we also built the individual visual comfort model through five categories of personalized data obtained from the experiment, which was trained by four machine learning algorithms. The results show that random forest has the best prediction performance and support vector machine is second. Gaussian mixed model and classification tree have the worst performance of stability and accuracy. In addition, this study also programmed a BIM plug-in integrating environmental data and personal preference data to predict appropriate vertical illuminance for a specific occupant. Thus, managers can adjust the intensity of artificial light in the office by increasing or decreasing the height of table lamps, saving energy and improving occupant comfort. This novel model will serve as a paradigm for selecting visual indicators and make indoor space be tailored to meet individual visual preferences.


Sensors ◽  
2020 ◽  
Vol 20 (5) ◽  
pp. 1298
Author(s):  
Ido Tam ◽  
Meir Kalech ◽  
Lior Rokach ◽  
Eyal Madar ◽  
Jacob Bortman ◽  
...  

Bearing spall detection and predicting its size are great challenges. Model-based simulation is a well-known traditional approach to physically model the influence of the spall on the bearing. Building a physical model is challenging due to the bearing complexity and the expert knowledge required to build such a model. Obviously, building a partial physical model for some of the spall sizes is easier. In this paper, we propose a machine-learning algorithm, called Probability-Based Forest, that uses a partial physical model. First, the behavior of some of the spall sizes is physically modeled and a simulator based on this model generates scenarios for these spall sizes in different conditions. Then, the machine-learning algorithm trains these scenarios to generate a prediction model of spall sizes even for those that have not been modeled by the physical model. Feature extraction is a key factor in the success of this approach. We extract features using two traditional approaches: statistical and physical, and an additional new approach: Time Series FeatuRe Extraction based on Scalable Hypothesis tests (TSFRESH). Experimental evaluation with well-known physical model shows that our approach achieves high accuracy, even in cases that have not been modeled by the physical model. Also, we show that the TSFRESH feature-extraction approach achieves the highest accuracy.


Author(s):  
Rui Fu ◽  
Nicholas Mitsakakis ◽  
Michael Chaiton

Aim: Popularity of electronic cigarettes (i.e. e-cigarettes) is soaring in Canada. Understanding person-level correlates of current e-cigarette use (vaping) is crucial to guide tobacco policy, but prior studies have not fully identified these correlates due to model overfitting caused by multicollinearity. This study addressed this issue by using classification tree, a machine learning algorithm. Methods: This population-based cross-sectional study used the Canadian Tobacco, Alcohol, and Drugs Survey (CTADS) from 2017 that targeted residents aged 15 or older. Forty-six person-level characteristics were first screened in a logistic mixed-effects regression procedure for their strength in predicting vaper type (current vs. former vaper) among people who reported to have ever vaped. A 9:1 ratio was used to randomly split the data into a training set and a validation set. A classification tree model was developed using the cross-validation method on the training set using the selected predictors and assessed on the validation set using sensitivity, specificity and accuracy. Results: Of the 3,059 people with an experience of vaping, the average age was 24.4 years (standard deviation = 11.0), with 41.9% of them being female and 8.5% of them being aboriginal. There were 556 (18.2%) current vapers. The classification tree model performed relatively well and suggested attraction to e-cigarette flavors was the most important correlate of current vaping, followed by young age (< 18) and believing vaping to be less harmful to oneself than cigarette smoking. Conclusions: People who vape due to flavors are associated with very high risk of becoming current vapers. The findings of this study provide evidence that supports the ongoing ban on flavored vaping products in the US and suggests a similar regulatory intervention may be effective in Canada.


2021 ◽  
Vol 5 (2) ◽  
pp. 447-455
Author(s):  
Aminat Yusuf ◽  
Oyelola Akande

Despite the popularity and utility of most machine learning techniques, expert knowledge is required in guiding choices about the suitable technique and settings that are good for solving a specific problem. The lack of expert information renders the procedures vulnerable to poor parameter settings. Several of these machine learning techniques configurations are offered under default settings. However, since different classification problems required suitable machine learning techniques, selecting the appropriate technique and tuning its settings are vital works that will rightly improve predictions in terms of reliability and accuracy. This study aims to perform grid search parameters tuning on 5-selected machine learning techniques on hepatitis disease. Comparative performance is drawn side-by-side with the default settings. The experimental results of the five tuning techniques show that using the configurations suggested in our work yield predictions of a greatly sophisticated quality than choice under its default settings. The result proves that tuning parameters of Support Vector Machine via grid search yields the best accuracy outcomes of 90% and has a competitive performance relative towards criteria of precision, recall, accuracy and Area Under the Curve. Present combinations of parameter settings for each of the techniques by identifying ranges of values for each setting that give good Hepatitis disease outcomes


2018 ◽  
Author(s):  
C.H.B. van Niftrik ◽  
F. van der Wouden ◽  
V. Staartjes ◽  
J. Fierstra ◽  
M. Stienen ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document