Combining Correlation-Based Feature and Machine Learning for Sensory Evaluation of Saigon Beer

2020 ◽  
Vol 11 (2) ◽  
pp. 71-85
Author(s):  
Nhat-Vinh Lu ◽  
Trong-Nhan Vuong ◽  
Duy-Tai Dinh

Sensory evaluation plays an important role in the food and consumer goods industry. In recent years, the application of machine learning techniques to support food sensory evaluation has become popular. Many different machine learning methods have been applied and produced positive results in this field. In this article, the authors propose a new method to support sensory evaluation on multiple criteria based on the use of a correlation-based feature selection technique, combined with machine learning methods such as linear regression, multilayer perceptron, support vector machine, and random forest. Experimental results are based on considering the correlation between physicochemical components and sensory factors on the Saigon beer dataset.

2015 ◽  
Author(s):  
Ming Zhong ◽  
Jared Schuetter ◽  
Srikanta Mishra ◽  
Randy F. LaFollette

Abstract Data mining for production optimization in unconventional reservoirs brings together data from multiple sources with varying levels of aggregation, detail, and quality. Tens of variables are typically included in data sets to be analyzed. There are many statistical and machine learning techniques that can be used to analyze data and summarize the results. These methods were developed to work extremely well in certain scenarios but can be terrible choices in others. The analyst may or may not be trained and experienced in using those methods. The question for both the analyst and the consumer of data mining analyses is, “What difference does the method make in the final interpreted result of an analysis?” The objective of this study was to compare and review the relative utility of several univariate and multivariate statistical and machine learning methods in predicting the production quality of Permian Basin Wolfcamp Shale wells. The data set for the study was restricted to wells completed in and producing from the Wolfcamp. Data categories used in the study included the well location and assorted metrics capturing various aspects of the well architecture, well completion, stimulation, and production. All of this information was publicly available. Data variables were scrutinized and corrected for inconsistent units and were sanity checked for out-of-bounds and other “bad data” problems. After the quality control effort was completed, the test data set was distributed among the statistical team for application of an agreed upon set of statistical and machine learning methods. Methods included standard univariate and multivariate linear regression as well as advanced machine learning techniques such as Support Vector Machine, Random Forests, and Boosted Regression Trees. The strengths, limitations, implementation, and study results of each of the methods tested are discussed and compared to those of the other methods. Consistent with other data mining studies, univariate linear methods are shown to be much less robust than multivariate non-linear methods, which tend to produce more reliable results. The practical importance is that when tens to hundreds of millions of dollars are at stake in the development of shale reservoirs, operators should have the confidence that their decisions are statistically sound. The work presented here shows that methods do matter, and useful insights can be derived regarding complex geosystem behavior by geoscientists, engineers, and statisticians working together.


2013 ◽  
Vol 760-762 ◽  
pp. 2037-2041
Author(s):  
Yi Pan ◽  
Jun Hua Zou ◽  
Shuai Yuan

As the customer reviews become more and more on the Internet, It would be significant if these reviews are summarized automatically. Sentiment classification aims at predicting the semantic orientation of customer reviews, positive and negative. In this paper, we gave out the framework of sentiment classification, and empirically studied the performance when used different features, term weighting methods and machine learning methods. The experimental results suggest that using binary occurrence to weight the features is more suitable when used Naïve Bayes, but when used the support vector machine, tfidf-c can get the best performance. Besides, we also find that the sentiment terms are not suitable as features when used the approaches based on machine learning methods.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Tomoaki Mameno ◽  
Masahiro Wada ◽  
Kazunori Nozaki ◽  
Toshihito Takahashi ◽  
Yoshitaka Tsujioka ◽  
...  

AbstractThe purpose of this retrospective cohort study was to create a model for predicting the onset of peri-implantitis by using machine learning methods and to clarify interactions between risk indicators. This study evaluated 254 implants, 127 with and 127 without peri-implantitis, from among 1408 implants with at least 4 years in function. Demographic data and parameters known to be risk factors for the development of peri-implantitis were analyzed with three models: logistic regression, support vector machines, and random forests (RF). As the results, RF had the highest performance in predicting the onset of peri-implantitis (AUC: 0.71, accuracy: 0.70, precision: 0.72, recall: 0.66, and f1-score: 0.69). The factor that had the most influence on prediction was implant functional time, followed by oral hygiene. In addition, PCR of more than 50% to 60%, smoking more than 3 cigarettes/day, KMW less than 2 mm, and the presence of less than two occlusal supports tended to be associated with an increased risk of peri-implantitis. Moreover, these risk indicators were not independent and had complex effects on each other. The results of this study suggest that peri-implantitis onset was predicted in 70% of cases, by RF which allows consideration of nonlinear relational data with complex interactions.


F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 2012 ◽  
Author(s):  
Hashem Koohy

In the era of explosion in biological data, machine learning techniques are becoming more popular in life sciences, including biology and medicine. This research note examines the rise and fall of the most commonly used machine learning techniques in life sciences over the past three decades.


Author(s):  
Hesham M. Al-Ammal

Detection of anomalies in a given data set is a vital step in several applications in cybersecurity; including intrusion detection, fraud, and social network analysis. Many of these techniques detect anomalies by examining graph-based data. Analyzing graphs makes it possible to capture relationships, communities, as well as anomalies. The advantage of using graphs is that many real-life situations can be easily modeled by a graph that captures their structure and inter-dependencies. Although anomaly detection in graphs dates back to the 1990s, recent advances in research utilized machine learning methods for anomaly detection over graphs. This chapter will concentrate on static graphs (both labeled and unlabeled), and the chapter summarizes some of these recent studies in machine learning for anomaly detection in graphs. This includes methods such as support vector machines, neural networks, generative neural networks, and deep learning methods. The chapter will reflect the success and challenges of using these methods in the context of graph-based anomaly detection.


Quora, an online question-answering platform has a lot of duplicate questions i.e. questions that convey the same meaning. Since it is open to all users, anyone can pose a question any number of times this increases the count of duplicate questions. This paper uses a dataset comprising of question pairs (taken from the Quora website) in different columns with an indication of whether the pair of questions are duplicates or not. Traditional comparison methods like Sequence matcher perform a letter by letter comparison without understanding the contextual information, hence they give lower accuracy. Machine learning methods predict the similarity using features extracted from the context. Both the traditional methods as well as the machine learning methods were compared in this study. The features for the machine learning methods are extracted using the Bag of Words models- Count-Vectorizer and TFIDF-Vectorizer. Among the traditional comparison methods, Sequence matcher gave the highest accuracy of 65.29%. Among the machine learning methods XGBoost gave the highest accuracy, 80.89% when Count-Vectorizer is used and 80.12% when TFIDF-Vectorizer is used.


Author(s):  
Michael M. Richter

In this article we present relations between complex business processes and machine learning techniques. The processes considered here are mostly related to planning. Planning takes place in preparing many decisions and often it is encountered with a rapidly changing context that constitutes an open world. The underlying structure and preconditions of the processes is quite often not known and hence the processes are regarded as stochastic. One can only observe the processes. Such observations deliver data and these data contain some knowledge about the processes in a hidden form. As a consequence, machine learning methods are involved here. The idea is to give the business persons an overview of quite different machine learning techniques so that they can select suitable ones. We provide a number of examples for business processes that we use for illustrations.


Author(s):  
Anne E Thessen

The natural sciences, such as ecology and earth science, study complex interactions between biotic and abiotic systems in order to infer understanding and make predictions. Machine-learning-based methods have an advantage over traditional statistical methods in studying these systems because the former do not impose unrealistic assumptions (such as linearity), are capable of inferring missing data, and can reduce long-term expert annotation burden. Thus, a wider adoption of machine learning methods in ecology and earth science has the potential to greatly accelerate the pace and quality of science. Despite these advantages, machine learning techniques have not had wide spread adoption in ecology and earth science. This is largely due to 1) a lack of communication and collaboration between the machine learning research community and natural scientists, 2) a lack of easily accessible tools and services, and 3) the requirement for a robust training and test data set. These impediments can be overcome through financial support for collaborative work and the development of tools and services facilitating ML use. Natural scientists who have not yet used machine learning methods can be introduced to these techniques through Random Forest, a method that is easy to implement and performs well. This manuscript will 1) briefly describe several popular ML methods and their application to ecology and earth science, 2) discuss why ML methods are underutilized in natural science, and 3) propose solutions for barriers preventing wider ML adoption.


2020 ◽  
Vol 122 (14) ◽  
pp. 1-30
Author(s):  
James Soland ◽  
Benjamin Domingue ◽  
David Lang

Background/Context Early warning indicators (EWI) are often used by states and districts to identify students who are not on track to finish high school, and provide supports/interventions to increase the odds the student will graduate. While EWI are diverse in terms of the academic behaviors they capture, research suggests that indicators like course failures, chronic absenteeism, and suspensions can help identify students in need of additional supports. In parallel with the expansion of administrative data that have made early versions of EWI possible, new machine learning methods have been developed. These methods are data-driven and often designed to sift through thousands of variables with the purpose of identifying the best predictors of a given outcome. While applications of machine learning techniques to identify students at-risk of high school dropout have obvious appeal, few studies consider the benefits and limitations of applying those models in an EWI context, especially as they relate to questions of fairness and equity. Focus of Study In this study, we will provide applied examples of how machine learning can be used to support EWI selection. The purpose is to articulate the broad risks and benefits of using machine learning methods to identify students who may be at risk of dropping out. We focus on dropping out given its salience in the EWI literature, but also anticipate generating insights that will be germane to EWI used for a variety of outcomes. Research Design We explore these issues by using several hypothetical examples of how ML techniques might be used to identify EWI. For example, we show results from decision tree algorithms used to identify predictors of dropout that use simulated data. Conclusions/Recommendations Generally, we argue that machine learning techniques have several potential benefits in the EWI context. For example, some related methods can help create clear decision rules for which students are a dropout risk, and their predictive accuracy can be higher than for more traditional, regression-based models. At the same time, these methods often require additional statistical and data management expertise to be used appropriately. Further, the black-box nature of machine learning algorithms could invite their users to interpret results through the lens of preexisting biases about students and educational settings.


2018 ◽  
Vol 7 (3) ◽  
pp. 1019 ◽  
Author(s):  
Mr Santosh A. Shinde ◽  
Dr P. Raja Rajeswari

Humans are considered to be the most intelligent species on the mother earth and are inherently more health conscious. Since Centuries mankind has discovered various proven healthcare systems. To automate the process and predict diseases more accurately machine learning methods are gaining popularity in research community. Machine Learning methods facilitate development of the intelligence into a machine, so that it can perform better in the future using the learned experience. Machine learning methods application on electronic health record dataset could provide valuable information and predication of health risks.The aim of this research review paper are four-fold: i) serve as a guideline for researchers who are new to machine learning area and want to contribute to it, ii) provide state-of-the-art survey of machine learning, iii) application of machine learning techniques in the health prediction, and iv) provides further research directions required into health prediction system using machine learning. 


Sign in / Sign up

Export Citation Format

Share Document