Clustering in Wineinformatics with Attribute Selection to Increase Uniqueness of Clusters

Jared McCune; Alex Riley; Bernard Chen

doi:10.3390/fermentation7010027

Wineinformatics: Using the Full Power of the Computational Wine Wheel to Understand 21st Century Bordeaux Wines from the Reviews

Beverages ◽

10.3390/beverages7010003 ◽

2021 ◽

Vol 7 (1) ◽

pp. 3

Author(s):

Zeqing Dong ◽

Travis Atkison ◽

Bernard Chen

Keyword(s):

21St Century ◽

Data Science ◽

Naive Bayes ◽

Science Research ◽

Naïve Bayes ◽

Classification Algorithm ◽

Wine Quality ◽

Research Results ◽

Full Power ◽

Wine Region

Although wine has been produced for several thousands of years, the ancient beverage has remained popular and even more affordable in modern times. Among all wine making regions, Bordeaux, France is probably one of the most prestigious wine areas in history. Since hundreds of wines are produced from Bordeaux each year, humans are not likely to be able to examine all wines across multiple vintages to define the characteristics of outstanding 21st century Bordeaux wines. Wineinformatics is a newly proposed data science research with an application domain in wine to process a large amount of wine data through the computer. The goal of this paper is to build a high-quality computational model on wine reviews processed by the full power of the Computational Wine Wheel to understand 21st century Bordeaux wines. On top of 985 binary-attributes generated from the Computational Wine Wheel in our previous research, we try to add additional attributes by utilizing a CATEGORY and SUBCATEGORY for an additional 14 and 34 continuous-attributes to be included in the All Bordeaux (14,349 wine) and the 1855 Bordeaux datasets (1359 wines). We believe successfully merging the original binary-attributes and the new continuous-attributes can provide more insights for Naïve Bayes and Supported Vector Machine (SVM) to build the model for a wine grade category prediction. The experimental results suggest that, for the All Bordeaux dataset, with the additional 14 attributes retrieved from CATEGORY, the Naïve Bayes classification algorithm was able to outperform the existing research results by increasing accuracy by 2.15%, precision by 8.72%, and the F-score by 1.48%. For the 1855 Bordeaux dataset, with the additional attributes retrieved from the CATEGORY and SUBCATEGORY, the SVM classification algorithm was able to outperform the existing research results by increasing accuracy by 5%, precision by 2.85%, recall by 5.56%, and the F-score by 4.07%. The improvements demonstrated in the research show that attributes retrieved from the CATEGORY and SUBCATEGORY has the power to provide more information to classifiers for superior model generation. The model build in this research can better distinguish outstanding and class 21st century Bordeaux wines. This paper provides new directions in Wineinformatics for technical research in data science, such as regression, multi-target, classification and domain specific research, including wine region terroir analysis, wine quality prediction, and weather impact examination.

Download Full-text

Federated Learning in a Medical Context: A Systematic Literature Review

ACM Transactions on Internet Technology ◽

10.1145/3412357 ◽

2021 ◽

Vol 21 (2) ◽

pp. 1-31

Author(s):

Bjarne Pfitzner ◽

Nico Steckhan ◽

Bert Arnrich

Keyword(s):

Machine Learning ◽

Literature Review ◽

Systematic Literature Review ◽

Data Privacy ◽

Research Area ◽

Learning Models ◽

Related Data ◽

Private Data ◽

Large Databases ◽

Machine Learning Models

Data privacy is a very important issue. Especially in fields like medicine, it is paramount to abide by the existing privacy regulations to preserve patients’ anonymity. However, data is required for research and training machine learning models that could help gain insight into complex correlations or personalised treatments that may otherwise stay undiscovered. Those models generally scale with the amount of data available, but the current situation often prohibits building large databases across sites. So it would be beneficial to be able to combine similar or related data from different sites all over the world while still preserving data privacy. Federated learning has been proposed as a solution for this, because it relies on the sharing of machine learning models, instead of the raw data itself. That means private data never leaves the site or device it was collected on. Federated learning is an emerging research area, and many domains have been identified for the application of those methods. This systematic literature review provides an extensive look at the concept of and research into federated learning and its applicability for confidential healthcare datasets.

Download Full-text

Data Science & Engineering into Food Science: A novel Big Data Platform for Low Molecular Weight Gelators’ Behavioral Analysis

Journal of Computer Science and Technology ◽

10.24215/16666038.20.e08 ◽

2020 ◽

Vol 20 (2) ◽

pp. e08

Author(s):

Verónica Cuello ◽

Gonzalo Zarza ◽

Maria Corradini ◽

Michael Rogers

Keyword(s):

Molecular Weight ◽

Big Data ◽

Data Science ◽

Data Entry ◽

Science Research ◽

Food Science ◽

Redundant Data ◽

Data Platform ◽

Assembly Behavior ◽

Low Molecular Weight Gelators

The objective of this article is to introduce a comprehensiveend-to-end solution aimed at enabling the applicationof state-of-the-art Data Science and Analyticmethodologies to a food science related problem. Theproblem refers to the automation of load, homogenization,complex processing and real-time accessibility tolow molecular-weight gelators (LMWGs) data to gaininsights into their assembly behavior, i.e. whether agel can be mixed with an appropriate solvent or not.Most of the work within the field of Colloidal andFood Science in relation to LMWGs have centered onidentifying adequate solvents that can generate stablegels and evaluating how the LMWG characteristics canaffect gelation. As a result, extensive databases havebeen methodically and manually registered, storingresults from different laboratory experiments. Thecomplexity of those databases, and the errors causedby manual data entry, can interfere with the analysisand visualization of relations and patterns, limiting theutility of the experimental work.Due to the above mentioned, we have proposed ascalable and flexible Big Data solution to enable theunification, homogenization and availability of the datathrough the application of tools and methodologies.This approach contributes to optimize data acquisitionduring LMWG research and reduce redundant data processingand analysis, while also enabling researchersto explore a wider range of testing conditions and pushforward the frontier in Food Science research.

Download Full-text

Data Wrangling using Python

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1427.0982s1119 ◽

2019 ◽

Vol 8 (2S11) ◽

pp. 3491-3495

Keyword(s):

Data Analytics ◽

Data Science ◽

Research Work ◽

Research Area ◽

Mapping Data ◽

Data Format ◽

Data Analyst ◽

Data Engineering ◽

Data Scientist ◽

Term Data

The term Data Engineering did not get much popularity as the terminologies like Data Science or Data Analytics, mainly because the importance of this technique or concept is normally observed or experienced only during working with data or handling data or playing with data as a Data Scientist or Data Analyst. Though neither of these two, but as an academician and the urge to learn, while working with Python, this topic ‘Data engineering’ and one of its major sub topic or concept ‘Data Wrangling’ has drawn attention and this paper is a small step to explain the experience of handling data which uses Wrangling concept, using Python. So Data Wrangling, earlier referred to as Data Munging (when done by hand or manually), is the method of transforming and mapping data from one available data format into another format with the idea of making it more appropriate and important for a variety of relatedm purposes such as analytics. Data wrangling is the modern name used for data pre-processing rather Munging. The Python Library used for the research work shown here is called Pandas. Though the major Research Area is ‘Application of Data Analytics on Academic Data using Python’, this paper focuses on a small preliminary topic of the mentioned research work named Data wrangling using Python (Pandas Library).

Download Full-text

The R package “eseis” – a software toolbox for environmental seismology

Earth Surface Dynamics ◽

10.5194/esurf-6-669-2018 ◽

2018 ◽

Vol 6 (3) ◽

pp. 669-686 ◽

Cited By ~ 8

Author(s):

Michael Dietze

Keyword(s):

Data Science ◽

Earth Science ◽

Science Research ◽

Worked Examples ◽

R Package ◽

Research Field ◽

Data Sets ◽

Research Fields ◽

Scientific Disciplines ◽

Analysis Environment

Abstract. Environmental seismology is the study of the seismic signals emitted by Earth surface processes. This emerging research field is at the intersection of seismology, geomorphology, hydrology, meteorology, and further Earth science disciplines. It amalgamates a wide variety of methods from across these disciplines and ultimately fuses them in a common analysis environment. This overarching scope of environmental seismology requires a coherent yet integrative software which is accepted by many of the involved scientific disciplines. The statistic software R has gained paramount importance in the majority of data science research fields. R has well-justified advances over other mostly commercial software, which makes it the ideal language to base a comprehensive analysis toolbox on. The article introduces the avenues and needs of environmental seismology, and how these are met by the R package eseis. The conceptual structure, example data sets, and available functions are demonstrated. Worked examples illustrate possible applications of the package and in-depth descriptions of the flexible use of the functions. The package has a registered DOI, is available under the GPL licence on the Comprehensive R Archive Network (CRAN), and is maintained on GitHub.

Download Full-text

Lean Data Science Research Life Cycle: A Concept for Data Analysis Software Development

Communications in Computer and Information Science - Knowledge-Based Software Engineering ◽

10.1007/978-3-319-11854-3_61 ◽

2014 ◽

pp. 708-716 ◽

Cited By ~ 2

Author(s):

Maxim Shcherbakov ◽

Nataliya Shcherbakova ◽

Adriaan Brebels ◽

Timur Janovsky ◽

Valery Kamaev

Keyword(s):

Life Cycle ◽

Data Analysis ◽

Software Development ◽

Data Science ◽

Science Research ◽

Analysis Software ◽

Data Analysis Software

Download Full-text

Intelligence in Web Technology

Handbook of Research on Computational Intelligence for Engineering, Science, and Business ◽

10.4018/978-1-4666-2518-1.ch029 ◽

2013 ◽

pp. 739-757

Author(s):

Sourav Maitra ◽

A. C. Mondal

Keyword(s):

Semantic Web ◽

Science Research ◽

Research Area ◽

Web Technology ◽

Clear Understanding ◽

Web Technologies ◽

Research Areas ◽

Computer Science Research ◽

E Learning ◽

New Research

End users also start days with Internet. This has become the scenario. One of the most burgeoning needs of computer science research is research on web technologies and intelligence, as that has become one of the most emerging nowadays. A big area of other research areas like e-marketing, e-learning, e-governance, searching technologies, et cetera will be highly benefited if intelligence can be added to the Web. The objective of this chapter is to create a clear understanding of Web technology research and highlight the ways to implement Semantic Web. The chapter also discusses the tools and technologies that can be applied to develop Semantic Web. This new research area needs enough care as sometimes data are open. Thus, software engineering issues are also a focus.

Download Full-text

Data Science Tools Application for Business Processes Modelling in Aviation

Advances in Computer and Electrical Engineering - Cases on Modern Computer Systems in Aviation ◽

10.4018/978-1-5225-7588-7.ch006 ◽

2019 ◽

pp. 176-190

Author(s):

Maryna Nehrey ◽

Taras Hnot

Keyword(s):

Decision Making ◽

Logistic Regression ◽

Unsupervised Learning ◽

Decision Trees ◽

Supervised Learning ◽

Data Science ◽

Business Processes ◽

Business Decision ◽

Logistic Regression Models ◽

Using Data

Successful business involves making decisions under uncertainty using a lot of information. Modern modeling approaches based on data science algorithms are a necessity for the effective management of business processes in aviation. Data science involves principles, processes, and techniques for understanding business processes through the analysis of data. The main goal of this chapter is to improve decision making using data science algorithms. There are sets of frequently used algorithms described in the chapter: linear, logistic regression models, decision trees as a classical example of supervised learning, and k-means and hierarchical clustering as unsupervised learning. Application of data science algorithms gives an opportunity for deep analyses and understanding of business processes in aviation, gives structuring of problems, provides systematization of business processes. Business processes modeling, based on the data science algorithms, enables us to substantiate solutions and even automate the processes of business decision making.

Download Full-text

Data Science Tools Application for Business Processes Modelling in Aviation

Research Anthology on Reliability and Safety in Aviation Systems, Spacecraft, and Air Transport ◽

10.4018/978-1-7998-5357-2.ch024 ◽

2021 ◽

pp. 617-631

Author(s):

Maryna Nehrey ◽

Taras Hnot

Keyword(s):

Decision Making ◽

Logistic Regression ◽

Unsupervised Learning ◽

Decision Trees ◽

Supervised Learning ◽

Data Science ◽

Business Processes ◽

Business Decision ◽

Logistic Regression Models ◽

Using Data

Successful business involves making decisions under uncertainty using a lot of information. Modern modeling approaches based on data science algorithms are a necessity for the effective management of business processes in aviation. Data science involves principles, processes, and techniques for understanding business processes through the analysis of data. The main goal of this chapter is to improve decision making using data science algorithms. There are sets of frequently used algorithms described in the chapter: linear, logistic regression models, decision trees as a classical example of supervised learning, and k-means and hierarchical clustering as unsupervised learning. Application of data science algorithms gives an opportunity for deep analyses and understanding of business processes in aviation, gives structuring of problems, provides systematization of business processes. Business processes modeling, based on the data science algorithms, enables us to substantiate solutions and even automate the processes of business decision making.

Download Full-text

Supervised and Unsupervised Learning for Data Science

10.1007/978-3-030-22475-2 ◽

2020 ◽

Keyword(s):

Unsupervised Learning ◽

Data Science ◽

Supervised And Unsupervised Learning

Download Full-text