scholarly journals Clustering in Wineinformatics with Attribute Selection to Increase Uniqueness of Clusters

Fermentation ◽  
2021 ◽  
Vol 7 (1) ◽  
pp. 27
Author(s):  
Jared McCune ◽  
Alex Riley ◽  
Bernard Chen

Wineinformatics is a new data science research area that focuses on large amounts of wine-related data. Most of the current Wineinformatics researches are focused on supervised learning to predict the wine quality, price, region and weather. In this research, unsupervised learning using K-means clustering with optimal K search and filtration process is studied on a Bordeaux-region specific dataset to form clusters and find representative wines in each cluster. 14,349 wines representing the 21st century Bordeaux dataset are clustered into 43 and 13 clusters with detailed analysis on the number of wines, dominant wine characteristics, average wine grades, and representative wines in each cluster. Similar research results are also generated and presented on 435 elite wines (wines that scored 95 points and above on a 100 points scale). The information generated from this research can be beneficial to wine vendors to make a selection given the limited number of wines they can realistically offer, to connoisseurs to study wines in a target region/vintage/price with a representative short list, and to wine consumers to get recommendations. Many possible researches can adopt the same process to analyze and find representative wines in different wine making regions/countries, vintages, or pivot points. This paper opens up a new door for Wineinformatics in unsupervised learning researches.

Beverages ◽  
2021 ◽  
Vol 7 (1) ◽  
pp. 3
Author(s):  
Zeqing Dong ◽  
Travis Atkison ◽  
Bernard Chen

Although wine has been produced for several thousands of years, the ancient beverage has remained popular and even more affordable in modern times. Among all wine making regions, Bordeaux, France is probably one of the most prestigious wine areas in history. Since hundreds of wines are produced from Bordeaux each year, humans are not likely to be able to examine all wines across multiple vintages to define the characteristics of outstanding 21st century Bordeaux wines. Wineinformatics is a newly proposed data science research with an application domain in wine to process a large amount of wine data through the computer. The goal of this paper is to build a high-quality computational model on wine reviews processed by the full power of the Computational Wine Wheel to understand 21st century Bordeaux wines. On top of 985 binary-attributes generated from the Computational Wine Wheel in our previous research, we try to add additional attributes by utilizing a CATEGORY and SUBCATEGORY for an additional 14 and 34 continuous-attributes to be included in the All Bordeaux (14,349 wine) and the 1855 Bordeaux datasets (1359 wines). We believe successfully merging the original binary-attributes and the new continuous-attributes can provide more insights for Naïve Bayes and Supported Vector Machine (SVM) to build the model for a wine grade category prediction. The experimental results suggest that, for the All Bordeaux dataset, with the additional 14 attributes retrieved from CATEGORY, the Naïve Bayes classification algorithm was able to outperform the existing research results by increasing accuracy by 2.15%, precision by 8.72%, and the F-score by 1.48%. For the 1855 Bordeaux dataset, with the additional attributes retrieved from the CATEGORY and SUBCATEGORY, the SVM classification algorithm was able to outperform the existing research results by increasing accuracy by 5%, precision by 2.85%, recall by 5.56%, and the F-score by 4.07%. The improvements demonstrated in the research show that attributes retrieved from the CATEGORY and SUBCATEGORY has the power to provide more information to classifiers for superior model generation. The model build in this research can better distinguish outstanding and class 21st century Bordeaux wines. This paper provides new directions in Wineinformatics for technical research in data science, such as regression, multi-target, classification and domain specific research, including wine region terroir analysis, wine quality prediction, and weather impact examination.


2021 ◽  
Vol 21 (2) ◽  
pp. 1-31
Author(s):  
Bjarne Pfitzner ◽  
Nico Steckhan ◽  
Bert Arnrich

Data privacy is a very important issue. Especially in fields like medicine, it is paramount to abide by the existing privacy regulations to preserve patients’ anonymity. However, data is required for research and training machine learning models that could help gain insight into complex correlations or personalised treatments that may otherwise stay undiscovered. Those models generally scale with the amount of data available, but the current situation often prohibits building large databases across sites. So it would be beneficial to be able to combine similar or related data from different sites all over the world while still preserving data privacy. Federated learning has been proposed as a solution for this, because it relies on the sharing of machine learning models, instead of the raw data itself. That means private data never leaves the site or device it was collected on. Federated learning is an emerging research area, and many domains have been identified for the application of those methods. This systematic literature review provides an extensive look at the concept of and research into federated learning and its applicability for confidential healthcare datasets.


2020 ◽  
Vol 20 (2) ◽  
pp. e08
Author(s):  
Verónica Cuello ◽  
Gonzalo Zarza ◽  
Maria Corradini ◽  
Michael Rogers

The objective of this article is to introduce a comprehensiveend-to-end solution aimed at enabling the applicationof state-of-the-art Data Science and Analyticmethodologies to a food science related problem. Theproblem refers to the automation of load, homogenization,complex processing and real-time accessibility tolow molecular-weight gelators (LMWGs) data to gaininsights into their assembly behavior, i.e. whether agel can be mixed with an appropriate solvent or not.Most of the work within the field of Colloidal andFood Science in relation to LMWGs have centered onidentifying adequate solvents that can generate stablegels and evaluating how the LMWG characteristics canaffect gelation. As a result, extensive databases havebeen methodically and manually registered, storingresults from different laboratory experiments. Thecomplexity of those databases, and the errors causedby manual data entry, can interfere with the analysisand visualization of relations and patterns, limiting theutility of the experimental work.Due to the above mentioned, we have proposed ascalable and flexible Big Data solution to enable theunification, homogenization and availability of the datathrough the application of tools and methodologies.This approach contributes to optimize data acquisitionduring LMWG research and reduce redundant data processingand analysis, while also enabling researchersto explore a wider range of testing conditions and pushforward the frontier in Food Science research.


2019 ◽  
Vol 8 (2S11) ◽  
pp. 3491-3495

The term Data Engineering did not get much popularity as the terminologies like Data Science or Data Analytics, mainly because the importance of this technique or concept is normally observed or experienced only during working with data or handling data or playing with data as a Data Scientist or Data Analyst. Though neither of these two, but as an academician and the urge to learn, while working with Python, this topic ‘Data engineering’ and one of its major sub topic or concept ‘Data Wrangling’ has drawn attention and this paper is a small step to explain the experience of handling data which uses Wrangling concept, using Python. So Data Wrangling, earlier referred to as Data Munging (when done by hand or manually), is the method of transforming and mapping data from one available data format into another format with the idea of making it more appropriate and important for a variety of relatedm purposes such as analytics. Data wrangling is the modern name used for data pre-processing rather Munging. The Python Library used for the research work shown here is called Pandas. Though the major Research Area is ‘Application of Data Analytics on Academic Data using Python’, this paper focuses on a small preliminary topic of the mentioned research work named Data wrangling using Python (Pandas Library).


2018 ◽  
Vol 6 (3) ◽  
pp. 669-686 ◽  
Author(s):  
Michael Dietze

Abstract. Environmental seismology is the study of the seismic signals emitted by Earth surface processes. This emerging research field is at the intersection of seismology, geomorphology, hydrology, meteorology, and further Earth science disciplines. It amalgamates a wide variety of methods from across these disciplines and ultimately fuses them in a common analysis environment. This overarching scope of environmental seismology requires a coherent yet integrative software which is accepted by many of the involved scientific disciplines. The statistic software R has gained paramount importance in the majority of data science research fields. R has well-justified advances over other mostly commercial software, which makes it the ideal language to base a comprehensive analysis toolbox on. The article introduces the avenues and needs of environmental seismology, and how these are met by the R package eseis. The conceptual structure, example data sets, and available functions are demonstrated. Worked examples illustrate possible applications of the package and in-depth descriptions of the flexible use of the functions. The package has a registered DOI, is available under the GPL licence on the Comprehensive R Archive Network (CRAN), and is maintained on GitHub.


Author(s):  
Sourav Maitra ◽  
A. C. Mondal

End users also start days with Internet. This has become the scenario. One of the most burgeoning needs of computer science research is research on web technologies and intelligence, as that has become one of the most emerging nowadays. A big area of other research areas like e-marketing, e-learning, e-governance, searching technologies, et cetera will be highly benefited if intelligence can be added to the Web. The objective of this chapter is to create a clear understanding of Web technology research and highlight the ways to implement Semantic Web. The chapter also discusses the tools and technologies that can be applied to develop Semantic Web. This new research area needs enough care as sometimes data are open. Thus, software engineering issues are also a focus.


Author(s):  
Maryna Nehrey ◽  
Taras Hnot

Successful business involves making decisions under uncertainty using a lot of information. Modern modeling approaches based on data science algorithms are a necessity for the effective management of business processes in aviation. Data science involves principles, processes, and techniques for understanding business processes through the analysis of data. The main goal of this chapter is to improve decision making using data science algorithms. There are sets of frequently used algorithms described in the chapter: linear, logistic regression models, decision trees as a classical example of supervised learning, and k-means and hierarchical clustering as unsupervised learning. Application of data science algorithms gives an opportunity for deep analyses and understanding of business processes in aviation, gives structuring of problems, provides systematization of business processes. Business processes modeling, based on the data science algorithms, enables us to substantiate solutions and even automate the processes of business decision making.


Author(s):  
Maryna Nehrey ◽  
Taras Hnot

Successful business involves making decisions under uncertainty using a lot of information. Modern modeling approaches based on data science algorithms are a necessity for the effective management of business processes in aviation. Data science involves principles, processes, and techniques for understanding business processes through the analysis of data. The main goal of this chapter is to improve decision making using data science algorithms. There are sets of frequently used algorithms described in the chapter: linear, logistic regression models, decision trees as a classical example of supervised learning, and k-means and hierarchical clustering as unsupervised learning. Application of data science algorithms gives an opportunity for deep analyses and understanding of business processes in aviation, gives structuring of problems, provides systematization of business processes. Business processes modeling, based on the data science algorithms, enables us to substantiate solutions and even automate the processes of business decision making.


Sign in / Sign up

Export Citation Format

Share Document