Utilization of text mining as a big data analysis tool for food science and nutrition

Summary The oil-and-gas industry is entering an era of “big data” because of the huge number of wells drilled with the rapid development of unconventional oil-and-gas reservoirs during the past decade. The massive amount of data generated presents a great opportunity for the industry to use data-analysis tools to help make informed decisions. The main challenge is the lack of the application of effective and efficient data-analysis tools to analyze and extract useful information for the decision-making process from the enormous amount of data available. In developing tight shale reservoirs, it is critical to have an optimal drilling strategy, thereby minimizing the risk of drilling in areas that would result in low-yield wells. The objective of this study is to develop an effective data-analysis tool capable of dealing with big and complicated data sets to identify hot zones in tight shale reservoirs with the potential to yield highly productive wells. The proposed tool is developed on the basis of nonparametric smoothing models, which are superior to the traditional multiple-linear-regression (MLR) models in both the predictive power and the ability to deal with nonlinear, higher-order variable interactions. This data-analysis tool is capable of handling one response variable and multiple predictor variables. To validate our tool, we used two real data sets—one with 249 tight oil horizontal wells from the Middle Bakken and the other with 2,064 shale gas horizontal wells from the Marcellus Shale. Results from the two case studies revealed that our tool not only can achieve much better predictive power than the traditional MLR models on identifying hot zones in the tight shale reservoirs but also can provide guidance on developing the optimal drilling and completion strategies (e.g., well length and depth, amount of proppant and water injected). By comparing results from the two data sets, we found that our tool can achieve model performance with the big data set (2,064 Marcellus wells) with only four predictor variables that is similar to that with the small data set (249 Bakken wells) with six predictor variables. This implies that, for big data sets, even with a limited number of available predictor variables, our tool can still be very effective in identifying hot zones that would yield highly productive wells. The data sets that we have access to in this study contain very limited completion, geological, and petrophysical information. Results from this study clearly demonstrated that the data-analysis tool is certainly powerful and flexible enough to take advantage of any additional engineering and geology data to allow the operators to gain insights on the impact of these factors on well performance.

Download Full-text

A Study on the Trends of Cosmetics through Big Data Analysis - Focusing on text mining and semantic network analysis -

Journal of The Korean Society of Illustration Research ◽

10.37379/jksir.2021.66.8 ◽

2021 ◽

Vol 66 ◽

pp. 85-95

Author(s):

Hee Suk Lim ◽

◽

Jae Wook Shin

Keyword(s):

Big Data ◽

Data Analysis ◽

Network Analysis ◽

Text Mining ◽

Semantic Network ◽

Big Data Analysis ◽

Semantic Network Analysis

Download Full-text

Big Data Analysis for Dance Studies Using Text Mining

The Journal of Dance Society for Documentation & History ◽

10.26861/sddh.2016.42.191 ◽

2016 ◽

Vol 42 ◽

pp. 191-212

Author(s):

Jungmin Lee ◽

◽

Eunja Jun ◽

Jungmin Chae

Keyword(s):

Big Data ◽

Data Analysis ◽

Text Mining ◽

Big Data Analysis ◽

Dance Studies

Download Full-text

A study on social big data analysis using text clustering

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.12.11023 ◽

2018 ◽

Vol 7 (2.12) ◽

pp. 1

Author(s):

Jin HeeKu ◽

Yoon Su Jeong

Keyword(s):

Big Data ◽

Data Analysis ◽

Text Mining ◽

Word Association ◽

Text Clustering ◽

Big Data Analysis ◽

Subject Analysis ◽

Clustering Model ◽

Social Big Data ◽

Cluster Dendrogram

Background/Objectives: As the use of big data increases in various fields, the use of social big data analysis for social media is increasing rapidly.This study proposed a method to apply text clustering for analysis by related topics of texts extracted using text mining of social big data.Methods/Statistical analysis: R was used for data collection and analysis, and social big data was collected from Twitter. The clustering model applicable to the related subject analysis of Twitter text was compared and selected and text clustering was performed. Text clustering is analyzed through a cluster dendrogram by generating a corpus, then grouping similar entities from the term-document matrix, and removing the sparse words.Findings: In this study, text clustering improves the difficulty in analyzing by word association and subject in text mining methods such as word cloud. Especially, in the text clustering model for the related topic analysis of social big data, the hierarchical clustering model based on the cosine similarity was more suitable than the non-hierarchical model for identifying which terms in the tweet have an association with each other. In addition, cluster dendrogram has been found to be effective in analyzing text contexts by grouping several groups of similar texts repeatedly in the visualization process.Improvements/Applications: This study can be used to confirm ideas and opinions of various participants by using Social Big Data, and to analyze more precisely the complex relationship between the prediction of social problems and the phenomenon.

Download Full-text