Structure and Dynamics of Many-Particle Systems: Big Data Sets and Data Analysis

Author(s):  
Wolfram Schommers
2018 ◽  
Vol 20 (1) ◽  
Author(s):  
Tiko Iyamu

Background: Over the years, big data analytics has been statically carried out in a programmed way, which does not allow for translation of data sets from a subjective perspective. This approach affects an understanding of why and how data sets manifest themselves into various forms in the way that they do. This has a negative impact on the accuracy, redundancy and usefulness of data sets, which in turn affects the value of operations and the competitive effectiveness of an organisation. Also, the current single approach lacks a detailed examination of data sets, which big data deserve in order to improve purposefulness and usefulness.Objective: The purpose of this study was to propose a multilevel approach to big data analysis. This includes examining how a sociotechnical theory, the actor network theory (ANT), can be complementarily used with analytic tools for big data analysis.Method: In the study, the qualitative methods were employed from the interpretivist approach perspective.Results: From the findings, a framework that offers big data analytics at two levels, micro- (strategic) and macro- (operational) levels, was developed. Based on the framework, a model was developed, which can be used to guide the analysis of heterogeneous data sets that exist within networks.Conclusion: The multilevel approach ensures a fully detailed analysis, which is intended to increase accuracy, reduce redundancy and put the manipulation and manifestation of data sets into perspectives for improved organisations’ competitiveness.


SPE Journal ◽  
2017 ◽  
Vol 23 (03) ◽  
pp. 719-736 ◽  
Author(s):  
Quan Cai ◽  
Wei Yu ◽  
Hwa Chi Liang ◽  
Jenn-Tai Liang ◽  
Suojin Wang ◽  
...  

Summary The oil-and-gas industry is entering an era of “big data” because of the huge number of wells drilled with the rapid development of unconventional oil-and-gas reservoirs during the past decade. The massive amount of data generated presents a great opportunity for the industry to use data-analysis tools to help make informed decisions. The main challenge is the lack of the application of effective and efficient data-analysis tools to analyze and extract useful information for the decision-making process from the enormous amount of data available. In developing tight shale reservoirs, it is critical to have an optimal drilling strategy, thereby minimizing the risk of drilling in areas that would result in low-yield wells. The objective of this study is to develop an effective data-analysis tool capable of dealing with big and complicated data sets to identify hot zones in tight shale reservoirs with the potential to yield highly productive wells. The proposed tool is developed on the basis of nonparametric smoothing models, which are superior to the traditional multiple-linear-regression (MLR) models in both the predictive power and the ability to deal with nonlinear, higher-order variable interactions. This data-analysis tool is capable of handling one response variable and multiple predictor variables. To validate our tool, we used two real data sets—one with 249 tight oil horizontal wells from the Middle Bakken and the other with 2,064 shale gas horizontal wells from the Marcellus Shale. Results from the two case studies revealed that our tool not only can achieve much better predictive power than the traditional MLR models on identifying hot zones in the tight shale reservoirs but also can provide guidance on developing the optimal drilling and completion strategies (e.g., well length and depth, amount of proppant and water injected). By comparing results from the two data sets, we found that our tool can achieve model performance with the big data set (2,064 Marcellus wells) with only four predictor variables that is similar to that with the small data set (249 Bakken wells) with six predictor variables. This implies that, for big data sets, even with a limited number of available predictor variables, our tool can still be very effective in identifying hot zones that would yield highly productive wells. The data sets that we have access to in this study contain very limited completion, geological, and petrophysical information. Results from this study clearly demonstrated that the data-analysis tool is certainly powerful and flexible enough to take advantage of any additional engineering and geology data to allow the operators to gain insights on the impact of these factors on well performance.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Yixue Zhu ◽  
Boyue Chai

With the development of increasingly advanced information technology and electronic technology, especially with regard to physical information systems, cloud computing systems, and social services, big data will be widely visible, creating benefits for people and at the same time facing huge challenges. In addition, with the advent of the era of big data, the scale of data sets is getting larger and larger. Traditional data analysis methods can no longer solve the problem of large-scale data sets, and the hidden information behind big data is digging out, especially in the field of e-commerce. We have become a key factor in competition among enterprises. We use a support vector machine method based on parallel computing to analyze the data. First, the training samples are divided into several working subsets through the SOM self-organizing neural network classification method. Compared with the ever-increasing progress of information technology and electronic equipment, especially the related physical information system finally merges the training results of each working set, so as to quickly deal with the problem of massive data prediction and analysis. This paper proposes that big data has the flexibility of expansion and quality assessment system, so it is meaningful to replace the double-sidedness of quality assessment with big data. Finally, considering the excellent performance of parallel support vector machines in data mining and analysis, we apply this method to the big data analysis of e-commerce. The research results show that parallel support vector machines can solve the problem of processing large-scale data sets. The emergence of data dirty problems has increased the effective rate by at least 70%.


2021 ◽  
Vol 2 (4) ◽  
pp. 1-22
Author(s):  
Jing Rui Chen ◽  
P. S. Joseph Ng

Griffith AI&BD is a technology company that uses big data platform and artificial intelligence technology to produce products for schools. The company focuses on primary and secondary school education support and data analysis assistance system and campus ARTIFICIAL intelligence products for the compulsory education stage in the Chinese market. Through big data, machine learning and data mining, scattered on campus and distributed systems enable anyone to sign up to join the huge data processing grid, and access learning support big data analysis and matching after helping students expand their knowledge in a variety of disciplines and learning and promotion. Improve the learning process based on large data sets of students, and combine ai technology to develop AI electronic devices. To provide schools with the best learning experience to survive in a competitive world.


Author(s):  
A. Sheik Abdullah ◽  
R. Suganya ◽  
S. Selvakumar ◽  
S. Rajaram

Classification is considered to be the one of the data analysis technique which can be used over many applications. Classification model predicts categorical continuous class labels. Clustering mainly deals with grouping of variables based upon similar characteristics. Classification models are experienced by comparing the predicted values to that of the known target values in a set of test data. Data classification has many applications in business modeling, marketing analysis, credit risk analysis; biomedical engineering and drug retort modeling. The extension of data analysis and classification makes the insight into big data with an exploration to processing and managing large data sets. This chapter deals with various techniques, methodologies that correspond to the classification problem in data analysis process and its methodological impacts to big data.


Author(s):  
Son Nguyen ◽  
Anthony Park

This chapter compares the performances of multiple Big Data techniques applied for time series forecasting and traditional time series models on three Big Data sets. The traditional time series models, Autoregressive Integrated Moving Average (ARIMA), and exponential smoothing models are used as the baseline models against Big Data analysis methods in the machine learning. These Big Data techniques include regression trees, Support Vector Machines (SVM), Multilayer Perceptrons (MLP), Recurrent Neural Networks (RNN), and long short-term memory neural networks (LSTM). Across three time series data sets used (unemployment rate, bike rentals, and transportation), this study finds that LSTM neural networks performed the best. In conclusion, this study points out that Big Data machine learning algorithms applied in time series can outperform traditional time series models. The computations in this work are done by Python, one of the most popular open-sourced platforms for data science and Big Data analysis.


Author(s):  
Arpit Kumar Sharma ◽  
Arvind Dhaka ◽  
Amita Nandal ◽  
Kumar Swastik ◽  
Sunita Kumari

The meaning of the term “big data” can be inferred by its name itself (i.e., the collection of large structured or unstructured data sets). In addition to their huge quantity, these data sets are so complex that they cannot be analyzed in any way using the conventional data handling software and hardware tools. If processed judiciously, big data can prove to be a huge advantage for the industries using it. Due to its usefulness, studies are being conducted to create methods to handle the big data. Knowledge extraction from big data is very important. Other than this, there is no purpose for accumulating such volumes of data. Cloud computing is a powerful tool which provides a platform for the storage and computation of massive amounts of data.


2022 ◽  
pp. 590-621
Author(s):  
Obinna Chimaobi Okechukwu

In this chapter, a discussion is presented on the latest tools and techniques available for Big Data Visualization. These tools, techniques and methods need to be understood appropriately to analyze Big Data. Big Data is a whole new paradigm where huge sets of data are generated and analyzed based on volume, velocity and variety. Conventional data analysis methods are incapable of processing data of this dimension; hence, it is fundamentally important to be familiar with new tools and techniques capable of processing these datasets. This chapter will illustrate tools available for analysts to process and present Big Data sets in ways that can be used to make appropriate decisions. Some of these tools (e.g., Tableau, RapidMiner, R Studio, etc.) have phenomenal capabilities to visualize processed data in ways traditional tools cannot. The chapter will also aim to explain the differences between these tools and their utilities based on scenarios.


Author(s):  
Miguel Figueres-Esteban ◽  
Peter Hughes ◽  
Coen van Gulijk

In the big data era, large and complex data sets will exceed scientists’ capacity to make sense of them in the traditional way. New approaches in data analysis, supported by computer science, will be necessary to address the problems that emerge with the rise of big data. The analysis of the Close Call database, which is a text-based database for near-miss reporting on the GB railways, provides a test case. The traditional analysis of Close Calls is time consuming and prone to differences in interpretation. This paper investigates the use of visual analytics techniques, based on network text analysis, to conduct data analysis and extract safety knowledge from 500 randomly selected Close Call records relating to worker slips, trips and falls. The results demonstrate a straightforward, yet effective, way to identify hazardous conditions without having to read each report individually. This opens up new ways to perform data analysis in safety science.


Author(s):  
Richard Earl

Topology remains a large, active research area in mathematics. Unsurprisingly its character has changed over the last century—there is considerably less current interest in general topology, but whole new areas have emerged, such as topological data analysis to help analyze big data sets. The Epilogue concludes that the interfaces of topology with other areas have remained rich and numerous, and it can be hard telling where topology stops and geometry or algebra or analysis or physics begin. Often that richness comes from studying structures that have interconnected flavours of algebra, geometry, and topology, but sometimes a result, seemingly of an entirely algebraic nature say, can be proved by purely topological means.


Sign in / Sign up

Export Citation Format

Share Document