scholarly journals Computational science: shifting the focus from tools to models

F1000Research ◽  
2014 ◽  
Vol 3 ◽  
pp. 101 ◽  
Author(s):  
Konrad Hinsen

Computational techniques have revolutionized many aspects of scientific research over the last few decades. Experimentalists use computation for data analysis, processing ever bigger data sets. Theoreticians compute predictions from ever more complex models. However, traditional articles do not permit the publication of big data sets or complex models. As a consequence, these crucial pieces of information no longer enter the scientific record. Moreover, they have become prisoners of scientific software: many models exist only as software implementations, and the data are often stored in proprietary formats defined by the software. In this article, I argue that this emphasis on software tools over models and data is detrimental to science in the long term, and I propose a means by which this can be reversed.

2018 ◽  
Vol 20 (1) ◽  
Author(s):  
Tiko Iyamu

Background: Over the years, big data analytics has been statically carried out in a programmed way, which does not allow for translation of data sets from a subjective perspective. This approach affects an understanding of why and how data sets manifest themselves into various forms in the way that they do. This has a negative impact on the accuracy, redundancy and usefulness of data sets, which in turn affects the value of operations and the competitive effectiveness of an organisation. Also, the current single approach lacks a detailed examination of data sets, which big data deserve in order to improve purposefulness and usefulness.Objective: The purpose of this study was to propose a multilevel approach to big data analysis. This includes examining how a sociotechnical theory, the actor network theory (ANT), can be complementarily used with analytic tools for big data analysis.Method: In the study, the qualitative methods were employed from the interpretivist approach perspective.Results: From the findings, a framework that offers big data analytics at two levels, micro- (strategic) and macro- (operational) levels, was developed. Based on the framework, a model was developed, which can be used to guide the analysis of heterogeneous data sets that exist within networks.Conclusion: The multilevel approach ensures a fully detailed analysis, which is intended to increase accuracy, reduce redundancy and put the manipulation and manifestation of data sets into perspectives for improved organisations’ competitiveness.


2015 ◽  
Vol 87 (3) ◽  
pp. 815-835 ◽  
Author(s):  
Pavel Baltiiski ◽  
Ilia Iliev ◽  
Boian Kehaiov ◽  
Vladimir Poulkov ◽  
Todor Cooklev

SPE Journal ◽  
2017 ◽  
Vol 23 (03) ◽  
pp. 719-736 ◽  
Author(s):  
Quan Cai ◽  
Wei Yu ◽  
Hwa Chi Liang ◽  
Jenn-Tai Liang ◽  
Suojin Wang ◽  
...  

Summary The oil-and-gas industry is entering an era of “big data” because of the huge number of wells drilled with the rapid development of unconventional oil-and-gas reservoirs during the past decade. The massive amount of data generated presents a great opportunity for the industry to use data-analysis tools to help make informed decisions. The main challenge is the lack of the application of effective and efficient data-analysis tools to analyze and extract useful information for the decision-making process from the enormous amount of data available. In developing tight shale reservoirs, it is critical to have an optimal drilling strategy, thereby minimizing the risk of drilling in areas that would result in low-yield wells. The objective of this study is to develop an effective data-analysis tool capable of dealing with big and complicated data sets to identify hot zones in tight shale reservoirs with the potential to yield highly productive wells. The proposed tool is developed on the basis of nonparametric smoothing models, which are superior to the traditional multiple-linear-regression (MLR) models in both the predictive power and the ability to deal with nonlinear, higher-order variable interactions. This data-analysis tool is capable of handling one response variable and multiple predictor variables. To validate our tool, we used two real data sets—one with 249 tight oil horizontal wells from the Middle Bakken and the other with 2,064 shale gas horizontal wells from the Marcellus Shale. Results from the two case studies revealed that our tool not only can achieve much better predictive power than the traditional MLR models on identifying hot zones in the tight shale reservoirs but also can provide guidance on developing the optimal drilling and completion strategies (e.g., well length and depth, amount of proppant and water injected). By comparing results from the two data sets, we found that our tool can achieve model performance with the big data set (2,064 Marcellus wells) with only four predictor variables that is similar to that with the small data set (249 Bakken wells) with six predictor variables. This implies that, for big data sets, even with a limited number of available predictor variables, our tool can still be very effective in identifying hot zones that would yield highly productive wells. The data sets that we have access to in this study contain very limited completion, geological, and petrophysical information. Results from this study clearly demonstrated that the data-analysis tool is certainly powerful and flexible enough to take advantage of any additional engineering and geology data to allow the operators to gain insights on the impact of these factors on well performance.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Yixue Zhu ◽  
Boyue Chai

With the development of increasingly advanced information technology and electronic technology, especially with regard to physical information systems, cloud computing systems, and social services, big data will be widely visible, creating benefits for people and at the same time facing huge challenges. In addition, with the advent of the era of big data, the scale of data sets is getting larger and larger. Traditional data analysis methods can no longer solve the problem of large-scale data sets, and the hidden information behind big data is digging out, especially in the field of e-commerce. We have become a key factor in competition among enterprises. We use a support vector machine method based on parallel computing to analyze the data. First, the training samples are divided into several working subsets through the SOM self-organizing neural network classification method. Compared with the ever-increasing progress of information technology and electronic equipment, especially the related physical information system finally merges the training results of each working set, so as to quickly deal with the problem of massive data prediction and analysis. This paper proposes that big data has the flexibility of expansion and quality assessment system, so it is meaningful to replace the double-sidedness of quality assessment with big data. Finally, considering the excellent performance of parallel support vector machines in data mining and analysis, we apply this method to the big data analysis of e-commerce. The research results show that parallel support vector machines can solve the problem of processing large-scale data sets. The emergence of data dirty problems has increased the effective rate by at least 70%.


2021 ◽  
Vol 2 (4) ◽  
pp. 1-22
Author(s):  
Jing Rui Chen ◽  
P. S. Joseph Ng

Griffith AI&BD is a technology company that uses big data platform and artificial intelligence technology to produce products for schools. The company focuses on primary and secondary school education support and data analysis assistance system and campus ARTIFICIAL intelligence products for the compulsory education stage in the Chinese market. Through big data, machine learning and data mining, scattered on campus and distributed systems enable anyone to sign up to join the huge data processing grid, and access learning support big data analysis and matching after helping students expand their knowledge in a variety of disciplines and learning and promotion. Improve the learning process based on large data sets of students, and combine ai technology to develop AI electronic devices. To provide schools with the best learning experience to survive in a competitive world.


F1000Research ◽  
2014 ◽  
Vol 3 ◽  
pp. 71 ◽  
Author(s):  
Zeeshan Ahmed ◽  
Saman Zeeshan ◽  
Thomas Dandekar

Software design and sustainable software engineering are essential for the long-term development of bioinformatics software. Typical challenges in an academic environment are short-term contracts, island solutions, pragmatic approaches and loose documentation. Upcoming new challenges are big data, complex data sets, software compatibility and rapid changes in data representation. Our approach to cope with these challenges consists of iterative intertwined cycles of development (“Butterfly” paradigm) for key steps in scientific software engineering. User feedback is valued as well as software planning in a sustainable and interoperable way. Tool usage should be easy and intuitive. A middleware supports a user-friendly Graphical User Interface (GUI) as well as a database/tool development independently. We validated the approach of our own software development and compared the different design paradigms in various software solutions.


Author(s):  
A. Sheik Abdullah ◽  
R. Suganya ◽  
S. Selvakumar ◽  
S. Rajaram

Classification is considered to be the one of the data analysis technique which can be used over many applications. Classification model predicts categorical continuous class labels. Clustering mainly deals with grouping of variables based upon similar characteristics. Classification models are experienced by comparing the predicted values to that of the known target values in a set of test data. Data classification has many applications in business modeling, marketing analysis, credit risk analysis; biomedical engineering and drug retort modeling. The extension of data analysis and classification makes the insight into big data with an exploration to processing and managing large data sets. This chapter deals with various techniques, methodologies that correspond to the classification problem in data analysis process and its methodological impacts to big data.


Sign in / Sign up

Export Citation Format

Share Document