scholarly journals A Review on Classification of Data Imbalance using BigData

2021 ◽  
Vol 13 (03) ◽  
pp. 09-22
Author(s):  
Ramasubramanian ◽  
Hariharan Shanmugasundaram

Classification is one among the data mining function that assigns items in a collection to target categories or collection of data to provide more accurate predictions and analysis. Classification using supervised learning method aims to identify the category of the class to which a new data will fall under. With the advancement of technology and increase in the generation of real-time data from various sources like Internet, IoT and Social media it needs more processing and challenging. One such challenge in processing is data imbalance. In the imbalanced dataset, majority classes dominate over minority classes causing the machine learning classifiers to be more biased towards majority classes and also most classification algorithm predicts all the test data with majority classes. In this paper, the author analysis the data imbalance models using big data and classification algorithm.

Author(s):  
Atheer Alahmed ◽  
Amal Alrasheedi ◽  
Maha Alharbi ◽  
Norah Alrebdi ◽  
Marwan Aleasa ◽  
...  

2021 ◽  
Author(s):  
Nagaraju Reddicharla ◽  
Subba Ramarao Rachapudi ◽  
Indra Utama ◽  
Furqan Ahmed Khan ◽  
Prabhker Reddy Vanam ◽  
...  

Abstract Well testing is one of the vital process as part of reservoir performance monitoring. As field matures with increase in number of well stock, testing becomes tedious job in terms of resources (MPFM and test separators) and this affect the production quota delivery. In addition, the test data validation and approval follow a business process that needs up to 10 days before to accept or reject the well tests. The volume of well tests conducted were almost 10,000 and out of them around 10 To 15 % of tests were rejected statistically per year. The objective of the paper is to develop a methodology to reduce well test rejections and timely raising the flag for operator intervention to recommence the well test. This case study was applied in a mature field, which is producing for 40 years that has good volume of historical well test data is available. This paper discusses the development of a data driven Well test data analyzer and Optimizer supported by artificial intelligence (AI) for wells being tested using MPFM in two staged approach. The motivating idea is to ingest historical, real-time data, well model performance curve and prescribe the quality of the well test data to provide flag to operator on real time. The ML prediction results helps testing operations and can reduce the test acceptance turnaround timing drastically from 10 days to hours. In Second layer, an unsupervised model with historical data is helping to identify the parameters that affecting for rejection of the well test example duration of testing, choke size, GOR etc. The outcome from the modeling will be incorporated in updating the well test procedure and testing Philosophy. This approach is being under evaluation stage in one of the asset in ADNOC Onshore. The results are expected to be reducing the well test rejection by at least 5 % that further optimize the resources required and improve the back allocation process. Furthermore, real time flagging of the test Quality will help in reduction of validation cycle from 10 days hours to improve the well testing cycle process. This methodology improves integrated reservoir management compliance of well testing requirements in asset where resources are limited. This methodology is envisioned to be integrated with full field digital oil field Implementation. This is a novel approach to apply machine learning and artificial intelligence application to well testing. It maximizes the utilization of real-time data for creating advisory system that improve test data quality monitoring and timely decision-making to reduce the well test rejection.


2014 ◽  
Vol 599-601 ◽  
pp. 1487-1490 ◽  
Author(s):  
Li Kun Zheng ◽  
Kun Feng ◽  
Xiao Qing Xiao ◽  
Wei Qiao Song

This paper mainly discusses the application of the mass real-time data mining technology in equipment safety state evaluation in the power plant and the realization of the equipment comprehensive quantitative assessment and early warning of potential failure by mining analysis and modeling massive amounts of real-time data the power equipment. In addition to the foundational technology introduced in this paper, the technology is also verified by the application case in the power supply side remote diagnosis center of Guangdong electric institute.


2021 ◽  
Author(s):  
Marco Aceves-Fernandez

Abstract Dealing with electroencephalogram signals (EEG) are often not easy. The lack of predicability and complexity of such non-stationary, noisy and high dimensional signals is challenging. Cross Recurrence Plots (CRP) have been used extensively to deal with detecting subtle changes in signals even when the noise is embedded in the signal. In this contribution, a total of 121 children performed visual attention experiments and a proposed methodology using CRP and a Welch Power Spectral Distribution have been used to classify then between those who have ADHD and the control group. Additional tools were presented to determine to which extent the proposed methodology is able to classify accurately and avoid misclassifications, thus demonstrating that this methodology is feasible to classify EEG signals from subjects with ADHD. Lastly, the results were compared with a baseline machine learning method to prove experimentally that this methodology is consistent and the results repeatable.


Author(s):  
Sridharan Chandrasekaran ◽  
G. Suresh Kumar

Rate of Penetration (ROP) is one of the important factors influencing the drilling efficiency. Since cost recovery is an important bottom line in the drilling industry, optimizing ROP is essential to minimize the drilling operational cost and capital cost. Traditional the empirical models are not adaptive to new lithology changes and hence the predictive accuracy is low and subjective. With advancement in big data technologies, real- time data storage cost is lowered, and the availability of real-time data is enhanced. In this study, it is shown that optimization methods together with data models has immense potential in predicting ROP based on real time measurements on the rig. A machine learning based data model is developed by utilizing the offset vertical wells’ real time operational parameters while drilling. Data pre-processing methods and feature engineering methods modify the raw data into a processed data so that the model learns effectively from the inputs. A multi – layer back propagation neural network is developed, cross-validated and compared with field measurements and empirical models.


Sign in / Sign up

Export Citation Format

Share Document