The Art of Deploying Data Mining and Machine Learning in Developing and Managing Deepwater Turbidite Gas Assets

Mapping Intimacies ◽

10.2118/205652-ms ◽

2021 ◽

Author(s):

Edo Pratama

Keyword(s):

Machine Learning ◽

Data Mining ◽

Best Practices ◽

Field Data ◽

Query Language ◽

Field Performance ◽

Lessons Learned ◽

Support Vector ◽

Gas Field ◽

Gas Fields

Abstract Many oil and gas operators have challenges in deepwater turbidite gas asset's reservoir management plan (RMP) readiness due to lack of experience and very limited analog field data. The objective of this article is to demonstrate how data analytics workflow, comprising of data mining and machine learning-based global deepwater turbidite gas field benchmarking and lessons learned, to identify field performance and mitigate subsurface challenges in developing and managing deepwater turbidite gas assets. To mine turbidite field data from around the world, a customized R script was constructed using optical character recognition, regular expression (regex), rule-based logic to extract subsurface and surface data attributes from unstructured data sources. All extracted contents were transformed into a properly structured query language (SQL) database relational format for the cleansing process. Having established the turbidite assets repository, exploratory data analysis (EDA) was then employed to discover insight datasets. To analyze the field performance, the number of wells needed to deplete the field was identified using support vector regression, subsequently, K-means clustering was used to classify the reservoirs productivity. The results of field benchmarking analysis from EDA are deployed in a fit-for-purpose dashboard application, which provides an elegant and powerful framework for data management and analytics purposes. The analytic dashboard which was developed to visualize EDA findings will be presented in this article. The productivity of deepwater turbidite gas reservoirs has been classified based on the maximum gas flow rate and estimated ultimate recovery per well. This result help in identifying the high-rate, high-ultimate-recovery (HRHU) reservoirs of a deepwater turbidite gas field. The regex pattern for subsurface challenges specifically as related to reservoir uncertainties and associated risks, including operational challenges in developing and managing deepwater turbidite gas fields were identified through word cloud recognition. Key subsurface challenges were then categorized and statistically ranked, finally, a decomposition tree was used to identify the issues, impacts, and mitigation plan for dealing with identified risks based on best practices from a global project point of view. Deployment of this novel workflow provides insight for better decision-making and can be a prudent complementary tool for de-risking subsurface uncertainties in developing and managing deepwater turbidite gas assets. The findings from this study can be used to develop the framework that captures current best-practices in the formulation and execution of a RMP including monitoring and benchmark of asset performance in deepwater turbidite gas fields.

Download Full-text

Detection of FAKE NEWS on SOCIAL MEDIA using CLASSIFICATION Data Mining Techniques

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1637.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 3132-3138

Keyword(s):

Machine Learning ◽

Data Mining ◽

Social Media ◽

Information Exchange ◽

Learning Algorithm ◽

Daily Life ◽

Support Vector ◽

Machine Learning Algorithm ◽

Fake News ◽

Other Information

In today’s world social media is one of the most important tool for communication that helps people to interact with each other and share their thoughts, knowledge or any other information. Some of the most popular social media websites are Facebook, Twitter, Whatsapp and Wechat etc. Since, it has a large impact on people’s daily life it can be used a source for any fake or misinformation. So it is important that any information presented on social media should be evaluated for its genuineness and originality in terms of the probability of correctness and reliability to trust the information exchange. In this work we have identified the features that can be helpful in predicting whether a given Tweet is Rumor or Information. Two machine learning algorithm are executed using WEKA tool for the classification that is Decision Tree and Support Vector Machine.

Download Full-text

Integration of synthetic minority oversampling technique for imbalanced class

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v13.i1.pp102-108 ◽

2019 ◽

Vol 13 (1) ◽

pp. 102

Author(s):

Noviyanti Santoso ◽

Wahyu Wibowo ◽

Hilda Hikmawati

Keyword(s):

Machine Learning ◽

Data Mining ◽

Support Vector Machine ◽

Class Imbalance ◽

Original Data ◽

Support Vector ◽

Classification Methods ◽

Problematic Issue ◽

Imbalanced Class ◽

F Measure

In the data mining, a class imbalance is a problematic issue to look for the solutions. It probably because machine learning is constructed by using algorithms with assuming the number of instances in each balanced class, so when using a class imbalance, it is possible that the prediction results are not appropriate. They are solutions offered to solve class imbalance issues, including oversampling, undersampling, and synthetic minority oversampling technique (SMOTE). Both oversampling and undersampling have its disadvantages, so SMOTE is an alternative to overcome it. By integrating SMOTE in the data mining classification method such as Naive Bayes, Support Vector Machine (SVM), and Random Forest (RF) is expected to improve the performance of accuracy. In this research, it was found that the data of SMOTE gave better accuracy than the original data. In addition to the three classification methods used, RF gives the highest average AUC, F-measure, and G-means score.

Download Full-text

Investigating the Physics of Tokamak Global Stability with Interpretable Machine Learning Tools

Applied Sciences ◽

10.3390/app10196683 ◽

2020 ◽

Vol 10 (19) ◽

pp. 6683

Author(s):

Andrea Murari ◽

Emmanuele Peluso ◽

Michele Lungaroni ◽

Riccardo Rossi ◽

Michela Gelfusa ◽

...

Keyword(s):

Machine Learning ◽

Data Mining ◽

Independent Learning ◽

Support Vector ◽

Learning Tools ◽

Feedback Systems ◽

Theoretical Understanding ◽

Machine Learning Classifiers ◽

Learning Classifiers ◽

Mining Tools

The inadequacies of basic physics models for disruption prediction have induced the community to increasingly rely on data mining tools. In the last decade, it has been shown how machine learning predictors can achieve a much better performance than those obtained with manually identified thresholds or empirical descriptions of the plasma stability limits. The main criticisms of these techniques focus therefore on two different but interrelated issues: poor “physics fidelity” and limited interpretability. Insufficient “physics fidelity” refers to the fact that the mathematical models of most data mining tools do not reflect the physics of the underlying phenomena. Moreover, they implement a black box approach to learning, which results in very poor interpretability of their outputs. To overcome or at least mitigate these limitations, a general methodology has been devised and tested, with the objective of combining the predictive capability of machine learning tools with the expression of the operational boundary in terms of traditional equations more suited to understanding the underlying physics. The proposed approach relies on the application of machine learning classifiers (such as Support Vector Machines or Classification Trees) and Symbolic Regression via Genetic Programming directly to experimental databases. The results are very encouraging. The obtained equations of the boundary between the safe and disruptive regions of the operational space present almost the same performance as the machine learning classifiers, based on completely independent learning techniques. Moreover, these models possess significantly better predictive power than traditional representations, such as the Hugill or the beta limit. More importantly, they are realistic and intuitive mathematical formulas, which are well suited to supporting theoretical understanding and to benchmarking empirical models. They can also be deployed easily and efficiently in real-time feedback systems.

Download Full-text

Heart Disease Prediction Using Machine Learning

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-1131 ◽

2021 ◽

pp. 267-276

Author(s):

Baban. U. Rindhe ◽

Nikita Ahire ◽

Rupali Patil ◽

Shweta Gagare ◽

Manisha Darade

Keyword(s):

Machine Learning ◽

Data Mining ◽

Heart Disease ◽

Heart Diseases ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Whole Body ◽

Support Vector ◽

Learning Techniques

Heart-related diseases or Cardiovascular Diseases (CVDs) are the main reason for a huge number of death in the world over the last few decades and has emerged as the most life-threatening disease, not only in India but in the whole world. So, there is a need fora reliable, accurate, and feasible system to diagnose such diseases in time for proper treatment. Machine Learning algorithms and techniques have been applied to various medical datasets to automate the analysis of large and complex data. Many researchers, in recent times, have been using several machine learning techniques to help the health care industry and the professionals in the diagnosis of heart-related diseases. Heart is the next major organ comparing to the brain which has more priority in the Human body. It pumps the blood and supplies it to all organs of the whole body. Prediction of occurrences of heart diseases in the medical field is significant work. Data analytics is useful for prediction from more information and it helps the medical center to predict various diseases. A huge amount of patient-related data is maintained on monthly basis. The stored data can be useful for the source of predicting the occurrence of future diseases. Some of the data mining and machine learning techniques are used to predict heart diseases, such as Artificial Neural Network (ANN), Random Forest,and Support Vector Machine (SVM).Prediction and diagnosingof heart disease become a challenging factor faced by doctors and hospitals both in India and abroad. To reduce the large scale of deaths from heart diseases, a quick and efficient detection technique is to be discovered. Data mining techniques and machine learning algorithms play a very important role in this area. The researchers accelerating their research works to develop software with thehelp of machine learning algorithms which can help doctors to decide both prediction and diagnosing of heart disease. The main objective of this research project is to predict the heart disease of a patient using machine learning algorithms.

Download Full-text

Teknik Resampling untuk Mengatasi Ketidakseimbangan Kelas pada Klasifikasi Penyakit Diabetes Menggunakan C4.5, Random Forest, dan SVM

Techno Com ◽

10.33633/tc.v20i3.4762 ◽

2021 ◽

Vol 20 (3) ◽

pp. 352-361

Author(s):

Wahyu Nugraha ◽

Raja Sabaruddin

Keyword(s):

Machine Learning ◽

Data Mining ◽

Random Forest ◽

Area Under Curve ◽

Support Vector ◽

Pima Indians ◽

R Language ◽

Level Data ◽

Vector Machines ◽

Under Sampling

Penderita diabetes di seluruh dunia terus mengalami peningkatan dengan angka kematian sebesar 4,6 juta pada tahun 2011 dan diperkirakan akan terus meningkat secara global menjadi 552 juta pada tahun 2030. Pencegahan Penyakit diabetes mungkin dapat dilakukan secara efektif dengan cara mendeteksinya sejak dini. Data mining dan machine learning terus dikembangkan agar menjadi alat yang handal dalam membangun model komputasi untuk mengidentifikasi penyakit diabetes pada tahap awal. Namun, masalah yang sering dihadapi dalam menganalisis penyakit diabetes ialah masalah ketidakseimbangan class. Kelas yang tidak seimbang membuat model pembelajaran akan sulit melakukan prediksi karena model pembelajaran didominasi oleh instance kelas mayoritas sehingga mengabaikan prediksi kelas minoritas. Pada penelitian ini kami mencoba menganalisa dan mencoba mengatasi masalah ketidakseimbangan kelas dengan menggunakan pendekatan level data yaitu teknik resampling data. Eksperimen ini menggunakan R language dengan library ROSE (version 0.0-4). Dataset Pima Indians dipilih pada penelitian ini karena merupakan salah satu dataset yang mengalami ketidakseimbangan kelas. Model pengklasifikasian pada penelitian ini menggunakan algoritma decision tree C4.5, RF (Random Forest), dan SVM (Support Vector Machines). Dari hasil eksperimen yang dilakukan model klasifikasi SVM dengan teknik resampling yang menggabungkan over dan under-sampling menjadi model yang memiliki performa terbaik dengan nilai AUC (Area Under Curve) sebesar 0.80

Download Full-text

Improvement of Support Vector Machine Algorithm in Big Data Background

Mathematical Problems in Engineering ◽

10.1155/2021/5594899 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Babacar Gaye ◽

Dezheng Zhang ◽

Aziguli Wulamu

Keyword(s):

Machine Learning ◽

Data Mining ◽

Big Data ◽

Time Complexity ◽

Dual Problem ◽

Learning Algorithm ◽

Rapid Development ◽

Machine Learning Algorithms ◽

Support Vector ◽

Original Space

With the rapid development of the Internet and the rapid development of big data analysis technology, data mining has played a positive role in promoting industry and academia. Classification is an important problem in data mining. This paper explores the background and theory of support vector machines (SVM) in data mining classification algorithms and analyzes and summarizes the research status of various improved methods of SVM. According to the scale and characteristics of the data, different solution spaces are selected, and the solution of the dual problem is transformed into the classification surface of the original space to improve the algorithm speed. Research Process. Incorporating fuzzy membership into multicore learning, it is found that the time complexity of the original problem is determined by the dimension, and the time complexity of the dual problem is determined by the quantity, and the dimension and quantity constitute the scale of the data, so it can be based on the scale of the data Features Choose different solution spaces. The algorithm speed can be improved by transforming the solution of the dual problem into the classification surface of the original space. Conclusion. By improving the calculation rate of traditional machine learning algorithms, it is concluded that the accuracy of the fitting prediction between the predicted data and the actual value is as high as 98%, which can make the traditional machine learning algorithm meet the requirements of the big data era. It can be widely used in the context of big data.

Download Full-text

Design and analysis of an efficient machine learning based hybrid recommendation system with enhanced density-based spatial clustering for digital e-learning applications

Complex & Intelligent Systems ◽

10.1007/s40747-021-00509-4 ◽

2021 ◽

Author(s):

S. Bhaskaran ◽

Raja Marappan

Keyword(s):

Machine Learning ◽

Data Mining ◽

Decision Making ◽

Support Vector Machine ◽

Absolute Error ◽

Support Vector ◽

E Learning ◽

Public Datasets ◽

Hybrid Recommender ◽

New Strategies

AbstractA decision-making system is one of the most important tools in data mining. The data mining field has become a forum where it is necessary to utilize users' interactions, decision-making processes and overall experience. Nowadays, e-learning is indeed a progressive method to provide online education in long-lasting terms, contrasting to the customary head-to-head process of educating with culture. Through e-learning, an ever-increasing number of learners have profited from different programs. Notwithstanding, the highly assorted variety of the students on the internet presents new difficulties to the conservative one-estimate fit-all learning systems, in which a solitary arrangement of learning assets is specified to the learners. The problems and limitations in well-known recommender systems are much variations in the expected absolute error, consuming more query processing time, and providing less accuracy in the final recommendation. The main objectives of this research are the design and analysis of a new transductive support vector machine-based hybrid personalized hybrid recommender for the machine learning public data sets. The learning experience has been achieved through the habits of the learners. This research designs some of the new strategies that are experimented with to improve the performance of a hybrid recommender. The modified one-source denoising approach is designed to preprocess the learner dataset. The modified anarchic society optimization strategy is designed to improve the performance measurements. The enhanced and generalized sequential pattern strategy is proposed to mine the sequential pattern of learners. The enhanced transductive support vector machine is developed to evaluate the extracted habits and interests. These new strategies analyze the confidential rate of learners and provide the best recommendation to the learners. The proposed generalized model is simulated on public datasets for machine learning such as movies, music, books, food, merchandise, healthcare, dating, scholarly paper, and open university learning recommendation. The experimental analysis concludes that the enhanced clustering strategy discovers clusters that are based on random size. The proposed recommendation strategies achieve better significant performance over the methods in terms of expected absolute error, accuracy, ranking score, recall, and precision measurements. The accuracy of the proposed datasets lies between 82 and 98%. The MAE metric lies between 5 and 19.2% for the simulated public datasets. The simulation results prove the proposed generalized recommender has a great strength to improve the quality and performance.

Download Full-text

An Integrated Approach to Map Tubular Degradation in ONWJ Field

Proc. Indon. Petrol. Assoc., Digital Technical Conference, 2020 ◽

10.29118/ipa20-e-278 ◽

2020 ◽

Author(s):

F. Sajjad

Keyword(s):

Oil And Gas ◽

Field Performance ◽

Integrated Approach ◽

Lessons Learned ◽

Oil And Gas Fields ◽

Preventive Actions ◽

Infill Drilling ◽

Actual Field ◽

And Performance ◽

Gas Fields

Tubular engineering is essential for production operations, especially in mature oil and gas fields. The complex interaction between hydrocarbon and non-hydrocarbon components will eventually result in tubulars deteriorating into poor condition and performance. 1500 well examples are located in field X, Indonesia, in which 70% of them have been producing for more than 30 years, indicating the existence of tubular thinning and deformation. The degradation is slowly developed until severe alterations are observed on the tubing body. The situation from the aforementioned wells is complicated since tubular deformation inhibits the flow as well as increasing the risk of wellbore collapse and complications during sidetracking, infill drilling, workover, and other production enhancement measures. These wells are subjected to costly remedial measures and often result in unsuccessful recovery efforts. The authors present the degree of tubular degradation and its effect to overall field performance and the possibility of tubular failure. Current field practices do not encourage a thorough tubular assessment during early life of the wells, which create complex problems at a later stage. Eventually, the study indicates that proper planning and preventive actions should be performed gradually before tubular degradation becomes severe. This paper presents a field experience-based model that is useful in developing new areas from the perspective of well and facilities integrity, so that the degradation-related issues can be recognized earlier. We used multiple case studies with actual field data to identify the dominant mechanism for tubular degradation. The case study presented a model that is capable to describe the extent of tubular degradation in offshore, mature wells that are prone to stress from its surroundings. Lessons learned from these failures encourages us to conduct a comprehensive study on tubular degradation. It is performed to model the incorporation of multiple degradation mechanisms on tubular performance.

Download Full-text

Machine Learning in Higher Education

Handbook of Research on Emerging Trends and Applications of Machine Learning - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-9643-1.ch002 ◽

2020 ◽

pp. 27-46

Author(s):

Garima Jaiswal ◽

Arun Sharma ◽

Reeti Sarup

Keyword(s):

Higher Education ◽

Machine Learning ◽

Data Mining ◽

At Risk ◽

Educational Data Mining ◽

Training Data ◽

Pedagogical Practices ◽

Dropping Out ◽

Support Vector ◽

Class Labels

Machine learning aims to give computers the ability to automatically learn from data. It can enable computers to make intelligent decisions by recognizing complex patterns from data. Through data mining, humongous amounts of data can be explored and analyzed to extract useful information and find interesting patterns. Classification, a supervised learning technique, can be beneficial in predicting class labels for test data by referring the already labeled classes from available training data set. In this chapter, educational data mining techniques are applied over a student dataset to analyze the multifarious factors causing alarmingly high number of dropouts. This work focuses on predicting students at risk of dropping out using five classification algorithms, namely, K-NN, naive Bayes, decision tree, random forest, and support vector machine. This can assist in improving pedagogical practices in order to enhance the performance of students predicted at risk of dropping out, thus reducing the dropout rates in higher education.

Download Full-text

Intellectual Data Mining in Socio-Geographic Research

Общественные науки и современность ◽

10.31857/s086904990017878-7 ◽

2021 ◽

pp. 150

Author(s):

Viktor Blanutsa

Keyword(s):

Neural Network ◽

Machine Learning ◽

Data Mining ◽

Social Geography ◽

Geospatial Data ◽

Semantic Search ◽

Bibliographic Databases ◽

Support Vector ◽

World Science ◽

Territorial Organization

In social geography, aimed at understanding the territorial organization of society, various methods are used, including data mining. However, there is no generalization of the experience of using such methods in world science. Therefore, the purpose of this article is to analyze the global array of scientific articles on this issue to identify priorities, algorithms and thematic areas with their capabilities and limitations. Using the author's method of semantic search based on machine learning, about two hundred articles published in the last two decades have been identified in eight bibliographic databases. Their generalization made it possible to identify chronological and chorological priorities, as well as to establish that a limited number of algorithms had been used for the geospatial data mining, which can be combined into groups of neural network, evolutionary, decision trees, swarm intelligence and support vector methods. These algorithms were used in five thematic areas (spatial-urban, regional-typological, area-based, geo-indicative and territorial-connective). The main features and limitations in each direction are given.

Download Full-text