The Knowledge Discovery of β-Thalassemia Using Principal Components Analysis: PCA and Machine Learning Techniques

Author(s):  
Patcharaporn Paokanta
2021 ◽  
Author(s):  
Ahmed AlSaihati ◽  
Salaheldin Elkatatny ◽  
Hani Gamal ◽  
Abdulazeez Abdulraheem

Abstract Mathematical equations, based on conservation of mass and momentum, are used to determine the ECD at different depths in the wellbore. However, such equations do not consider important factors that have a influence on the ECD such as: (i) bottom hole temperature, (ii) pipe rotation and eccentricity, and (iii) wellbore roughness. Thus, discrepancy between the calculated ECDs and actual ones has been reported in the literature. This paper aims to explore how artificial intelligence (AI) and machine learning (ML) could provide real-time accurate prediction of the ECD, to have more insight and management of wellbore downhole conditions. For this purpose, a supervised ML algorithm, support vector machine (SVM), based on principal components analysis (PCA), was developed. Actual field data of Well-1 including drilling surface parameters and ECDs, measured by downhole sensors, were collected to develop a classical SVM model. The dataset was split with an 80/20 training-testing data ratio. Sensitivity analysis with different SVM parameters such as regularization parameter C, gamma, kernel type (linear, radial basis function "RBF") was performed. The performance of the model was assessed in terms of root mean square error (RMSE) and coefficient of determination (R2). Afterward, PCA was applied to the dataset of Well-1 to develop an SVM model using the transformed dataset in PCA space. The performance of the model while using different numbers of principal components was evaluated. The results showed that the classical SVM with the linear kernel predicted the ECD with RMSE of 0.53 and R2 of 0.97 in the training set, while RMSE and R2 were 0.56 and 0.97 respectively in the testing set. The PCA-based SVM model, with the linear kernel and four principal components (93.53% variation of the dataset), predicted the ECD with RMSE 0.79 and R2 of 0.95 in the testing set.


2017 ◽  
Vol 7 (1.1) ◽  
pp. 143 ◽  
Author(s):  
J. Deepika ◽  
T. Senthil ◽  
C. Rajan ◽  
A. Surendar

With the greater development of technology and automation human history is predominantly updated. The technology movement shifted from large mainframes to PCs to cloud when computing the available data for a larger period. This has happened only due to the advent of many tools and practices, that elevated the next generation in computing. A large number of techniques has been developed so far to automate such computing. Research dragged towards training the computers to behave similar to human intelligence. Here the diversity of machine learning came into play for knowledge discovery. Machine Learning (ML) is applied in many areas such as medical, marketing, telecommunications, and stock, health care and so on. This paper presents reviews about machine learning algorithm foundations, its types and flavors together with R code and Python scripts possibly for each machine learning techniques.  


2019 ◽  
Vol 5 ◽  
pp. 237802311881872 ◽  
Author(s):  
Ryan Compton

Sociological research typically involves exploring theoretical relationships, but the emergence of “big data” enables alternative approaches. This work shows the promise of data-driven machine-learning techniques involving feature engineering and predictive model optimization to address a sociological data challenge. The author’s group develops improved generalizable models to identify at-risk families. Principal-components analysis and decision tree modeling are used to predict six main dependent variables in the Fragile Families Challenge, successfully modeling one binary variable but no continuous dependent variables in the diagnostic data set. This indicates that some binary dependent variables are more predictable using a reduced set of uncorrelated independent variables, and continuous dependent variables demand more complexity.


Author(s):  
Ibrahim Obeidat ◽  
Nabhan Hamadneh ◽  
Mouhammd Alkasassbeh ◽  
Mohammad Almseidin ◽  
Mazen Ibrahim AlZubi

Abstract— Network security engineers work to keep services available all the time by handling intruder attacks. Intrusion Detection System (IDS) is one of the obtainable mechanism that used to sense and classify any abnormal actions. Therefore, the IDS must be always up to date with the latest intruder attacks signatures to preserve confidentiality, integrity and availability of the services. The speed of the IDS is very important issue as well learning the new attacks. This research work illustrates how the Knowledge Discovery and Data Mining (or Knowledge Discovery in Databases) KDD dataset is very handy for testing and evaluating different Machine Learning Techniques. It mainly focuses on the KDD preprocess part in order to prepare a decent and fair experimental data set. The techniques J48, Random Forest, Random Tree, MLP, Naïve Bayes and Bayes Network classifiers have been chosen for this study. It has been proven that the Random forest classifier has achieved the highest accuracy rate for detecting and classifying all KDD dataset attacks, which are of type (DOS, R2L, U2R, and PROBE).


IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 157741-157755
Author(s):  
Hammam M. Abdelaal ◽  
Abdelmoty M. Ahmed ◽  
Wade Ghribi ◽  
Hassan A. Youness Alansary

2021 ◽  
pp. 1-17
Author(s):  
Zeinab Shahbazi ◽  
Yung-Cheol Byun

Understanding the real-world short texts become an essential task in the recent research area. The document deduction analysis and latent coherent topic named as the important aspect of this process. Latent Dirichlet Allocation (LDA) and Probabilistic Latent Semantic Analysis (PLSA) are suggested to model huge information and documents. This type of contexts’ main problem is the information limitation, words relationship, sparsity, and knowledge extraction. The knowledge discovery and machine learning techniques integrated with topic modeling were proposed to overcome this issue. The knowledge discovery was applied based on the hidden information extraction to increase the suitable dataset for further analysis. The integration of machine learning techniques, Artificial Neural Network (ANN) and Long Short-Term (LSTM) are applied to anticipate topic movements. LSTM layers are fed with latent topic distribution learned from the pre-trained Latent Dirichlet Allocation (LDA) model. We demonstrate general information from different techniques applied in short text topic modeling. We proposed three categories based on Dirichlet multinomial mixture, global word co-occurrences, and self-aggregation using representative design and analysis of all categories’ performance in different tasks. Finally, the proposed system evaluates with state-of-art methods on real-world datasets, comprises them with long document topic modeling algorithms, and creates a classification framework that considers further knowledge and represents it in the machine learning pipeline.


Sign in / Sign up

Export Citation Format

Share Document