attribute selection
Recently Published Documents


TOTAL DOCUMENTS

497
(FIVE YEARS 150)

H-INDEX

27
(FIVE YEARS 4)

2022 ◽  
pp. 1-90
Author(s):  
David Lubo-Robles ◽  
Deepak Devegowda ◽  
Vikram Jayaram ◽  
Heather Bedle ◽  
Kurt J. Marfurt ◽  
...  

During the past two decades, geoscientists have used machine learning to produce a more quantitative reservoir characterization and to discover hidden patterns in their data. However, as the complexity of these models increase, the sensitivity of their results to the choice of the input data becomes more challenging. Measuring how the model uses the input data to perform either a classification or regression task provides an understanding of the data-to-geology relationships which indicates how confident we are in the prediction. To provide such insight, the ML community has developed Local Interpretable Model-agnostic Explanations (LIME), and SHapley Additive exPlanations (SHAP) tools. In this study, we train a random forest architecture using a suite of seismic attributes as input to differentiate between mass transport deposits (MTDs), salt, and conformal siliciclastic sediments in a Gulf of Mexico dataset. We apply SHAP to understand how the model uses the input seismic attributes to identify target seismic facies and examine in what manner variations in the input such as adding band-limited random noise or applying a Kuwahara filter impact the models’ predictions. During our global analysis, we find that the attribute importance is dynamic, and changes based on the quality of the seismic attributes and the seismic facies analyzed. For our data volume and target facies, attributes measuring changes in dip and energy show the largest importance for all cases in our sensitivity analysis. We note that to discriminate between the seismic facies, the ML architecture learns a “set of rules” in multi-attribute space and that overlap between MTDs, salt, and conformal sediments might exist based on the seismic attribute analyzed. Finally, using SHAP at a voxel-scale, we understand why certain areas of interest were misclassified by the algorithm and perform an in-context interpretation to analyze how changes in the geology impact the model’s predictions.


Electronics ◽  
2021 ◽  
Vol 10 (23) ◽  
pp. 3026
Author(s):  
Tehseen Akhtar ◽  
Syed Omer Gilani ◽  
Zohaib Mushtaq ◽  
Saad Arif ◽  
Mohsin Jamil ◽  
...  

Thyroid disease is characterized by abnormal development of glandular tissue on the periphery of the thyroid gland. Thyroid disease occurs when this gland produces an abnormally high or low level of hormones, with hyperthyroidism (active thyroid gland) and hypothyroidism (inactive thyroid gland) being the two most common types. The purpose of this work was to create an efficient homogeneous ensemble of ensembles in conjunction with numerous feature-selection methodologies for the improved detection of thyroid disorder. The dataset employed is based on real-time thyroid information obtained from the District Head Quarter (DHQ) teaching hospital, Dera Ghazi (DG) Khan, Pakistan. Following the necessary preprocessing steps, three types of attribute-selection strategies; Select From Model (SFM), Select K-Best (SKB), and Recursive Feature Elimination (RFE) were used. Decision Tree (DT), Gradient Boosting (GB), Logistic Regression (LR), and Random Forest (RF) classifiers were used as promising feature estimators. The homogeneous ensembling activated the bagging- and boosting-based classifiers, which were then classified by the Voting ensemble using both soft and hard voting. Accuracy, sensitivity, mean square error, hamming loss, and other performance assessment metrics have been adopted. The experimental results indicate the optimum applicability of the proposed strategy for improved thyroid ailment identification. All of the employed approaches achieved 100% accuracy with a small feature set. In terms of accuracy and computational cost, the presented findings outperformed similar benchmark models in its domain.


2021 ◽  
Vol 11 (23) ◽  
pp. 11400
Author(s):  
Andra-Maria Mircea-Vicoveanu ◽  
Elena Rezuș ◽  
Florin Leon ◽  
Silvia Curteanu

This study is based on the consideration that the patients with rheumatoid arthritis and ankylosing spondylitis undergoing biological therapy have a higher risk of developing tuberculosis. The QuantiFERON-TB Gold test result was the output of the models and a series of features related to the patients and their treatments were chosen as inputs. A distribution of patients by gender and biological therapy, followed at the time of inclusion in the study, and at the end of the study, is made for both rheumatoid arthritis and ankylosing spondylitis. A series of classification algorithms (random forest, nearest neighbor, k-nearest neighbors, C4.5 decision trees, non-nested generalized exemplars, and support vector machines) and attribute selection algorithms (ReliefF, InfoGain, and correlation-based feature selection) were successfully applied. Useful information was obtained regarding the influence of biological and classical treatments on tuberculosis risk, and most of them agreed with medical studies.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Malik Bader Alazzam ◽  
Fawaz Alassery ◽  
Ahmed Almulihi

When compared to other types of skin cancer, melanoma is the deadliest. However, those who are diagnosed early on have a better prognosis for the purpose of providing a supplementary opinion to experts; various methods of spontaneous melanoma recognition and diagnosis have been investigated by different researchers. Because of the imbalance between classes, building models from existing information has proven difficult. Machine learning algorithms paired with imbalanced basis training approaches are being evaluated for their performance on the melanoma diagnosis challenge in this study. There were 200 dermoscopic photos in which patterns of skin lesions could be extracted using the VGG16, VGG19, Inception, and ResNet convolutional neural network architectures with the ABCD rule. After employing attribute selection with GS and training data balance using Synthetic Minority Oversampling Technique and Edited Nearest Neighbor rule, the random forest classifier had a sensitivity of nearly 93% and a kappa index ( k − index ) of 78%.


Author(s):  
Pengfei Zhang ◽  
Tianrui Li ◽  
Zhong Yuan ◽  
Chuan Luo ◽  
Guoqiang Wang ◽  
...  

Author(s):  
Wilson Chango ◽  
Rebeca Cerezo ◽  
Miguel Sanchez-Santillan ◽  
Roger Azevedo ◽  
Cristóbal Romero

AbstractThe aim of this study was to predict university students’ learning performance using different sources of performance and multimodal data from an Intelligent Tutoring System. We collected and preprocessed data from 40 students from different multimodal sources: learning strategies from system logs, emotions from videos of facial expressions, allocation and fixations of attention from eye tracking, and performance on posttests of domain knowledge. Our objective was to test whether the prediction could be improved by using attribute selection and classification ensembles. We carried out three experiments by applying six classification algorithms to numerical and discretized preprocessed multimodal data. The results show that the best predictions were produced using ensembles and selecting the best attributes approach with numerical data.


Sign in / Sign up

Export Citation Format

Share Document