scholarly journals Penerapan Metode Pembelajaran Menggunakan Ekstraksi Fitur dan Algoritma Klasifikasi untuk Identifikasi Pengenalan Iris

2021 ◽  
Vol 7 (2) ◽  
pp. 151-157
Author(s):  
Rahmat Hidayat ◽  
Sarifah Agustiani ◽  
Siti Khotimatul Wildah ◽  
Ali Mustopa ◽  
Rizky Ade Safitri

Iris mata terletak di antara kornea mata dan lensa mata, yang berfungsi untuk mengontrol intensitas atau jumlah cahaya yang masuk dengan cara melebarkan dan mengecilkan pupil. Setiap orang memiliki iris yang berbeda dan memiliki stabilitas sepanjang hidup, kecuali terjadi kerusakan yang tidak disengaja pada iris seperti terjadi kecelakaan. Tujuan dari penelitian ini adalah untuk melakukan pengklasifikasian dan identifikasi pengenalan citra iris dengan menggunakan metode pembelajaran atau machine learning. Metode yang diusulkan dalam penelitian ini adalah penerapan ekstraksi fitur seperti HOG, Hu-Moments, dan Haralick dengan algoritma klasifikasi yang terdiri dari LR, LDA, KNN, RF, CART, NB, dan SVM. Berdasarkan hasil pengujian yang telah dilakukan dalam mengklasifikasikan iris dapat disimpulkan bahwa penggunaan ekstraksi fitur sangat berpengaruh pada nilai akurasi yang dihasilkan. Dalam hal ini nilai akurasi terbaik diperoleh dari penggabungan ekstraksi fitur HOG dan haralick pada algoritma Random Forest dengan nilai akurasi sebesar 81.38℅.

Author(s):  
Jun Pei ◽  
Zheng Zheng ◽  
Hyunji Kim ◽  
Lin Song ◽  
Sarah Walworth ◽  
...  

An accurate scoring function is expected to correctly select the most stable structure from a set of pose candidates. One can hypothesize that a scoring function’s ability to identify the most stable structure might be improved by emphasizing the most relevant atom pairwise interactions. However, it is hard to evaluate the relevant importance for each atom pair using traditional means. With the introduction of machine learning methods, it has become possible to determine the relative importance for each atom pair present in a scoring function. In this work, we use the Random Forest (RF) method to refine a pair potential developed by our laboratory (GARF6) by identifying relevant atom pairs that optimize the performance of the potential on our given task. Our goal is to construct a machine learning (ML) model that can accurately differentiate the native ligand binding pose from candidate poses using a potential refined by RF optimization. We successfully constructed RF models on an unbalanced data set with the ‘comparison’ concept and, the resultant RF models were tested on CASF-2013.5 In a comparison of the performance of our RF models against 29 scoring functions, we found our models outperformed the other scoring functions in predicting the native pose. In addition, we used two artificial designed potential models to address the importance of the GARF potential in the RF models: (1) a scrambled probability function set, which was obtained by mixing up atom pairs and probability functions in GARF, and (2) a uniform probability function set, which share the same peak positions with GARF but have fixed peak heights. The results of accuracy comparison from RF models based on the scrambled, uniform, and original GARF potential clearly showed that the peak positions in the GARF potential are important while the well depths are not. <br>


2020 ◽  
Vol 15 (2) ◽  
pp. 121-134 ◽  
Author(s):  
Eunmi Kwon ◽  
Myeongji Cho ◽  
Hayeon Kim ◽  
Hyeon S. Son

Background: The host tropism determinants of influenza virus, which cause changes in the host range and increase the likelihood of interaction with specific hosts, are critical for understanding the infection and propagation of the virus in diverse host species. Methods: Six types of protein sequences of influenza viral strains isolated from three classes of hosts (avian, human, and swine) were obtained. Random forest, naïve Bayes classification, and knearest neighbor algorithms were used for host classification. The Java language was used for sequence analysis programming and identifying host-specific position markers. Results: A machine learning technique was explored to derive the physicochemical properties of amino acids used in host classification and prediction. HA protein was found to play the most important role in determining host tropism of the influenza virus, and the random forest method yielded the highest accuracy in host prediction. Conserved amino acids that exhibited host-specific differences were also selected and verified, and they were found to be useful position markers for host classification. Finally, ANOVA analysis and post-hoc testing revealed that the physicochemical properties of amino acids, comprising protein sequences combined with position markers, differed significantly among hosts. Conclusion: The host tropism determinants and position markers described in this study can be used in related research to classify, identify, and predict the hosts of influenza viruses that are currently susceptible or likely to be infected in the future.


2018 ◽  
Author(s):  
Liyan Pan ◽  
Guangjian Liu ◽  
Xiaojian Mao ◽  
Huixian Li ◽  
Jiexin Zhang ◽  
...  

BACKGROUND Central precocious puberty (CPP) in girls seriously affects their physical and mental development in childhood. The method of diagnosis—gonadotropin-releasing hormone (GnRH)–stimulation test or GnRH analogue (GnRHa)–stimulation test—is expensive and makes patients uncomfortable due to the need for repeated blood sampling. OBJECTIVE We aimed to combine multiple CPP–related features and construct machine learning models to predict response to the GnRHa-stimulation test. METHODS In this retrospective study, we analyzed clinical and laboratory data of 1757 girls who underwent a GnRHa test in order to develop XGBoost and random forest classifiers for prediction of response to the GnRHa test. The local interpretable model-agnostic explanations (LIME) algorithm was used with the black-box classifiers to increase their interpretability. We measured sensitivity, specificity, and area under receiver operating characteristic (AUC) of the models. RESULTS Both the XGBoost and random forest models achieved good performance in distinguishing between positive and negative responses, with the AUC ranging from 0.88 to 0.90, sensitivity ranging from 77.91% to 77.94%, and specificity ranging from 84.32% to 87.66%. Basal serum luteinizing hormone, follicle-stimulating hormone, and insulin-like growth factor-I levels were found to be the three most important factors. In the interpretable models of LIME, the abovementioned variables made high contributions to the prediction probability. CONCLUSIONS The prediction models we developed can help diagnose CPP and may be used as a prescreening tool before the GnRHa-stimulation test.


Author(s):  
Farrikh Alzami ◽  
Erika Devi Udayanti ◽  
Dwi Puji Prabowo ◽  
Rama Aria Megantara

Sentiment analysis in terms of polarity classification is very important in everyday life, with the existence of polarity, many people can find out whether the respected document has positive or negative sentiment so that it can help in choosing and making decisions. Sentiment analysis usually done manually. Therefore, an automatic sentiment analysis classification process is needed. However, it is rare to find studies that discuss extraction features and which learning models are suitable for unstructured sentiment analysis types with the Amazon food review case. This research explores some extraction features such as Word Bags, TF-IDF, Word2Vector, as well as a combination of TF-IDF and Word2Vector with several machine learning models such as Random Forest, SVM, KNN and Naïve Bayes to find out a combination of feature extraction and learning models that can help add variety to the analysis of polarity sentiments. By assisting with document preparation such as html tags and punctuation and special characters, using snowball stemming, TF-IDF results obtained with SVM are suitable for obtaining a polarity classification in unstructured sentiment analysis for the case of Amazon food review with a performance result of 87,3 percent.


Energies ◽  
2021 ◽  
Vol 14 (7) ◽  
pp. 1809
Author(s):  
Mohammed El Amine Senoussaoui ◽  
Mostefa Brahami ◽  
Issouf Fofana

Machine learning is widely used as a panacea in many engineering applications including the condition assessment of power transformers. Most statistics attribute the main cause of transformer failure to insulation degradation. Thus, a new, simple, and effective machine-learning approach was proposed to monitor the condition of transformer oils based on some aging indicators. The proposed approach was used to compare the performance of two machine-learning classifiers: J48 decision tree and random forest. The service-aged transformer oils were classified into four groups: the oils that can be maintained in service, the oils that should be reconditioned or filtered, the oils that should be reclaimed, and the oils that must be discarded. From the two algorithms, random forest exhibited a better performance and high accuracy with only a small amount of data. Good performance was achieved through not only the application of the proposed algorithm but also the approach of data preprocessing. Before feeding the classification model, the available data were transformed using the simple k-means method. Subsequently, the obtained data were filtered through correlation-based feature selection (CFsSubset). The resulting features were again retransformed by conducting the principal component analysis and were passed through the CFsSubset filter. The transformation and filtration of the data improved the classification performance of the adopted algorithms, especially random forest. Another advantage of the proposed method is the decrease in the number of the datasets required for the condition assessment of transformer oils, which is valuable for transformer condition monitoring.


2020 ◽  
Author(s):  
Anurag Sohane ◽  
Ravinder Agarwal

Abstract Various simulation type tools and conventional algorithms are being used to determine knee muscle forces of human during dynamic movement. These all may be good for clinical uses, but have some drawbacks, such as higher computational times, muscle redundancy and less cost-effective solution. Recently, there has been an interest to develop supervised learning-based prediction model for the computationally demanding process. The present research work is used to develop a cost-effective and efficient machine learning (ML) based models to predict knee muscle force for clinical interventions for the given input parameter like height, mass and angle. A dataset of 500 human musculoskeletal, have been trained and tested using four different ML models to predict knee muscle force. This dataset has obtained from anybody modeling software using AnyPyTools, where human musculoskeletal has been utilized to perform squatting movement during inverse dynamic analysis. The result based on the datasets predicts that the random forest ML model outperforms than the other selected models: neural network, generalized linear model, decision tree in terms of mean square error (MSE), coefficient of determination (R2), and Correlation (r). The MSE of predicted vs actual muscle forces obtained from the random forest model for Biceps Femoris, Rectus Femoris, Vastus Medialis, Vastus Lateralis are 19.92, 9.06, 5.97, 5.46, Correlation are 0.94, 0.92, 0.92, 0.94 and R2 are 0.88, 0.84, 0.84 and 0.89 for the test dataset, respectively.


2021 ◽  
Vol 13 (5) ◽  
pp. 1021
Author(s):  
Hu Ding ◽  
Jiaming Na ◽  
Shangjing Jiang ◽  
Jie Zhu ◽  
Kai Liu ◽  
...  

Artificial terraces are of great importance for agricultural production and soil and water conservation. Automatic high-accuracy mapping of artificial terraces is the basis of monitoring and related studies. Previous research achieved artificial terrace mapping based on high-resolution digital elevation models (DEMs) or imagery. As a result of the importance of the contextual information for terrace mapping, object-based image analysis (OBIA) combined with machine learning (ML) technologies are widely used. However, the selection of an appropriate classifier is of great importance for the terrace mapping task. In this study, the performance of an integrated framework using OBIA and ML for terrace mapping was tested. A catchment, Zhifanggou, in the Loess Plateau, China, was used as the study area. First, optimized image segmentation was conducted. Then, features from the DEMs and imagery were extracted, and the correlations between the features were analyzed and ranked for classification. Finally, three different commonly-used ML classifiers, namely, extreme gradient boosting (XGBoost), random forest (RF), and k-nearest neighbor (KNN), were used for terrace mapping. The comparison with the ground truth, as delineated by field survey, indicated that random forest performed best, with a 95.60% overall accuracy (followed by 94.16% and 92.33% for XGBoost and KNN, respectively). The influence of class imbalance and feature selection is discussed. This work provides a credible framework for mapping artificial terraces.


Energies ◽  
2021 ◽  
Vol 14 (15) ◽  
pp. 4595
Author(s):  
Parisa Asadi ◽  
Lauren E. Beckingham

X-ray CT imaging provides a 3D view of a sample and is a powerful tool for investigating the internal features of porous rock. Reliable phase segmentation in these images is highly necessary but, like any other digital rock imaging technique, is time-consuming, labor-intensive, and subjective. Combining 3D X-ray CT imaging with machine learning methods that can simultaneously consider several extracted features in addition to color attenuation, is a promising and powerful method for reliable phase segmentation. Machine learning-based phase segmentation of X-ray CT images enables faster data collection and interpretation than traditional methods. This study investigates the performance of several filtering techniques with three machine learning methods and a deep learning method to assess the potential for reliable feature extraction and pixel-level phase segmentation of X-ray CT images. Features were first extracted from images using well-known filters and from the second convolutional layer of the pre-trained VGG16 architecture. Then, K-means clustering, Random Forest, and Feed Forward Artificial Neural Network methods, as well as the modified U-Net model, were applied to the extracted input features. The models’ performances were then compared and contrasted to determine the influence of the machine learning method and input features on reliable phase segmentation. The results showed considering more dimensionality has promising results and all classification algorithms result in high accuracy ranging from 0.87 to 0.94. Feature-based Random Forest demonstrated the best performance among the machine learning models, with an accuracy of 0.88 for Mancos and 0.94 for Marcellus. The U-Net model with the linear combination of focal and dice loss also performed well with an accuracy of 0.91 and 0.93 for Mancos and Marcellus, respectively. In general, considering more features provided promising and reliable segmentation results that are valuable for analyzing the composition of dense samples, such as shales, which are significant unconventional reservoirs in oil recovery.


2020 ◽  
Vol 13 (1) ◽  
pp. 10
Author(s):  
Andrea Sulova ◽  
Jamal Jokar Arsanjani

Recent studies have suggested that due to climate change, the number of wildfires across the globe have been increasing and continue to grow even more. The recent massive wildfires, which hit Australia during the 2019–2020 summer season, raised questions to what extent the risk of wildfires can be linked to various climate, environmental, topographical, and social factors and how to predict fire occurrences to take preventive measures. Hence, the main objective of this study was to develop an automatized and cloud-based workflow for generating a training dataset of fire events at a continental level using freely available remote sensing data with a reasonable computational expense for injecting into machine learning models. As a result, a data-driven model was set up in Google Earth Engine platform, which is publicly accessible and open for further adjustments. The training dataset was applied to different machine learning algorithms, i.e., Random Forest, Naïve Bayes, and Classification and Regression Tree. The findings show that Random Forest outperformed other algorithms and hence it was used further to explore the driving factors using variable importance analysis. The study indicates the probability of fire occurrences across Australia as well as identifies the potential driving factors of Australian wildfires for the 2019–2020 summer season. The methodical approach and achieved results and drawn conclusions can be of great importance to policymakers, environmentalists, and climate change researchers, among others.


Sign in / Sign up

Export Citation Format

Share Document