Traditional Chinese Medicine (TCM) Diagnosis Model Building Based on Multi-label Classification

In the study, we propose a TCM diagnosis model that can be used for multi-label classification and give clear diagnosis, as well as the basis for diagnosis and differentiation when the symptoms correspond to multiple diseases or syndromes. The implementation of the model is divided into three steps. Firstly, choose the machine learning algorithm to train the TCM diagnosis model. The features of the training data are symptoms and the labels are diseases or syndromes. Secondly, give the number α (α>1, α∈Z+), the model will output the diagnoses with the top α highest probability according to the input symptoms as candidate diagnoses. Finally, the rules of differential diagnosis are designed to determine which candidate diagnoses should be reserved, thereby complete the multi-label classification. In our test dataset, by 10-fold cross-validation, the average accuracy of the single label classification was 0.882; the average precision was 0.974; the average recall was 1.000; the average f1 score was 0.967; the average accuracy of the multi-label classification was 0.706; the average micro precision was 0.934; the average micro recall was 0.941 and the average hamming loss was 0.060. Through the test we can know that this model had a good potential for auxiliary decision making in clinical diagnosis and treatment.

Download Full-text

A Novel Method for Gender and Age Detection Based on EEG Brain Signals

The International Arab Journal of Information Technology ◽

10.34028/iajit/18/5/10 ◽

2021 ◽

Vol 18 (5) ◽

Author(s):

Haitham Issa ◽

Sali Issa ◽

Wahab Shah

Keyword(s):

Cross Validation ◽

Image Feature ◽

Emotional States ◽

Time Frequency ◽

Brain Signals ◽

Average Accuracy ◽

Gender And Age ◽

Novel Method ◽

Fold Cross Validation ◽

Validation Strategy

This paper presents a new gender and age classification system based on Electroencephalography (EEG) brain signals. First, Continuous Wavelet Transform (CWT) technique is used to get the time-frequency information of only one EEG electrode for eight distinct emotional states instead of the ordinary neutral or relax states. Then, sequential steps are implemented to extract the improved grayscale image feature. For system evaluation, a three-fold-cross validation strategy is applied to construct four different classifiers. The experimental test shows that the proposed extracted feature with Convolutional Neural Network (CNN) classifier improves the performance of both gender and age classification, and achieves an average accuracy of 96.3% and 89% for gender and age classification, respectively. Moreover, the ability to predict human gender and age during the mood of different emotional states is practically approved.

Download Full-text

The impact of indexing approaches on Arabic text classification

Journal of Information Science ◽

10.1177/0165551515625030 ◽

2016 ◽

Vol 43 (2) ◽

pp. 159-173 ◽

Cited By ~ 10

Author(s):

Amer Al-Badarneh ◽

Emad Al-Shawakfa ◽

Basel Bani-Ismail ◽

Khaleel Al-Rababah ◽

Safwan Shatnawi

Keyword(s):

Cross Validation ◽

Arabic Text ◽

Word Form ◽

Bayes Classifier ◽

Stem Form ◽

Average Accuracy ◽

Arabic Text Classification ◽

The Impact ◽

And Storage ◽

Fold Cross Validation

This paper investigates the impact of using different indexing approaches (full-word, stem, and root) when classifying Arabic text. In this study, the naïve Bayes classifier is used to construct the multinomial classification models and is evaluated using stratified k-fold cross-validation ( k ranges from 2 to 10). It is also uses a corpus that consists of 1000 normalized Arabic documents. The results of one experiment in this study show that significant accuracy improvements have occurred when the full-word form is used in most k-folds. Further experiments show that the classifier has achieved the highest accuracy in the eight-fold by using 7/8–1/8 train–test ratio, despite the indexing approach being used. The overall results of this study show that the classifier has achieved the maximum micro-average accuracy 99.36%, either by using the full-word form or the stem form. This proves that the stem is a better choice to use when classifying Arabic text, because it makes the corpus dataset smaller and this will enhance both the processing time and storage utilization, and achieve the highest level of accuracy.

Download Full-text

Soil Mapping Based on the Integration of the Similarity-Based Approach and Random Forests

Land ◽

10.3390/land9060174 ◽

2020 ◽

Vol 9 (6) ◽

pp. 174

Author(s):

Desheng Wang ◽

A-Xing Zhu

Keyword(s):

Random Forests ◽

Cross Validation ◽

Soil Mapping ◽

High Quality ◽

Integrated Method ◽

Average Accuracy ◽

Soil Information ◽

Fold Cross Validation

Digital soil mapping (DSM) is currently the primary framework for predicting the spatial variation of soil information (soil type or soil properties). Random forests and similarity-based methods have been used widely in DSM. However, the accuracy of the similarity-based approach is limited, and the performance of random forests is affected by the quality of the feature set. The objective of this study was to present a method for soil mapping by integrating the similarity-based approach and the random forests method. The Heshan area (Heilongjiang province, China) was selected as the case study for mapping soil subgroups. The results of the regular validation samples showed that the overall accuracy of the integrated method (71.79%) is higher than that of a similarity-based approach (58.97%) and random forests (66.67%). The results of the 5-fold cross-validation showed that the overall accuracy of the integrated method, similarity-based approach, and random forests range from 55% to 72.73%, 43.48% to 69.57%, and 54.17% to 70.83%, with an average accuracy of 66.61%, 57.39%, and 59.62%, respectively. These results suggest that the proposed method can produce a high-quality covariate set and achieve a better performance than either the random forests or similarity-based approach alone.

Download Full-text

Using artificial intelligence to assist radiologists in distinguishing COVID-19 from other pulmonary infections

Journal of X-Ray Science and Technology ◽

10.3233/xst-200735 ◽

2020 ◽

pp. 1-17

Author(s):

Yanhong Yang ◽

Fleming Y.M. Lure ◽

Hengyuan Miao ◽

Ziqi Zhang ◽

Stefan Jaeger ◽

...

Keyword(s):

Deep Learning ◽

Learning Algorithm ◽

Model Development ◽

Training Data ◽

Test Accuracy ◽

Pulmonary Infections ◽

Deep Learning Algorithm ◽

Average Accuracy ◽

Independent Test ◽

Comparable Performance

Background: Accurate and rapid diagnosis of coronavirus disease (COVID-19) is crucial for timely quarantine and treatment. Purpose: In this study, a deep learning algorithm-based AI model using ResUNet network was developed to evaluate the performance of radiologists with and without AI assistance in distinguishing COVID-19 infected pneumonia patients from other pulmonary infections on CT scans. Methods: For model development and validation, a total number of 694 cases with 111,066 CT slides were retrospectively collected as training data and independent test data in the study. Among them, 118 are confirmed COVID-19 infected pneumonia cases and 576 are other pulmonary infections cases (e.g. tuberculosis cases, common pneumonia cases and non-COVID-19 viral pneumonia cases). The cases were divided into training and testing datasets. The independent test was performed by evaluating and comparing the performance of three radiologists with different years of practice experience in distinguishing COVID-19 infected pneumonia cases with and without the AI assistance. Results: Our final model achieved an overall test accuracy of 0.914 with an area of the receiver operating characteristic (ROC) curve (AUC) of 0.903 in which the sensitivity and specificity are 0.918 and 0.909, respectively. The deep learning-based model then achieved a comparable performance by improving the radiologists’ performance in distinguish COVOD-19 from other pulmonary infections, yielding better average accuracy and sensitivity, from 0.941 to 0.951 and from 0.895 to 0.942, respectively, when compared to radiologists without using AI assistance. Conclusion: A deep learning algorithm-based AI model developed in this study successfully improved radiologists’ performance in distinguishing COVID-19 from other pulmonary infections using chest CT images.

Download Full-text

Implementasi Algoritma C5.0 Untuk Menganalisa Gejala Prioritas Pada Anak Yang Mengalami Bullying

Repositor ◽

10.22219/repositor.v2i8.410 ◽

2020 ◽

Vol 2 (8) ◽

Author(s):

Nabillah Annisa Rahmayanti ◽

Yufis Azhar ◽

Gita Indah Marthasari

Keyword(s):

Feature Selection ◽

Cross Validation ◽

Evaluation Method ◽

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Training Data ◽

Victims Of Bullying ◽

Fold Cross Validation ◽

Selection Of

AbstrakBullying sering terjadi pada anak-anak khususnya remaja dan meresahkan para orang tua. Maraknya kasus bullying di negeri ini bahkan sampai menyebabkan korban jiwa. Hal ini dapat dicegah dengan cara mengetahui gejala-gejala seorang anak yang mengalami bullying. Kondisi seorang anak yang tidak dapat mengungkapkan keluh kesahnya, tentu membuat orang tua dan juga guru di sekolah sukar dalam mengerti apa yang sedang menimpanya. Hal tersebut bisa saja dikarenakan anak sedang mengalami tindakan bullying oleh teman-temannya. Oleh karena itu peneliti memiliki tujuan untuk menghasilkan fitur yang telah terseleksi dengan menggunakan algoritma C5.0. Sehingga dengan menggunakan fitur yang telah terseleksi dapat meringankan pekerjaan dalam mengisi kuisioner dan juga mempersingkat waktu dalam menentukan seorang anak apakah terkena bullying atau tidak berdasarkan gejala yang ada di setiap pertanyaan pada kuisioner. Untuk menunjang data dalam penelitian ini, peneliti menggunakan kuisioner untuk mendapatkan jawaban dari pertanyaan yang berisi tentang gejala anak yang menjadi korban bullying. Jawaban dari responden akan diolah menjadi kumpulan data yang nantinya akan dibagi menjadi data latih dan data uji untuk selanjutnya diteliti dengan menggunakan Algoritma C5.0. Metode evaluasi yang digunakan pada penelitian ini yaitu 10 fold cross validation dan untuk menilai akurasi menggunakan confusion matrix. Penelitian ini juga melaukan perbandingan dengan beberapa algoritma klasifikasi lainnya yaitu Naive Bayes dan KNN yang bertujuan untuk melhat seberapa akurat algoritma C5.0 dalam melakukan seleksi fitur. Hasil pengujian menunjukkan bahwa algoritma C5.0 mampu melakukan seleksi fitur dan juga memiliki tingkat akurasi yang lebih baik jika dibandingkan dengan algoritma Naive Bayes dan KNN dengan hasil akurasi sebelum menggunakan seleksi fitur sebesar 92,77% dan setelah menggunakan seleksi fitur sebesar 93,33%. Abstract Bullying often occurs in children, especially teenagers and unsettles parents. The rise of cases of bullying in this country even caused casualties. This can be prevented by knowing the symptoms of a child who has bullying. The condition of a child who cannot express his complaints, certainly makes parents and teachers at school difficult to understand what is happening to them. This could be because the child is experiencing bullying by his friends. Therefore, researchers have a goal to produce selected features using the C5.0 algorithm. So using the selected features can ease the work in filling out questionnaires and also shorten the time in determining whether a child is exposed to bullying or not based on the symptoms in each question in the questionnaire. To support the data in this study, the researcher used a questionnaire to get answers to questions that contained the symptoms of children who were victims of bullying. The answer from the respondent will be processed into a data collection which will later be divided into training data and test data for further research using the C5.0 Algorithm. The evaluation method used in this study is 10 fold cross validation and to assess accuracy using confusion matrix. This study also carried out a comparison with several other classification algorithms, namely Naive Bayes and KNN which aimed to see how accurate the C5.0 algorithm was in feature selection. The test results show that the C5.0 algorithm is capable of feature selection and also has a better accuracy compared to the Naive Bayes and KNN algorithms with accuracy results before using feature selection of 92.77% and after using feature selection of 93.33%

Download Full-text

Urine biomarker: novel approach to hepatocellular carcinoma screening

10.1101/2020.11.21.20236125 ◽

2020 ◽

Author(s):

Amy K Kim ◽

James P. Hamilton ◽

Selena Y. Lin ◽

Ting-Tsung Chang ◽

Hie-Won Hann ◽

...

Keyword(s):

Hepatocellular Carcinoma ◽

Cross Validation ◽

Learning Algorithm ◽

Early Stage ◽

High Risk Patient ◽

Circulating Tumor Dna ◽

Urine Samples ◽

Detection Rates ◽

Non Invasive ◽

Fold Cross Validation

ABSTRACTBackground & AimsContinued limitations in hepatocellular carcinoma (HCC) screening have led to late diagnosis with poor survival, despite well-defined high-risk patient populations. Our aim is to develop a non-invasive urine circulating tumor DNA (ctDNA) biomarker panel for HCC screening to aid in early detection.MethodsCandidate ctDNA biomarkers was prescreened in urine samples obtained from HCC, cirrhosis, and hepatitis patients. Then, 609 patient urine samples with HCC, cirrhosis, or chronic hepatitis B were collected from five academic medical centers and evaluated by serum alpha feto-protein (AFP) and urine ctDNA panel using logistic regression, a Two-Step machine learning algorithm, and iterated 10-fold cross-validation.ResultsMutated TP53, and methylated RASSF1a and GSTP1, were selected for the urine ctDNA panel. The sensitivity of AFP-alone (9.8 ng/mL cut-off) to detect HCC was 71% by Two-Step. The combination of ctDNA and AFP increased the sensitivity to 81% at a specificity of 90%. The AUROC for the combination of ctDNA and AFP vs. AFP-alone were 0.925 (95% CI, 0.924-0.925) and 0.877 (95% CI, 0.876-0.877), respectively. Notably, among the patients with AFP <20 ng/mL, the combination panel correctly identified 64% of HCC cases. The panel performed superiorly to AFP-alone in early-stage HCC (BCLC A) with 80% sensitivity and 90% specificity. In an iterated 10-fold cross-validation analysis, the AUROC for the combination panel was 0.898 (95% CI, 0.895-0.901).ConclusionsThe combination of urine ctDNA and serum AFP can increase HCC detection rates including in those patients with low-AFP. Given the ease of collection, a urine ctDNA panel could be a potential non-invasive HCC screening test.

Download Full-text

CGENet: A Deep Graph Model for COVID-19 Detection Based on Chest CT

Biology ◽

10.3390/biology11010033 ◽

2021 ◽

Vol 11 (1) ◽

pp. 33

Author(s):

Si-Yuan Lu ◽

Zheng Zhang ◽

Yu-Dong Zhang ◽

Shui-Hua Wang

Keyword(s):

Extreme Learning Machine ◽

Cross Validation ◽

Graph Model ◽

Chest Ct ◽

Selection Algorithm ◽

Efficient Tool ◽

K Nearest Neighbors ◽

Average Accuracy ◽

Learning Machine ◽

Fold Cross Validation

Accurate and timely diagnosis of COVID-19 is indispensable to control its spread. This study proposes a novel explainable COVID-19 diagnosis system called CGENet based on graph embedding and an extreme learning machine for chest CT images. We put forward an optimal backbone selection algorithm to select the best backbone for the CGENet based on transfer learning. Then, we introduced graph theory into the ResNet-18 based on the k-nearest neighbors. Finally, an extreme learning machine was trained as the classifier of the CGENet. The proposed CGENet was evaluated on a large publicly-available COVID-19 dataset and produced an average accuracy of 97.78% based on 5-fold cross-validation. In addition, we utilized the Grad-CAM maps to present a visual explanation of the CGENet based on COVID-19 samples. In all, the proposed CGENet can be an effective and efficient tool to assist COVID-19 diagnosis.

Download Full-text

CASE BASE REASONING UNTUK MENENTUKAN KEBUTUHAN BAHAN BANGUNAN RUMAH

SINTECH (Science and Information Technology) Journal ◽

10.31598/sintechjournal.v2i1.224 ◽

2018 ◽

Vol 1 (2) ◽

pp. 70-75

Author(s):

Abdul Rozaq

Keyword(s):

Test Data ◽

Building Materials ◽

Cross Validation ◽

Nearest Neighbor ◽

Training Data ◽

Consultation Process ◽

Case Base ◽

Case Base Reasoning ◽

House Building ◽

Fold Cross Validation

Building materials is an important factor to built a house, to estimate funds the needs of build a house, consumers or developers can estimate the funds needed to build a house. To solve these problems use case base reasoning (CBR) approach, which method is capable of reasoning or solving the problem based on the cases that have been there as a solution to new problems. The system built in this study is a CBR system for determine the needs of house building materials. The consultation process is done by inserting new cases compared to the old case similarity value is then calculated using the nearest neighbor. The first test by inserting test data then compared with each type of home then obtained an accuracy of 83.6%. The second test is done by K-fold Cross Validation with K = 25 with the number of data 200, the data will be divided into two parts, namely the training data and test data, training data as many as 192 data and test data as many as 8 data. K-Fold Cross Validation method. This CBR system can produce an accuracy of 85.71%

Download Full-text

Convolutional Neural Networks for automatic image quality control and EARL compliance of PET images

10.21203/rs.3.rs-964263/v1 ◽

2021 ◽

Author(s):

Elisabeth Pfaehler ◽

Daniela Euba ◽

Andreas Rinscheid ◽

Otto S. Hoekstra ◽

Josee Zijlstra ◽

...

Keyword(s):

Machine Learning ◽

Cross Validation ◽

Training Data ◽

Independent Dataset ◽

Pet Ct ◽

Image Quality Control ◽

The Cross ◽

The Impact ◽

Pet Scanners ◽

Fold Cross Validation

Abstract Background Machine learning studies require a large number of images often obtained on different PET scanners. When merging these images, the use of harmonized images following EARL-standards is essential. However, when including retrospective images, EARL accreditation might not have been in place. The aim of this study was to develop a convolutional neural network (CNN) that can identify retrospectively if an image is EARL compliant and if it is meeting older or newer EARL-standards. Materials and Methods 96 PET images acquired on three PET/CT systems were included in the study. All images were reconstructed with the locally clinically preferred, EARL1, and EARL2 compliant reconstruction protocols. After image pre-processing, one CNN was trained to separate clinical and EARL compliant reconstructions. A second CNN was optimized to identify EARL1 and EARL2 compliant images. The accuracy of both CNNs was assessed using 5-fold cross validation. The CNNs were validated on 24 images acquired on a PET scanner not included in the training data. To assess the impact of image noise on the CNN decision, the 24 images were reconstructed with different scan durations. Results In the cross-validation, the first CNN classified all images correctly. When identifying EARL1 and EARL2 compliant images, the second CNN identified 100% EARL1 compliant and 85% EARL2 compliant images correctly. The accuracy in the independent dataset was comparable to the cross-validation accuracy. The scan duration had almost no impact on the results. Conclusion The two CNNs trained in this study can be used to retrospectively include images in a multi-center setting by e.g. adding additional smoothing. This method is especially important for machine learning studies where the harmonization of images from different PET systems is essential.

Download Full-text

Hierarchy-Based File Fragment Classification

Machine Learning and Knowledge Extraction ◽

10.3390/make2030012 ◽

2020 ◽

Vol 2 (3) ◽

pp. 216-232

Author(s):

Manish Bhatt ◽

Avdesh Mishra ◽

Md Wasi Ul Kabir ◽

S. E. Blake-Gatto ◽

Rishav Rajendra ◽

...

Keyword(s):

Cross Validation ◽

Hierarchical Classification ◽

Future Research ◽

Support Vector ◽

Challenging Problem ◽

Fine Grain ◽

Average Accuracy ◽

Vector Machines ◽

Essential Problem ◽

Fold Cross Validation

File fragment classification is an essential problem in digital forensics. Although several attempts had been made to solve this challenging problem, a general solution has not been found. In this work, we propose a hierarchical machine-learning-based approach with optimized support vector machines (SVM) as the base classifiers for file fragment classification. This approach consists of more general classifiers at the top level and more specialized fine-grain classifiers at the lower levels of the hierarchy. We also propose a primitive taxonomy for file types that can be used to perform hierarchical classification. We evaluate our model with a dataset of 14 file types, with 1000 fragments measuring 512 bytes from each file type derived from a subset of the publicly available Digital Corpora, the govdocs1 corpus. Our experiment shows comparable results to the present literature, with an average accuracy of 67.78% and an F1-measure of 65% using 10-fold cross-validation. We then improve on the hierarchy and find better results, with an increase in the F1-measure of 1%. Finally, we make our assessment and observations, then conclude the paper by discussing the scope of future research.

Download Full-text