scholarly journals Pulmonary Nodule Classification in Thoracic CT Images using Random Forest Algorithm

In this paper, an automatic classification of thoracic pulmonary nodules with Computed Tomography Image as input is performed. We can crisply classify the nodules into two categories: Benign and Malignant. Benign nodules are the ones which do not cause any harm and even if they do, the impact is negligible. Malignant Nodules are the ones which, if not detected on time can cause severe damage to a person, even resulting in death. Henceforth, detection at early stage of lung cancer is critical. We plan to perform our analysis in 4 steps. Firstly, a noise free CT image is obtained after preprocessing. Then, we apply the improved Random Walker algorithm to perform regionbased segmentation, resulting in generation of foreground and background seeds. The next step is to bring out important features of the segments. The features can be intensity, texture and geometry based. Finally we used an improved Random Forest method to generate classification trees, comprising of different class labels. Using RF Algorithm, we predict the accurate class label which corresponds to a particular type of nodule and the stage of cancer that it has developed.

2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Alina Trifan ◽  
José Luis Oliveira

Abstract With the continuous increase in the use of social networks, social mining is steadily becoming a powerful component of digital phenotyping. In this paper we explore social mining for the classification of self-diagnosed depressed users of Reddit as social network. We conduct a cross evaluation study based on two public datasets in order to understand the impact of transfer learning when the data source is virtually the same. We further complement these results with an experiment of transfer learning in post-partum depression classification, using a corpus we have collected for the matter. Our findings show that transfer learning in social mining might still be at an early stage in computational research and we thoroughly discuss its implications.


2021 ◽  
Vol 4 (1) ◽  
pp. 14
Author(s):  
Husna Afanyn Khoirunissa ◽  
Amanda Rizky Widyaningrum ◽  
Annisa Priliya Ayu Maharani

<p>The Bank is a business entity that is dealing with money, accepting deposits from customers, providing funds for each withdrawal, billing checks on the customer's orders, giving credit and or embedding the excess deposits until required for repayment. The purpose of this research is to determine the influence of age, gender, country, customer credit score, number of bank products used by the customer, and the activation of the bank members in the decision to choose to continue using the bank account that he has retained or closed the bank account. The data in this research used 10,000 respondents originating from France, Spain, and Germany. The method used is data mining with early stage preprocessing to clean data from outlier and missing value and feature selection to select important attributes. Then perform the classification using three methods, which are Random Forest, Logistic Regression, and Multilayer Perceptron. The results of this research showed that the model with Multilayer Perceptron method with 10 folds Cross Validation is the best model with 85.5373% accuracy.</p><strong>Keywords:</strong> bank customer, random forest, logistic regression, multilayer perceptron


2021 ◽  
Vol 11 ◽  
pp. 52
Author(s):  
Akitoshi Inoue ◽  
Tucker F. Johnson ◽  
Benjamin A. Voss ◽  
Yong S. Lee ◽  
Shuai Leng ◽  
...  

Objectives: The objectives of the study were to estimate the impact of high matrix image reconstruction on chest computed tomography (CT) compared to standard image reconstruction. Material and Methods: This retrospective study included patients with interstitial or parenchymal lung disease, airway disease, and pulmonary nodules who underwent chest CT. Chest CT images were reconstructed using high matrix (1024 × 1024) or standard matrix (512 × 512), with all other parameters matched. Two radiologists, blinded to reconstruction technique, independently examined each lung, viewing image sets side by side and rating the conspicuity of imaging findings using a 5-point relative conspicuity scale. The presence of pulmonary nodules and confidence in classification of internal attenuation was also graded. Overall image quality and subjective noise/artifacts were assessed. Results: Thirty-four patients with 68 lungs were evaluated. Relative conspicuity scores were significantly higher using high matrix image reconstruction for all imaging findings indicative of idiopathic lung fibrosis (peripheral airway visualization, interlobular septal thickening, intralobular reticular opacity, and end-stage fibrotic change; P ≤ 0.001) along with emphysema, mosaic attenuation, and fourth order bronchi for both readers (P ≤ 0.001). High matrix reconstruction did not improve confidence in the presence or classification of internal nodule attenuation for either reader. Overall image quality was increased but not subjective noise/artifacts with high matrix image reconstruction for both readers (P < 0.001). Conclusion: High matrix image reconstruction significantly improves the conspicuity of imaging findings reflecting interstitial lung disease and may be useful for diagnosis or treatment response assessment.


2021 ◽  
Author(s):  
Anton Korosov ◽  
Hugo Boulze ◽  
Julien Brajard

&lt;p&gt;A new algorithm for classification of sea ice types on Sentinel-1 Synthetic Aperture Radar (SAR) data using a convolutional neural network (CNN) is presented.&amp;#160; The CNN is trained on reference ice charts produced by human experts and compared with an existing machine learning algorithm based on texture features and random forest classifier. The CNN is trained on a dataset from winter 2020 for retrieval of four classes: ice free, young ice, first-year ice and old ice. The accuracy of our classification is 91.6%. The error is a bit higher for young ice (76%) and first-year ice (84%). Our algorithm outperforms the existing random forest product for each ice type. It has also proved to be more efficient in computing time and less sensitive to the noise in SAR data.&lt;/p&gt;&lt;p&gt;&amp;#160;&lt;/p&gt;&lt;p&gt;Our study demonstrates that CNN can be successfully applied for classification of sea ice types in SAR data. The algorithm is applied in small sub-images extracted from a SAR image after preprocessing including thermal noise removal. Validation shows that the errors are mostly attributed to coarse resolution of ice charts or misclassification of training data by human experts.&lt;/p&gt;&lt;p&gt;&amp;#160;&lt;/p&gt;&lt;p&gt;Several sensitivity experiments were conducted for testing the impact of CNN architecture, hyperparameters, training parameters and data preprocessing on accuracy. It was shown that a CNN with three convolutional layers, two max-pool layers and three hidden dense layers can be applied to a sub-image with size 50 x 50 pixels for achieving the best results. It was also shown that a CNN can be applied to SAR data without thermal noise removal on the preprocessing step. Understandably, the classification accuracy decreases to 89% but remains reasonable.&lt;/p&gt;&lt;p&gt;&amp;#160;&lt;/p&gt;&lt;p&gt;The main advantages of the new algorithm are the ability to classify several ice types, higher classification accuracy for each ice type and higher speed of processing than in the previous studies. The relative simplicity of the algorithm (both texture analysis and classification are performed by CNN) is also a benefit. In addition to providing ice type labels, the algorithm also derives the probability of belonging to a class. Uncertainty of the method can be derived from these probabilities and used in the assimilation of ice type in numerical models.&amp;#160;&lt;/p&gt;&lt;p&gt;&lt;br&gt;Given the high accuracy and processing speed, the CNN-based algorithm is included in the Copernicus Marine Environment Monitoring Service (CMEMS) for operational sea ice type retrieval for generating ice charts in the Arctic Ocean. It is already released as an open source software and available on Github: https://github.com/nansencenter/s1_icetype_cnn.&lt;/p&gt;


2022 ◽  
Vol 9 (1) ◽  
pp. 27
Author(s):  
Inês Vigo ◽  
Luis Coelho ◽  
Sara Reis

Background: Alzheimer’s disease (AD) has paramount importance due to its rising prevalence, the impact on the patient and society, and the related healthcare costs. However, current diagnostic techniques are not designed for frequent mass screening, delaying therapeutic intervention and worsening prognoses. To be able to detect AD at an early stage, ideally at a pre-clinical stage, speech analysis emerges as a simple low-cost non-invasive procedure. Objectives: In this work it is our objective to do a systematic review about speech-based detection and classification of Alzheimer’s Disease with the purpose of identifying the most effective algorithms and best practices. Methods: A systematic literature search was performed from Jan 2015 up to May 2020 using ScienceDirect, PubMed and DBLP. Articles were screened by title, abstract and full text as needed. A manual complementary search among the references of the included papers was also performed. Inclusion criteria and search strategies were defined a priori. Results: We were able: to identify the main resources that can support the development of decision support systems for AD, to list speech features that are correlated with the linguistic and acoustic footprint of the disease, to recognize the data models that can provide robust results and to observe the performance indicators that were reported. Discussion: A computational system with the adequate elements combination, based on the identified best-practices, can point to a whole new diagnostic approach, leading to better insights about AD symptoms and its disease patterns, creating conditions to promote a longer life span as well as an improvement in patient quality of life. The clinically relevant results that were identified can be used to establish a reference system and help to define research guidelines for future developments.


2010 ◽  
Vol 21 (1) ◽  
pp. 140-144
Author(s):  
Olga Lucía Torres-Vargas ◽  
José Manuel Barat-Baviera ◽  
Marta Aliño

Composition of fresh and frozen meat products, specially fat content, is one of the parameters that is usually taken into account for different purposes such as classification of fresh and frozen meat pieces for fresh consume as well as for processing, price fixation and adequacy for processing. That is the reason why some different non destructive methods are used to determine, at least approximately, the meat composition. Among them, we can go from visual methods to the more complex resonance imaging and computed tomography image analysis. The aim of this study was to use density measurement as a non destructive, easy and cheap method to classify fresh and frozen meat with a wide range of fat content. The obtained results showed that there is asignificant relationship between density and fat content of fresh and frozen meat.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Sadegh Ilbeigipour ◽  
Amir Albadvi ◽  
Elham Akhondzadeh Noughabi

One of the major causes of death in the world is cardiac arrhythmias. In the field of healthcare, physicians use the patient’s electrocardiogram (ECG) records to detect arrhythmias, which indicate the electrical activity of the patient’s heart. The problem is that the symptoms do not always appear and the physician may be mistaken in the diagnosis. Therefore, patients need continuous monitoring through real-time ECG analysis to detect arrhythmias in a timely manner and prevent an eventual incident that threatens the patient’s life. In this research, we used the Structured Streaming module built top on the open-source Apache Spark platform for the first time to implement a machine learning pipeline for real-time cardiac arrhythmias detection and evaluate the impact of using this new module on classification performance metrics and the rate of delay in arrhythmia detection. The ECG data collected from the MIT/BIH database for the detection of three class labels: normal beats, RBBB, and atrial fibrillation arrhythmias. We also developed three decision trees, random forest, and logistic regression multiclass classifiers for data classification where the random forest classifier showed better performance in classification than the other two classifiers. The results show previous results in performance metrics of the classification model and a significant decrease in pipeline runtime by using more class labels compared to previous studies.


Sign in / Sign up

Export Citation Format

Share Document