Predicting psoriasis using routine laboratory tests with random forest

Psoriasis is a chronic inflammatory skin disease that affects approximately 125 million people worldwide. It has significant impacts on both physical and emotional health-related quality of life comparable to other major illnesses. Accurately prediction of psoriasis using biomarkers from routine laboratory tests has important practical values. Our goal is to derive a powerful predictive model for psoriasis disease based on only routine hospital tests. We collected a data set including 466 psoriasis patients and 520 healthy controls with 81 variables from only laboratory routine tests, such as age, total cholesterol, HDL cholesterol, blood pressure, albumin, and platelet distribution width. In this study, Boruta feature selection method was applied to select the most relevant features, with which a Random Forest model was constructed. The model was tested with 30 repetitions of 10-fold cross-validation. Our classification model yielded an average accuracy of 86.9%. 26 notable features were selected by Boruta, among which 15 features are confirmed from previous studies, and the rest are worth further investigations. The experimental results demonstrate that the machine learning approach has good potential in predictive modeling for the psoriasis disease given the information only from routine hospital tests.

Download Full-text

Evaluating regional variability in the use of the most commonly requested laboratory tests in primary care in Spain: data from the multi-center national scale REDCONLAB initiative

LaboratoriumsMedizin ◽

10.1515/labmed-2016-0077 ◽

2017 ◽

Vol 41 (2) ◽

Author(s):

Maria Salinas ◽

Maite Lopez-Garrigos ◽

Emilio Flores ◽

Carlos Leiva-Salinas

Keyword(s):

Primary Care ◽

Uric Acid ◽

Regional Differences ◽

Laboratory Tests ◽

Hdl Cholesterol ◽

Regional Variability ◽

Data Set ◽

Autonomous Communities ◽

Glutamyl Transpeptidase ◽

Common Laboratory

AbstractBackground:The aim was to study the regional variability in the request of the ten most frequently ordered laboratory tests in primary care in Spain.Methods:Spain is divided into autonomous communities (AACC), first level health care divisions. Every AACC is divided into health departments (HDs). A laboratory attends the needs of every HD inhabitant. Laboratories from different HDs participated in the study. They reported the request of the ten most commonly requested laboratory tests in primary care during the year 2014 according to prior evidence: alanine aminotransferase (ALT), aspartate aminotransferase (AST), total cholesterol, creatinine, γ-glutamyl transpeptidase (GGT), glucose, HDL-cholesterol, triglycerides, uric acid and urinalysis. Test-utilization rates were calculated as tests per 1000 inhabitants. Laboratories were grouped in the different AACC and the results for each region were compared using the coefficient of quartile dispersion (CQD), calculated using the first (Q1) and third (Q3) quartiles for each data set, as follows: (Q3−Q1)/(Q3+Q1).Results:One hundred and ten laboratories participated, corresponding to 27,798,262 inhabitants (59.8% Spanish population) from 15 AACC. 82,710,869 tests were requested. AST, GGT and uric acid showed the greatest variation.Conclusions:There were significant regional differences in how the most common laboratory tests were ordered in Spain.

Download Full-text

Thyroid dysfunction diagnosis from routine laboratory tests based on machine learning

10.1101/2021.03.30.21254605 ◽

2021 ◽

Author(s):

Min Hu ◽

Chikashi Asami ◽

Hiroshi Iwakura ◽

Yasuyo Nakajima ◽

Ryousuke Sema ◽

...

Keyword(s):

Machine Learning ◽

Laboratory Tests ◽

Thyroid Dysfunction ◽

Cross Validation ◽

Laboratory Finding ◽

Machine Learning Algorithms ◽

Classification Model ◽

Routine Laboratory ◽

Health Records ◽

Medical Checkup

AbstractApproximately 2.4 million patients in Japan need treatment for thyroid disease, including Graves’ disease and Hashimoto’s disease. However, only 450,000 of them are receiving treatment, and many patients with thyroid dysfunction remain largely overlooked. In this retrospective study, we aimed to screen patients with hyperthyroidism and hypothyroidism who would greatly benefit from prompt medical treatment. We examined routine laboratory finding data and machine learning algorithms to investigate whether such accurate and robust screening is possible to prevent overlooking and misdiagnosing thyroid dysfunction. We succeeded in developing a machine learning method to construct a classification model for detecting hyperthyroidism and hypothyroidism in patients using 11 routine laboratory tests. We collected electronic health records and medical checkup data from four hospitals in Japan. As a result of cross-validation and external evaluation, we achieved a high classification accuracy for the hyperthyroidism and hypothyroidism models.

Download Full-text

Can routine laboratory tests discriminate 2019 novel coronavirus infected pneumonia from other community-acquired pneumonia?

10.1101/2020.02.25.20024711 ◽

2020 ◽

Cited By ~ 1

Author(s):

Yunbao Pan ◽

Guangming Ye ◽

Xiantao Zeng ◽

Guohong Liu ◽

Xiaojiao Zeng ◽

...

Keyword(s):

Blood Cell ◽

Laboratory Tests ◽

Clinical Laboratory ◽

Roc Curves ◽

Community Acquired Pneumonia ◽

Routine Laboratory ◽

Discriminative Ability ◽

Distribution Width ◽

Discriminatory Ability ◽

Novel Coronavirus

AbstractBackgroundThe clinical presentation of 2019 Novel Coronavirus (2019-nCov) infected pneumonia (NCIP) resembles that of other etiologies of community-acquired pneumonia (CAP). We aimed to identify clinical laboratory features to distinguish NCIP from CAP.MethodsWe compared the ability of the hematological and biochemical features of 84 patients with NCIP at hospital admission and 316 patients with CAP. Parameters independently predictive of NCIP were calculated by multivariate logistic regression. The receiver operating characteristic (ROC) curves were generated and the area under the ROC curve (AUC) was measured to evaluate the discriminative ability.ResultsMost hematological and biochemical indexes of patients with NCIP were significantly different from patients with CAP. Nine laboratory parameters were identified to be highly predictive of a diagnosis of NCIP by multivariate analysis. The AUCs demonstrated good discriminatory ability for red cell distribution width (RDW) with an AUC of 0.88 and Hemoglobin (HGB) with an AUC of 0.82. Red blood cell (RBC), albumin (ALB), eosinophil (EO), hematocrit (HCT), alkaline phosphatase (ALP), and white blood cell (WBC) had fair discriminatory ability. Combinations of any two parameters performed better than did the RDW alone.ConclusionsRoutine laboratory examinations may be helpful for the diagnosis of NCIP. Application of laboratory tests may help to optimize the use of isolation rooms for patients when they present with unexplained febrile respiratory illnesses.

Download Full-text

ILRC : A Hybrid Biomarker Discovery Algorithm Based on Improved L1 Regularization and Clustering in Microarray Data

10.21203/rs.3.rs-590641/v1 ◽

2021 ◽

Author(s):

Kun Yu ◽

Weidong Xie ◽

Linjie Wang ◽

Wei Li

Keyword(s):

Feature Selection ◽

Microarray Data ◽

Biomarker Discovery ◽

Cleft Lip And Palate ◽

Feature Selection Method ◽

Disease Diagnosis ◽

Selection Method ◽

Classification Model ◽

Data Set ◽

L1 Regularization

Abstract Background Finding significant genes or proteins from gene chip data for disease diagnosis and drug development is an important task, and the challenge comes from the curse of the data dimension. It is of great significance to use machine learning methods to find important features from the data and build an accurate classification model. Results The proposed Method has proved superior to the published advanced hybrid feature selection method and traditional feature selection method on different public microarray data sets. In addition, the results on the cleft lip and palate data set with known biomarkers provided by the cooperative hospital show that compared with other methods, our method can preferentially select these biomarkers. Method In this paper, a feature selection algorithm ILRC based on clustering and improved L1 regularization is proposed. In this method, the features are first clustered, and the redundant features in the sub-clusters are deleted. Then all the remaining features are iteratively evaluated using ILR, and the final result is output according to the cumulative weight reordering. Conclusion The proposed method can effectively remove redundant features. The algorithm's output has high stability and classification accuracy and can potentially select potential biomarkers.

Download Full-text

ANALYSIS OF THE INFLUENCE OF MACHINE LEARNING ALGORITHM PARAMETERS ON THE RESULTS OF TRAFFIC CLASSIFICATION IN REAL TIME

T-Comm ◽

10.36724/2072-8735-2021-15-9-24-35 ◽

2021 ◽

Vol 15 (9) ◽

pp. 24-35

Author(s):

Irina A. Krasnova ◽

Keyword(s):

Machine Learning ◽

Random Forest ◽

Real Time ◽

Experimental Studies ◽

Machine Learning Algorithms ◽

Classification Model ◽

Traffic Classification ◽

Data Set ◽

Minimum Number ◽

The Impact

The paper analyzes the impact of setting the parameters of Machine Learning algorithms on the results of traffic classification in real-time. The Random Forest and XGBoost algorithms are considered. A brief description of the work of both methods and methods for evaluating the results of classification is given. Experimental studies are conducted on a database obtained on a real network, separately for TCP and UDP flows. In order for the results of the study to be used in real time, a special feature matrix is created based on the first 15 packets of the flow. The main parameters of the Random Forest (RF) algorithm for configuration are the number of trees, the partition criterion used, the maximum number of features for constructing the partition function, the depth of the tree, and the minimum number of samples in the node and in the leaf. For XGBoost, the number of trees, the depth of the tree, the minimum number of samples in the leaf, for features, and the percentage of samples needed to build the tree are taken. Increasing the number of trees leads to an increase in accuracy to a certain value, but as shown in the article, it is important to make sure that the model is not overfitted. To combat overfitting, the remaining parameters of the trees are used. In the data set under study, by eliminating overfitting, it was possible to achieve an increase in classification accuracy for individual applications by 11-12% for Random Forest and by 12-19% for XGBoost. The results show that setting the parameters is a very important step in building a traffic classification model, because it helps to combat overfitting and significantly increases the accuracy of the algorithm’s predictions. In addition, it was shown that if the parameters are properly configured, XGBoost, which is not very popular in traffic classification works, becomes a competitive algorithm and shows better results compared to the widespread Random Forest.

Download Full-text

Multi-Label Classification Based on Random Forest Algorithm for Non-Intrusive Load Monitoring System

Processes ◽

10.3390/pr7060337 ◽

2019 ◽

Vol 7 (6) ◽

pp. 337 ◽

Cited By ~ 11

Author(s):

Xin Wu ◽

Yuchen Gao ◽

Dian Jiao

Keyword(s):

Random Forest ◽

Learning Algorithm ◽

Classification Model ◽

Load Identification ◽

Data Set ◽

Classification Feature ◽

Public Data ◽

Energy Disaggregation ◽

Load Monitoring ◽

Energy Consumption Patterns

Non-intrusive load monitoring (NILM) is an effective method to optimize energy consumption patterns. Since the concept of NILM was proposed, extensive research has focused on energy disaggregation or load identification. The traditional method is to disaggregate mixed signals, and then identify the independent load. This paper proposes a multi-label classification method using Random Forest (RF) as a learning algorithm for non-intrusive load identification. Multi-label classification can be used to determine which categories data belong to. This classification can help to identify the operation states of independent loads from mixed signals without disaggregation. The experiments are conducted in real environment and public data set respectively. Several basic electrical features are selected as the classification feature to build the classification model. These features are also compared to select the most suitable features for classification by feature importance parameters. The classification accuracy and F-score of the proposed method can reach 0.97 and 0.98, respectively.

Download Full-text

Recursive Feature Elimination with Ridge Regression (L2) Machine Learning Hybrid Feature Selection Algorithm for Diabetic Prediction using Random Forest Classifer.

10.21203/rs.3.rs-742641/v1 ◽

2021 ◽

Author(s):

K venkatachalam ◽

P Prabhu ◽

B saravana Balaji ◽

Mohamed Abouhawwash ◽

R Rajadevi

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Ridge Regression ◽

Feature Selection Method ◽

Selection Method ◽

Recursive Feature Elimination ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Data Set

Abstract In day today life, diabetes illness is increasing in count due to the body not able to metabolize the glucose level. The prediction of the right diabetes patients is an important research area that many researchers are proposing the techniques to predict this disease through data mining and machine learning methods. In prediction, feature selection is one of the key concept in preprocessing so that the features that are relevant to the disease will be used for prediction. This will improve the prediction accuracy. Selecting right features among the whole feature set is a complicated process and many researchers are concentrating on it to produce the predictive model with high accuracy. In this proposed work, the wrapper based feature selection method called Recursive Feature Elimination (RFE) is combined with Ridge regression (L2) to form a hybrid L2 regulated feature selection algorithm to overcome the overfilling problem of the data set. Over fitting is the major problem in feature selection which means that the new data are not fit to the model since the training data is small. Ridge regression is mainly used to overcome the overfitting problem. Once the features are selected using the proposed feature selection method, random forest classifier is used to classify the data based on the selected features. The proposed work is experimented in PIDD data set and the evaluated results are compared with the existing algorithms to prove the accuracy effect of the proposed algorithm. From the results obtained by proposed algorithm, the accuracy of predicting the diabetes disease is high compared to other existing algorithms.

Download Full-text

ILRC : A Hybrid Biomarker Discovering Algorithm Based on Improved L1 Regularization and Clustering in Microarray Data

10.21203/rs.3.rs-572610/v1 ◽

2021 ◽

Author(s):

Kun Yu ◽

Weidong Xie ◽

Linjie Wang ◽

Wei Li

Keyword(s):

Feature Selection ◽

Microarray Data ◽

Cleft Lip ◽

Cleft Lip And Palate ◽

Feature Selection Method ◽

Disease Diagnosis ◽

Selection Method ◽

Classification Model ◽

Data Set ◽

L1 Regularization

Abstract Background: Finding significant genes or proteins from gene chip data for disease diagnosis and drug development is an important task, and the challenge comes from the curse of the data dimension. It is of great significance to use machine learning methods to find important features from the data and build an accurate classification model. Results: The proposed Mehtod has proved superior to the published advanced hybrid feature selection method and traditional feature selection method on different public microarray data sets. In addition, the results on the cleft lip and palate data set with known biomarkers provided by the cooperative hospital show that compared with other methods, our method can preferentially select these biomarkers. Method: In this paper, a feature selection algorithm ILRC based on clustering and improved L1 regularization is proposed. In this method, the features are first clustered, and the redundant features in the sub-clusters are deleted. Then all the remaining features are iteratively evaluated using ILR, and the final result is output according to the cumulative weight reordering. Conclusion: The proposed method can effectively remove redundant features. The algorithm’s output has high stability and classification accuracy and can potentially select potential biomarkers.

Download Full-text

Rapid Screening Methods for Bleeding Disorders. A Three Year Survey

Thrombosis and Haemostasis ◽

10.1055/s-0038-1654846 ◽

1964 ◽

Vol 11 (02) ◽

pp. 506-512 ◽

Cited By ~ 1

Author(s):

V. A Lovric ◽

J Margolis

Keyword(s):

Laboratory Tests ◽

Capillary Blood ◽

Screening Tests ◽

Rapid Screening ◽

Screening Methods ◽

Bleeding Disorders ◽

Routine Laboratory ◽

Clotting Time ◽

Kaolin Clotting Time ◽

In Infants And Children

SummaryAn adaptation of “kaolin clotting time” and prothrombin time for use on haemolysed capillary blood provided simple and sensitive screening tests suitable for use in infants and children. A survey of three year’s experience shows that these are reliable routine laboratory tests for detection of latent coagulation disorders.

Download Full-text

Random Forest Refinement of Pairwise Potentials for Protein-ligand Decoy Detection

10.26434/chemrxiv.8047820.v1 ◽

2019 ◽

Cited By ~ 1

Author(s):

Jun Pei ◽

Zheng Zheng ◽

Hyunji Kim ◽

Lin Song ◽

Sarah Walworth ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Probability Function ◽

Pair Potential ◽

Scoring Function ◽

Stable Structure ◽

Scoring Functions ◽

Atom Pair ◽

Data Set ◽

Atom Pairs

An accurate scoring function is expected to correctly select the most stable structure from a set of pose candidates. One can hypothesize that a scoring function’s ability to identify the most stable structure might be improved by emphasizing the most relevant atom pairwise interactions. However, it is hard to evaluate the relevant importance for each atom pair using traditional means. With the introduction of machine learning methods, it has become possible to determine the relative importance for each atom pair present in a scoring function. In this work, we use the Random Forest (RF) method to refine a pair potential developed by our laboratory (GARF6) by identifying relevant atom pairs that optimize the performance of the potential on our given task. Our goal is to construct a machine learning (ML) model that can accurately differentiate the native ligand binding pose from candidate poses using a potential refined by RF optimization. We successfully constructed RF models on an unbalanced data set with the ‘comparison’ concept and, the resultant RF models were tested on CASF-2013.5 In a comparison of the performance of our RF models against 29 scoring functions, we found our models outperformed the other scoring functions in predicting the native pose. In addition, we used two artificial designed potential models to address the importance of the GARF potential in the RF models: (1) a scrambled probability function set, which was obtained by mixing up atom pairs and probability functions in GARF, and (2) a uniform probability function set, which share the same peak positions with GARF but have fixed peak heights. The results of accuracy comparison from RF models based on the scrambled, uniform, and original GARF potential clearly showed that the peak positions in the GARF potential are important while the well depths are not. <br>

Download Full-text