scholarly journals Random forest age estimation model based on length of left hand bone for Asian population

Author(s):  
Mohd Faaizie Darmawan ◽  
Ahmad Firdaus Zainal Abidin ◽  
Shahreen Kasim ◽  
Tole Sutikno ◽  
Rahmat Budiarto

In forensic anthropology, age estimation is used to ease the process of identifying the age of a living being or the body of a deceased person. Nonetheless, the specialty of the estimation models is solely suitable to a specific people. Commonly, the models are inter and intra-observer variability as the qualitative set of data is being used which results the estimation of age to rely on forensic experts. This study proposes an age estimation model by using length of bone in left hand of Asian subjects range from newborn up to 18-year-old. One soft computing model, which is Random Forest (RF) is used to develop the estimation model and the results are compared with Artificial Neural Network (ANN) and Support Vector Machine (SVM), developed in the previous case studies. The performance measurement used in this study and the previous case study are R-square and Mean Square Error (MSE) value. Based on the results produced, the RF model shows comparable results with the ANN and SVM model. For male subjects, the performance of the RF model is better than ANN, however less ideal than SVM model. As for female subjects, the RF model overperfoms both ANN and SVM model. Overall, the RF model is the most suitable model in estimating age for female subjects compared to ANN and SVM model, however for male subjects, RF model is the second best model compared to the both models. Yet, the application of this model is restricted only to experimental purpose or forensic practice.

Diabetes is a well-known common disease among people around the world. Diabetes causes many anomalies in the body and results in the patients to become under a long term medication. Detecting diabetes has been done via hectic medical tests and causes a delay for the patients to get to know their test results. However, data mining and machine learning approaches are in the frontline supporting the health care domain to make effective predictions in this regard. This paper elaborates about predicting Type 2 Diabetes Mellitus using classification models. A suitable secondary dataset was used to build classification models and the more suitable model was selected via the valid performance measures. In this line, the Random Forest, Support Vector Machine, Naïve Bayes and Artificial Neural Network models were built. Based on the performance measures, Random Forest has been identified as the more suitable classifier with the accuracy of 90%, the recall and precision value of 0.90.


Water ◽  
2018 ◽  
Vol 10 (11) ◽  
pp. 1618 ◽  
Author(s):  
Dan Ma ◽  
Hongyu Duan ◽  
Xin Cai ◽  
Zhenhua Li ◽  
Qiang Li ◽  
...  

Water inrush hazards can be effectively reduced by a reasonable and accurate soft-measuring method on the water inrush quantity from the mine floor. This is quite important for safe mining. However, there is a highly nonlinear relationship between the water outburst from coal seam floors and geological structure, hydrogeology, aquifer, water pressure, water-resisting strata, mining damage, fault and other factors. Therefore, it is difficult to establish a suitable model by traditional methods to forecast the water inrush quantity from the mine floor. Modeling methods developed in other fields can provide adequate models for rock behavior on water inrush. In this study, a new forecast system, which is based on a hybrid genetic algorithm (GA) with the support vector machine (SVM) algorithm, a model structure and the related parameters are proposed simultaneously on water inrush prediction. With the advantages of powerful global optimization functions, implicit parallelism and high stability of the GA, the penalty coefficient, insensitivity coefficient and kernel function parameter of the SVM model are determined as approximately optimal automatically in the spatial dimension. All of these characteristics greatly improve the accuracy and usable range of the SVM model. Testing results show that GA has a useful ability in finding optimal parameters of a SVM model. The performance of the GA optimized SVM (GA-SVM) is superior to the SVM model. The GA-SVM enables the prediction of water inrush and provides a promising solution to the predictive problem for relevant industries.


2021 ◽  
Vol 13 (18) ◽  
pp. 3573
Author(s):  
Chunfang Kong ◽  
Yiping Tian ◽  
Xiaogang Ma ◽  
Zhengping Weng ◽  
Zhiting Zhang ◽  
...  

Regarding the ever increasing and frequent occurrence of serious landslide disaster in eastern Guangxi, the current study was implemented to adopt support vector machines (SVM), particle swarm optimization support vector machines (PSO-SVM), random forest (RF), and particle swarm optimization random forest (PSO-RF) methods to assess landslide susceptibility in Zhaoping County. To this end, 10 landslide disaster-related variables including digital elevation model (DEM)-derived, meteorology-derived, Landsat8-derived, geology-derived, and human activities factors were provided. Of 345 landslide disaster locations found, 70% were used to train the models, and the rest of them were performed for model verification. The aforementioned four models were run, and landslide susceptibility evaluation maps were produced. Then, receiver operating characteristics (ROC) curves, statistical analysis, and field investigation were performed to test and verify the efficiency of these models. Analysis and comparison of the results denoted that all four landslide models performed well for the landslide susceptibility evaluation as indicated by the area under curve (AUC) values of ROC curves from 0.863 to 0.934. Among them, it has been shown that the PSO-RF model has the highest accuracy in comparison to other landslide models, followed by the PSO-SVM model, the RF model, and the SVM model. Moreover, the results also showed that the PSO algorithm has a good effect on SVM and RF models. Furthermore, the landslide models devolved in the present study are promising methods that could be transferred to other regions for landslide susceptibility evaluation. In addition, the evaluation results can provide suggestions for disaster reduction and prevention in Zhaoping County of eastern Guangxi.


Author(s):  
Beaulah Jeyavathana Rajendran ◽  
Kanimozhi K. V.

Tuberculosis is one of the hazardous infectious diseases that can be categorized by the evolution of tubercles in the tissues. This disease mainly affects the lungs and also the other parts of the body. The disease can be easily diagnosed by the radiologists. The main objective of this chapter is to get best solution selected by means of modified particle swarm optimization is regarded as optimal feature descriptor. Five stages are being used to detect tuberculosis disease. They are pre-processing an image, segmenting the lungs and extracting the feature, feature selection and classification. These stages that are used in medical image processing to identify the tuberculosis. In the feature extraction, the GLCM approach is used to extract the features and from the extracted feature sets the optimal features are selected by random forest. Finally, support vector machine classifier method is used for image classification. The experimentation is done, and intermediate results are obtained. The proposed system accuracy results are better than the existing method in classification.


2021 ◽  
Author(s):  
Jihong Dong ◽  
Wenting Dai ◽  
Jiren Xu ◽  
Songnian Li

The study reported here examined, as the research subject, surface soils in the Liuxin mining area of Xuzhou, and explored the heavy metal content and spectral data by establishing quantitative models with Multivariable Linear Regression (MLR), Generalized Regression Neural Network (GRNN) and Sequential Minimal Optimization for Support Vector Machine (SMO-SVM) methods. The study results are as follows: (1) the estimations of the spectral inversion models established based on MLR, GRNN and SMO-SVM are satisfactory, and the MLR model provides the worst estimation, with R2 of more than 0.46. This result suggests that the stress sensitive bands of heavy metal pollution contain enough effective spectral information; (2) the GRNN model can simulate the data from small samples more effectively than the MLR model, and the R2 between the contents of the five heavy metals estimated by the GRNN model and the measured values are approximately 0.7; (3) the stability and accuracy of the spectral estimation using the SMO-SVM model are obviously better than that of the GRNN and MLR models. Among all five types of heavy metals, the estimation for cadmium (Cd) is the best when using the SMO-SVM model, and its R2 value reaches 0.8628; (4) using the optimal model to invert the Cd content in wheat that are planted on mine reclamation soil, the R2 and RMSE between the measured and the estimated values are 0.6683 and 0.0489, respectively. This result suggests that the method using the SMO-SVM model to estimate the contents of heavy metals in wheat samples is feasible.


2020 ◽  
Author(s):  
Junyan Wang ◽  
Chunyan Wang ◽  
Lihong Fu ◽  
Qian Wang ◽  
Guangping Fu ◽  
...  

AbstractIn forensic science, accurate estimation of the age of a victim or suspect can facilitate the investigators to narrow a search and aid in solving a crime. Aging is a complex process associated with various molecular regulation on DNA or RNA levels. Recent studies have shown that circular RNAs (circRNAs) upregulate globally during aging in multiple organisms such as mice and elegans because of their ability to resist degradation by exoribonucleases. In the current study, we attempted to investigate circRNAs’ potential capability of age prediction. Here, we identified more than 40,000 circRNAs in the blood of thirteen Chinese unrelated healthy individuals with ages of 20-62 years according to their circRNA-seq profiles. Three methods were applied to select age-related circRNAs candidates including false discovery rate, lasso regression, and support vector machine. The analysis uncovered a strong bias for circRNA upregulation during aging in human blood. A total of 28 circRNAs were chosen for further validation in 50 healthy unrelated subjects aged between 19 and 72 years by RT-qPCR and finally, 7 age-related circRNAs were chosen for final age prediction models. Several different algorithms including multivariate linear regression (MLR), regression tree, bagging regression, random forest regression (RFR), and support vector regression (SVR) were compared based on root mean square error (RMSE) and mean average error (MAE) values. Among five modeling methods, random forest regression (RFR) performed better than the others with an RMSE value of 5.072 years and an MAE value of 4.065 years (R2 = 0.902). In this preliminary study, we firstly used circRNAs as additional novel age-related biomarkers for developing forensic age estimation models. We propose that the use of circRNAs to obtain additional clues for forensic investigations and serve as aging indicators for age prediction would become a promising field of interest.Author summaryIn forensic investigations, estimation of the age of biological evidence recovered from crime scenes can provide additional information such as chronological age or the appearance of a culprit, which could give valuable investigative leads especially when there is no eyewitness available. Hence, generating an accurate model for age prediction using body fluids such as blood commonly seen at a crime scene can be of vital importance. Various molecular changes on DNA or RNA levels were discovered that they upregulated or downregulated during a person’s lifetime. Although some biomarkers have been proved to be associated with aging and used to predict age, several disadvantages such as low sensitivity, prediction accuracy, instability and susceptibility of diseases or immune states, thus limiting their applicability in the field of age estimation. Here, we utilized a novel biomarker namely circular RNA (circRNA) to generate highly accurate age prediction models. We propose that circRNA is more suitable for forensic degradation samples because of its unique molecular structure. This preliminary research offers a new thought for exploring potential biomarker for age prediction.


2021 ◽  
Vol 3 (4) ◽  
Author(s):  
Ahmet Çınar ◽  
Seda Arslan Tuncer

AbstractWhite blood cells (WBC), which form the basis of the immune system, protect the body from foreign invaders and infectious diseases. While the number and structural features of WBCs can provide important information about the health of people, the ratio of the subtypes of these cells and observable deformations are a good indicator in the diagnostic process. The recognition of cells of the type of lymphocytes, neutrophils, eosinophils, basophils and monocytes is critical. In this article, Deep Learning based Hybrid CNN (Convololutional Neural Network) model is proposed for classification of eosinophils, lymphocytes, monocytes, and neutrophils WBCs. The model presented is based on pretrained Alexnet and Googlenet architectures. The feature vector in the last pooling layer of both CNN architectures has been merged, and the resulting feature vector is classified by the Support Vector Machine. To determine the superiority of the proposed method, the classification was also performed and compared using pretrained Alexnet and Googlenet. Hybrid Alexnet-Googlenet-SVM model provides higher accuracy than pretrained Alexnet and Googlenet. The proposed method has been tested with WBC images from Kaggle and LISC database. Accuracy and F1-score were 99.73%, 0.99 and 98.23%, 0.98 for both data sets, respectively.


2021 ◽  
Author(s):  
Aayushi Rathore ◽  
Anu Saini ◽  
Navjot Kaur ◽  
Aparna Singh ◽  
Ojasvi Dutta ◽  
...  

ABSTRACTSepsis is a severe infectious disease with high mortality, and it occurs when chemicals released in the bloodstream to fight an infection trigger inflammation throughout the body and it can cause a cascade of changes that damage multiple organ systems, leading them to fail, even resulting in death. In order to reduce the possibility of sepsis or infection antiseptics are used and process is known as antisepsis. Antiseptic peptides (ASPs) show properties similar to antigram-negative peptides, antigram-positive peptides and many more. Machine learning algorithms are useful in screening and identification of therapeutic peptides and thus provide initial filters or built confidence before using time consuming and laborious experimental approaches. In this study, various machine learning algorithms like Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbour (KNN) and Logistic Regression (LR) were evaluated for prediction of ASPs. Moreover, the characteristics physicochemical features of ASPs were also explored to use them in machine learning. Both manual and automatic feature selection methodology was employed to achieve best performance of machine learning algorithms. A 5-fold cross validation and independent data set validation proved RF as the best model for prediction of ASPs. Our RF model showed an accuracy of 97%, Matthew’s Correlation Coefficient (MCC) of 0.93, which are indication of a robust and good model. To our knowledge this is the first attempt to build a machine learning classifier for prediction of ASPs.


F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 1185
Author(s):  
Yen-Siang Leow ◽  
Kok-Why Ng ◽  
Yih-Jian Yoong ◽  
Seng-Beng Ng

Background: Thalassemia is a hereditary blood disease in which abnormal red blood cells (RBCs) carry insufficient oxygen throughout the body. Conventional methods of thalassemia detection through a complete blood count (CBC) test and peripheral blood smear image still possess a lot of weaknesses. Methods: This paper proposes a hybrid segmentation method to segment the RBCs. It incorporates adaptive thresholding and canny edge method to segment the RBCs. Morphological operations are performed to clean the leftovers. Shape and texture features are extracted using the segmented masks and the gray level co-occurrence matrix. Data imbalance treatment is used for solving the imbalance cell type class in distribution. In the data resampling layer, the synthetic minority oversampling technique (SMOTE), adaptive synthetic sampling (ADASYN), and random over sampling (ROS) are performed and evaluated using the decision tree and logistic regression. In the classification layer, the decision tree, random forest classifier and support vector machine (SVM) are assessed and compared for the best performance in classification. Results:The proposed method outperforms the other methods in the image segmentation layer with the structural similarity index measure (SSIM) of 89.88%. In the data resampling layer, ADASYN is employed as it is more accurate than the SMOTE and ROS. The random forest classifier is chosen at the classification layer as it is more accurate than the decision tree and support vector machine (SVM). Conclusions:The proposed method is tested on the latest dataset of erythrocyteIDB3 and it solves the issues of imbalanced data due to the insufficient cell classes.


2021 ◽  
Author(s):  
Mohd. Faaizie Bin Darmawan ◽  
Mohd Zamri Osman ◽  
Dewi Nasien

Sign in / Sign up

Export Citation Format

Share Document