Klasifikasi Kualitas Biji Kopi Menggunakan MultilayerPerceptron Berbasis Fitur Warna LCH

Coffee is one of Indonesia's foreign exchange earners and plays an important role in the development of the plantation industry. In previous studies, coffee bean quality research has been carried out using the ANN method using color features. RGB and GLCM. However, the results carried out in the study only had an accuracy value of up to 47%. Therefore, this study aims to improve the performance of coffee bean quality classification using four machine learning methods and 7 color features. From the results obtained, it shows that MultilayerPerceptron is better starting with RGB color with an accuracy of 38% split ratio 90:10. HSV has an accuracy of 57% split ratio 90:10. CMYK has an accuracy of 63% split ratio 90:10. LAB has a 58% curation split ratio of 90:10. The YUV type has an accuracy of 58% split ratio 90:10. Furthermore, the HSI color type has an accuracy of 42% split ratio 90:10. The HCL color type has an accuracy of 65% split ratio 90:10 and LCH has an accuracy of 78% split ratio 90:10. In testing, it can be concluded that the MultilayerPerceptron method is better than other methods for the coffee bean classification process.

Download Full-text

A Comparative Study of Machine Learning Methods for Automatic Classification of Academic and Vocational Guidance Questions

International Journal of Interactive Mobile Technologies (iJIM) ◽

10.3991/ijim.v14i08.13005 ◽

2020 ◽

Vol 14 (08) ◽

pp. 43 ◽

Cited By ~ 1

Author(s):

Omar Zahour ◽

El Habib Benlahmar ◽

Ahmed Eddaouim ◽

Oumaima Hourrane

Keyword(s):

Machine Learning ◽

Comparative Study ◽

Automatic Classification ◽

Vocational Guidance ◽

Machine Learning Algorithms ◽

Holland's Theory ◽

Machine Learning Methods ◽

The Right ◽

Better Than

Academic and vocational guidance is a particularly important issue today, as it strongly determines the chances of successful integration into the labor market, which has become increasingly difficult. Families have understood this because they are interested, often with concern, in the orientation of their child. In this context, it is very important to consider the interests, trades, skills, and personality of each student to make the right decision and build a strong career path. This paper deals with the problematic of educational and vocational guidance by providing a comparative study of the results of four machine-learning algorithms. The algorithms we used are for the automatic classification of school orientation questions and four categories based on John L. Holland's Theory of RIASEC typology. The results of this study show that neural networks work better than the other three algorithms in terms of the automatic classification of these questions. In this sense, our model allows us to automatically generate questions in this domain. This model can serve practitioners and researchers in E-Orientation for further research because the algorithms give us good results.

Download Full-text

Maximizing the Reusability of Public Gene Expression Data by Predicting Missing Metadata

10.1101/792382 ◽

2019 ◽

Author(s):

Pei-Yau Lung ◽

Xiaodong Pang ◽

Yan Li ◽

Jinfeng Zhang

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Gene Expression Data ◽

Missing Values ◽

Expression Data ◽

New Approach ◽

Machine Learning Methods ◽

Differential Gene ◽

Missing Variables ◽

Better Than

AbstractReusability is part of the FAIR data principle, which aims to make data Findable, Accessible, Interoperable, and Reusable. One of the current efforts to increase the reusability of public genomics data has been to focus on the inclusion of quality metadata associated with the data. When necessary metadata are missing, most researchers will consider the data useless. In this study, we develop a framework to predict the missing metadata of gene expression datasets to maximize their reusability. We propose a new metric called Proportion of Cases Accurately Predicted (PCAP), which is optimized in our specifically-designed machine learning pipeline. The new approach performed better than pipelines using commonly used metrics such as F1-score in terms of maximizing the reusability of data with missing values. We also found that different variables might need to be predicted using different machine learning methods and/or different data processing protocols. Using differential gene expression analysis as an example, we show that when missing variables are accurately predicted, the corresponding gene expression data can be reliably used in downstream analyses.

Download Full-text

Research on Classification Method of Maize Seed Defect Based on Machine Vision

Journal of Sensors ◽

10.1155/2019/2716975 ◽

2019 ◽

Vol 2019 ◽

pp. 1-9

Author(s):

Sheng Huang ◽

Xiaofei Fan ◽

Lei Sun ◽

Yanlu Shen ◽

Xuesong Suo

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Heat Map ◽

Deep Learning Algorithm ◽

Quality Classification ◽

Visualization Technology ◽

Better Than

Traditionally, the classification of seed defects mainly relies on the characteristics of color, shape, and texture. This method requires repeated extraction of a large amount of feature information, which is not efficiently used in detection. In recent years, deep learning has performed well in the field of image recognition. We introduced convolutional neural networks (CNNs) and transfer learning into the quality classification of seeds and compared them with traditional machine learning algorithms. Experiments showed that deep learning algorithm was significantly better than the machine learning algorithm with an accuracy of 95% (GoogLeNet) vs. 79.2% (SURF+SVM). We used three classifiers in GoogLeNet to demonstrate that network accuracy increases as the depth of the network increases. We used the visualization technology to obtain the feature map of each layer of the network in CNNs and used the heat map to represent the probability distribution of the inference results. As an end-to-end network, CNNs can be easily applied for automated seed manufacturing.

Download Full-text

Comparisons of forecasting for Survival Outcome for Head and Neck Squamous Cell Carcinoma by using Six Machine Learning Models Based on Multi-Omics

10.21203/rs.3.rs-1100398/v1 ◽

2021 ◽

Author(s):

Liying Mo ◽

Yuangang Su ◽

Jianhui Yuan ◽

Zhiwei Xiao ◽

Ziyan Zhang ◽

...

Keyword(s):

Machine Learning ◽

Squamous Cell Carcinoma ◽

Cell Carcinoma ◽

Head And Neck ◽

Squamous Cell ◽

Survival Outcome ◽

Learning Models ◽

Machine Learning Methods ◽

Machine Learning Models ◽

Better Than

Abstract Background: Machine learning methods showed excellent predictive ability in a wide range of fields. For the survival of head and neck squamous cell carcinoma (HNSC), its multi-omics influence is crucial. This study attempts to establish a variety of machine learning multi-omics models to predict the survival of HNSC and find the most suitable machine learning prediction method. Results: For omics of HNSC, the results of the six models all showed that the performance of multi-omics was better than each single-omic alone. Results were presented which showed that the BN model played a good prediction performance (area under the curve [AUC] 0.8250) in HNSC multi-omics data. The other machine learning models RF (AUC = 0.8002), NN (AUC = 0.7200), and GLM (AUC = 0.7145) also showed high predictive performance except for DT(AUC = 0.5149) and SVM(AUC = 0.6981). And the results of a vitro qPCR were consistent with the Random forest algorithm. Conclusion: Machine learning methods could better forecast the survival outcome of HNSC. Meanwhile, this study found that the Bayesian network was the most superior. Moreover, the forecast result of multi-omics was better than single-omic alone in HNSC.

Download Full-text

Biological Monitoring: a Comparison between Bayesian, Neural and Machine Learning Methods of Water Quality Classification

Environmental Software Systems ◽

10.1007/978-0-387-34951-0_20 ◽

1996 ◽

pp. 229-240 ◽

Cited By ~ 7

Author(s):

W. J. Walley ◽

S. Džeroski

Keyword(s):

Machine Learning ◽

Water Quality ◽

Biological Monitoring ◽

Learning Methods ◽

Machine Learning Methods ◽

Quality Classification ◽

Water Quality Classification

Download Full-text

Comparative Analysis of Random Forests with Statistical and Machine Learning Methods in Predicting Fault-Prone Classes

Cross-Disciplinary Applications of Artificial Intelligence and Pattern Recognition - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-61350-429-1.ch023 ◽

2012 ◽

pp. 428-449 ◽

Cited By ~ 1

Author(s):

Ruchika Malhotra ◽

Arvinder Kaur ◽

Yogesh Singh

Keyword(s):

Machine Learning ◽

Random Forests ◽

Data Sets ◽

Classification Problems ◽

Learning Methods ◽

Machine Learning Methods ◽

Code Metrics ◽

Machine Learning Models ◽

Better Than

There are available metrics for predicting fault prone classes, which may help software organizations for planning and performing testing activities. This may be possible due to proper allocation of resources on fault prone parts of the design and code of the software. Hence, importance and usefulness of such metrics is understandable, but empirical validation of these metrics is always a great challenge. Random Forest (RF) algorithm has been successfully applied for solving regression and classification problems in many applications. In this work, the authors predict faulty classes/modules using object oriented metrics and static code metrics. This chapter evaluates the capability of RF algorithm and compares its performance with nine statistical and machine learning methods in predicting fault prone software classes. The authors applied RF on six case studies based on open source, commercial software and NASA data sets. The results indicate that the prediction performance of RF is generally better than statistical and machine learning models. Further, the classification of faulty classes/modules using the RF method is better than the other methods in most of the data sets.

Download Full-text

A Machine Learning Algorithm to Improve Risk Assessment for Patients with Sickle Cell Disease

Blood ◽

10.1182/blood-2019-125846 ◽

2019 ◽

Vol 134 (Supplement_1) ◽

pp. 893-893

Author(s):

Vandana Sachdev ◽

Yuan Gu ◽

James Nichols ◽

Wen Li ◽

Stanislav Sidenko ◽

...

Keyword(s):

Machine Learning ◽

Sickle Cell Disease ◽

Sickle Cell ◽

Compound Heterozygosity ◽

Cell Disease ◽

Learning Methods ◽

Predictors Of Mortality ◽

Machine Learning Methods ◽

C Statistic ◽

Better Than

Sickle cell disease (SCD) is a clinical syndrome that encompasses several different genotypes, the 3 most common being homozygosity for the bS allele (HbSS), compound heterozygosity of HbS and HbC (HbSC), and compound heterozygosity of HbS and HbSb thalassemia (HbSb+ or HbSb0 thalassemia). Generally, patients with HbSS and HbSb0 thalassemia genotypes have the most severe clinical manifestations, while patients with HbSC and HbSβ+-thalassemia are thought to be less severe. Within each of these genotypic groups, however, there are also substantial phenotypic differences. This heterogeneity makes it difficult to quantify the severity of the disease process and to guide therapeutics. As more intensive, high risk and costly treatments such as hematopoietic stem cell transplant and gene therapy are developing, the ability to assess patients at highest risk of early mortality becomes increasingly important. Integrating varied clinical, laboratory, and imaging markers for personalized risk prediction has been difficult, however, newer machine learning methods for outcome prediction take a more agnostic approach than traditional statistical methods and can detect complex, non-linear relationships in the data. In this study, we sought to apply machine learning methods to a well-characterized cohort of SCD patients followed at the National Institutes of Health in order to identify clinically meaningful subgroups of patients at highest risk of mortality. Between 2006 and 2017, 601 patients (age 35±13 years, 51% female) underwent echocardiogram, standard laboratory markers and hemoglobin electrophoresis resulting in 61 candidate variables. Among these patients, 488 had HbSS, 12 HbSb0 thalassemia, 80 HbSC, 20 HbSb+ thalassemia. All-cause mortality was ascertained by proxy interview, through medical records, and through the CDC National Death Index. Average follow-up time was 5 years and 130 patients were deceased. A random survival forest (RSF) algorithm followed by nested model selection and AIC Cox regression analysis identified 13 predictors of mortality (estimated right ventricular systolic pressure, peak tricuspid regurgitant (TR) velocity, mitral E velocity, septal and posterior wall thickness, IVC diameter, right atrial area, BUN, alkaline phosphatase, N-terminal-pro brain natriuretic peptide (BNP), creatinine, potassium and bicarbonate). This model performed better than individual clinical and laboratory variables with a C-statistic of 0.822 (genotype 0.524, eGFR 0.624, NT-proBNP 0.686, TR velocity 0.703). K-means clustering grouped all patients into 3 main clusters with significant survival differences. Survival at 8 years for the entire group was 70%; for individual clusters, survival was 43% for cluster 1, 72% for cluster 2, and 88% for cluster 3 (Figure 1A). Since TR velocity is recognized as one of the most specific independent predictors of mortality, we compared our results with this parameter. There was a better stratification of mortality risk using the 7 strongest parameters from RSF compared with TR velocity alone (Figure 1B), particularly for longer term outcomes. In this cohort of 601 patients with SCD, machine learning methods were used to show the heterogeneity of this disorder and the ability to detect phenotypic clusters with different mortality profiles. Although there are many individual predictors of mortality, few methods other than assessment by an expert clinician can integrate all known variables in deeply phenotyped patients. RSF and cluster analysis was used in this cohort to analyze a large amount of data in order to identify seven variables that could stratify patients into groups with significantly different outcomes. The specificity of this approach was high (c-statistic 0.822) and better than that of individual markers of end-organ involvement. Disclosures No relevant conflicts of interest to declare.

Download Full-text

Maximizing the reusability of gene expression data by predicting missing metadata

PLoS Computational Biology ◽

10.1371/journal.pcbi.1007450 ◽

2020 ◽

Vol 16 (11) ◽

pp. e1007450

Author(s):

Pei-Yau Lung ◽

Dongrui Zhong ◽

Xiaodong Pang ◽

Yan Li ◽

Jinfeng Zhang

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Gene Expression Data ◽

Missing Values ◽

Expression Data ◽

New Approach ◽

Machine Learning Methods ◽

Differential Gene ◽

Missing Variables ◽

Better Than

Reusability is part of the FAIR data principle, which aims to make data Findable, Accessible, Interoperable, and Reusable. One of the current efforts to increase the reusability of public genomics data has been to focus on the inclusion of quality metadata associated with the data. When necessary metadata are missing, most researchers will consider the data useless. In this study, we developed a framework to predict the missing metadata of gene expression datasets to maximize their reusability. We found that when using predicted data to conduct other analyses, it is not optimal to use all the predicted data. Instead, one should only use the subset of data, which can be predicted accurately. We proposed a new metric called Proportion of Cases Accurately Predicted (PCAP), which is optimized in our specifically-designed machine learning pipeline. The new approach performed better than pipelines using commonly used metrics such as F1-score in terms of maximizing the reusability of data with missing values. We also found that different variables might need to be predicted using different machine learning methods and/or different data processing protocols. Using differential gene expression analysis as an example, we showed that when missing variables are accurately predicted, the corresponding gene expression data can be reliably used in downstream analyses.

Download Full-text

Corn Nitrogen Nutrition Index Prediction Improved by Integrating Genetic, Environmental, and Management Factors with Active Canopy Sensing Using Machine Learning

Remote Sensing ◽

10.3390/rs14020394 ◽

2022 ◽

Vol 14 (2) ◽

pp. 394

Author(s):

Dan Li ◽

Yuxin Miao ◽

Curtis J. Ransom ◽

G. Mac Bean ◽

Newell R. Kitchen ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Vegetation Index ◽

Sensor Data ◽

Support Vector ◽

Random Forest Regression ◽

Machine Learning Methods ◽

Developmental Growth ◽

N Status ◽

Better Than

Accurate nitrogen (N) diagnosis early in the growing season across diverse soil, weather, and management conditions is challenging. Strategies using multi-source data are hypothesized to perform significantly better than approaches using crop sensing information alone. The objective of this study was to evaluate, across diverse environments, the potential for integrating genetic (e.g., comparative relative maturity and growing degree units to key developmental growth stages), environmental (e.g., soil and weather), and management (e.g., seeding rate, irrigation, previous crop, and preplant N rate) information with active canopy sensor data for improved corn N nutrition index (NNI) prediction using machine learning methods. Thirteen site-year corn (Zea mays L.) N rate experiments involving eight N treatments conducted in four US Midwest states in 2015 and 2016 were used for this study. A proximal RapidSCAN CS-45 active canopy sensor was used to collect corn canopy reflectance data around the V9 developmental growth stage. The utility of vegetation indices and ancillary data for predicting corn aboveground biomass, plant N concentration, plant N uptake, and NNI was evaluated using singular variable regression and machine learning methods. The results indicated that when the genetic, environmental, and management data were used together with the active canopy sensor data, corn N status indicators could be more reliably predicted either using support vector regression (R2 = 0.74–0.90 for prediction) or random forest regression models (R2 = 0.84–0.93 for prediction), as compared with using the best-performing single vegetation index or using a normalized difference vegetation index (NDVI) and normalized difference red edge (NDRE) together (R2 < 0.30). The N diagnostic accuracy based on the NNI was 87% using the data fusion approach with random forest regression (kappa statistic = 0.75), which was better than the result of a support vector regression model using the same inputs. The NDRE index was consistently ranked as the most important variable for predicting all the four corn N status indicators, followed by the preplant N rate. It is concluded that incorporating genetic, environmental, and management information with canopy sensing data can significantly improve in-season corn N status prediction and diagnosis across diverse soil and weather conditions.

Download Full-text

Development and validation of prognosis model of mortality risk in patients with COVID-19

Epidemiology and Infection ◽

10.1017/s0950268820001727 ◽

2020 ◽

Vol 148 ◽

Cited By ~ 2

Author(s):

Xuedi Ma ◽

Michael Ng ◽

Shuang Xu ◽

Zhouming Xu ◽

Hui Qiu ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Clinical Features ◽

Mortality Risk ◽

Operating Characteristics ◽

Multivariate Logistic Regression ◽

Learning Methods ◽

Data Set ◽

Machine Learning Methods ◽

Better Than

Abstract This study aimed to identify clinical features for prognosing mortality risk using machine-learning methods in patients with coronavirus disease 2019 (COVID-19). A retrospective study of the inpatients with COVID-19 admitted from 15 January to 15 March 2020 in Wuhan is reported. The data of symptoms, comorbidity, demographic, vital sign, CT scans results and laboratory test results on admission were collected. Machine-learning methods (Random Forest and XGboost) were used to rank clinical features for mortality risk. Multivariate logistic regression models were applied to identify clinical features with statistical significance. The predictors of mortality were lactate dehydrogenase (LDH), C-reactive protein (CRP) and age based on 500 bootstrapped samples. A multivariate logistic regression model was formed to predict mortality 292 in-sample patients with area under the receiver operating characteristics (AUROC) of 0.9521, which was better than CURB-65 (AUROC of 0.8501) and the machine-learning-based model (AUROC of 0.4530). An out-sample data set of 13 patients was further tested to show our model (AUROC of 0.6061) was also better than CURB-65 (AUROC of 0.4608) and the machine-learning-based model (AUROC of 0.2292). LDH, CRP and age can be used to identify severe patients with COVID-19 on hospital admission.

Download Full-text