Convolutional ensembles for Arabic Handwritten Character and Digit Recognition

A learning algorithm is proposed for the task of Arabic Handwritten Character and Digit recognition. The architecture consists on an ensemble of different Convolutional Neural Networks. The proposed training algorithm uses a combination of adaptive gradient descent on the first epochs and regular stochastic gradient descent in the last epochs, to facilitate convergence. Different validation strategies are tested, namely Monte Carlo Cross-Validation and K-fold Cross Validation. Hyper-parameter tuning was done by using the MADbase digits dataset. State of the art validation and testing classification accuracies were achieved, with average values of 99.74% and 99.47% respectively. The same algorithm was then trained and tested with the AHCD character dataset, also yielding state of the art validation and testing classification accuracies: 98.60% and 98.42% respectively.

Download Full-text

Perbandingan Prediksi Kualitas Kopi Arabika dengan Menggunakan Algoritma SGD, Random Forest dan Naive Bayes

EDUMATIC Jurnal Pendidikan Informatika ◽

10.29408/edumatic.v4i2.2202 ◽

2020 ◽

Vol 4 (2) ◽

pp. 1-9

Author(s):

Veronica Sari ◽

◽

Feranandah Firdausi ◽

Yufis Azhar ◽

◽

...

Keyword(s):

Random Forest ◽

Gradient Descent ◽

Cross Validation ◽

Naive Bayes ◽

Area Under The Curve ◽

Naïve Bayes ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Quality Institute ◽

Fold Cross Validation

Classification is one of the techniques that exist in data mining and is useful for grouping a data based on the attachment of the data with the sample data. The dataset that is used in this study is the coffee dataset taken from Dataset Coffee Quality Institute on the GitHub platform. The attributes that contained in the dataset are Aroma, Aftertaste, Flavor, Acidity, Balance, Body, Uniformity, Sweetness, Clean Cup, and Copper points. There are 3 classification methods that are used in this study, Stochastic Gradient Descent, Random Forest and Naive Bayes. The aim of this study is to find out which algorithm is the most effective to predict the coffee quality in the dataset. After that, the prediction results will be tested using K-Fold Cross Validation and Area Under the Curve (AUC) method. The results show that Stochastic Gradient Descent obtained the best accuracy results compared to the other two methods with an accuracy of 98% and increased to 99% after tested using K-fold Cross Validation and AUC method.

Download Full-text

DNA sequences performs as natural language processing by exploiting deep learning algorithm for the identification of N4-methylcytosine

Scientific Reports ◽

10.1038/s41598-020-80430-x ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Abdul Wahab ◽

Hilal Tayara ◽

Zhenyu Xuan ◽

Kil To Chong

Keyword(s):

Deep Learning ◽

Language Processing ◽

Dna Sequences ◽

Area Under Curve ◽

Cross Validation ◽

Learning Algorithm ◽

State Of The Art ◽

Deep Learning Algorithm ◽

Fold Cross Validation ◽

Genome Dataset

AbstractN4-methylcytosine is a biochemical alteration of DNA that affects the genetic operations without modifying the DNA nucleotides such as gene expression, genomic imprinting, chromosome stability, and the development of the cell. In the proposed work, a computational model, 4mCNLP-Deep, used the word embedding approach as a vector formulation by exploiting deep learning based CNN algorithm to predict 4mC and non-4mC sites on the C.elegans genome dataset. Diversity of ranges employed for the experimental such as corpus k-mer and k-fold cross-validation to obtain the prevailing capabilities. The 4mCNLP-Deep outperform from the state-of-the-art predictor by achieving the results in five evaluation metrics by following; Accuracy (ACC) as 0.9354, Mathew’s correlation coefficient (MCC) as 0.8608, Specificity (Sp) as 0.89.96, Sensitivity (Sn) as 0.9563, and Area under curve (AUC) as 0.9731 by using 3-mer corpus word2vec and 3-fold cross-validation and attained the increment of 1.1%, 0.6%, 0.58%, 0.77%, and 4.89%, respectively. At last, we developed the online webserver http://nsclbio.jbnu.ac.kr/tools/4mCNLP-Deep/, for the experimental researchers to get the results easily.

Download Full-text

Regularized Instance Embedding for Deep Multi-Instance Learning

Applied Sciences ◽

10.3390/app10010064 ◽

2019 ◽

Vol 10 (1) ◽

pp. 64

Author(s):

Yi Lin ◽

Honggang Zhang

Keyword(s):

Neural Network ◽

Big Data ◽

Supervised Learning ◽

Regularization Method ◽

Gradient Descent ◽

State Of The Art ◽

Stochastic Gradient Descent ◽

Learning Framework ◽

Weakly Supervised ◽

The Cost

In the era of Big Data, multi-instance learning, as a weakly supervised learning framework, has various applications since it is helpful to reduce the cost of the data-labeling process. Due to this weakly supervised setting, learning effective instance representation/embedding is challenging. To address this issue, we propose an instance-embedding regularizer that can boost the performance of both instance- and bag-embedding learning in a unified fashion. Specifically, the crux of the instance-embedding regularizer is to maximize correlation between instance-embedding and underlying instance-label similarities. The embedding-learning framework was implemented using a neural network and optimized in an end-to-end manner using stochastic gradient descent. In experiments, various applications were studied, and the results show that the proposed instance-embedding-regularization method is highly effective, having state-of-the-art performance.

Download Full-text

Contactless Li-Ion Battery Voltage Detection by Using Walabot and Machine Learning

Volume 9: 15th IEEE/ASME International Conference on Mechatronic and Embedded Systems and Applications ◽

10.1115/detc2019-97668 ◽

2019 ◽

Author(s):

Yanan Wang ◽

Haoyu Niu ◽

Tiebiao Zhao ◽

Xiaozhong Liao ◽

Lei Dong ◽

...

Keyword(s):

Machine Learning ◽

Gradient Descent ◽

Learning Algorithm ◽

Three Dimensional ◽

Lithium Ion ◽

Principal Component ◽

Stochastic Gradient Descent ◽

Li Ion Battery ◽

Linear Discriminant ◽

Li Ion

Abstract This paper has proposed a contactless voltage classification method for Lithium-ion batteries (LIBs). With a three-dimensional radio-frequency based sensor called Walabot, voltage data of LIBs can be collected in a contactless way. Then three machine learning algorithm, that is, principal component analysis (PCA), linear discriminant analysis (LDA), and stochastic gradient descent (SGD) classifiers, have been employed for data processing. Experiments and comparison have been conducted to verify the proposed method. The colormaps of results and prediction accuracy show that LDA may be most suitable for LIBs voltage classification.

Download Full-text

A novel computational model for predicting potential LncRNA-disease associations based on both direct and indirect features of LncRNA-disease pairs

BMC Bioinformatics ◽

10.1186/s12859-020-03906-7 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Yubin Xiao ◽

Zheng Xiao ◽

Xiang Feng ◽

Zhiping Chen ◽

Linai Kuang ◽

...

Keyword(s):

Computational Model ◽

Cross Validation ◽

State Of The Art ◽

Prediction Methods ◽

Good Prediction ◽

Average Case ◽

Comparison Results ◽

Disease Associations ◽

Fold Cross Validation

Abstract Background Accumulating evidence has demonstrated that long non-coding RNAs (lncRNAs) are closely associated with human diseases, and it is useful for the diagnosis and treatment of diseases to get the relationships between lncRNAs and diseases. Due to the high costs and time complexity of traditional bio-experiments, in recent years, more and more computational methods have been proposed by researchers to infer potential lncRNA-disease associations. However, there exist all kinds of limitations in these state-of-the-art prediction methods as well. Results In this manuscript, a novel computational model named FVTLDA is proposed to infer potential lncRNA-disease associations. In FVTLDA, its major novelty lies in the integration of direct and indirect features related to lncRNA-disease associations such as the feature vectors of lncRNA-disease pairs and their corresponding association probability fractions, which guarantees that FVTLDA can be utilized to predict diseases without known related-lncRNAs and lncRNAs without known related-diseases. Moreover, FVTLDA neither relies solely on known lncRNA-disease nor requires any negative samples, which guarantee that it can infer potential lncRNA-disease associations more equitably and effectively than traditional state-of-the-art prediction methods. Additionally, to avoid the limitations of single model prediction techniques, we combine FVTLDA with the Multiple Linear Regression (MLR) and the Artificial Neural Network (ANN) for data analysis respectively. Simulation experiment results show that FVTLDA with MLR can achieve reliable AUCs of 0.8909, 0.8936 and 0.8970 in 5-Fold Cross Validation (fivefold CV), 10-Fold Cross Validation (tenfold CV) and Leave-One-Out Cross Validation (LOOCV), separately, while FVTLDA with ANN can achieve reliable AUCs of 0.8766, 0.8830 and 0.8807 in fivefold CV, tenfold CV, and LOOCV respectively. Furthermore, in case studies of gastric cancer, leukemia and lung cancer, experiment results show that there are 8, 8 and 8 out of top 10 candidate lncRNAs predicted by FVTLDA with MLR, and 8, 7 and 8 out of top 10 candidate lncRNAs predicted by FVTLDA with ANN, having been verified by recent literature. Comparing with the representative prediction model of KATZLDA, comparison results illustrate that FVTLDA with MLR and FVTLDA with ANN can achieve the average case study contrast scores of 0.8429 and 0.8515 respectively, which are both notably higher than the average case study contrast score of 0.6375 achieved by KATZLDA. Conclusion The simulation results show that FVTLDA has good prediction performance, which is a good supplement to future bioinformatics research.

Download Full-text

Global observation-based climatology of precipitation occurrence and peak intensity

10.5194/egusphere-egu2020-7837 ◽

2020 ◽

Author(s):

Hylke Beck ◽

Seth Westra ◽

Eric Wood

Keyword(s):

Land Surface ◽

Regression Models ◽

Cross Validation ◽

Climate Models ◽

Daily Precipitation ◽

State Of The Art ◽

Coefficient Of Determination ◽

Peak Intensity ◽

Uncertainty Estimates ◽

Fold Cross Validation

We introduce a unique set of global observation-based climatologies of daily precipitation (P) occurrence (related to the lower tail of the P distribution) and peak intensity (related to the upper tail of the P distribution). The climatologies were produced using Random Forest (RF) regression models trained with an unprecedented collection of daily P observations from 93,138 stations worldwide. Five-fold cross-validation was used to evaluate the generalizability of the approach and to quantify uncertainty globally. The RF models were found to provide highly satisfactory performance, yielding cross-validation coefficient of determination (R2) values from 0.74 for the 15-year return-period daily P intensity to 0.86 for the >0.5 mm d-1 daily P occurrence. The performance of the RF models was consistently superior to that of state-of-the-art reanalysis (ERA5) and satellite (IMERG) products. The highest P intensities over land were found along the western equatorial coast of Africa, in India, and along coastal areas of Southeast Asia. Using a 0.5 mm d-1 threshold, P was estimated to occur 23.2 % of days on average over the global land surface (excluding Antarctica). The climatologies including uncertainty estimates will be released as the Precipitation DISTribution (PDIST) dataset via www.gloh2o.org/pdist. We expect the dataset to be useful for numerous purposes, such as the evaluation of climate models, the bias correction of gridded P datasets, and the design of hydraulic structures in poorly gauged regions.

Download Full-text

On the Performance Improvement of Devanagari Handwritten Character Recognition

Applied Computational Intelligence and Soft Computing ◽

10.1155/2015/193868 ◽

2015 ◽

Vol 2015 ◽

pp. 1-12 ◽

Cited By ~ 4

Author(s):

Pratibha Singh ◽

Ajay Verma ◽

Narendra S. Chaudhari

Keyword(s):

Performance Improvement ◽

Character Recognition ◽

Gradient Descent ◽

Stochastic Gradient Descent ◽

Weight Decay ◽

Pixel Intensity ◽

Handwritten Character ◽

Extraction Algorithm ◽

Numeral Recognition ◽

Gradient Feature

The paper is about the application of mini minibatch stochastic gradient descent (SGD) based learning applied to Multilayer Perceptron in the domain of isolated Devanagari handwritten character/numeral recognition. This technique reduces the variance in the estimate of the gradient and often makes better use of the hierarchical memory organization in modern computers.L2-weight decay is added on minibatch SGD to avoid overfitting. The experiments are conducted firstly on the direct pixel intensity values as features. After that, the experiments are performed on the proposed flexible zone based gradient feature extraction algorithm. The results are promising on most of the standard dataset of Devanagari characters/numerals.

Download Full-text

A Novel Computational Model for Predicting Potential LncRNA-Disease Associations based on Both Direct and Indirect Features of LncRNA-Disease Pairs

10.21203/rs.2.18937/v3 ◽

2020 ◽

Author(s):

Yubin Xiao ◽

Zheng Xiao ◽

Xiang Feng ◽

Zhiping Chen ◽

Linai Kuang ◽

...

Keyword(s):

Computational Model ◽

Cross Validation ◽

State Of The Art ◽

Prediction Methods ◽

Good Prediction ◽

Average Case ◽

Comparison Results ◽

Disease Associations ◽

Fold Cross Validation

Abstract Background: Accumulating evidence has demonstrated that long non-coding RNAs (lncRNAs) are closely associated with human diseases, and it is useful for the diagnosis and treatment of diseases to get the relationships between lncRNAs and diseases. Due to the high costs and time complexity of traditional bio-experiments, in recent years, more and more computational methods have been proposed by researchers to infer potential lncRNA-disease associations. However, there exist all kinds of limitations in these state-of-the-art prediction methods as well.Results: In this manuscript, a novel computational model named FVTLDA is proposed to infer potential lncRNA-disease associations. In FVTLDA, its major novelty lies in the integration of direct and indirect features related to lncRNA-disease associations such as the feature vectors of lncRNA-disease pairs and their corresponding association probability fractions, which guarantees that FVTLDA can be utilized to predict diseases without known related-lncRNAs and lncRNAs without known related-diseases. Moreover, FVTLDA neither relies solely on known lncRNA-disease nor requires any negative samples, which guarantee that it can infer potential lncRNA-disease associations more equitably and effectively than traditional state-of-the-art prediction methods. Additionally, to avoid the limitations of single model prediction techniques, we combine FVTLDA with the Multiple Linear Regression (MLR) and the Artificial Neural Network (ANN) for data analysis respectively. Simulation experiment results show that FVTLDA with MLR can achieve reliable AUCs of 0.8909, 0.8936 and 0.8970 in 5-Fold Cross Validation (5-fold CV), 10-Fold Cross Validation (10-fold CV) and Leave-One-Out Cross Validation (LOOCV), separately, while FVTLDA with ANN can achieve reliable AUCs of 0.8766, 0.8830 and 0.8807 in 5-fold CV, 10-fold CV, and LOOCV respectively. Furthermore, in case studies of gastric cancer, leukemia and lung cancer, experiment results show that there are 8, 8 and 8 out of top 10 candidate lncRNAs predicted by FVTLDA with MLR, and 8, 7 and 8 out of top 10 candidate lncRNAs predicted by FVTLDA with ANN, having been verified by recent literature. Comparing with the representative prediction model of KATZLDA, comparison results illustrate that FVTLDA with MLR and FVTLDA with ANN can achieve the average case study contrast scores of 0.8429 and 0.8515 respectively, which are both notably higher than the average case study contrast score of 0.6375 achieved by KATZLDA.Conclusion: The simulation results show that FVTLDA has good prediction performance, which is a good supplement to future bioinformatics research.

Download Full-text

Urine biomarker: novel approach to hepatocellular carcinoma screening

10.1101/2020.11.21.20236125 ◽

2020 ◽

Author(s):

Amy K Kim ◽

James P. Hamilton ◽

Selena Y. Lin ◽

Ting-Tsung Chang ◽

Hie-Won Hann ◽

...

Keyword(s):

Hepatocellular Carcinoma ◽

Cross Validation ◽

Learning Algorithm ◽

Early Stage ◽

High Risk Patient ◽

Circulating Tumor Dna ◽

Urine Samples ◽

Detection Rates ◽

Non Invasive ◽

Fold Cross Validation

ABSTRACTBackground & AimsContinued limitations in hepatocellular carcinoma (HCC) screening have led to late diagnosis with poor survival, despite well-defined high-risk patient populations. Our aim is to develop a non-invasive urine circulating tumor DNA (ctDNA) biomarker panel for HCC screening to aid in early detection.MethodsCandidate ctDNA biomarkers was prescreened in urine samples obtained from HCC, cirrhosis, and hepatitis patients. Then, 609 patient urine samples with HCC, cirrhosis, or chronic hepatitis B were collected from five academic medical centers and evaluated by serum alpha feto-protein (AFP) and urine ctDNA panel using logistic regression, a Two-Step machine learning algorithm, and iterated 10-fold cross-validation.ResultsMutated TP53, and methylated RASSF1a and GSTP1, were selected for the urine ctDNA panel. The sensitivity of AFP-alone (9.8 ng/mL cut-off) to detect HCC was 71% by Two-Step. The combination of ctDNA and AFP increased the sensitivity to 81% at a specificity of 90%. The AUROC for the combination of ctDNA and AFP vs. AFP-alone were 0.925 (95% CI, 0.924-0.925) and 0.877 (95% CI, 0.876-0.877), respectively. Notably, among the patients with AFP <20 ng/mL, the combination panel correctly identified 64% of HCC cases. The panel performed superiorly to AFP-alone in early-stage HCC (BCLC A) with 80% sensitivity and 90% specificity. In an iterated 10-fold cross-validation analysis, the AUROC for the combination panel was 0.898 (95% CI, 0.895-0.901).ConclusionsThe combination of urine ctDNA and serum AFP can increase HCC detection rates including in those patients with low-AFP. Given the ease of collection, a urine ctDNA panel could be a potential non-invasive HCC screening test.

Download Full-text

Asymptotics of Reinforcement Learning with Neural Networks

Stochastic Systems ◽

10.1287/stsy.2021.0072 ◽

2021 ◽

Author(s):

Justin Sirignano ◽

Konstantinos Spiliopoulos

Keyword(s):

Differential Equation ◽

Neural Networks ◽

Stationary Solution ◽

Gradient Descent ◽

Learning Algorithm ◽

Single Layer ◽

Stochastic Gradient Descent ◽

Distributed Data ◽

Limiting Behavior ◽

Q Learning

We prove that a single-layer neural network trained with the Q-learning algorithm converges in distribution to a random ordinary differential equation as the size of the model and the number of training steps become large. Analysis of the limit differential equation shows that it has a unique stationary solution that is the solution of the Bellman equation, thus giving the optimal control for the problem. In addition, we study the convergence of the limit differential equation to the stationary solution. As a by-product of our analysis, we obtain the limiting behavior of single-layer neural networks when trained on independent and identically distributed data with stochastic gradient descent under the widely used Xavier initialization.

Download Full-text