scholarly journals Learning Sparse Neural Networks for Better Generalization

Author(s):  
Shiwei Liu

Deep neural networks perform well on test data when they are highly overparameterized, which, however, also leads to large cost to train and deploy them. As a leading approach to address this problem, sparse neural networks have been widely used to significantly reduce the size of networks, making them more efficient during training and deployment, without compromising performance. Recently, sparse neural networks, either compressed from a pre-trained model or obtained by training from scratch, have been observed to be able to generalize as well as or even better than their dense counterparts. However, conventional techniques to find well fitted sparse sub-networks are expensive and the mechanisms underlying this phenomenon are far from clear. To tackle these problems, this Ph.D. research aims to study the generalization of sparse neural networks, and to propose more efficient approaches that can yield sparse neural networks with generalization bounds.

2015 ◽  
Vol 2015 ◽  
pp. 1-10 ◽  
Author(s):  
Chenghao Cai ◽  
Yanyan Xu ◽  
Dengfeng Ke ◽  
Kaile Su

We propose multistate activation functions (MSAFs) for deep neural networks (DNNs). These MSAFs are new kinds of activation functions which are capable of representing more than two states, including theN-order MSAFs and the symmetrical MSAF. DNNs with these MSAFs can be trained via conventional Stochastic Gradient Descent (SGD) as well as mean-normalised SGD. We also discuss how these MSAFs perform when used to resolve classification problems. Experimental results on the TIMIT corpus reveal that, on speech recognition tasks, DNNs with MSAFs perform better than the conventional DNNs, getting a relative improvement of 5.60% on phoneme error rates. Further experiments also reveal that mean-normalised SGD facilitates the training processes of DNNs with MSAFs, especially when being with large training sets. The models can also be directly trained without pretraining when the training set is sufficiently large, which results in a considerable relative improvement of 5.82% on word error rates.


Molecules ◽  
2021 ◽  
Vol 26 (5) ◽  
pp. 1285
Author(s):  
Alfonso T. García-Sosa

Substances that can modify the androgen receptor pathway in humans and animals are entering the environment and food chain with the proven ability to disrupt hormonal systems and leading to toxicity and adverse effects on reproduction, brain development, and prostate cancer, among others. State-of-the-art databases with experimental data of human, chimp, and rat effects by chemicals have been used to build machine-learning classifiers and regressors and to evaluate these on independent sets. Different featurizations, algorithms, and protein structures lead to different results, with deep neural networks (DNNs) on user-defined physicochemically relevant features developed for this work outperforming graph convolutional, random forest, and large featurizations. The results show that these user-provided structure-, ligand-, and statistically based features and specific DNNs provided the best results as determined by AUC (0.87), MCC (0.47), and other metrics and by their interpretability and chemical meaning of the descriptors/features. In addition, the same features in the DNN method performed better than in a multivariate logistic model: validation MCC = 0.468 and training MCC = 0.868 for the present work compared to evaluation set MCC = 0.2036 and training set MCC = 0.5364 for the multivariate logistic regression on the full, unbalanced set. Techniques of this type may improve AR and toxicity description and prediction, improving assessment and design of compounds. Source code and data are available on github.


2020 ◽  
Author(s):  
Abhinav Sagar ◽  
J Dheeba

AbstractIn this work, we address the problem of skin cancer classification using convolutional neural networks. A lot of cancer cases early on are misdiagnosed as something else leading to severe consequences including the death of a patient. Also there are cases in which patients have some other problems and doctors think they might have skin cancer. This leads to unnecessary time and money spent for further diagnosis. In this work, we address both of the above problems using deep neural networks and transfer learning architecture. We have used publicly available ISIC databases for both training and testing our model. Our work achieves an accuracy of 0.935, precision of 0.94, recall of 0.77, F1 score of 0.85 and ROC-AUC of 0.861 which is better than the previous state of the art approaches.


Author(s):  
Zhongxiao Wang ◽  
Lei Zhang ◽  
Min Zhao ◽  
Ying Wang ◽  
Huihui Bai ◽  
...  

Background: Bacterial vaginosis (BV) is caused by the excessive and imbalanced growth of bacteria in vagina, affecting 30-50% of women in their lives. Gram stain followed by Nugent scoring based on bacterial morphotypes under the microscope (NS) has been considered the golden standard for BV diagnosis, which is often labor-intensive, time-consuming, and variable results from person to person. Methods: We developed and optimized a convolutional neural networks (CNN) model, and evaluated its ability to automatically identify and classify three categories of Nugent scores from microscope images. The CNN model was first established with a panel of microscopic images with Nugent scores determined by experts. The model was trained by minimizing the cross entropy loss function and optimized by using a momentum optimizer. The separate test sets of images collected from three hospitals were evaluated by the CNN models. Results: The CNN model consisted of 25 convolutional layers, 2 pooling layers, and a fully connected layer. The model obtained 82.4% sensitivity and 96.6% specificity on the 5,815 validation images when considered altered vaginal flora and BV as the positive samples, which was better than the top-level technologists and obstetricians in China. The ability of generalization for our model was strong that it obtained 75.1% accuracy of three categories of Nugent scores on the independent test set of 1082 images, which was 6.6% higher than the average of three technologists, who are with a bachelor degree in medicine and eligible making diagnostic decisions. When three technologists ran one specimen in triplicate, the precision of three categories of Nugent scores was 54.0%. 103 samples diagnosed by two technologists at different days showed repeatability of 90.3%. Conclusion: The CNN model over-performed human healthcare practitioners on accuracy and stability for three categories of Nugent scores diagnosis. The deep learning model may offer translational applications in automating diagnosis of bacterial vaginosis with proper supporting hardware.


Author(s):  
Ye Xu ◽  
Xun Yuan

Background: Forecasting of time series stock data is important in financial related works. Stock data usually have multifeatures such as opening price, closing price and so on. The traditional forecast methods, however, is mainly applied to one feature – closing price, or a few, like four or five features. The massive information hidden in the multi-feature data is not thoroughly discovered and used. Objective: Find a method to make used of all information of multi-features and get a forecast model. Method: LSTM based models are introduced in this paper. For comparison, three models are used and they are single LSTM model, hybrid model of LSTM-CNN, and traditional ARIMA model. Results: Experiments with different models are performed on stock data with 50 and 230 features, respectively. Results show that MSE of single LSTM model is 2.4% lower than ARIMA model and MSE of LSTM-CNN model is 12.57% lower than that of single LSTM model on 50 features data. On 230 features data, LSTM-CNN model is found to be improved by 23.41% in forecast accuracy. Conclusion: In this paper, we use three different models – ARIMA, single LSTM and LSTM-CNN hybrid model – to forecast rise and fall of multi-features stock data. It’s found that single LSTM model is better than traditional ARIMA model on the average, and LSTM-CNN hybrid model is better than single LSTM model on 50-feature stock data. What’s more, we use LSTM-CNN model to perform experiments on stock data with 50 and 230 features, respectively. And is found that results of the same model on 230 features data is better than that on 50 features data. It’s proved in our work that the LSTM-CNN hybrid model is better than other models and experiments on stock data with more features could result in better outcomes. We’ll do more works on hybrid models next.


2018 ◽  
Vol 9 (1) ◽  
pp. 54 ◽  
Author(s):  
Xinyue Wan ◽  
Bofeng Zhang ◽  
Guobing Zou ◽  
Furong Chang

In recent years, although deep neural networks have yielded immense success in solving various recognition and classification problems, the exploration of deep neural networks in recommender systems has received relatively less attention. Meanwhile, the inherent sparsity of data is still a challenging problem for deep neural networks. In this paper, firstly, we propose a new CIDAE (Continuous Imputation Denoising Autoencoder) model based on the Denoising Autoencoder to alleviate the problem of data sparsity. CIDAE performs regular continuous imputation on the missing parts of the original data and trains the imputed data as the desired output. Then, we optimize the existing advanced NeuMF (Neural Matrix Factorization) model, which combines matrix factorization and a multi-layer perceptron. By optimizing the training process of NeuMF, we improve the accuracy and robustness of NeuMF. Finally, this paper fuses CIDAE and optimized NeuMF with reference to the idea of ensemble learning. We name the fused model the I-NMF (Imputation-Neural Matrix Factorization) model. I-NMF can not only alleviate the problem of data sparsity, but also fully exploit the ability of deep neural networks to learn potential features. Our experimental results prove that I-NMF performs better than the state-of-the-art methods for the public MovieLens datasets.


Author(s):  
Vasily D. Derbentsev ◽  
Vitalii S. Bezkorovainyi ◽  
Iryna V. Luniak

This study investigates the issues of forecasting changes in short-term currency trends using deep learning models, which is relevant for both the scientific community and for traders and investors. The purpose of this study is to build a model for forecasting the direction of change in the prices of currency quotes based on deep neural networks. The developed architecture was based on the model of valve recurrent node, which is a modification of the model of “Long Short-Term Memory”, but is simpler in terms of the number of parameters and learning time. The forecast calculations of the dynamics of quotations of the currency pair euro/dollar and the most capitalised cryptocurrency Bitcoin/dollar were performed using daily, four-hour and hourly datasets. The obtained results of binary classification (forecast of the direction of trend change) when applying daily and hourly quotations turned out to be generally better than those of time series models or models of neural networks of other architecture (in particular, multilayer perceptron or “Long Short-Term Memory” models). According to the study results, the highest accuracy of classification was for the model of daily quotations for both euro/dollar – about 72%, and for Bitcoin/ dollar – about 69%. For four-hour and hourly time series, the accuracy of classification decreased, which can be explained both by the increase in the impact of “market noise” and the probable overfitting. Computer simulation has demonstrated that models predict a rising trend better than a declining one. The study confirmed the prospects for the application of deep learning models for short-term forecasting of time series of currency quotes. The use of the developed models proved to be effective for both fiat and cryptocurrencies. The proposed system of models based on deep neural networks can be used as a basis for developing an automated trading system in the foreign exchange market


Author(s):  
Alfonso T. García-Sosa

Substances that can modify the androgen receptor pathway in humans and animals are entering the environment and food chain with the proven ability to disrupt hormonal systems and leading to toxicity and adverse effects on reproduction, brain development, and prostate cancer, among others. State-of-the-art databases with experimental data of human, chimp, and rat effects by chemicals have been used to build machine learning classifiers and regressors and evaluate these on independent sets. Different featurizations, algorithms, and protein structures lead to dif- ferent results, with deep neural networks (DNNs) on user-defined physicochemically-relevant features developed for this work outperforming graph convolutional, random forest, and large featurizations. The results show that these user-provided structure-, ligand-, and statistically-based features and specific DNNs provided the best results as determined by AUC (0.87), MCC (0.47), and other metrics and by their interpretability and chemical meaning of the descriptors/features. In addition, the same features in the DNN method performed better than in a multivariate logistic model: validation MCC = 0.468 and training MCC = 0.868 for the present work compared to evalu- ation set MCC = 0.2036 and training set MCC = 0.5364 for the multivariate logistic regression on the full, unbalanced set. Techniques of this type may improve AR and toxicity description and predic- tion, improving assessment and design of compounds. Source code and data are available at https://github.com/AlfonsoTGarcia-Sosa/ML


Author(s):  
Alfonso T. García-Sosa

Substances that can modify the androgen receptor pathway in humans and animals are entering the environment and food chain with the proven ability to disrupt hormonal systems and leading to toxicity and adverse effects on reproduction, brain development, and prostate cancer, among others. State-of-the-art databases with experimental data of human, chimp, and rat effects by chemicals have been used to build machine learning classifiers and regressors and evaluate these on independent sets. Different featurizations, algorithms, and protein structures lead to dif- ferent results, with deep neural networks (DNNs) on user-defined physicochemically-relevant features developed for this work outperforming graph convolutional, random forest, and large featurizations. The results show that these user-provided structure-, ligand-, and statistically-based features and specific DNNs provided the best results as determined by AUC (0.87), MCC (0.47), and other metrics and by their interpretability and chemical meaning of the descriptors/features. In addition, the same features in the DNN method performed better than in a multivariate logistic model: validation MCC = 0.468 and training MCC = 0.868 for the present work compared to evalu- ation set MCC = 0.2036 and training set MCC = 0.5364 for the multivariate logistic regression on the full, unbalanced set. Techniques of this type may improve AR and toxicity description and predic- tion, improving assessment and design of compounds. Source code and data are available at https://github.com/AlfonsoTGarcia-Sosa/ML


Sign in / Sign up

Export Citation Format

Share Document