Deep learning: from speech recognition to language and multimodal processing

While artificial neural networks have been in existence for over half a century, it was not until year 2010 that they had made a significant impact on speech recognition with a deep form of such networks. This invited paper, based on my keynote talk given at Interspeech conference in Singapore in September 2014, will first reflect on the historical path to this transformative success, after providing brief reviews of earlier studies on (shallow) neural networks and on (deep) generative models relevant to the introduction of deep neural networks (DNN) to speech recognition several years ago. The role of well-timed academic-industrial collaboration is highlighted, so are the advances of big data, big compute, and the seamless integration between the application-domain knowledge of speech and general principles of deep learning. Then, an overview is given on sweeping achievements of deep learning in speech recognition since its initial success. Such achievements, summarized into six major areas in this article, have resulted in across-the-board, industry-wide deployment of deep learning in speech recognition systems. Next, more challenging applications of deep learning, natural language and multimodal processing, are selectively reviewed and analyzed. Examples include machine translation, knowledgebase completion, information retrieval, and automatic image captioning, where fresh ideas from deep learning, continuous-space embedding in particular, are shown to be revolutionizing these application areas albeit with less rapid pace than for speech and image recognition. Finally, a number of key issues in deep learning are discussed, and future directions are analyzed for perceptual tasks such as speech, image, and video, as well as for cognitive tasks involving natural language.

Download Full-text

Off-the-shelf deep learning is not enough, and requires parsimony, Bayesianity, and causality

npj Computational Materials ◽

10.1038/s41524-020-00487-0 ◽

2021 ◽

Vol 7 (1) ◽

Author(s):

Rama K. Vasudevan ◽

Maxim Ziatdinov ◽

Lukas Vlcek ◽

Sergei V. Kalinin

Keyword(s):

Neural Networks ◽

Computer Vision ◽

Deep Learning ◽

Bayesian Methods ◽

Deep Neural Networks ◽

Applied Research ◽

Modern Science ◽

Generative Models ◽

Knowledge Development ◽

Physical Constraints

AbstractDeep neural networks (‘deep learning’) have emerged as a technology of choice to tackle problems in speech recognition, computer vision, finance, etc. However, adoption of deep learning in physical domains brings substantial challenges stemming from the correlative nature of deep learning methods compared to the causal, hypothesis driven nature of modern science. We argue that the broad adoption of Bayesian methods incorporating prior knowledge, development of solutions with incorporated physical constraints and parsimonious structural descriptors and generative models, and ultimately adoption of causal models, offers a path forward for fundamental and applied research.

Download Full-text

Speech Assistance for Persons With Speech Impediments Using Artificial Neural Networks

Volume 3: Biomedical and Biotechnology Engineering ◽

10.1115/imece2017-71027 ◽

2017 ◽

Author(s):

Ramy Mounir ◽

Redwan Alqasemi ◽

Rajiv Dubey

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Deep Learning ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Challenging Problem ◽

Speech Impairment ◽

Recognition Model ◽

Wide Range ◽

Speech Variability

This work focuses on the research related to enabling individuals with speech impairment to use speech-to-text software to recognize and dictate their speech. Automatic Speech Recognition (ASR) tends to be a challenging problem for researchers because of the wide range of speech variability. Some of the variabilities include different accents, pronunciations, speeds, volumes, etc. It is very difficult to train an end-to-end speech recognition model on data with speech impediment due to the lack of large enough datasets, and the difficulty of generalizing a speech disorder pattern on all users with speech impediments. This work highlights the different techniques used in deep learning to achieve ASR and how it can be modified to recognize and dictate speech from individuals with speech impediments.

Download Full-text

Assistant robot through deep learning

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v10i1.pp1053-1062 ◽

2020 ◽

Vol 10 (1) ◽

pp. 1053

Author(s):

Robinson Jiménez-Moreno ◽

Javier Orlando Pinzón-Arenas ◽

César Giovany Pachón-Suescún

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Speech Recognition ◽

Real Time ◽

Convolutional Neural Networks ◽

Spanish Language ◽

Assistive Robotics ◽

The One

This article presents a work oriented to assistive robotics, where a scenario is established for a robot to reach a tool in the hand of a user, when they have verbally requested it by his name. For this, three convolutional neural networks are trained, one for recognition of a group of tools, which obtained an accuracy of 98% identifying the tools established for the application, that are scalpel, screwdriver and scissors; one for speech recognition, trained with the names of the tools in Spanish language, where its validation accuracy reach a 97.5% in the recognition of the words; and another for recognition of the user's hand, taking in consideration the classification of 2 gestures: Open and Closed hand, where a 96.25% accuracy was achieved. With those networks, tests in real time are performed, presenting results in the delivery of each tool with a 100% of accuracy, i.e. the robot was able to identify correctly what the user requested, recognize correctly each tool and deliver the one need when the user opened their hand, taking an average time of 45 seconds in the execution of the application.

Download Full-text

Envision Foundational of Convolution Neural Network

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.f8804.0410621 ◽

2021 ◽

Vol 10 (6) ◽

pp. 54-60

Author(s):

M Venkata Krishna Reddy* ◽

Pradeep S.

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Learning ◽

Computer Graphics ◽

Convolutional Neural Networks ◽

Generative Models ◽

Activation Functions ◽

Machine Learning Classification ◽

Network Intrusion ◽

Recent Advances

1. Bilal, A. Jourabloo, M. Ye, X. Liu, and L. Ren. Do Convolutional Neural Networks Learn Class Hierarchy? IEEE Transactions on Visualization and Computer Graphics, 24(1):152–162, Jan. 2018. 2. M. Carney, B. Webster, I. Alvarado, K. Phillips, N. Howell, J. Griffith, J. Jongejan, A. Pitaru, and A. Chen. Teachable Machine: Approachable Web-Based Tool for Exploring Machine Learning Classification. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems, CHI ’20. ACM, Honolulu, HI, USA, 2020. 3. A. Karpathy. CS231n Convolutional Neural Networks for Visual Recognition, 2016 4. M. Kahng, N. Thorat, D. H. Chau, F. B. Viegas, and M. Wattenberg. GANLab: Understanding Complex Deep Generative Models using Interactive Visual Experimentation. IEEE Transactions on Visualization and Computer Graphics, 25(1):310–320, Jan. 2019. 5. J. Yosinski, J. Clune, A. Nguyen, T. Fuchs, and H. Lipson. Understanding Neural Networks Through Deep Visualization. In ICML Deep Learning Workshop, 2015 6. M. Kahng, P. Y. Andrews, A. Kalro, and D. H. Chau. ActiVis: Visual Exploration of Industry-Scale Deep Neural Network Models. IEEE Transactions on Visualization and Computer Graphics, 24(1):88–97, Jan. 2018. 7. https://cs231n.github.io/convolutional-networks/ 8. https://www.analyticsvidhya.com/blog/2020/02/learn-imageclassification-cnn-convolutional-neural-networks-3-datasets/ 9. https://towardsdatascience.com/understanding-cnn-convolutionalneural- network-69fd626ee7d4 10. https://medium.com/@birdortyedi_23820/deep-learning-lab-episode-2- cifar- 10-631aea84f11e 11. J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, T. Liu, X. Wang, G. Wang, J. Cai, and T. Chen. Recent advances in convolutional neural networks. Pattern Recognition, 77:354–377, May 2018. 12. Hamid, Y., Shah, F.A. and Sugumaram, M. (2014), ―Wavelet neural network model for network intrusion detection system‖, International Journal of Information Technology, Vol. 11 No. 2, pp. 251-263 13. G Sreeram , S Pradeep, K SrinivasRao , B.Deevan Raju , Parveen Nikhat , ― Moving ridge neuronal espionage network simulation for reticulum invasion sensing‖. International Journal of Pervasive Computing and Communications.https://doi.org/10.1108/IJPCC-05- 2020-0036 14. E. Stevens, L. Antiga, and T. Viehmann. Deep Learning with PyTorch. O’Reilly Media, 2019. 15. J. Yosinski, J. Clune, A. Nguyen, T. Fuchs, and H. Lipson. Understanding Neural Networks Through Deep Visualization. In ICML Deep Learning Workshop, 2015. 16. Aman Dureja, Payal Pahwa, ―Analysis of Non-Linear Activation Functions for Classification Tasks Using Convolutional Neural Networks‖, Recent Advances in Computer Science , Vol 2, Issue 3, 2019 ,PP-156-161 17. https://missinglink.ai/guides/neural-network-concepts/7-types-neuralnetwork-activation-functions-right/

Download Full-text

Bearing fault diagnosis using deep learning techniques coupled with handcrafted feature extraction: A comparative study

Journal of Vibration and Control ◽

10.1177/1077546320929141 ◽

2020 ◽

pp. 107754632092914

Author(s):

Mohammed Alabsi ◽

Yabin Liao ◽

Ala-Addin Nabulsi

Keyword(s):

Neural Networks ◽

Feature Extraction ◽

Deep Learning ◽

Comparative Study ◽

Domain Knowledge ◽

Deep Neural Networks ◽

Performance Limits ◽

Data Repositories ◽

Learning Techniques ◽

Wide Range

Deep learning has seen tremendous growth over the past decade. It has set new performance limits for a wide range of applications, including computer vision, speech recognition, and machinery health monitoring. With the abundance of instrumentation data and the availability of high computational power, deep learning continues to prove itself as an efficient tool for the extraction of micropatterns from machinery big data repositories. This study presents a comparative study for feature extraction capabilities using stacked autoencoders considering the use of expert domain knowledge. Case Western Reserve University bearing dataset was used for the study, and a classifier was trained and tested to extract and visualize features from 12 different failure classes. Based on the raw data preprocessing, four different deep neural network structures were studied. Results indicated that integrating domain knowledge with deep learning techniques improved feature extraction capabilities and reduced the deep neural networks size and computational requirements without the need for exhaustive deep neural networks architecture tuning and modification.

Download Full-text

Neural Language Modeling for Molecule Generation

10.26434/chemrxiv.14700831 ◽

2021 ◽

Author(s):

Sanjar Adilov

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Natural Language Processing ◽

Drug Design ◽

Natural Language ◽

Language Processing ◽

De Novo ◽

Language Modeling ◽

Machine Learning Methods

Generative neural networks have shown promising results in <i>de novo</i> drug design. Recent studies suggest that one of the efficient ways to produce novel molecules matching target properties is to model SMILES sequences using deep learning in a way similar to language modeling in natural language processing. In this paper, we present a survey of various machine learning methods for SMILES-based language modeling and propose our benchmarking results on a standardized subset of ChEMBL database.

Download Full-text

Neural Language Modeling for Molecule Generation

10.26434/chemrxiv.14700831.v1 ◽

2021 ◽

Author(s):

Sanjar Adilov

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Natural Language Processing ◽

Drug Design ◽

Natural Language ◽

Language Processing ◽

De Novo ◽

Language Modeling ◽

Machine Learning Methods

Download Full-text

Graph and Neural Network-Based Intelligent Conversation System

Nature-Inspired Algorithms for Big Data Frameworks - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-5852-1.ch014 ◽

2019 ◽

pp. 339-357 ◽

Cited By ~ 1

Author(s):

Anuja Arora ◽

Aman Srivastava ◽

Shivam Bansal

Keyword(s):

Deep Learning ◽

Natural Language ◽

System Approach ◽

Domain Knowledge ◽

Research Work ◽

Graph Model ◽

Machine Intelligence ◽

Knowledge Graph ◽

Feature Engineering ◽

Graph Based Model

The conventional approach to build a chatbot system uses the sequence of complex algorithms and productivity of these systems depends on order and coherence of algorithms. This research work introduces and showcases a deep learning-based conversation system approach. The proposed approach is an intelligent conversation model approach which conceptually uses graph model and neural conversational model. The proposed deep learning-based conversation system uses neural conversational model over knowledge graph model in a hybrid manner. Graph-based model answers questions written in natural language using its intent in the knowledge graph and neural conversational model converses answer based on conversation content and conversation sequence order. NLP is used in graph model and neural conversational model uses natural language understanding and machine intelligence. The neural conversational model uses seq2seq framework as it requires less feature engineering and lacks domain knowledge. The results achieved through the authors' approach are competitive with solely used graph model results.

Download Full-text

Multiple-Weight Recurrent Neural Networks

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/205 ◽

2017 ◽

Cited By ~ 2

Author(s):

Zhu Cao ◽

Linlin Wang ◽

Gerard de Melo

Keyword(s):

Neural Networks ◽

Natural Language Processing ◽

Speech Recognition ◽

Natural Language ◽

Language Processing ◽

Recurrent Neural Networks ◽

Experimental Results ◽

Great Success ◽

Human Ability

Recurrent neural networks (RNNs) have enjoyed great success in speech recognition, natural language processing, etc. Many variants of RNNs have been proposed, including vanilla RNNs, LSTMs, and GRUs. However, current architectures are not particularly adept at dealing with tasks involving multi-faceted contents. In this work, we solve this problem by proposing Multiple-Weight RNNs and LSTMs, which rely on multiple weight matrices in an attempt to mimic the human ability of switching between contexts. We present a framework for adapting RNN-based models and analyze the properties of this approach. Our detailed experimental results show that our model outperforms previous work across a range of different tasks and datasets.

Download Full-text

Deep learning in clinical natural language processing: a methodical review

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocz200 ◽

2019 ◽

Vol 27 (3) ◽

pp. 457-470 ◽

Cited By ~ 25

Author(s):

Stephen Wu ◽

Kirk Roberts ◽

Surabhi Datta ◽

Jingcheng Du ◽

Zongcheng Ji ◽

...

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Computational Linguistics ◽

Language Processing ◽

Recurrent Neural Networks ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity

Abstract Objective This article methodically reviews the literature on deep learning (DL) for natural language processing (NLP) in the clinical domain, providing quantitative analysis to answer 3 research questions concerning methods, scope, and context of current research. Materials and Methods We searched MEDLINE, EMBASE, Scopus, the Association for Computing Machinery Digital Library, and the Association for Computational Linguistics Anthology for articles using DL-based approaches to NLP problems in electronic health records. After screening 1,737 articles, we collected data on 25 variables across 212 papers. Results DL in clinical NLP publications more than doubled each year, through 2018. Recurrent neural networks (60.8%) and word2vec embeddings (74.1%) were the most popular methods; the information extraction tasks of text classification, named entity recognition, and relation extraction were dominant (89.2%). However, there was a “long tail” of other methods and specific tasks. Most contributions were methodological variants or applications, but 20.8% were new methods of some kind. The earliest adopters were in the NLP community, but the medical informatics community was the most prolific. Discussion Our analysis shows growing acceptance of deep learning as a baseline for NLP research, and of DL-based NLP in the medical community. A number of common associations were substantiated (eg, the preference of recurrent neural networks for sequence-labeling named entity recognition), while others were surprisingly nuanced (eg, the scarcity of French language clinical NLP with deep learning). Conclusion Deep learning has not yet fully penetrated clinical NLP and is growing rapidly. This review highlighted both the popular and unique trends in this active field.

Download Full-text