RANDOM MULTI-MODAL DEEP LEARNING IN THE PROBLEM OF IMAGE RECOGNITION

Author(s):  
А.И. Паршин ◽  
М.Н. Аралов ◽  
В.Ф. Барабанов ◽  
Н.И. Гребенникова

Задача распознавания изображений - одна из самых сложных в машинном обучении, требующая от исследователя как глубоких знаний, так и больших временных и вычислительных ресурсов. В случае использования нелинейных и сложных данных применяются различные архитектуры глубоких нейронных сетей, но при этом сложным вопросом остается проблема выбора нейронной сети. Основными архитектурами, используемыми повсеместно, являются свёрточные нейронные сети (CNN), рекуррентные нейронные сети (RNN), глубокие нейронные сети (DNN). На основе рекуррентных нейронных сетей (RNN) были разработаны сети с долгой краткосрочной памятью (LSTM) и сети с управляемыми реккурентными блоками (GRU). Каждая архитектура нейронной сети имеет свою структуру, свои настраиваемые и обучаемые параметры, обладает своими достоинствами и недостатками. Комбинируя различные виды нейронных сетей, можно существенно улучшить качество предсказания в различных задачах машинного обучения. Учитывая, что выбор оптимальной архитектуры сети и ее параметров является крайне трудной задачей, рассматривается один из методов построения архитектуры нейронных сетей на основе комбинации свёрточных, рекуррентных и глубоких нейронных сетей. Показано, что такие архитектуры превосходят классические алгоритмы машинного обучения The image recognition task is one of the most difficult in machine learning, requiring both deep knowledge and large time and computational resources from the researcher. In the case of using nonlinear and complex data, various architectures of deep neural networks are used but the problem of choosing a neural network remains a difficult issue. The main architectures used everywhere are convolutional neural networks (CNN), recurrent neural networks (RNN), deep neural networks (DNN). Based on recurrent neural networks (RNNs), Long Short Term Memory Networks (LSTMs) and Controlled Recurrent Unit Networks (GRUs) were developed. Each neural network architecture has its own structure, customizable and trainable parameters, and advantages and disadvantages. By combining different types of neural networks, you can significantly improve the quality of prediction in various machine learning problems. Considering that the choice of the optimal network architecture and its parameters is an extremely difficult task, one of the methods for constructing the architecture of neural networks based on a combination of convolutional, recurrent and deep neural networks is considered. We showed that such architectures are superior to classical machine learning algorithms

Author(s):  
E. Yu. Shchetinin

The recognition of human emotions is one of the most relevant and dynamically developing areas of modern speech technologies, and the recognition of emotions in speech (RER) is the most demanded part of them. In this paper, we propose a computer model of emotion recognition based on an ensemble of bidirectional recurrent neural network with LSTM memory cell and deep convolutional neural network ResNet18. In this paper, computer studies of the RAVDESS database containing emotional speech of a person are carried out. RAVDESS-a data set containing 7356 files. Entries contain the following emotions: 0 – neutral, 1 – calm, 2 – happiness, 3 – sadness, 4 – anger, 5 – fear, 6 – disgust, 7 – surprise. In total, the database contains 16 classes (8 emotions divided into male and female) for a total of 1440 samples (speech only). To train machine learning algorithms and deep neural networks to recognize emotions, existing audio recordings must be pre-processed in such a way as to extract the main characteristic features of certain emotions. This was done using Mel-frequency cepstral coefficients, chroma coefficients, as well as the characteristics of the frequency spectrum of audio recordings. In this paper, computer studies of various models of neural networks for emotion recognition are carried out on the example of the data described above. In addition, machine learning algorithms were used for comparative analysis. Thus, the following models were trained during the experiments: logistic regression (LR), classifier based on the support vector machine (SVM), decision tree (DT), random forest (RF), gradient boosting over trees – XGBoost, convolutional neural network CNN, recurrent neural network RNN (ResNet18), as well as an ensemble of convolutional and recurrent networks Stacked CNN-RNN. The results show that neural networks showed much higher accuracy in recognizing and classifying emotions than the machine learning algorithms used. Of the three neural network models presented, the CNN + BLSTM ensemble showed higher accuracy.


IoT ◽  
2021 ◽  
Vol 2 (2) ◽  
pp. 222-235
Author(s):  
Guillaume Coiffier ◽  
Ghouthi Boukli Hacene ◽  
Vincent Gripon

Deep Neural Networks are state-of-the-art in a large number of challenges in machine learning. However, to reach the best performance they require a huge pool of parameters. Indeed, typical deep convolutional architectures present an increasing number of feature maps as we go deeper in the network, whereas spatial resolution of inputs is decreased through downsampling operations. This means that most of the parameters lay in the final layers, while a large portion of the computations are performed by a small fraction of the total parameters in the first layers. In an effort to use every parameter of a network at its maximum, we propose a new convolutional neural network architecture, called ThriftyNet. In ThriftyNet, only one convolutional layer is defined and used recursively, leading to a maximal parameter factorization. In complement, normalization, non-linearities, downsamplings and shortcut ensure sufficient expressivity of the model. ThriftyNet achieves competitive performance on a tiny parameters budget, exceeding 91% accuracy on CIFAR-10 with less than 40 k parameters in total, 74.3% on CIFAR-100 with less than 600 k parameters, and 67.1% On ImageNet ILSVRC 2012 with no more than 4.15 M parameters. However, the proposed method typically requires more computations than existing counterparts.


2020 ◽  
Vol 24 (1) ◽  
pp. 130-143
Author(s):  
D. I. Konarev ◽  
A. A. Gulamov

Purpose of research. The current task is to monitor ships using video surveillance cameras installed along the canal. It is important for information communication support for navigation of the Moscow Canal. The main subtask is direct recognition of ships in an image or video. Implementation of a neural network is perspectively.Methods. Various neural network are described. images of ships are an input data for the network. The learning sample uses CIFAR-10 dataset. The network is built and trained by using Keras and TensorFlow machine learning libraries.Results. Implementation of curving artificial neural networks for problems of image recognition is described. Advantages of such architecture when working with images are also described. The selection of Python language for neural network implementation is justified. The main used libraries of machine learning, such as TensorFlow and Keras are described. An experiment has been conducted to train swirl neural networks with different architectures based on Google collaboratoty service. The effectiveness of different architectures was evaluated as a percentage of correct pattern recognition in the test sample. Conclusions have been drawn about parameters influence of screwing neural network on showing its effectiveness.Conclusion. The network with a single curl layer in each cascade showed insufficient results, so three-stage curls with two and three curl layers in each cascade were used. Feature map extension has the greatest impact on the accuracy of image recognition. The increase in cascades' number has less noticeable effect and the increase in the number of screwdriver layers in each cascade does not always have an increase in the accuracy of the neural network. During the study, a three-frame network with two buckling layers in each cascade and 128 feature maps is defined as an optimal architecture of neural network under described conditions. operability checking of architecture's part under consideration on random images of ships confirmed the correctness of optimal architecture choosing.


2021 ◽  
Author(s):  
Muhammad Zubair

<div><div><div><p>Electrocardiogram (ECG) is the graphical portrayal of heart usefulness. The ECG signals holds its significance in the discovery of heart irregularities. These ECG signals are frequently tainted by antiques from various sources. It is basic to diminish these curios and improve the exactness just as dependability to show signs of improvement results identified with heart usefulness. The most commonly disturbed artifact in ECG signals is Motion Artifacts (MA). In this paper, we have proposed a new concept on how machine learning algorithms can be used for de-noising the ECG signals. Towards the goal, a unique combination of Recurrent Neural Network (RNN) and Deep Neural Network (DNN) is used to efficiently remove MA. The proposed algorithm is validated using ECG records obtained from the MIT-BIH Arrhythmia Database. To eliminate MA using the proposed method, we have used Adam optimization algorithm to train and fit the contaminated ECG data in RNN and DNN models. Performance evaluation results in terms of SNR and RRMSE show that the proposed algorithm outperforms other existing MA removal methods without significantly distorting the morphologies of ECG signals.</p></div></div></div>


2021 ◽  
Author(s):  
Muhammad Zubair

<div><div><div><p>Electrocardiogram (ECG) is the graphical portrayal of heart usefulness. The ECG signals holds its significance in the discovery of heart irregularities. These ECG signals are frequently tainted by antiques from various sources. It is basic to diminish these curios and improve the exactness just as dependability to show signs of improvement results identified with heart usefulness. The most commonly disturbed artifact in ECG signals is Motion Artifacts (MA). In this paper, we have proposed a new concept on how machine learning algorithms can be used for de-noising the ECG signals. Towards the goal, a unique combination of Recurrent Neural Network (RNN) and Deep Neural Network (DNN) is used to efficiently remove MA. The proposed algorithm is validated using ECG records obtained from the MIT-BIH Arrhythmia Database. To eliminate MA using the proposed method, we have used Adam optimization algorithm to train and fit the contaminated ECG data in RNN and DNN models. Performance evaluation results in terms of SNR and RRMSE show that the proposed algorithm outperforms other existing MA removal methods without significantly distorting the morphologies of ECG signals.</p></div></div></div>


Author(s):  
V. V. Kniaz ◽  
V. S. Gorbatsevich ◽  
V. A. Mizginov

Deep convolutional neural networks have dramatically changed the landscape of the modern computer vision. Nowadays methods based on deep neural networks show the best performance among image recognition and object detection algorithms. While polishing of network architectures received a lot of scholar attention, from the practical point of view the preparation of a large image dataset for a successful training of a neural network became one of major challenges. This challenge is particularly profound for image recognition in wavelengths lying outside the visible spectrum. For example no infrared or radar image datasets large enough for successful training of a deep neural network are available to date in public domain. Recent advances of deep neural networks prove that they are also capable to do arbitrary image transformations such as super-resolution image generation, grayscale image colorisation and imitation of style of a given artist. Thus a natural question arise: how could be deep neural networks used for augmentation of existing large image datasets? This paper is focused on the development of the Thermalnet deep convolutional neural network for augmentation of existing large visible image datasets with synthetic thermal images. The Thermalnet network architecture is inspired by colorisation deep neural networks.


2016 ◽  
Vol 807 ◽  
pp. 155-166 ◽  
Author(s):  
Julia Ling ◽  
Andrew Kurzawski ◽  
Jeremy Templeton

There exists significant demand for improved Reynolds-averaged Navier–Stokes (RANS) turbulence models that are informed by and can represent a richer set of turbulence physics. This paper presents a method of using deep neural networks to learn a model for the Reynolds stress anisotropy tensor from high-fidelity simulation data. A novel neural network architecture is proposed which uses a multiplicative layer with an invariant tensor basis to embed Galilean invariance into the predicted anisotropy tensor. It is demonstrated that this neural network architecture provides improved prediction accuracy compared with a generic neural network architecture that does not embed this invariance property. The Reynolds stress anisotropy predictions of this invariant neural network are propagated through to the velocity field for two test cases. For both test cases, significant improvement versus baseline RANS linear eddy viscosity and nonlinear eddy viscosity models is demonstrated.


2022 ◽  
pp. 1559-1575
Author(s):  
Mário Pereira Véstias

Machine learning is the study of algorithms and models for computing systems to do tasks based on pattern identification and inference. When it is difficult or infeasible to develop an algorithm to do a particular task, machine learning algorithms can provide an output based on previous training data. A well-known machine learning model is deep learning. The most recent deep learning models are based on artificial neural networks (ANN). There exist several types of artificial neural networks including the feedforward neural network, the Kohonen self-organizing neural network, the recurrent neural network, the convolutional neural network, the modular neural network, among others. This article focuses on convolutional neural networks with a description of the model, the training and inference processes and its applicability. It will also give an overview of the most used CNN models and what to expect from the next generation of CNN models.


Information ◽  
2019 ◽  
Vol 10 (3) ◽  
pp. 98 ◽  
Author(s):  
Tariq Ahmad ◽  
Allan Ramsay ◽  
Hanady Ahmed

Assigning sentiment labels to documents is, at first sight, a standard multi-label classification task. Many approaches have been used for this task, but the current state-of-the-art solutions use deep neural networks (DNNs). As such, it seems likely that standard machine learning algorithms, such as these, will provide an effective approach. We describe an alternative approach, involving the use of probabilities to construct a weighted lexicon of sentiment terms, then modifying the lexicon and calculating optimal thresholds for each class. We show that this approach outperforms the use of DNNs and other standard algorithms. We believe that DNNs are not a universal panacea and that paying attention to the nature of the data that you are trying to learn from can be more important than trying out ever more powerful general purpose machine learning algorithms.


Sign in / Sign up

Export Citation Format

Share Document