scholarly journals Music generation and human voice conversion based on LSTM

2021 ◽  
Vol 336 ◽  
pp. 06015
Author(s):  
Guangwei Li ◽  
Shuxue Ding ◽  
Yujie Li ◽  
Kangkang Zhang

Music is closely related to human life and is an important way for people to express their feelings in life. Deep neural networks have played a significant role in the field of music processing. There are many different neural network models to implement deep learning for audio processing. For general neural networks, there are problems such as complex operation and slow computing speed. In this paper, we introduce Long Short-Term Memory (LSTM), which is a circulating neural network, to realize end-to-end training. The network structure is simple and can generate better audio sequences after the training model. After music generation, human voice conversion is important for music understanding and inserting lyrics to pure music. We propose the audio segmentation technology for segmenting the fixed length of the human voice. Different notes are classified through piano music without considering the scale and are correlated with the different human voices we get. Finally, through the transformation, we can express the generated piano music through the output of the human voice. Experimental results demonstrate that the proposed scheme can successfully obtain a human voice from pure piano Music generated by LSTM.

Author(s):  
Zahra A. Shirazi ◽  
Camila P. E. de Souza ◽  
Rasha Kashef ◽  
Felipe F. Rodrigues

Artificial Neural networks (ANN) are composed of nodes that are joint to each other through weighted connections. Deep learning, as an extension of ANN, is a neural network model, but composed of different categories of layers: input layer, hidden layers, and output layers. Input data is fed into the first (input) layer. But the main process of the neural network models is done within the hidden layers, ranging from a single hidden layer to multiple ones. Depending on the type of model, the structure of the hidden layers is different. Depending on the type of input data, different models are applied. For example, for image data, convolutional neural networks are the most appropriate. On the other hand, for text or sequential and time series data, recurrent neural networks or long short-term memory models are the better choices. This chapter summarizes the state-of-the-art deep learning methods applied to the healthcare industry.


2021 ◽  
Author(s):  
Haiyue Wu ◽  
Aihua Huang ◽  
John W. Sutherland

Abstract Predictive maintenance (PdM) is an advanced technique to predict the time to failure (TTF) of a system. PdM collects sensor data on the health of a system, processes the information using data analytics, and then establishes data-driven models that can forecast system failure. Deep neural networks are increasingly being used as these data-driven models owing to their high predictive accuracy and efficiency. However, deep neural networks are often criticized as being “black boxes,” which owing to their multi-layered and non-linear structure provide little insight into the underlying physics of the system being monitored, and that are nontransparent and untraceable in their predictions. In order to address this issue, the layer-wise relevance propagation (LRP) technique is applied to analyze a long short-term memory (LSTM) recurrent neural network (RNN) model. The proposed method is demonstrated and validated for a bearing health monitoring study based on vibration data. The obtained LRP results provide insights into how the model “learns” from the input data and demonstrate the distribution of contribution/relevance to the neural network classification in the input space. In addition, comparisons are made with gradient-based sensitivity analysis to show the power of LRP in interpreting RNN models. The LRP is proved to have promising potential in interpreting deep neural network models and improving model accuracy and efficiency for PdM.


Author(s):  
Muhammad Faheem Mushtaq ◽  
Urooj Akram ◽  
Muhammad Aamir ◽  
Haseeb Ali ◽  
Muhammad Zulqarnain

It is important to predict a time series because many problems that are related to prediction such as health prediction problem, climate change prediction problem and weather prediction problem include a time component. To solve the time series prediction problem various techniques have been developed over many years to enhance the accuracy of forecasting. This paper presents a review of the prediction of physical time series applications using the neural network models. Neural Networks (NN) have appeared as an effective tool for forecasting of time series.  Moreover, to resolve the problems related to time series data, there is a need of network with single layer trainable weights that is Higher Order Neural Network (HONN) which can perform nonlinearity mapping of input-output. So, the developers are focusing on HONN that has been recently considered to develop the input representation spaces broadly. The HONN model has the ability of functional mapping which determined through some time series problems and it shows the more benefits as compared to conventional Artificial Neural Networks (ANN). The goal of this research is to present the reader awareness about HONN for physical time series prediction, to highlight some benefits and challenges using HONN.


Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2852
Author(s):  
Parvathaneni Naga Srinivasu ◽  
Jalluri Gnana SivaSai ◽  
Muhammad Fazal Ijaz ◽  
Akash Kumar Bhoi ◽  
Wonjoon Kim ◽  
...  

Deep learning models are efficient in learning the features that assist in understanding complex patterns precisely. This study proposed a computerized process of classifying skin disease through deep learning based MobileNet V2 and Long Short Term Memory (LSTM). The MobileNet V2 model proved to be efficient with a better accuracy that can work on lightweight computational devices. The proposed model is efficient in maintaining stateful information for precise predictions. A grey-level co-occurrence matrix is used for assessing the progress of diseased growth. The performance has been compared against other state-of-the-art models such as Fine-Tuned Neural Networks (FTNN), Convolutional Neural Network (CNN), Very Deep Convolutional Networks for Large-Scale Image Recognition developed by Visual Geometry Group (VGG), and convolutional neural network architecture that expanded with few changes. The HAM10000 dataset is used and the proposed method has outperformed other methods with more than 85% accuracy. Its robustness in recognizing the affected region much faster with almost 2× lesser computations than the conventional MobileNet model results in minimal computational efforts. Furthermore, a mobile application is designed for instant and proper action. It helps the patient and dermatologists identify the type of disease from the affected region’s image at the initial stage of the skin disease. These findings suggest that the proposed system can help general practitioners efficiently and effectively diagnose skin conditions, thereby reducing further complications and morbidity.


2018 ◽  
Vol 6 (11) ◽  
pp. 216-216 ◽  
Author(s):  
Zhongheng Zhang ◽  
◽  
Marcus W. Beck ◽  
David A. Winkler ◽  
Bin Huang ◽  
...  

Author(s):  
Yuheng Hu ◽  
Yili Hong

Residents often rely on newspapers and television to gather hyperlocal news for community awareness and engagement. More recently, social media have emerged as an increasingly important source of hyperlocal news. Thus far, the literature on using social media to create desirable societal benefits, such as civic awareness and engagement, is still in its infancy. One key challenge in this research stream is to timely and accurately distill information from noisy social media data streams to community members. In this work, we develop SHEDR (social media–based hyperlocal event detection and recommendation), an end-to-end neural event detection and recommendation framework with a particular use case for Twitter to facilitate residents’ information seeking of hyperlocal events. The key model innovation in SHEDR lies in the design of the hyperlocal event detector and the event recommender. First, we harness the power of two popular deep neural network models, the convolutional neural network (CNN) and long short-term memory (LSTM), in a novel joint CNN-LSTM model to characterize spatiotemporal dependencies for capturing unusualness in a region of interest, which is classified as a hyperlocal event. Next, we develop a neural pairwise ranking algorithm for recommending detected hyperlocal events to residents based on their interests. To alleviate the sparsity issue and improve personalization, our algorithm incorporates several types of contextual information covering topic, social, and geographical proximities. We perform comprehensive evaluations based on two large-scale data sets comprising geotagged tweets covering Seattle and Chicago. We demonstrate the effectiveness of our framework in comparison with several state-of-the-art approaches. We show that our hyperlocal event detection and recommendation models consistently and significantly outperform other approaches in terms of precision, recall, and F-1 scores. Summary of Contribution: In this paper, we focus on a novel and important, yet largely underexplored application of computing—how to improve civic engagement in local neighborhoods via local news sharing and consumption based on social media feeds. To address this question, we propose two new computational and data-driven methods: (1) a deep learning–based hyperlocal event detection algorithm that scans spatially and temporally to detect hyperlocal events from geotagged Twitter feeds; and (2) A personalized deep learning–based hyperlocal event recommender system that systematically integrates several contextual cues such as topical, geographical, and social proximity to recommend the detected hyperlocal events to potential users. We conduct a series of experiments to examine our proposed models. The outcomes demonstrate that our algorithms are significantly better than the state-of-the-art models and can provide users with more relevant information about the local neighborhoods that they live in, which in turn may boost their community engagement.


2021 ◽  
Vol 1 (1) ◽  
pp. 19-29
Author(s):  
Zhe Chu ◽  
Mengkai Hu ◽  
Xiangyu Chen

Recently, deep learning has been successfully applied to robotic grasp detection. Based on convolutional neural networks (CNNs), there have been lots of end-to-end detection approaches. But end-to-end approaches have strict requirements for the dataset used for training the neural network models and it’s hard to achieve in practical use. Therefore, we proposed a two-stage approach using particle swarm optimizer (PSO) candidate estimator and CNN to detect the most likely grasp. Our approach achieved an accuracy of 92.8% on the Cornell Grasp Dataset, which leaped into the front ranks of the existing approaches and is able to run at real-time speeds. After a small change of the approach, we can predict multiple grasps per object in the meantime so that an object can be grasped in a variety of ways.


Author(s):  
Tahani Aljohani ◽  
Alexandra I. Cristea

Massive Open Online Courses (MOOCs) have become universal learning resources, and the COVID-19 pandemic is rendering these platforms even more necessary. In this paper, we seek to improve Learner Profiling (LP), i.e. estimating the demographic characteristics of learners in MOOC platforms. We have focused on examining models which show promise elsewhere, but were never examined in the LP area (deep learning models) based on effective textual representations. As LP characteristics, we predict here the employment status of learners. We compare sequential and parallel ensemble deep learning architectures based on Convolutional Neural Networks and Recurrent Neural Networks, obtaining an average high accuracy of 96.3% for our best method. Next, we predict the gender of learners based on syntactic knowledge from the text. We compare different tree-structured Long-Short-Term Memory models (as state-of-the-art candidates) and provide our novel version of a Bi-directional composition function for existing architectures. In addition, we evaluate 18 different combinations of word-level encoding and sentence-level encoding functions. Based on these results, we show that our Bi-directional model outperforms all other models and the highest accuracy result among our models is the one based on the combination of FeedForward Neural Network and the Stack-augmented Parser-Interpreter Neural Network (82.60% prediction accuracy). We argue that our prediction models recommended for both demographics characteristics examined in this study can achieve high accuracy. This is additionally also the first time a sound methodological approach toward improving accuracy for learner demographics classification on MOOCs was proposed.


2020 ◽  
Vol 49 (4) ◽  
pp. 482-494
Author(s):  
Jurgita Kapočiūtė-Dzikienė ◽  
Senait Gebremichael Tesfagergish

Deep Neural Networks (DNNs) have proven to be especially successful in the area of Natural Language Processing (NLP) and Part-Of-Speech (POS) tagging—which is the process of mapping words to their corresponding POS labels depending on the context. Despite recent development of language technologies, low-resourced languages (such as an East African Tigrinya language), have received too little attention. We investigate the effectiveness of Deep Learning (DL) solutions for the low-resourced Tigrinya language of the Northern-Ethiopic branch. We have selected Tigrinya as the testbed example and have tested state-of-the-art DL approaches seeking to build the most accurate POS tagger. We have evaluated DNN classifiers (Feed Forward Neural Network – FFNN, Long Short-Term Memory method – LSTM, Bidirectional LSTM, and Convolutional Neural Network – CNN) on a top of neural word2vec word embeddings with a small training corpus known as Nagaoka Tigrinya Corpus. To determine the best DNN classifier type, its architecture and hyper-parameter set both manual and automatic hyper-parameter tuning has been performed. BiLSTM method was proved to be the most suitable for our solving task: it achieved the highest accuracy equal to 92% that is 65% above the random baseline.


10.14311/1121 ◽  
2009 ◽  
Vol 49 (2) ◽  
Author(s):  
M. Chvalina

This article analyses the existing possibilities for using Standard Statistical Methods and Artificial Intelligence Methods for a short-term forecast and simulation of demand in the field of telecommunications. The most widespread methods are based on Time Series Analysis. Nowadays, approaches based on Artificial Intelligence Methods, including Neural Networks, are booming. Separate approaches will be used in the study of Demand Modelling in Telecommunications, and the results of these models will be compared with actual guaranteed values. Then we will examine the quality of Neural Network models. 


Sign in / Sign up

Export Citation Format

Share Document