scholarly journals Audio Tagging Using CNN Based Audio Neural Networks for Massive Data Processing

Author(s):  
J. Samuel Manoharan

Sound event detection, speech emotion classification, music classification, acoustic scene classification, audio tagging and several other audio pattern recognition applications are largely dependent on the growing machine learning technology. The audio pattern recognition issues are also addressed by neural networks in recent days. The existing systems operate within limited durations on specific datasets. Pretrained systems with large datasets in natural language processing and computer vision applications over the recent years perform well in several tasks. However, audio pattern recognition research with large-scale datasets is limited in the current scenario. In this paper, a large-scale audio dataset is used for training a pre-trained audio neural network. Several audio related tasks are performed by transferring this audio neural network. Several convolution neural networks are used for modeling the proposed audio neural network. The computational complexity and performance of this system are analyzed. The waveform and leg-mel spectrogram are used as input features in this architecture. During audio tagging, the proposed system outperforms the existing systems with a mean average of 0.45. The performance of the proposed model is demonstrated by applying the audio neural network to five specific audio pattern recognition tasks.

In Recent Years, Social Emotion In Recent Years Acquires Natural Language Processing Researchers’ Attention, Because Of Analyzing User-Generated Emotional Documents On The Web. But, These Emotions Has Noisy Instance Mixed And It Is Great Dispute To Acquire The Textual Meaning Of Short Messages. Definition: In General, Large-Scale Datasets Will Have Many Noisy Data, Which Can’t Be Used Readily And Also It Is Costly, Because Of Ambiguity Of Various Informal Expressions In User-Generated Comments. It Is Very Tedious One To Recognize The Similar User Documents From The Entire Social Media Text Message. Furthermore, Online Comments Are Characteristically Categorized By A Sparse Feature Space, Which Makes The Respective Emotion Classification Task A Complex One. Methodology: Three Major Contributions Were Done In This Work In Order To Rectify These Problems, They Are: Development Of A Novel Mutation Bat Optimization Based Sparse Encoding (MBO-SC) Which Transforming The Sparse Low-Level Features Into Dense HighLevel Features, Was The 1st Contribution, Next Is, An Enhanced Weight Based Convolutional Neural Network (EWCNN) To Target-Specific Layer. It Influences The Semantically EWCNN Classifier To Include Semantic Domain Knowledge Into The Neural Network To Bootstrap Its Inference Power And Interpretability. Fuzzy Clustering Algorithm Is Proposed To Minimize The Similarity Among Two Documents. Uses: It Is Quite Constructive In Recommending Products, Collecting Public Opinions, And Predicting Election Results. Proposed Work Is Distinguished With The Existing Methods, With The Metrics Such As: Precision, Recall, Sensitivity, Specificity, FMeasure And Accuracy. From The Experimental Result It Is Confirmed That The Quality Of Learned Semantic Vectors And The Performance Of Social Emotion Classification Can Be Enhanced By Proposed Models.


Author(s):  
Khaled F. Hussain ◽  
Mohamed Yousef Bassyouni ◽  
Erol Gelenbe

Artificial Neural Networks (ANNs)-based techniques have dominated state-of-the-art results in most problems related to computer vision, audio recognition, and natural language processing in the past few years, resulting in strong industrial adoption from all leading technology companies worldwide. One of the major obstacles that have historically delayed large-scale adoption of ANNs is the huge computational and power costs associated with training and testing (deploying) them. In the mean-time, Neuromorphic Computing platforms have recently achieved remarkable performance running the bio-realistic Spiking Neural Networks at high throughput and very low power consumption making them a natural alternative to ANNs. Here, we propose using the Random Neural Network, a spiking neural network with both theoretical and practical appealing properties, as a general purpose classifier that can match the classification power of ANNs on a number of tasks while enjoying all the features of being a spiking neural network. This is demonstrated on a number of real-world classification datasets.


2021 ◽  
Vol 40 (3) ◽  
pp. 1-13
Author(s):  
Lumin Yang ◽  
Jiajie Zhuang ◽  
Hongbo Fu ◽  
Xiangzhi Wei ◽  
Kun Zhou ◽  
...  

We introduce SketchGNN , a convolutional graph neural network for semantic segmentation and labeling of freehand vector sketches. We treat an input stroke-based sketch as a graph with nodes representing the sampled points along input strokes and edges encoding the stroke structure information. To predict the per-node labels, our SketchGNN uses graph convolution and a static-dynamic branching network architecture to extract the features at three levels, i.e., point-level, stroke-level, and sketch-level. SketchGNN significantly improves the accuracy of the state-of-the-art methods for semantic sketch segmentation (by 11.2% in the pixel-based metric and 18.2% in the component-based metric over a large-scale challenging SPG dataset) and has magnitudes fewer parameters than both image-based and sequence-based methods.


Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2852
Author(s):  
Parvathaneni Naga Srinivasu ◽  
Jalluri Gnana SivaSai ◽  
Muhammad Fazal Ijaz ◽  
Akash Kumar Bhoi ◽  
Wonjoon Kim ◽  
...  

Deep learning models are efficient in learning the features that assist in understanding complex patterns precisely. This study proposed a computerized process of classifying skin disease through deep learning based MobileNet V2 and Long Short Term Memory (LSTM). The MobileNet V2 model proved to be efficient with a better accuracy that can work on lightweight computational devices. The proposed model is efficient in maintaining stateful information for precise predictions. A grey-level co-occurrence matrix is used for assessing the progress of diseased growth. The performance has been compared against other state-of-the-art models such as Fine-Tuned Neural Networks (FTNN), Convolutional Neural Network (CNN), Very Deep Convolutional Networks for Large-Scale Image Recognition developed by Visual Geometry Group (VGG), and convolutional neural network architecture that expanded with few changes. The HAM10000 dataset is used and the proposed method has outperformed other methods with more than 85% accuracy. Its robustness in recognizing the affected region much faster with almost 2× lesser computations than the conventional MobileNet model results in minimal computational efforts. Furthermore, a mobile application is designed for instant and proper action. It helps the patient and dermatologists identify the type of disease from the affected region’s image at the initial stage of the skin disease. These findings suggest that the proposed system can help general practitioners efficiently and effectively diagnose skin conditions, thereby reducing further complications and morbidity.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Changming Wu ◽  
Heshan Yu ◽  
Seokhyeong Lee ◽  
Ruoming Peng ◽  
Ichiro Takeuchi ◽  
...  

AbstractNeuromorphic photonics has recently emerged as a promising hardware accelerator, with significant potential speed and energy advantages over digital electronics for machine learning algorithms, such as neural networks of various types. Integrated photonic networks are particularly powerful in performing analog computing of matrix-vector multiplication (MVM) as they afford unparalleled speed and bandwidth density for data transmission. Incorporating nonvolatile phase-change materials in integrated photonic devices enables indispensable programming and in-memory computing capabilities for on-chip optical computing. Here, we demonstrate a multimode photonic computing core consisting of an array of programable mode converters based on on-waveguide metasurfaces made of phase-change materials. The programmable converters utilize the refractive index change of the phase-change material Ge2Sb2Te5 during phase transition to control the waveguide spatial modes with a very high precision of up to 64 levels in modal contrast. This contrast is used to represent the matrix elements, with 6-bit resolution and both positive and negative values, to perform MVM computation in neural network algorithms. We demonstrate a prototypical optical convolutional neural network that can perform image processing and recognition tasks with high accuracy. With a broad operation bandwidth and a compact device footprint, the demonstrated multimode photonic core is promising toward large-scale photonic neural networks with ultrahigh computation throughputs.


2017 ◽  
Vol 56 (05) ◽  
pp. 377-389 ◽  
Author(s):  
Xingyu Zhang ◽  
Joyce Kim ◽  
Rachel E. Patzer ◽  
Stephen R. Pitts ◽  
Aaron Patzer ◽  
...  

SummaryObjective: To describe and compare logistic regression and neural network modeling strategies to predict hospital admission or transfer following initial presentation to Emergency Department (ED) triage with and without the addition of natural language processing elements.Methods: Using data from the National Hospital Ambulatory Medical Care Survey (NHAMCS), a cross-sectional probability sample of United States EDs from 2012 and 2013 survey years, we developed several predictive models with the outcome being admission to the hospital or transfer vs. discharge home. We included patient characteristics immediately available after the patient has presented to the ED and undergone a triage process. We used this information to construct logistic regression (LR) and multilayer neural network models (MLNN) which included natural language processing (NLP) and principal component analysis from the patient’s reason for visit. Ten-fold cross validation was used to test the predictive capacity of each model and receiver operating curves (AUC) were then calculated for each model.Results: Of the 47,200 ED visits from 642 hospitals, 6,335 (13.42%) resulted in hospital admission (or transfer). A total of 48 principal components were extracted by NLP from the reason for visit fields, which explained 75% of the overall variance for hospitalization. In the model including only structured variables, the AUC was 0.824 (95% CI 0.818-0.830) for logistic regression and 0.823 (95% CI 0.817-0.829) for MLNN. Models including only free-text information generated AUC of 0.742 (95% CI 0.7310.753) for logistic regression and 0.753 (95% CI 0.742-0.764) for MLNN. When both structured variables and free text variables were included, the AUC reached 0.846 (95% CI 0.839-0.853) for logistic regression and 0.844 (95% CI 0.836-0.852) for MLNN.Conclusions: The predictive accuracy of hospital admission or transfer for patients who presented to ED triage overall was good, and was improved with the inclusion of free text data from a patient’s reason for visit regardless of modeling approach. Natural language processing and neural networks that incorporate patient-reported outcome free text may increase predictive accuracy for hospital admission.


2020 ◽  
Vol 49 (4) ◽  
pp. 482-494
Author(s):  
Jurgita Kapočiūtė-Dzikienė ◽  
Senait Gebremichael Tesfagergish

Deep Neural Networks (DNNs) have proven to be especially successful in the area of Natural Language Processing (NLP) and Part-Of-Speech (POS) tagging—which is the process of mapping words to their corresponding POS labels depending on the context. Despite recent development of language technologies, low-resourced languages (such as an East African Tigrinya language), have received too little attention. We investigate the effectiveness of Deep Learning (DL) solutions for the low-resourced Tigrinya language of the Northern-Ethiopic branch. We have selected Tigrinya as the testbed example and have tested state-of-the-art DL approaches seeking to build the most accurate POS tagger. We have evaluated DNN classifiers (Feed Forward Neural Network – FFNN, Long Short-Term Memory method – LSTM, Bidirectional LSTM, and Convolutional Neural Network – CNN) on a top of neural word2vec word embeddings with a small training corpus known as Nagaoka Tigrinya Corpus. To determine the best DNN classifier type, its architecture and hyper-parameter set both manual and automatic hyper-parameter tuning has been performed. BiLSTM method was proved to be the most suitable for our solving task: it achieved the highest accuracy equal to 92% that is 65% above the random baseline.


2021 ◽  
Vol 3 (4) ◽  
pp. 922-945
Author(s):  
Shaw-Hwa Lo ◽  
Yiqiao Yin

Text classification is a fundamental language task in Natural Language Processing. A variety of sequential models are capable of making good predictions, yet there is a lack of connection between language semantics and prediction results. This paper proposes a novel influence score (I-score), a greedy search algorithm, called Backward Dropping Algorithm (BDA), and a novel feature engineering technique called the “dagger technique”. First, the paper proposes to use the novel influence score (I-score) to detect and search for the important language semantics in text documents that are useful for making good predictions in text classification tasks. Next, a greedy search algorithm, called the Backward Dropping Algorithm, is proposed to handle long-term dependencies in the dataset. Moreover, the paper proposes a novel engineering technique called the “dagger technique” that fully preserves the relationship between the explanatory variable and the response variable. The proposed techniques can be further generalized into any feed-forward Artificial Neural Networks (ANNs) and Convolutional Neural Networks (CNNs), and any neural network. A real-world application on the Internet Movie Database (IMDB) is used and the proposed methods are applied to improve prediction performance with an 81% error reduction compared to other popular peers if I-score and “dagger technique” are not implemented.


2003 ◽  
Vol 15 (3) ◽  
pp. 278-285
Author(s):  
Daigo Misaki ◽  
◽  
Shigeru Aomura ◽  
Noriyuki Aoyama

We discuss effective pattern recognition for contour images by hierarchical feature extraction. When pattern recognition is done for an unlimited object, it is effective to see the object in a perspective manner at the beginning and next to see in detail. General features are used for rough classification and local features are used for a more detailed classification. D-P matching is applied for classification of a typical contour image of individual class, which contains selected points called ""landmark""s, and rough classification is done. Features between these landmarks are analyzed and used as input data of neural networks for more detailed classification. We apply this to an illustrated referenced book of insects in which much information is classified hierarchically to verify the proposed method. By introducing landmarks, a neural network can be used effectively for pattern recognition of contour images.


2013 ◽  
Vol 7 (1) ◽  
pp. 49-62 ◽  
Author(s):  
Vijaykumar Sutariya ◽  
Anastasia Groshev ◽  
Prabodh Sadana ◽  
Deepak Bhatia ◽  
Yashwant Pathak

Artificial neural networks (ANNs) technology models the pattern recognition capabilities of the neural networks of the brain. Similarly to a single neuron in the brain, artificial neuron unit receives inputs from many external sources, processes them, and makes decisions. Interestingly, ANN simulates the biological nervous system and draws on analogues of adaptive biological neurons. ANNs do not require rigidly structured experimental designs and can map functions using historical or incomplete data, which makes them a powerful tool for simulation of various non-linear systems.ANNs have many applications in various fields, including engineering, psychology, medicinal chemistry and pharmaceutical research. Because of their capacity for making predictions, pattern recognition, and modeling, ANNs have been very useful in many aspects of pharmaceutical research including modeling of the brain neural network, analytical data analysis, drug modeling, protein structure and function, dosage optimization and manufacturing, pharmacokinetics and pharmacodynamics modeling, and in vitro in vivo correlations. This review discusses the applications of ANNs in drug delivery and pharmacological research.


Sign in / Sign up

Export Citation Format

Share Document