scholarly journals Spatial position constraint for unsupervised learning of speech representations

2021 ◽  
Vol 7 ◽  
pp. e650
Author(s):  
Mohammad Ali Humayun ◽  
Hayati Yassin ◽  
Pg Emeroylariffion Abas

The success of supervised learning techniques for automatic speech processing does not always extend to problems with limited annotated speech. Unsupervised representation learning aims at utilizing unlabelled data to learn a transformation that makes speech easily distinguishable for classification tasks, whereby deep auto-encoder variants have been most successful in finding such representations. This paper proposes a novel mechanism to incorporate geometric position of speech samples within the global structure of an unlabelled feature set. Regression to the geometric position is also added as an additional constraint for the representation learning auto-encoder. The representation learnt by the proposed model has been evaluated over a supervised classification task for limited vocabulary keyword spotting, with the proposed representation outperforming the commonly used cepstral features by about 9% in terms of classification accuracy, despite using a limited amount of labels during supervision. Furthermore, a small keyword dataset has been collected for Kadazan, an indigenous, low-resourced Southeast Asian language. Analysis for the Kadazan dataset also confirms the superiority of the proposed representation for limited annotation. The results are significant as they confirm that the proposed method can learn unsupervised speech representations effectively for classification tasks with scarce labelled data.

2021 ◽  
Vol 14 (1) ◽  
pp. 171
Author(s):  
Qingyan Wang ◽  
Meng Chen ◽  
Junping Zhang ◽  
Shouqiang Kang ◽  
Yujing Wang

Hyperspectral image (HSI) data classification often faces the problem of the scarcity of labeled samples, which is considered to be one of the major challenges in the field of remote sensing. Although active deep networks have been successfully applied in semi-supervised classification tasks to address this problem, their performance inevitably meets the bottleneck due to the limitation of labeling cost. To address the aforementioned issue, this paper proposes a semi-supervised classification method for hyperspectral images that improves active deep learning. Specifically, the proposed model introduces the random multi-graph algorithm and replaces the expert mark in active learning with the anchor graph algorithm, which can label a considerable amount of unlabeled data precisely and automatically. In this way, a large number of pseudo-labeling samples would be added to the training subsets such that the model could be fine-tuned and the generalization performance could be improved without extra efforts for data manual labeling. Experiments based on three standard HSIs demonstrate that the proposed model can get better performance than other conventional methods, and they also outperform other studied algorithms in the case of a small training set.


2021 ◽  
Vol 25 (3) ◽  
pp. 711-738
Author(s):  
Phu Pham ◽  
Phuc Do

Link prediction on heterogeneous information network (HIN) is considered as a challenge problem due to the complexity and diversity in types of nodes and links. Currently, there are remained challenges of meta-path-based link prediction in HIN. Previous works of link prediction in HIN via network embedding approach are mainly focused on exploiting features of node rather than existing relations in forms of meta-paths between nodes. In fact, predicting the existence of new links between non-linked nodes is absolutely inconvincible. Moreover, recent HIN-based embedding models also lack of thorough evaluations on the topic similarity between text-based nodes along given meta-paths. To tackle these challenges, in this paper, we proposed a novel approach of topic-driven multiple meta-path-based HIN representation learning framework, namely W-MMP2Vec. Our model leverages the quality of node representations by combining multiple meta-paths as well as calculating the topic similarity weight for each meta-path during the processes of network embedding learning in content-based HINs. To validate our approach, we apply W-TMP2Vec model in solving several link prediction tasks in both content-based and non-content-based HINs (DBLP, IMDB and BlogCatalog). The experimental outputs demonstrate the effectiveness of proposed model which outperforms recent state-of-the-art HIN representation learning models.


Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2648
Author(s):  
Muhammad Aamir ◽  
Tariq Ali ◽  
Muhammad Irfan ◽  
Ahmad Shaf ◽  
Muhammad Zeeshan Azam ◽  
...  

Natural disasters not only disturb the human ecological system but also destroy the properties and critical infrastructures of human societies and even lead to permanent change in the ecosystem. Disaster can be caused by naturally occurring events such as earthquakes, cyclones, floods, and wildfires. Many deep learning techniques have been applied by various researchers to detect and classify natural disasters to overcome losses in ecosystems, but detection of natural disasters still faces issues due to the complex and imbalanced structures of images. To tackle this problem, we propose a multilayered deep convolutional neural network. The proposed model works in two blocks: Block-I convolutional neural network (B-I CNN), for detection and occurrence of disasters, and Block-II convolutional neural network (B-II CNN), for classification of natural disaster intensity types with different filters and parameters. The model is tested on 4428 natural images and performance is calculated and expressed as different statistical values: sensitivity (SE), 97.54%; specificity (SP), 98.22%; accuracy rate (AR), 99.92%; precision (PRE), 97.79%; and F1-score (F1), 97.97%. The overall accuracy for the whole model is 99.92%, which is competitive and comparable with state-of-the-art algorithms.


Sensors ◽  
2021 ◽  
Vol 21 (13) ◽  
pp. 4486
Author(s):  
Niall O’Mahony ◽  
Sean Campbell ◽  
Lenka Krpalkova ◽  
Anderson Carvalho ◽  
Joseph Walsh ◽  
...  

Fine-grained change detection in sensor data is very challenging for artificial intelligence though it is critically important in practice. It is the process of identifying differences in the state of an object or phenomenon where the differences are class-specific and are difficult to generalise. As a result, many recent technologies that leverage big data and deep learning struggle with this task. This review focuses on the state-of-the-art methods, applications, and challenges of representation learning for fine-grained change detection. Our research focuses on methods of harnessing the latent metric space of representation learning techniques as an interim output for hybrid human-machine intelligence. We review methods for transforming and projecting embedding space such that significant changes can be communicated more effectively and a more comprehensive interpretation of underlying relationships in sensor data is facilitated. We conduct this research in our work towards developing a method for aligning the axes of latent embedding space with meaningful real-world metrics so that the reasoning behind the detection of change in relation to past observations may be revealed and adjusted. This is an important topic in many fields concerned with producing more meaningful and explainable outputs from deep learning and also for providing means for knowledge injection and model calibration in order to maintain user confidence.


2021 ◽  
Vol 11 (11) ◽  
pp. 4753
Author(s):  
Gen Ye ◽  
Chen Du ◽  
Tong Lin ◽  
Yan Yan ◽  
Jack Jiang

(1) Background: Deep learning has become ubiquitous due to its impressive performance in various domains, such as varied as computer vision, natural language and speech processing, and game-playing. In this work, we investigated the performance of recent deep learning approaches on the laryngopharyngeal reflux (LPR) diagnosis task. (2) Methods: Our dataset is composed of 114 subjects with 37 pH-positive cases and 77 control cases. In contrast to prior work based on either reflux finding score (RFS) or pH monitoring, we directly take laryngoscope images as inputs to neural networks, as laryngoscopy is the most common and simple diagnostic method. The diagnosis task is formulated as a binary classification problem. We first tested a powerful backbone network that incorporates residual modules, attention mechanism and data augmentation. Furthermore, recent methods in transfer learning and few-shot learning were investigated. (3) Results: On our dataset, the performance is the best test classification accuracy is 73.4%, while the best AUC value is 76.2%. (4) Conclusions: This study demonstrates that deep learning techniques can be applied to classify LPR images automatically. Although the number of pH-positive images used for training is limited, deep network can still be capable of learning discriminant features with the advantage of technique.


Author(s):  
Jinfang Zeng ◽  
Youming Li ◽  
Yu Zhang ◽  
Da Chen

Environmental sound classification (ESC) is a challenging problem due to the complexity of sounds. To date, a variety of signal processing and machine learning techniques have been applied to ESC task, including matrix factorization, dictionary learning, wavelet filterbanks and deep neural networks. It is observed that features extracted from deeper networks tend to achieve higher performance than those extracted from shallow networks. However, in ESC task, only the deep convolutional neural networks (CNNs) which contain several layers are used and the residual networks are ignored, which lead to degradation in the performance. Meanwhile, a possible explanation for the limited exploration of CNNs and the difficulty to improve on simpler models is the relative scarcity of labeled data for ESC. In this paper, a residual network called EnvResNet for the ESC task is proposed. In addition, we propose to use audio data augmentation to overcome the problem of data scarcity. The experiments will be performed on the ESC-50 database. Combined with data augmentation, the proposed model outperforms baseline implementations relying on mel-frequency cepstral coefficients and achieves results comparable to other state-of-the-art approaches in terms of classification accuracy.


2020 ◽  
Author(s):  
Dongyu Xue ◽  
Han Zhang ◽  
Dongling Xiao ◽  
Yukang Gong ◽  
Guohui Chuai ◽  
...  

AbstractIn silico modelling and analysis of small molecules substantially accelerates the process of drug development. Representing and understanding molecules is the fundamental step for various in silico molecular analysis tasks. Traditionally, these molecular analysis tasks have been investigated individually and separately. In this study, we presented X-MOL, which applies large-scale pre-training technology on 1.1 billion molecules for molecular understanding and representation, and then, carefully designed fine-tuning was performed to accommodate diverse downstream molecular analysis tasks, including molecular property prediction, chemical reaction analysis, drug-drug interaction prediction, de novo generation of molecules and molecule optimization. As a result, X-MOL was proven to achieve state-of-the-art results on all these molecular analysis tasks with good model interpretation ability. Collectively, taking advantage of super large-scale pre-training data and super-computing power, our study practically demonstrated the utility of the idea of “mass makes miracles” in molecular representation learning and downstream in silico molecular analysis, indicating the great potential of using large-scale unlabelled data with carefully designed pre-training and fine-tuning strategies to unify existing molecular analysis tasks and substantially enhance the performance of each task.


Author(s):  
Sebastijan Dumancic ◽  
Hendrik Blockeel

The goal of unsupervised representation learning is to extract a new representation of data, such that solving many different tasks becomes easier. Existing methods typically focus on vectorized data and offer little support for relational data, which additionally describes relationships among instances. In this work we introduce an approach for relational unsupervised representation learning. Viewing a relational dataset as a hypergraph, new features are obtained by clustering vertices and hyperedges. To find a representation suited for many relational learning tasks, a wide range of similarities between relational objects is considered, e.g. feature and structural similarities. We experimentally evaluate the proposed approach and show that models learned on such latent representations perform better, have lower complexity, and outperform the existing approaches on classification tasks.


2022 ◽  
Vol 16 (4) ◽  
pp. 1-16
Author(s):  
Fereshteh Jafariakinabad ◽  
Kien A. Hua

The syntactic structure of sentences in a document substantially informs about its authorial writing style. Sentence representation learning has been widely explored in recent years and it has been shown that it improves the generalization of different downstream tasks across many domains. Even though utilizing probing methods in several studies suggests that these learned contextual representations implicitly encode some amount of syntax, explicit syntactic information further improves the performance of deep neural models in the domain of authorship attribution. These observations have motivated us to investigate the explicit representation learning of syntactic structure of sentences. In this article, we propose a self-supervised framework for learning structural representations of sentences. The self-supervised network contains two components; a lexical sub-network and a syntactic sub-network which take the sequence of words and their corresponding structural labels as the input, respectively. Due to the n -to-1 mapping of words to their structural labels, each word will be embedded into a vector representation which mainly carries structural information. We evaluate the learned structural representations of sentences using different probing tasks, and subsequently utilize them in the authorship attribution task. Our experimental results indicate that the structural embeddings significantly improve the classification tasks when concatenated with the existing pre-trained word embeddings.


2020 ◽  
Vol 3 (1) ◽  
pp. 445-454
Author(s):  
Celal Buğra Kaya ◽  
Alperen Yılmaz ◽  
Gizem Nur Uzun ◽  
Zeynep Hilal Kilimci

Pattern classification is related with the automatic finding of regularities in dataset through the utilization of various learning techniques. Thus, the classification of the objects into a set of categories or classes is provided. This study is undertaken to evaluate deep learning methodologies to the classification of stock patterns. In order to classify patterns that are obtained from stock charts, convolutional neural networks (CNNs), recurrent neural networks (RNNs), and long-short term memory networks (LSTMs) are employed. To demonstrate the efficiency of proposed model in categorizing patterns, hand-crafted image dataset is constructed from stock charts in Istanbul Stock Exchange and NASDAQ Stock Exchange. Experimental results show that the usage of convolutional neural networks exhibits superior classification success in recognizing patterns compared to the other deep learning methodologies.


Sign in / Sign up

Export Citation Format

Share Document