scholarly journals Multi-Condition Training for Unknown Environment Adaptation in Robust ASR Under Real Conditions

10.14311/1105 ◽  
2009 ◽  
Vol 49 (2) ◽  
Author(s):  
J. Rajnoha

Automatic speech recognition (ASR) systems frequently work in a noisy environment. As they are often trained on clean speech data, noise reduction or adaptation techniques are applied to decrease the influence of background disturbance even in the case of unknown conditions. Speech data mixed with noise recordings from particular environment are often used for the purposes of model adaptation. This paper analyses the improvement of recognition performance within such adaptation when multi-condition training data from a real environment is used for training initial models. Although the quality of such models can decrease with the presence of noise in the training material, they are assumed to include initial information about noise and consequently support the adaptation procedure. Experimental results show significant improvement of the proposed training method in a robust ASR task under unknown noisy conditions. The decrease by 29 % and 14 % in word error rate in comparison with clean speech training data was achieved for the non-adapted and adapted system, respectively. 

2020 ◽  
Vol 12 (2) ◽  
pp. 110-115
Author(s):  
Branislav Popović ◽  
Edvin Pakoci ◽  
Darko Pekar

In automatic speech recognition systems, the training data used for system development and the data actually obtained from the users of the system sometimes significantly differ in practice. However, other, more similar data may be available. Transfer learning can help to exploit such similar data for training in order to boost the automatic speech recognizer's performance for a certain domain. This paper presents a few applications of transfer learning in the context of speech recognition, specifically for the Serbian language. Several methods are proposed, with the goal of optimizing system performance on a specific part of the existing speech database for Serbian, or in a noisy environment. The experimental results evaluated on a test set from the desired domain show significant improvement in both word error rate and character error rate.


2014 ◽  
Vol 7 (3) ◽  
pp. 15-31
Author(s):  
Hiroyuki Segi ◽  
Kazuo Onoe ◽  
Shoei Sato ◽  
Akio Kobayashi ◽  
Akio Ando

Tied-mixture HMMs have been proposed as the acoustic model for large-vocabulary continuous speech recognition and have yielded promising results. They share base-distribution and provide more flexibility in choosing the degree of tying than state-clustered HMMs. However, it is unclear which acoustic models to superior to the other under the same training data. Moreover, LBG algorithm and EM algorithm, which are the usual training methods for HMMs, have not been compared. Therefore in this paper, the recognition performance of the respective HMMs and the respective training methods are compared under the same condition. It was found that the number of parameters and the word error rate for both HMMs are equivalent when the number of codebooks is sufficiently large. It was also found that training method using the LBG algorithm achieves a 90% reduction in training time compared to training method using the EM algorithm, without degradation of recognition accuracy.


Author(s):  
Endang Sumarti ◽  
Harun Ahmad Sangaji ◽  
Yahmun Yahmun

This research is motivated by the desire to find out the responses of the traineesto the quality of classroom action research (PTK) training conducted by IKIP Budi Utomo Malang. For that reason, the three research questions were proposed, namely first, how are the trainees' responses to the trainers’ competency? Second, how are the trainees’ understandingof the training material? Third, how are the trainees’ responsesto the quality of training? Data collection uses questionnaires, then analyzed by descriptive techniques, and the results are as follows. First, trainees' responses to trainers’ competencyare in Good category. This is evidenced by the average oftrainees’ answers in the range above 70% of interval scale, concerning skills competency, material delivery, giving instructions, and trainees’ responses. Second, the trainees' understanding of the training material is quite sufficient, because it is in the range 40% of interval scale. This is caused by many factors, even though the competence of the trainers have met the standards of feasibility. Third, the quality of the training is included in the Good category, because 71% of 100 trainees stated Good. This shows that the performance of the training has reached a feasiblestandard.


Sensors ◽  
2018 ◽  
Vol 18 (9) ◽  
pp. 2778 ◽  
Author(s):  
Kristina Yordanova ◽  
Frank Krüger

Providing ground truth is essential for activity recognition and behaviour analysis as it is needed for providing training data in methods of supervised learning, for providing context information for knowledge-based methods, and for quantifying the recognition performance. Semantic annotation extends simple symbolic labelling by assigning semantic meaning to the label, enabling further reasoning. In this paper, we present a novel approach to semantic annotation by means of plan operators. We provide a step by step description of the workflow to manually creating the ground truth annotation. To validate our approach, we create semantic annotation of the Carnegie Mellon University (CMU) grand challenge dataset, which is often cited, but, due to missing and incomplete annotation, almost never used. We show that it is possible to derive hidden properties, behavioural routines, and changes in initial and goal conditions in the annotated dataset. We evaluate the quality of the annotation by calculating the interrater reliability between two annotators who labelled the dataset. The results show very good overlapping (Cohen’s κ of 0.8) between the annotators. The produced annotation and the semantic models are publicly available, in order to enable further usage of the CMU grand challenge dataset.


Author(s):  
Sayoni Das ◽  
Harry M Scholes ◽  
Neeladri Sen ◽  
Christine Orengo

Abstract Motivation Identification of functional sites in proteins is essential for functional characterization, variant interpretation and drug design. Several methods are available for predicting either a generic functional site, or specific types of functional site. Here, we present FunSite, a machine learning predictor that identifies catalytic, ligand-binding and protein–protein interaction functional sites using features derived from protein sequence and structure, and evolutionary data from CATH functional families (FunFams). Results FunSite’s prediction performance was rigorously benchmarked using cross-validation and a holdout dataset. FunSite outperformed other publicly available functional site prediction methods. We show that conserved residues in FunFams are enriched in functional sites. We found FunSite’s performance depends greatly on the quality of functional site annotations and the information content of FunFams in the training data. Finally, we analyze which structural and evolutionary features are most predictive for functional sites. Availabilityand implementation https://github.com/UCL/cath-funsite-predictor. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 69 (4) ◽  
pp. 297-306
Author(s):  
Julius Krause ◽  
Maurice Günder ◽  
Daniel Schulz ◽  
Robin Gruna

Abstract The selection of training data determines the quality of a chemometric calibration model. In order to cover the entire parameter space of known influencing parameters, an experimental design is usually created. Nevertheless, even with a carefully prepared Design of Experiment (DoE), redundant reference analyses are often performed during the analysis of agricultural products. Because the number of possible reference analyses is usually very limited, the presented active learning approaches are intended to provide a tool for better selection of training samples.


Author(s):  
Raj Dabre ◽  
Atsushi Fujita

In encoder-decoder based sequence-to-sequence modeling, the most common practice is to stack a number of recurrent, convolutional, or feed-forward layers in the encoder and decoder. While the addition of each new layer improves the sequence generation quality, this also leads to a significant increase in the number of parameters. In this paper, we propose to share parameters across all layers thereby leading to a recurrently stacked sequence-to-sequence model. We report on an extensive case study on neural machine translation (NMT) using our proposed method, experimenting with a variety of datasets. We empirically show that the translation quality of a model that recurrently stacks a single-layer 6 times, despite its significantly fewer parameters, approaches that of a model that stacks 6 different layers. We also show how our method can benefit from a prevalent way for improving NMT, i.e., extending training data with pseudo-parallel corpora generated by back-translation. We then analyze the effects of recurrently stacked layers by visualizing the attentions of models that use recurrently stacked layers and models that do not. Finally, we explore the limits of parameter sharing where we share even the parameters between the encoder and decoder in addition to recurrent stacking of layers.


Sensors ◽  
2021 ◽  
Vol 21 (18) ◽  
pp. 6168
Author(s):  
Piotr Łuczak ◽  
Przemysław Kucharski ◽  
Tomasz Jaworski ◽  
Izabela Perenc ◽  
Krzysztof Ślot ◽  
...  

The presented paper proposes a hybrid neural architecture that enables intelligent data analysis efficacy to be boosted in smart sensor devices, which are typically resource-constrained and application-specific. The postulated concept integrates prior knowledge with learning from examples, thus allowing sensor devices to be used for the successful execution of machine learning even when the volume of training data is highly limited, using compact underlying hardware. The proposed architecture comprises two interacting functional modules arranged in a homogeneous, multiple-layer architecture. The first module, referred to as the knowledge sub-network, implements knowledge in the Conjunctive Normal Form through a three-layer structure composed of novel types of learnable units, called L-neurons. In contrast, the second module is a fully-connected conventional three-layer, feed-forward neural network, and it is referred to as a conventional neural sub-network. We show that the proposed hybrid structure successfully combines knowledge and learning, providing high recognition performance even for very limited training datasets, while also benefiting from an abundance of data, as it occurs for purely neural structures. In addition, since the proposed L-neurons can learn (through classical backpropagation), we show that the architecture is also capable of repairing its knowledge.


MADRASAH ◽  
2020 ◽  
Vol 12 (2) ◽  
pp. 74-87
Author(s):  
Syarifah Salmah ◽  
Rahmad Rahmad

Indonesia is one of the countries that respects the human rights of its citizens. Based on the fundamental constitutional mandate that education is the right of every citizen without exception, one of the indicators is that educational institutions must open opportunities for every citizen. This study aims to evaluate the existing educational facilities in the city of Banjarmasin, precisely some private Islamic Elementary School (MIS). The method in this research is descriptive qualitative. This method aims to describe the situation as a whole and thoroughly related to the selected object. The results of research related to accessibility for people with disabilities still cannot be seen in some of MIS choosen as the object of this study. The results of this study get the fact that all research objects are not friendly to people with dissabilities, such as conventional school steps. Even though the rule of laws is complete, but in fact, the implementation of these laws still encountered some problems until now.  Accessibility for disability is one indicator as a child-friendly school, so, hopefully this research will be a piece of initial information for stakeholders in the Ministry of Religion in improving the quality of essential Islamic education services.


Sign in / Sign up

Export Citation Format

Share Document