scholarly journals Data Driven Predictive Model to Compact a Production Stop-on-Fail Test Set for an Electronic Device

Author(s):  
Ana Hinojosa ◽  
Stoyan Stoyanov
2014 ◽  
Vol 12 (3) ◽  
pp. 365-376 ◽  
Author(s):  
Teodora Harsa ◽  
Alexandra Harsa ◽  
Beata Szefler

AbstractA novel QSAR approach based on correlation weighting and alignment over a hypermolecule that mimics the investigated correlational space was performed on a set of 40 caffeines downloaded from the PubChem database. The best models describing log P and LD50 values of this set of caffeine derivatives were validated against the external test set and in a new predictive model by using clusters of similarity.


Author(s):  
MIRJAM SEPESY MAUČEC ◽  
TOMAŽ ROTOVNIK ◽  
ZDRAVKO KAČIČ ◽  
JANEZ BREST

This paper presents the results of a study on modeling the highly inflective Slovenian language. We focus on creating a language model for a large vocabulary speech recognition system. A new data-driven method is proposed for the induction of inflectional morphology into language modeling. The research focus is on data sparsity, which results from the complex morphology of the language. The idea of using subword units is examined. An attempt is made to figure out the segmentation of words into two subword units: stems and endings. No prior knowledge of the language is used. The subword units should fit into the frameworks of the probabilistic language models. A morphologically correct decomposition of words is not being sought, but searching for a decomposition which yields the minimum entropy of the training corpus. This entropy is approximated by using N-gram models. Despite some seemingly over-simplified assumption, the subword models improve the applicability of the language models for a sparse training corpus. The experiments were performed using the VEČER newswire text corpus as a training corpus. The test set was taken from the SNABI speech database, because the final models were evaluated in speech recognition experiments on SNABI speech database. Two different subword-based models are proposed and examined experimentally. The experiments demonstrate that subword-based models, which considerably reduce OOV rate, improve speech recognition WER when compared with standard word-based models, even though they increase test set perplexity. Subword-based models with improved perplexity, but which reduce the OOV rate much less than the previous ones, do not improve speech recognition results.


This paper describes how bootstrapping was used to extend the development of the Urdu Noisy Text dependency treebank. To overcome the bottleneck of manually annotating corpus for a new domain of user-generated text, MaltParser, an opensource, data-driven dependency parser, is used to bootstrap the treebank in semi-automatic manner for corpus annotation after being trained on 500 tweet Urdu Noisy Text Dependency Treebank. Total four bootstrapping iterations were performed. At the end of each iteration, 300 Urdu tweets were automatically tagged, and the performance of parser model was evaluated against the development set. 75 automatically tagged tweets were randomly selected out of pre-tagged 300 tweets for manual correction, which were then added in the training set for parser retraining. Finally, at the end of last iteration, parser performance was evaluated against test set. The final supervised bootstrapping model obtains a LA of 72.1%, UAS of 75.7% and LAS of 64.9%, which is a significant improvement over baseline score of 69.8% LA, 74% UAS, and 62.9% LAS


Author(s):  
Aerambamoorthy Thavaneswaran ◽  
Ruppa K Thulasiram ◽  
Zimo Zhu ◽  
Mohammed Erfanul Hoque ◽  
Nalini Ravishanker

2019 ◽  
Vol 11 (20) ◽  
pp. 5702 ◽  
Author(s):  
Lee ◽  
Choi ◽  
Choi ◽  
Kim

Clothing condition was selected as a key human-subject-relevant parameter which is dynamically changed depending on the user’s preferences and also on climate conditions. While the environmental components are relatively easier to measure using sensor devices, clothing value (clo) is almost impossible to visually estimate because it varies across building occupants even though they share constant thermal conditions in the same room. Therefore, in this study we developed a data-driven model to estimate the clothing insulation value as a function of skin and clothing surface temperatures. We adopted a series of environmental chamber tests with 20 participants. A portion of the collected data was used as a training dataset to establish a data-driven model based on the use of advanced computational algorithms. To consider a practical application, in this study we minimized the number of sensing points for data collection while adopting a wearable device for the user’s convenience. The study results revealed that the developed predictive model generated an accuracy of 88.04%, and the accuracy became higher in the prediction of a high clo value than in that of a low value. In addition, the accuracy was affected by the user’s body mass index. Therefore, this research confirms that it is possible to develop a data-driven predictive model of a user’s clo value based on the use of his/her physiological and ambient environmental information, and an additional study with a larger dataset via using chamber experiments with additional test participants is required for better performance in terms of prediction accuracy.


Sign in / Sign up

Export Citation Format

Share Document