match accuracy
Recently Published Documents


TOTAL DOCUMENTS

14
(FIVE YEARS 9)

H-INDEX

2
(FIVE YEARS 1)

2021 ◽  
Vol 11 (24) ◽  
pp. 12116
Author(s):  
Shanza Abbas ◽  
Muhammad Umair Khan ◽  
Scott Uk-Jin Lee ◽  
Asad Abbas

Natural language interfaces to databases (NLIDB) has been a research topic for a decade. Significant data collections are available in the form of databases. To utilize them for research purposes, a system that can translate a natural language query into a structured one can make a huge difference. Efforts toward such systems have been made with pipelining methods for more than a decade. Natural language processing techniques integrated with data science methods are researched as pipelining NLIDB systems. With significant advancements in machine learning and natural language processing, NLIDB with deep learning has emerged as a new research trend in this area. Deep learning has shown potential for rapid growth and improvement in text-to-SQL tasks. In deep learning NLIDB, closing the semantic gap in predicting users’ intended columns has arisen as one of the critical and fundamental problems in this research field. Contributions toward this issue have consisted of preprocessed feature inputs and encoding schema elements afore of and more impactful to the targeted model. Various significant work contributed towards this problem notwithstanding, this has been shown to be one of the critical issues for the task of developing NLIDB. Working towards closing the semantic gap between user intention and predicted columns, we present an approach for deep learning text-to-SQL tasks that includes previous columns’ occurrences scores as an additional input feature. Overall exact match accuracy can also be improved by emphasizing the improvement of columns’ prediction accuracy, which depends significantly on column prediction itself. For this purpose, we extract the query fragments from previous queries’ data and obtain the columns’ occurrences and co-occurrences scores. Column occurrences and co-occurrences scores are processed as input features for the encoder–decoder-based text to the SQL model. These scores contribute, as a factor, the probability of having already used columns and tables together in the query history. We experimented with our approach on the currently popular text-to-SQL dataset Spider. Spider is a complex data set containing multiple databases. This dataset includes query–question pairs along with schema information. We compared our exact match accuracy performance with a base model using their test and training data splits. It outperformed the base model’s accuracy, and accuracy was further boosted in experiments with the pretrained language model BERT.


2021 ◽  
Vol 2021 (29) ◽  
pp. 118-122
Author(s):  
Peter Morovič ◽  
Ján Morovič ◽  
Sergio Etchebehere

Managing color on a particular imaging system is a wellunderstood challenge with a wealth of existing models, methods and techniques. In the case of printing systems, these tend to operate in the context of a single substrate, where managing color on every additional substrate is approach as a separate, detached problem. While such a mind-set works reasonably well in general, it breaks down when it comes to printing onto precolored textiles, such as pre-dyed fabrics. The present paper therefore introduces a family of approaches that support the use of multiple pre-colored textiles on a given printing system that also allow for a balance between characterization effort and color match accuracy. This, in turn provides solutions that can fit a variety of practical working patterns to maximize overall efficiency and performance.


2021 ◽  
Vol 11 (16) ◽  
pp. 7758
Author(s):  
Jihye Yoo ◽  
Yeongbong Jin ◽  
Bonggyun Ko ◽  
Min-Soo Kim

Cardiovascular diseases are the leading cause of death globally. The ECG is the most commonly used tool for diagnosing cardiovascular diseases, and, recently, there are a number of attempts to use deep learning to analyze ECG. In this study, we propose a method for performing multi-label classification on standard ECG (12-lead with duration of 10 s) data. We used the ResNet model that can perform residual learning as a base model for classification in this work, and we tried to improve performance through SE-ResNet, which added squeeze and excitation blocks on the plain ResNet. As a result of the experiment, it was possible to induce overall performance improvement through squeeze and excitation blocks. In addition, the random k-labelsets (RAKEL) algorithm was applied to improve the performance in multi-label classification problems. As a result, the model that applied soft voting through the RAKEL algorithm to SE-ResNet-34 represented the best performance, and the average performances according to the number of label divisions k were achieved 0.99%, 88.49%, 92.43%, 90.54%, and 93.40% in exact match, accuracy, F1-score, precision, and recall, respectively.


Author(s):  
Pushpinder Kaur Brar ◽  
Balpreet Kaur Kang ◽  
Rozy Rasool ◽  
Sanjay Kumar Sahoo

Abstract Background Exposure of Apis mellifera to neonicotinoid insecticides is one of the factors attributed to the recent decline in A. mellifera populations resulting in economic and ecological losses due to loss of pollination services. Honeybees can get exposed to neonicotinoids like imidacloprid directly in the field at the time of application as well as during consumption of pollen and nectar from treated plants. In addition, some metabolites of imidacloprid are more toxic than the parent compound. So, the fate of imidacloprid and its metabolites in commodities to which honeybees get exposed needs to be overhauled. Objective To validate QuEChERS method for estimation of imidacloprid and its metabolites in cotton flower, pollen, nectariferous tissue, and honey using HPLC. Methods The QuEChERS method was validated in terms of selectivity, linearity, LOD, LOQ, matrix match, accuracy, and precision. The estimation of residues was done by HPLC. Results Recoveries of imidacloprid and its metabolites for cotton flowers, nectariferous tissue, pollen and honey samples were in the range of 80.42–99. 83%. LOQ for imidacloprid and its metabolites was 0.01 µg/g. Acceptable precision (RSD < 20%) was obtained. Conclusion The method allows simple and fast extraction of imidacloprid and its metabolites from cotton flower, pollen, nectariferous tissue, and honey. Highlights An accurate, simple, and sensitive analytical method was validated for imidacloprid and its metabolites. The method was validated according to the SANTE/12682/2019 guidelines.


Author(s):  
Hankook Lee ◽  
Sungsoo Ahn ◽  
Seung-Woo Seo ◽  
You Young Song ◽  
Eunho Yang ◽  
...  

Retrosynthesis, of which the goal is to find a set of reactants for synthesizing a target product, is an emerging research area of deep learning. While the existing approaches have shown promising results, they currently lack the ability to consider availability (e.g., stability or purchasability) of the reactants or generalize to unseen reaction templates (i.e., chemical reaction rules). In this paper, we propose a new approach that mitigates the issues by reformulating retrosynthesis into a selection problem of reactants from a candidate set of commercially available molecules. To this end, we design an efficient reactant selection framework, named RetCL (retrosynthesis via contrastive learning), for enumerating all of the candidate molecules based on selection scores computed by graph neural networks. For learning the score functions, we also propose a novel contrastive training scheme with hard negative mining. Extensive experiments demonstrate the benefits of the proposed selection-based approach. For example, when all 671k reactants in the USPTO database are given as candidates, our RetCL achieves top-1 exact match accuracy of 71.3% for the USPTO-50k benchmark, while a recent transformer-based approach achieves 59.6%. We also demonstrate that RetCL generalizes well to unseen templates in various settings in contrast to template-based approaches.


2021 ◽  
pp. 152692482110246
Author(s):  
Manish Suryapalam ◽  
Mohammed Kashem ◽  
Huaqing Zhao ◽  
Norihisa Shigemura ◽  
Yoshiya Toyoda ◽  
...  

Purpose: A difference in the lower body to upper body ratio between similarly heighted individuals could lead to inadequately matched transplants. There has been a perception in clinical circles that body ratio varies between people of different races, and investigating this supposition would prove useful in increasing transplant match accuracy. The investigation’s purpose was to derive an equation with a greater correlation to lung length than height alone. Methods: Lung transplantation donor data for 480 adult patients was obtained and divided by ethnicity—Caucasian, African American, and Hispanic. Height, weight, age, sex, right and left lung length were evaluated for significance. The R2 value of the multiple linear regression with these variables vs. lung length was determined and tested in a separate dataset of 100 patients. Results: Only the distribution of height was significant between the 3 ethnicities ( P = 0.041). None of the ANCOVAs were significant ( P < 0.05) or near significant ( P < 0.10). For the strongest correlation model with lung length, height had a linear fit, weight had a cubic fit, and age had a logistic fit. Multiple regression models were successfully created for right lung (R2 = 0.202) and left lung (R2 = 0.213). Independent testing showed a correlation of 0.131 and 0.136, respectively. Conclusion: Using demographic information from the donor and recipient as proxies for estimating lung size should only serve as a rough guide due to their weak correlation with lung length. As a result, for greater accuracy, donor-recipient matching should be individualized by taking donor and recipient chest X-Rays and/or TLC into consideration.


2020 ◽  
Vol 7 (4) ◽  
pp. 755
Author(s):  
Arif Bijaksana Putra Negara ◽  
Hafiz Muhardi ◽  
Evi Fathiyah Muniyati

<p>Informasi jeda adalah salah satu faktor pendukung dari ucapan berkualitas yang dihasilkan oleh sistem <em>Text to Speech</em>. Penelitian ini bertujuan untuk memprediksi jeda pada ucapan kalimat bahasa Melayu Pontianak berbasis <em>part of speech</em> dengan menggunakan <em>tools</em> Hidden Markov Model (HMM). HMM akan menghitung nilai probabilitas dari setiap kemungkinan yang ada. Penelitian ini menggunakan data berupa file rekaman ucapan penutur yang membacakan 500 kalimat berbahasa Melayu Pontianak. Hasil yang didapatkan dari sistem ini yaitu teks kalimat bahasa Melayu Pontianak beserta prediksi jedanya. Indeks jeda dikategorikan menjadi 5 kategori yaitu indeks jeda “0” menandakan tidak ada jeda, “1” menandakan jeda singkat, “2” menandakan jeda panjang, “,” menandakan tanda baca koma, dan “.” menandakan akhir kalimat. Hasil prediksi kemudian diuji menggunakan pengujian akurasi kecocokan jeda ucapan dalam satu kalimat penuh dan pengujian <em>precision</em>, <em>recall</em> dan <em>f-measure</em>. Frasa jeda ucapan yang diuji yaitu frasa jeda 1+2 dan frasa jeda 2. Pengujian dilakukan dengan membandingkan hasil model bigram dan trigram. Berdasarkan pengujian yang telah dilakukan, model trigram lebih baik dalam menghasilkan prediksi jeda ucapan pada kalimat bahasa Melayu Pontianak.</p><p> </p><p><em><strong>Abstract</strong></em></p><p><em>Pause information is one of the supporting factors of quality speech produced by the Text to Speech system. Previously there had been research to predict pauses in Pontianak Malay language using other methods, but it still did not get good results. This study aims to predict pauses in Pontianak Malay language sentences using the Hidden Markov Model (HMM) tools based on part of speech. HMM will calculate the probability value of each possibility. This research uses recording file of speeches from speakers who read 500 Pontianak Malay sentences and a new PoS set developed from several existing PoS sets. The results are Pontianak Malay language sentence along with the pause prediction. The pause indices are categorized into 5 categories, the pause index "0" indicates that there is no pause, "1" indicates a short pause, "2" indicates a long pause, "," indicates the comma punctuation, and "." indicates the end of the sentence. The prediction results are then tested using a speech pause match accuracy test in one full sentence and testing of precision, recall and f-measure. The speech pause phrases that are tested are the pause phrase 1+2 and the pause phrase 2. The test is done by comparing the results of the bigram and trigram models. Based on the tests that have been done, the trigram model is better at producing predictions of speech pauses in Pontianak Malay language sentences.</em></p>


Author(s):  
Barbara May Bernhardt ◽  
D. Ignatova ◽  
W. Amoako ◽  
N. Aspinall ◽  
S. Marinova-Todd ◽  
...  

Previous research on Bulgarian consonant acquisition reports earlier acquisition of stops, nasals and glides than fricatives, affricates and liquids. The current study expands the investigation of Bulgarian consonant acquisition. The primary objective was to identify characteristics of protracted versus typical phonological development (PPD versus TD) relative to consonant match (accuracy) levels and mismatch patterns. A native speaker audio-recorded and transcribed single-word productions (110-word list) of sixty 3- to 5-year-olds (30 TD, 30 PPD). Another two transcribers confirmed transcriptions using acoustic analysis for disambiguation. Data generally confirmed previous findings regarding the order of consonant acquisition. Factors characteristic of PPD in comparison with TD were: lower match levels, especially at age 3 for onsets in unstressed syllables: later mastery of laterals; and a greater proportion and range of mismatch patterns, including deletion and more than one feature mismatch per segment (e.g., Manner & Place). The paper concludes with clinical and research implications.


2019 ◽  
Author(s):  
Bryan Shilowich ◽  
Irving Biederman

Voice recognition is a fundamental pathway to person individuation, although typically overshadowed by its visual counterpart, face recognition. There have been no large scale, parametric studies investigating voice recognition performance as a function of cognitive variables in concert with voice parameters. Using celebrity voice clips of varying lengths, 1-4 sec., paired with similar sounding, unfamiliar voice foils, the present study investigated three key voice parameters distinguishing targets from foils -- fundamental frequency, f0 (pitch), subharmonic-to-harmonic ratio, SHR (creakiness), and syllabic rate--in concert with the cognitive variables of voice familiarity and judged voice distinctiveness as they contributed to recognition accuracy at varying clip lengths. All the variables had robust effects in clips as short as 1 sec. Objective measures of distinctiveness, quantified by the distances of each target voice to that target’s sex- based mean for each parameter, showed that sensitivity to distinctiveness increased with familiarity. This effect was most evident on foil trials; at clip lengths of one second and above, f0 and SHR distinctiveness showed no discernible effect on match trials. Speaking rate distinctiveness improved match accuracy, an effect only seen with high familiarity. Recognition accuracy improved with the number of parameters that differed by an amount larger than the median, both in the target-to-foil and target-to-mean voice comparisons. A linear regression model of these three voice parameters, clip length, and subjective measures of distinctiveness and familiarity accounted for 36.7% of the variance in recognition accuracy.


2018 ◽  
Author(s):  
Shoko Wakamiya ◽  
Mizuki Morita ◽  
Yoshinobu Kano ◽  
Tomoko Ohkuma ◽  
Eiji Aramaki

BACKGROUND The amount of medical and clinical-related information on the Web is increasing. Among the different types of information available, social media–based data obtained directly from people are particularly valuable and are attracting significant attention. To encourage medical natural language processing (NLP) research exploiting social media data, the 13th NII Testbeds and Community for Information access Research (NTCIR-13) Medical natural language processing for Web document (MedWeb) provides pseudo-Twitter messages in a cross-language and multi-label corpus, covering 3 languages (Japanese, English, and Chinese) and annotated with 8 symptom labels (such as cold, fever, and flu). Then, participants classify each tweet into 1 of the 2 categories: those containing a patient’s symptom and those that do not. OBJECTIVE This study aimed to present the results of groups participating in a Japanese subtask, English subtask, and Chinese subtask along with discussions, to clarify the issues that need to be resolved in the field of medical NLP. METHODS In summary, 8 groups (19 systems) participated in the Japanese subtask, 4 groups (12 systems) participated in the English subtask, and 2 groups (6 systems) participated in the Chinese subtask. In total, 2 baseline systems were constructed for each subtask. The performance of the participant and baseline systems was assessed using the exact match accuracy, F-measure based on precision and recall, and Hamming loss. RESULTS The best system achieved exactly 0.880 match accuracy, 0.920 F-measure, and 0.019 Hamming loss. The averages of match accuracy, F-measure, and Hamming loss for the Japanese subtask were 0.720, 0.820, and 0.051; those for the English subtask were 0.770, 0.850, and 0.037; and those for the Chinese subtask were 0.810, 0.880, and 0.032, respectively. CONCLUSIONS This paper presented and discussed the performance of systems participating in the NTCIR-13 MedWeb task. As the MedWeb task settings can be formalized as the factualization of text, the achievement of this task could be directly applied to practical clinical applications.


Sign in / Sign up

Export Citation Format

Share Document