structured representation
Recently Published Documents


TOTAL DOCUMENTS

112
(FIVE YEARS 33)

H-INDEX

12
(FIVE YEARS 3)

2021 ◽  
Vol 12 (1) ◽  
pp. 369
Author(s):  
Da Ma ◽  
Xingyu Chen ◽  
Ruisheng Cao ◽  
Zhi Chen ◽  
Lu Chen ◽  
...  

Generating natural language descriptions for structured representation (e.g., a graph) is an important yet challenging task. In this work, we focus on SQL-to-text, a task that maps a SQL query into the corresponding natural language question. Previous work represents SQL as a sparse graph and utilizes a graph-to-sequence model to generate questions, where each node can only communicate with k-hop nodes. Such a model will degenerate when adapted to more complex SQL queries due to the inability to capture long-term and the lack of SQL-specific relations. To tackle this problem, we propose a relation-aware graph transformer (RGT) to consider both the SQL structure and various relations simultaneously. Specifically, an abstract SQL syntax tree is constructed for each SQL to provide the underlying relations. We also customized self-attention and cross-attention strategies to encode the relations in the SQL tree. Experiments on benchmarks WikiSQL and Spider demonstrate that our approach yields improvements over strong baselines.


AI ◽  
2021 ◽  
Vol 2 (4) ◽  
pp. 738-755
Author(s):  
Jingxiu Huang ◽  
Qingtang Liu ◽  
Yunxiang Zheng ◽  
Linjing Wu

Natural language understanding technologies play an essential role in automatically solving math word problems. In the process of machine understanding Chinese math word problems, comma disambiguation, which is associated with a class imbalance binary learning problem, is addressed as a valuable instrument to transform the problem statement of math word problems into structured representation. Aiming to resolve this problem, we employed the synthetic minority oversampling technique (SMOTE) and random forests to comma classification after their hyperparameters were jointly optimized. We propose a strict measure to evaluate the performance of deployed comma classification models on comma disambiguation in math word problems. To verify the effectiveness of random forest classifiers with SMOTE on comma disambiguation, we conducted two-stage experiments on two datasets with a collection of evaluation measures. Experimental results showed that random forest classifiers were significantly superior to baseline methods in Chinese comma disambiguation. The SMOTE algorithm with optimized hyperparameter settings based on the categorical distribution of different datasets is preferable, instead of with its default values. For practitioners, we suggest that hyperparameters of a classification models be optimized again after parameter settings of SMOTE have been changed.


Electronics ◽  
2021 ◽  
Vol 10 (20) ◽  
pp. 2469
Author(s):  
Te Zeng ◽  
Francis C. M. Lau

We present a novel reinforcement learning architecture that learns a structured representation for use in symbolic melody harmonization. Probabilistic models are predominant in melody harmonization tasks, most of which only treat melody notes as independent observations and do not take note of substructures in the melodic sequence. To fill this gap, we add substructure discovery as a crucial step in automatic chord generation. The proposed method consists of a structured representation module that generates hierarchical structures for the symbolic melodies, a policy module that learns to break a melody into segments (whose boundaries concur with chord changes) and phrases (the subunits in segments), and a harmonization module that generates chord sequences for each segment. We formulate the structure discovery process as a sequential decision problem with a policy gradient RL method selecting the boundary of each segment or phrase to obtain an optimized structure. We conduct experiments on our preprocessed HookTheory Lead Sheet Dataset, which has 17,979 melody/chord pairs. The results demonstrate that our proposed method can learn task-specific representations and, thus, yield competitive results compared with state-of-the-art baselines.


Author(s):  
Andrea Seveso ◽  
Fabio Mercorio ◽  
Mario Mezzanzanica

Taxonomies provide a structured representation of semantic relations between lexical terms, acting as the backbone of many applications. The research proposed herein addresses the topic of taxonomy enrichment using an ”human-in-the-loop” semi-supervised approach. I will be investigating possible ways to extend and enrich a taxonomy using corpora of unstructured text data. The objective is to develop a methodological framework potentially applicable to any domain.


2021 ◽  
Author(s):  
Xinyao LI ◽  
Linlin ZHANG ◽  
Xuehua BI ◽  
Ying ZHANG ◽  
Guanglei YU ◽  
...  

Abstract Objective:It is important for physicians' clinical decision support to classify the coronary heart disease (CHD).Customizing personalized predictive models for patients requires selecting a patient group from an existing medical database that most closely resembles the indexed patients. In this study,we introduce a new concept that using the patient similarity for the classification of patient with CHD.Materials and methods: We performed a structured representation of CHD patients. Obtain the multidimensional attribute distance matrix between patient pairs by calculating the multidimensional attribute distance of the patients. Predict similarity between patient pairs using machine learning (ML) models to predict clinical outcomes for indexed patients based on matched similar patients.Results:The new measure shows marked improvements over the traditional classification measures. LightGBM is the top-performing ML model. The best model achieved 88.52% accuracy.Conclusion:The medical applications of ML supported by similarity analytics represent a promising solution through which to reduce the physican workload to achieve the goal of “precision medicine”.


2021 ◽  
Author(s):  
Jingyuan Yang ◽  
Jie Li ◽  
Leida Li ◽  
Xiumei Wang ◽  
Xinbo Gao

SoftwareX ◽  
2021 ◽  
Vol 14 ◽  
pp. 100694
Author(s):  
Filipe Assunção ◽  
Nuno Lourenço ◽  
Bernardete Ribeiro ◽  
Penousal Machado

Author(s):  
Irena Parvanova ◽  
Joseph Finkelstein

Introduction of core outcome sets (COS) facilitates evidence synthesis, transparency in outcome reporting, and standardization in clinical research. However, development of COS may be a time consuming and expensive process. Publicly available repositories, such as ClinicalTrials.gov (CTG), provide access to a vast collection of clinical trial characteristics including primary and secondary outcomes, which can be analyzed using a comprehensive set of tools. With growing number of COVID-19 clinical trials, COS development may provide crucial means to standardize, aggregate, share, and analyze diverse research results in a harmonized way. This study was aimed at initial assessment of utility of CTG analytics for identifying COVID-19 COS. At the time of this study, January, 2021, we analyzed 120 ongoing NIH-funded COVID-19 clinical trials initiated in 2020 to inform COVID-19 COS development by evaluating and ranking clinical trial outcomes based on their structured representation in CTG. Using this approach, COS comprised of 25 major clinical outcomes has been identified with mortality, mental health status, and COVID-19 antibodies at the top of the list. We concluded that CTG analytics can be instrumental for COVID-19 COS development and that further analysis is warranted including broader number of international trials combined with more granular approach and ontology-driven pipelines for outcome extraction and curation.


2021 ◽  
Author(s):  
Mullai Murugan ◽  
Lawrence J. Babb ◽  
Casey Overby Taylor ◽  
Luke V. Rasmussen ◽  
Robert R. Freimuth ◽  
...  

AbstractStructured representation of clinical genetic results is necessary for advancing precision medicine. The Electronic Medical Records and Genomics (eMERGE) Network’s Phase III program initially used a commercially developed XML message format for standardized and structured representation of genetic results for electronic health record (EHR) integration. In a desire to move towards a standard representation, the network created a new standardized format based upon Health Level Seven Fast Healthcare Interoperability Resources (HL7 FHIR), to represent clinical genomics results. These new standards improve the utility of HL7 FHIR as an international healthcare interoperability standard for management of genetic data from patients. This work advances the establishment of standards that are being designed for broad adoption in the current health information technology landscape.


Sign in / Sign up

Export Citation Format

Share Document