An Abstract System for Converting and Recovering Texts like Structured Information

Author(s):  
Edgardo Samuel Barraza Verdesoto ◽  
Richard de Jesus Gil Herrera ◽  
Marlly Yaneth Rojas Ortiz

Abstract This paper introduces an abstract system for converting texts into structured information. The proposed architecture incorporates several strategies based on scientific models of how the brain records and recovers memories, and approaches that convert texts into structured data. The applications of this proposal are vast because, in general, the information that can be expressed like a text way, such as reports, emails, web contents, etc., is considered unstructured and, hence, the repositories based on a SQL do not capable to deal efficiently with this kind of data. The model in which was based on this proposal divides a sentence into clusters of words which in turn are transformed into members of a taxonomy of algebraic structures. The algebraic structures must comply properties of Abelian groups. Methodologically, an incremental prototyping approach has been applied to develop a satisfactory architecture that can be adapted to any language. A special case is studied, this deals with the Spanish language. The developed abstract system is a framework that permits to implements applications that convert unstructured textual information to structured information, this can be useful in contexts such as Natural Language Generation, Data Mining, dynamically generation of theories, among others.

2021 ◽  
Vol 11 (9) ◽  
pp. 3867
Author(s):  
Zhewei Liu ◽  
Zijia Zhang ◽  
Yaoming Cai ◽  
Yilin Miao ◽  
Zhikun Chen

Extreme Learning Machine (ELM) is characterized by simplicity, generalization ability, and computational efficiency. However, previous ELMs fail to consider the inherent high-order relationship among data points, resulting in being powerless on structured data and poor robustness on noise data. This paper presents a novel semi-supervised ELM, termed Hypergraph Convolutional ELM (HGCELM), based on using hypergraph convolution to extend ELM into the non-Euclidean domain. The method inherits all the advantages from ELM, and consists of a random hypergraph convolutional layer followed by a hypergraph convolutional regression layer, enabling it to model complex intraclass variations. We show that the traditional ELM is a special case of the HGCELM model in the regular Euclidean domain. Extensive experimental results show that HGCELM remarkably outperforms eight competitive methods on 26 classification benchmarks.


2020 ◽  
Vol 8 (6) ◽  
pp. 3281-3287

Text is an extremely rich resources of information. Each and every second, minutes, peoples are sending or receiving hundreds of millions of data. There are various tasks involved in NLP are machine learning, information extraction, information retrieval, automatic text summarization, question-answered system, parsing, sentiment analysis, natural language understanding and natural language generation. The information extraction is an important task which is used to find the structured information from unstructured or semi-structured text. The paper presents a methodology for extracting the relations of biomedical entities using spacy. The framework consists of following phases such as data creation, load and converting the data into spacy object, preprocessing, define the pattern and extract the relations. The dataset is downloaded from NCBI database which contains only the sentences. The created model evaluated with performance measures like precision, recall and f-measure. The model achieved 87% of accuracy in retrieving of entities relation.


NeuroSci ◽  
2020 ◽  
Vol 1 (1) ◽  
pp. 24-43
Author(s):  
Tatsuya Daikoku

Statistical learning is an innate function in the brain and considered to be essential for producing and comprehending structured information such as music. Within the framework of statistical learning the brain has an ability to calculate the transitional probabilities of sequences such as speech and music, and to predict a future state using learned statistics. This paper computationally examines whether and how statistical learning and knowledge partially contributes to musical representation in jazz improvisation. The results represent the time-course variations in a musician’s statistical knowledge. Furthermore, the findings show that improvisational musical representation might be susceptible to higher- but not lower-order statistical knowledge (i.e., knowledge of higher-order transitional probability). The evidence also demonstrates the individuality of improvisation for each improviser, which in part depends on statistical knowledge. Thus, this study suggests that statistical properties in jazz improvisation underline individuality of musical representation.


2019 ◽  
Vol 108 (2) ◽  
pp. 262-277 ◽  
Author(s):  
ANDREW D. BROOKE-TAYLOR ◽  
SHEILA K. MILLER

We show that the isomorphism problems for left distributive algebras, racks, quandles and kei are as complex as possible in the sense of Borel reducibility. These algebraic structures are important for their connections with the theory of knots, links and braids. In particular, Joyce showed that a quandle can be associated with any knot, and this serves as a complete invariant for tame knots. However, such a classification of tame knots heuristically seemed to be unsatisfactory, due to the apparent difficulty of the quandle isomorphism problem. Our result confirms this view, showing that, from a set-theoretic perspective, classifying tame knots by quandles replaces one problem with (a special case of) a much harder problem.


2019 ◽  
Vol 375 (1791) ◽  
pp. 20190304 ◽  
Author(s):  
Ryan Calmus ◽  
Benjamin Wilson ◽  
Yukiko Kikuchi ◽  
Christopher I. Petkov

Understanding how the brain forms representations of structured information distributed in time is a challenging endeavour for the neuroscientific community, requiring computationally and neurobiologically informed approaches. The neural mechanisms for segmenting continuous streams of sensory input and establishing representations of dependencies remain largely unknown, as do the transformations and computations occurring between the brain regions involved in these aspects of sequence processing. We propose a blueprint for a neurobiologically informed and informing computational model of sequence processing (entitled: Vector-symbolic Sequencing of Binding INstantiating Dependencies, or VS-BIND). This model is designed to support the transformation of serially ordered elements in sensory sequences into structured representations of bound dependencies, readily operates on multiple timescales, and encodes or decodes sequences with respect to chunked items wherever dependencies occur in time. The model integrates established vector symbolic additive and conjunctive binding operators with neurobiologically plausible oscillatory dynamics, and is compatible with modern spiking neural network simulation methods. We show that the model is capable of simulating previous findings from structured sequence processing tasks that engage fronto-temporal regions, specifying mechanistic roles for regions such as prefrontal areas 44/45 and the frontal operculum during interactions with sensory representations in temporal cortex. Finally, we are able to make predictions based on the configuration of the model alone that underscore the importance of serial position information, which requires input from time-sensitive cells, known to reside in the hippocampus and dorsolateral prefrontal cortex. This article is part of the theme issue ‘Towards mechanistic models of meaning composition’.


2020 ◽  
Vol 45 (4) ◽  
pp. 737-763 ◽  
Author(s):  
Anirban Laha ◽  
Parag Jain ◽  
Abhijit Mishra ◽  
Karthik Sankaranarayanan

We present a framework for generating natural language description from structured data such as tables; the problem comes under the category of data-to-text natural language generation (NLG). Modern data-to-text NLG systems typically use end-to-end statistical and neural architectures that learn from a limited amount of task-specific labeled data, and therefore exhibit limited scalability, domain-adaptability, and interpretability. Unlike these systems, ours is a modular, pipeline-based approach, and does not require task-specific parallel data. Rather, it relies on monolingual corpora and basic off-the-shelf NLP tools. This makes our system more scalable and easily adaptable to newer domains. Our system utilizes a three-staged pipeline that: (i) converts entries in the structured data to canonical form, (ii) generates simple sentences for each atomic entry in the canonicalized representation, and (iii) combines the sentences to produce a coherent, fluent, and adequate paragraph description through sentence compounding and co-reference replacement modules. Experiments on a benchmark mixed-domain data set curated for paragraph description from tables reveals the superiority of our system over existing data-to-text approaches. We also demonstrate the robustness of our system in accepting other popular data sets covering diverse data types such as knowledge graphs and key-value maps.


2015 ◽  
Vol 29 (2) ◽  
pp. 381-396 ◽  
Author(s):  
Miklos A. Vasarhelyi ◽  
Alexander Kogan ◽  
Brad M. Tuttle

SYNOPSIS This paper discusses an overall framework of Big Data in accounting, setting the stage for the ensuing collection of essays that presents the ongoing evolution of corporate data into Big Data, ranging from the structured data contained in modern ERPs to loosely connected unstructured and semi-structured information from the environment. These essays focus on the sources, uses, and challenges of Big Data in accounting (measurement) and auditing (assurance). They consider the changing nature of accounting records and the incorporation of nontraditional sources of data into the accounting and auditing domains, as well as the need for changes in the accounting and auditing standards, and the new opportunities for audit analytics enabled by Big Data. Additionally, the papers discuss the interaction of Big Data and traditional sources of data, as well as Big Data's impact on audit judgment and behavioral research. Both accounting academics and accounting practitioners will benefit from learning about the significant potential benefits of Big Data and the inevitable challenges and obstacles in the way of its utilization. Advanced accounting students would also benefit from exposure to these emerging issues to enhance their future career development.


2021 ◽  
Vol 66 (4) ◽  
pp. 69-80
Author(s):  
Mihai Enăchescu ◽  

Continuity and Discontinuity in the Transmission of Spanish Inherited Words Competed by Arabisms: oliva and aceituna, olio and aceite, olivo and aceituno. The loss and replacement of Arabisms by Latin loanwords was a frequent phenomenon between the sixteenth and the seventeenth centuries; the opposite movement, the replacement of an inherited word by an Arabism is far less frequent. Oliva, an inherited word, is competed by the Arabism aceituna; currently the common name for the fruit in the Hispanic world is aceituna, and oliva has a restricted use to the phrase aceite de oliva or to refer to a colour. Similarly, the inherited word olio will be replaced by aceite, and with a specialized meaning will be eliminated by the euphuism óleo, its etymological doublet. On the other hand, olivo prevails over aceituno and represents a special case of continuity in this lexical family. The research will be carried out in two directions: first, I will analyse the old academic dictionaries and other specialized dictionaries and glossaries from the fifteenth-twentieth centuries. Second, I will conduct a corpus analysis, based on the diachronic corpora available for the Spanish language. This study will try to answer the questions how? and why? of these neological movements of vocabulary. Keywords: inherited words, Arabisms, oliva, aceituna, lexical substitution


2009 ◽  
Vol 05 (01) ◽  
pp. 123-134 ◽  
Author(s):  
YUTA KAKIMOTO ◽  
KAZUYUKI AIHARA

Binocular rivalry is perceptual alternation that occurs when different visual images are presented to each eye. Despite the intensive studies, the mechanism of binocular rivalry still remains unclear. In multistable binocular rivalry, which is a special case of binocular rivalry, it is known that the perceptual alternation between paired patterns is more frequent than that between unpaired patterns. This result suggests that perceptual transition in binocular rivalry is not a simple random process, and the memories stored in the brain can play an important role in the perceptual transition. In this study, we propose a hierarchical chaotic neural network model for multistable binocular rivalry and show that our model reproduces some characteristic features observed in multistable binocular rivalry.


2011 ◽  
Vol 23 (1) ◽  
pp. 160-182 ◽  
Author(s):  
S. M. M. Martens ◽  
J. M. Mooij ◽  
N. J. Hill ◽  
J. Farquhar ◽  
B. Schölkopf

We present a graphical model framework for decoding in the visual ERP-based speller system. The proposed framework allows researchers to build generative models from which the decoding rules are obtained in a straightforward manner. We suggest two models for generating brain signals conditioned on the stimulus events. Both models incorporate letter frequency information but assume different dependencies between brain signals and stimulus events. For both models, we derive decoding rules and perform a discriminative training. We show on real visual speller data how decoding performance improves by incorporating letter frequency information and using a more realistic graphical model for the dependencies between the brain signals and the stimulus events. Furthermore, we discuss how the standard approach to decoding can be seen as a special case of the graphical model framework. The letter also gives more insight into the discriminative approach for decoding in the visual speller system.


Sign in / Sign up

Export Citation Format

Share Document