scholarly journals MODEL OF LINGUISTIC ONTOLOGY WITH FUZZY SEMANTIC RELATIONS GENERATED ON BASIS OF WIKIPEDIA

2016 ◽  
Vol 2016 (1) ◽  
pp. 134-139
Author(s):  
Дмитрий Кравцов ◽  
Dmitriy Kravtsov ◽  
Евгений Леонов ◽  
Evgeniy Leonov

The application without knowledge of an ontological type allows updating considerably quality of problem solutions in natural language processing. A number of researchers use Wikipedia as a basis for the formation of such resources. This paper reports the formalization method of Wikipedia structures and linguistic ontology used in the developed by the authors system of the linguistic ontology formation a specified subject field from Wikipedia. The papers and references connecting them serve a purpose for formation of a weighted graph of ontology to the graph nodes correspond notions, and to the ribs of graph – fuzzy semantic relations between them. The references obtain different weights depending on entering this or that information unit on a page. By a graph of relations it is possible to estimate numerically the degree of semantic proximity of two arbitrary concepts. For this purpose it is possible to use different measures of semantic proximity. Recursive measures possess considerable computational complexity at insignificant improvement of quality in test problem solution in comparison with nonrecursive local measures of the Dice measure type that is unacceptable for the ontology large enough. From these considerations the Dice weighted measure is chosen as a basic one for the system under development.

2020 ◽  
Author(s):  
Vadim V. Korolev ◽  
Artem Mitrofanov ◽  
Kirill Karpov ◽  
Valery Tkachenko

The main advantage of modern natural language processing methods is a possibility to turn an amorphous human-readable task into a strict mathematic form. That allows to extract chemical data and insights from articles and to find new semantic relations. We propose a universal engine for processing chemical and biological texts. We successfully tested it on various use-cases and applied to a case of searching a therapeutic agent for a COVID-19 disease by analyzing PubMed archive.


2021 ◽  
Vol 48 (4) ◽  
pp. 41-44
Author(s):  
Dena Markudova ◽  
Martino Trevisan ◽  
Paolo Garza ◽  
Michela Meo ◽  
Maurizio M. Munafo ◽  
...  

With the spread of broadband Internet, Real-Time Communication (RTC) platforms have become increasingly popular and have transformed the way people communicate. Thus, it is fundamental that the network adopts traffic management policies that ensure appropriate Quality of Experience to users of RTC applications. A key step for this is the identification of the applications behind RTC traffic, which in turn allows to allocate adequate resources and make decisions based on the specific application's requirements. In this paper, we introduce a machine learning-based system for identifying the traffic of RTC applications. It builds on the domains contacted before starting a call and leverages techniques from Natural Language Processing (NLP) to build meaningful features. Our system works in real-time and is robust to the peculiarities of the RTP implementations of different applications, since it uses only control traffic. Experimental results show that our approach classifies 5 well-known meeting applications with an F1 score of 0.89.


Proceedings ◽  
2021 ◽  
Vol 77 (1) ◽  
pp. 17
Author(s):  
Andrea Giussani

In the last decade, advances in statistical modeling and computer science have boosted the production of machine-produced contents in different fields: from language to image generation, the quality of the generated outputs is remarkably high, sometimes better than those produced by a human being. Modern technological advances such as OpenAI’s GPT-2 (and recently GPT-3) permit automated systems to dramatically alter reality with synthetic outputs so that humans are not able to distinguish the real copy from its counteracts. An example is given by an article entirely written by GPT-2, but many other examples exist. In the field of computer vision, Nvidia’s Generative Adversarial Network, commonly known as StyleGAN (Karras et al. 2018), has become the de facto reference point for the production of a huge amount of fake human face portraits; additionally, recent algorithms were developed to create both musical scores and mathematical formulas. This presentation aims to stimulate participants on the state-of-the-art results in this field: we will cover both GANs and language modeling with recent applications. The novelty here is that we apply a transformer-based machine learning technique, namely RoBerta (Liu et al. 2019), to the detection of human-produced versus machine-produced text concerning fake news detection. RoBerta is a recent algorithm that is based on the well-known Bidirectional Encoder Representations from Transformers algorithm, known as BERT (Devlin et al. 2018); this is a bi-directional transformer used for natural language processing developed by Google and pre-trained over a huge amount of unlabeled textual data to learn embeddings. We will then use these representations as an input of our classifier to detect real vs. machine-produced text. The application is demonstrated in the presentation.


Author(s):  
J. Matthew Brennan ◽  
Angela Lowenstern ◽  
Paige Sheridan ◽  
Isabel J. Boero ◽  
Vinod H. Thourani ◽  
...  

Background Patients with symptomatic severe aortic stenosis (ssAS) have a high mortality risk and compromised quality of life. Surgical/transcatheter aortic valve replacement (AVR) is a Class I recommendation, but it is unclear if this recommendation is uniformly applied. We determined the impact of managing cardiologists on the likelihood of ssAS treatment. Methods and Results Using natural language processing of Optum electronic health records, we identified 26 438 patients with newly diagnosed ssAS (2011–2016). Multilevel, multivariable Fine‐Gray competing risk models clustered by cardiologists were used to determine the impact of cardiologists on the likelihood of 1‐year AVR treatment. Within 1 year of diagnosis, 35.6% of patients with ssAS received an AVR; however, rates varied widely among managing cardiologists (0%, lowest quartile; 100%, highest quartile [median, 29.6%; 25th–75th percentiles, 13.3%–47.0%]). The odds of receiving AVR varied >2‐fold depending on the cardiologist (median odds ratio for AVR, 2.25; 95% CI, 2.14–2.36). Compared with patients with ssAS of cardiologists with the highest treatment rates, those treated by cardiologists with the lowest AVR rates experienced significantly higher 1‐year mortality (lowest quartile, adjusted hazard ratio, 1.22, 95% CI, 1.13–1.33). Conclusions Overall AVR rates for ssAS were low, highlighting a potential challenge for ssAS management in the United States. Cardiologist AVR use varied substantially; patients treated by cardiologists with lower AVR rates had higher mortality rates than those treated by cardiologists with higher AVR rates.


2018 ◽  
Vol 25 (6) ◽  
pp. 726-733
Author(s):  
Maria S. Karyaeva ◽  
Pavel I. Braslavski ◽  
Valery A. Sokolov

The ability to identify semantic relations between words has made a word2vec model widely used in NLP tasks. The idea of word2vec is based on a simple rule that a higher similarity can be reached if two words have a similar context. Each word can be represented as a vector, so the closest coordinates of vectors can be interpreted as similar words. It allows to establish semantic relations (synonymy, relations of hypernymy and hyponymy and other semantic relations) by applying an automatic extraction. The extraction of semantic relations by hand is considered as a time-consuming and biased task, requiring a large amount of time and some help of experts. Unfortunately, the word2vec model provides an associative list of words which does not consist of relative words only. In this paper, we show some additional criteria that may be applicable to solve this problem. Observations and experiments with well-known characteristics, such as word frequency, a position in an associative list, might be useful for improving results for the task of extraction of semantic relations for the Russian language by using word embedding. In the experiments, the word2vec model trained on the Flibusta and pairs from Wiktionary are used as examples with semantic relationships. Semantically related words are applicable to thesauri, ontologies and intelligent systems for natural language processing.


Author(s):  
Kaan Ant ◽  
Ugur Sogukpinar ◽  
Mehmet Fatif Amasyali

The use of databases those containing semantic relationships between words is becoming increasingly widespread in order to make natural language processing work more effective. Instead of the word-bag approach, the suggested semantic spaces give the distances between words, but they do not express the relation types. In this study, it is shown how semantic spaces can be used to find the type of relationship and it is compared with the template method. According to the results obtained on a very large scale, while is_a and opposite are more successful for semantic spaces for relations, the approach of templates is more successful in the relation types at_location, made_of and non relational.


2019 ◽  
Vol 34 (4) ◽  
pp. 295-310 ◽  
Author(s):  
Huyen T M Nguyen ◽  
Hung V Nguyen ◽  
Quyen T Ngo ◽  
Luong X Vu ◽  
Vu Mai Tran ◽  
...  

Sentiment analysis is a natural language processing (NLP) task of identifying orextracting the sentiment content of a text unit. This task has become an active research topic since the early 2000s. During the two last editions of the VLSP workshop series, the shared task on Sentiment Analysis (SA) for Vietnamese has been organized in order to provide an objective evaluation measurement about the performance (quality) of sentiment analysis tools, and encouragethe development of Vietnamese sentiment analysis systems, as well as to provide benchmark datasets for this task. The rst campaign in 2016 only focused on the sentiment polarity classication, with a dataset containing reviews of electronic products. The second campaign in 2018 addressed the problem of Aspect Based Sentiment Analysis (ABSA) for Vietnamese, by providing two datasets containing reviews in restaurant and hotel domains. These data are accessible for research purpose via the VLSP website vlsp.org.vn/resources. This paper describes the built datasets as well as the evaluation results of the systems participating to these campaigns.


2016 ◽  
Vol 6 (3) ◽  
pp. 258
Author(s):  
Gabriela Mariel Zunino

In order to promote the practical application of psycholinguistic data in educational fields and expecting that this transfer would enhance the development of both the pedagogical field and the investigation in experimental psycholinguistics, we present two experiments to analyse the production of semantic relations in discourse, especially the causality/countercausality dimension. We found that the pattern of causal advantage is cross-wise and consistent in subjects with different levels of formal education, so it could be a suitable scaffold to develop other aspects of discourse comprehension and production. We compare our results with previous findings about discourse comprehension and interpret the data in the framework of educational processes. To use of empirical evidence about language processing on educational fields allows not only to review specific issues such as the characteristics of teaching materials, but also to improve educational process in a comprehensive way, making possible to adapt different approaches to populations with different characteristics.


Vector representations for language have been shown to be useful in a number of Natural Language Processing tasks. In this paper, we aim to investigate the effectiveness of word vector representations for the problem of Sentiment Analysis. In particular, we target three sub-tasks namely sentiment words extraction, polarity of sentiment words detection, and text sentiment prediction. We investigate the effectiveness of vector representations over different text data and evaluate the quality of domain-dependent vectors. Vector representations has been used to compute various vector-based features and conduct systematically experiments to demonstrate their effectiveness. Using simple vector based features can achieve better results for text sentiment analysis of APP.


10.2196/20443 ◽  
2020 ◽  
Vol 22 (7) ◽  
pp. e20443
Author(s):  
Xiaoying Li ◽  
Xin Lin ◽  
Huiling Ren ◽  
Jinjing Guo

Background Licensed drugs may cause unexpected adverse reactions in patients, resulting in morbidity, risk of mortality, therapy disruptions, and prolonged hospital stays. Officially approved drug package inserts list the adverse reactions identified from randomized controlled clinical trials with high evidence levels and worldwide postmarketing surveillance. Formal representation of the adverse drug reaction (ADR) enclosed in semistructured package inserts will enable deep recognition of side effects and rational drug use, substantially reduce morbidity, and decrease societal costs. Objective This paper aims to present an ontological organization of traceable ADR information extracted from licensed package inserts. In addition, it will provide machine-understandable knowledge for bioinformatics analysis, semantic retrieval, and intelligent clinical applications. Methods Based on the essential content of package inserts, a generic ADR ontology model is proposed from two dimensions (and nine subdimensions), covering the ADR information and medication instructions. This is followed by a customized natural language processing method programmed with Python to retrieve the relevant information enclosed in package inserts. After the biocuration and identification of retrieved data from the package insert, an ADR ontology is automatically built for further bioinformatic analysis. Results We collected 165 package inserts of quinolone drugs from the National Medical Products Administration and other drug databases in China, and built a specialized ADR ontology containing 2879 classes and 15,711 semantic relations. For each quinolone drug, the reported ADR information and medication instructions have been logically represented and formally organized in an ADR ontology. To demonstrate its usage, the source data were further bioinformatically analyzed. For example, the number of drug-ADR triples and major ADRs associated with each active ingredient were recorded. The 10 ADRs most frequently observed among quinolones were identified and categorized based on the 18 categories defined in the proposal. The occurrence frequency, severity, and ADR mitigation method explicitly stated in package inserts were also analyzed, as well as the top 5 specific populations with contraindications for quinolone drugs. Conclusions Ontological representation and organization using officially approved information from drug package inserts enables the identification and bioinformatic analysis of adverse reactions caused by a specific drug with regard to predefined ADR ontology classes and semantic relations. The resulting ontology-based ADR knowledge source classifies drug-specific adverse reactions, and supports a better understanding of ADRs and safer prescription of medications.


Sign in / Sign up

Export Citation Format

Share Document