Ranking of documents of topical corpus according to their mutual relevance in the problem of estimating of affinity of a text to the sense standard

D V Mikhaylov; G M Emelyanov

doi:10.1088/1742-6596/2052/1/012027

Ranking of documents of topical corpus according to their mutual relevance in the problem of estimating of affinity of a text to the sense standard

Journal of Physics Conference Series ◽

10.1088/1742-6596/2052/1/012027 ◽

2021 ◽

Vol 2052 (1) ◽

pp. 012027

Author(s):

D V Mikhaylov ◽

G M Emelyanov

Keyword(s):

Natural Language ◽

Text Corpus ◽

The Individual

Abstract The offered paper is devoted to the problem of oneness and integrity of image for the semantic pattern (i.e., sense standard) revealed phrase by phrase for some text within a topical collection. One phrase corresponds here to an extended natural-language sentence. The basis of estimating affinity to the standard is the classifying of words of each phrase in a text according to the TF-IDF value relative to some text corpus. Texts to the corpus are pre-selected by an expert. The essence of the problem: for each phrase, its maximal affinity to the sense standard is achieved concerning the individual corpus document, and, consequently, it is necessary to estimate the mutual relevance of such documents concerning different phrases of the analyzed text. Based on distances between vectors of TF-IDF for words of a separate phrase obtained relative to different corpus documents, the significance estimation for each such document is entered into consideration to choose a pair of mutual relevant.

Download Full-text

Does higher education properly prepare graduates for the growing artificial intelligence market? Gaps identification using text mining

Human Systems Management ◽

10.3233/hsm-211179 ◽

2021 ◽

pp. 1-13

Author(s):

Lamiae Benhayoun ◽

Daniel Lang

Keyword(s):

Artificial Intelligence ◽

Natural Language Processing ◽

Text Mining ◽

Natural Language ◽

Language Processing ◽

Academic Training ◽

Market Requirements ◽

Job Advertisements ◽

The Individual

BACKGROUND: The renewed advent of Artificial Intelligence (AI) is inducing profound changes in the classic categories of technology professions and is creating the need for new specific skills. OBJECTIVE: Identify the gaps in terms of skills between academic training on AI in French engineering and Business Schools, and the requirements of the labour market. METHOD: Extraction of AI training contents from the schools’ websites and scraping of a job advertisements’ website. Then, analysis based on a text mining approach with a Python code for Natural Language Processing. RESULTS: Categorization of occupations related to AI. Characterization of three classes of skills for the AI market: Technical, Soft and Interdisciplinary. Skills’ gaps concern some professional certifications and the mastery of specific tools, research abilities, and awareness of ethical and regulatory dimensions of AI. CONCLUSIONS: A deep analysis using algorithms for Natural Language Processing. Results that provide a better understanding of the AI capability components at the individual and the organizational levels. A study that can help shape educational programs to respond to the AI market requirements.

Download Full-text

Platform for Integrating and Testing Tools which Transform Natural Language Queries into SPARQL Queries

Vestnik NSU Series Information Technologies ◽

10.25205/1818-7900-2019-17-2-138-152 ◽

2019 ◽

Vol 17 (2) ◽

pp. 138-152

Author(s):

I. S. Postanogov ◽

I. A. Turova

Keyword(s):

Natural Language ◽

Data Flow ◽

Data Access ◽

Building Blocks ◽

Third Party ◽

Flow Diagram ◽

Graphical Form ◽

Testing Tools ◽

Party Transformation ◽

The Individual

In the paper we discuss how to support the process of creating tools which transform natural language (NL) queries into SPARQL queries (hereinafter referred to as a transformation tool). In the introduction, we describe the relevance of the task of understanding natural language queries by information systems, as well as the advantages of using ontologies as a means of representing knowledge for solving this problem. This ontology-based data access approach can be also used in systems which provide natural language interface to databases. Based on the analysis of problems related to the integration and testing of existing transformation tools, as well as to support the creation and testing own transformation modules, the concept of a software platform that simplifies these tasks is proposed. The platform architecture satisfies the requirements for ease of connecting third party transformation tools, reusing individual modules, as well as integrating the resulting transformation tools into other systems, including testing systems. The building blocks of the created transformation systems are the individual transformation modules packaged in Docker containers. Program access to each module is carried out using gRPC. Modules loaded into the platform can be built into the transformation pipeline automatically or manually using the built-in third party SciVi data flow diagram editor. Compatibility of individual modules is controlled by automatic analysis of application programming interfaces. The resulting pipeline is combined according to specified data flow into a single multi-container application that can be integrated into other systems, as well as tested on extendable test suites. The expected and actual results of the query transformation are available for viewing in graphical form in the visualization tool developed earlier.

Download Full-text

Modeling the Relationship between a Human and a Malicious Artificial Intelligence, Natural-Language ’Bot in an Immersive Virtual World

Digital Democracy and the Impact of Technology on Governance and Politics ◽

10.4018/978-1-4666-3637-8.ch016 ◽

2013 ◽

pp. 287-306

Author(s):

Shalin Hai-Jew

Keyword(s):

Artificial Intelligence ◽

Natural Language ◽

Virtual Worlds ◽

Virtual World ◽

Real Life ◽

The Individual ◽

The Relationship

People go to virtual immersive spaces online to socialize through their human-embodied avatars. Through the “passing stranger” phenomenon, many make fast relationships and share intimate information with the idea that they will not deal with the individual again. Others, though, pursue longer-term relationships from the virtual into Real Life (RL). Many do not realize that they are interacting with artificial intelligence ’bots with natural language capabilities. This chapter models some implications of malicious AI natural language ’bots in immersive virtual worlds (as socio-technical spaces). For simplicity, this is referred to as a one-on-one, but that is not to assume that various combinations of malicious ’bots or those that are occasionally human-embodied may not be deployed for the same deceptive purposes.

Download Full-text

Language contact phenomena in the language use of speakers of German descent and the significance of their language attitudes

Linguistik Online ◽

10.13092/lo.64.1379 ◽

2014 ◽

Vol 64 (2) ◽

Author(s):

Veronika Ries

Keyword(s):

Qualitative Research ◽

Natural Language ◽

Language Contact ◽

Language Use ◽

Language Attitudes ◽

Structural Characteristics ◽

Data Types ◽

Different Types ◽

The Individual ◽

Individual Speaker

Within the scope of my investigation on language use and language attitudes of People of German Descent from the USSR, I find almost regular different language contact phenomena, such as viel bliny habn=wir gbackt (engl.: 'we cooked lots of pancakes') (cf. Ries 2011). The aim of analysis is to examine both language use with regard to different forms of language contact and the language attitudes of the observed speakers. To be able to analyse both of these aspects and synthesize them, different types of data are required. The research is based on the following two data types: everyday conversations and interviews. In addition, the individual speakers' biography is a key part of the analysis, because it allows one to draw conclusions about language attitudes and use. This qualitative research is based on morpho-syntactic and interactional linguistic analysis of authentic spoken data. The data arise from a corpus compiled and edited by myself. My being a member of the examined group allowed me to build up an authentic corpus. The natural language use is analysed from the perspective of different language contact phenomena and potential functions of language alternations. One central issue is: How do speakers use the languages available to them, German and Russian? Structural characteristics such as code switching and discursive motives for these phenomena are discussed as results, together with the socio-cultural background of the individual speaker. Within the scope of this article I present exemplarily the data and results of one speaker.

Download Full-text

Text corpus for natural language story-telling sentence generation: A design and evaluation

2014 11th International Joint Conference on Computer Science and Software Engineering (JCSSE) ◽

10.1109/jcsse.2014.6841846 ◽

2014 ◽

Author(s):

Worasa Limpanadusadee ◽

Proadpran Punyabukkana ◽

Atiwong Suchato ◽

Onintra Poobrasert

Keyword(s):

Natural Language ◽

Story Telling ◽

Sentence Generation ◽

Text Corpus

Download Full-text

A New Statistical and Verbal-Semantic Approach to Pattern Extraction in Text Mining Applications

CLEI electronic journal ◽

10.19153/cleiej.22.3.5 ◽

2019 ◽

Vol 22 (3) ◽

Author(s):

Dildre Georgiana Vasques ◽

Paulo Sérgio Martins ◽

Solange Oliveira Rezende

Keyword(s):

Text Mining ◽

Natural Language ◽

Alternative Medicine ◽

Complex Networks ◽

Implicit Knowledge ◽

Semantic Approach ◽

Text Corpus ◽

Useful Knowledge ◽

Textual Databases

The discovery of knowledge in textual databases is an approach that basically seeks for implicitrelationships between different concepts in different documents written in natural language, inorder to identify new useful knowledge. To assist in this process, this approach can count on thehelp of Text Mining techniques. Despite all the progress made, researchers in this area must stilldeal with the large number of false relationships generated by most of the available processes.A statistical and verbal semantic approach that supports the understanding of the logic betweenrelationships may bridge this gap. Thus, the objective of this work is to support the user with theidentification of implicit relationships between concepts present in different texts, consideringthe causal relationships between concepts in the texts. To this end, this work proposes a hybridapproach for the discovery of implicit knowledge present in a text corpus, using analysis based onassociation rules together with metrics from complex networks and verbal semantics. Througha case study, a set of texts from alternative medicine was selected and the different extractionsshowed that the proposed approach facilitates the identification of implicit knowledge by theuser

Download Full-text

Personally Identifiable Information (PII) Detection in the Unstructured Large Text Corpus using Natural Language Processing and Unsupervised Learning Technique

International Journal of Advanced Computer Science and Applications ◽

10.14569/ijacsa.2021.0120957 ◽

2021 ◽

Vol 12 (9) ◽

Author(s):

Poornima Kulkarni ◽

Cauvery N K

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Unsupervised Learning ◽

Language Processing ◽

Text Corpus ◽

Personally Identifiable Information ◽

Learning Technique

Download Full-text

MANSI POET ANDREY TARKHANOV: REALITIES AND PARADOXES OF THE ARTISTIC WORLD

Yearbook of Finno-Ugric Studies ◽

10.35634/2224-9443-2021-15-1-72-82 ◽

2021 ◽

Vol 15 (1) ◽

pp. 72-82

Author(s):

Alexey Andreevich Arzamazov

Keyword(s):

Time History ◽

Russian Literature ◽

Vicious Circle ◽

National Literature ◽

Text Corpus ◽

Lyrical Subject ◽

The Individual ◽

Artistic Tradition ◽

Ritual Practices

The article comprehensively examines the poetry of the classic of Mansi literature Andrei Tarkhanov. The place of the national literature in the system of the vast interliterary community of the peoples of the circumpolar zone is determined, the genetic connection of the Mansi artistic tradition with mythological ideas and folklore is emphasized. The most common, creatively significant themes and problems are identified: a great author’s interest in the themes of time, history, cultural and civilizational identity of Russia is shown, the individual author’s artistic panorama of the post-Soviet era is analyzed, the socio-psychological contexts of the experience of the author / lyrical subject of modernity are considered. The plots of Tarkhanov’s appeal to Russian literature are studied, attention is paid to the study of the personological aspect. The focus of the reading falls on the poet’s numerous attempts to comprehend Christian narratives in poetry, to delve into the axiology and ritual practices of Orthodoxy. When we interpret the text corpus of the Mansi poet, we take into account his linguistic choice, which significantly reduces the associative and meaningful contexts with “his” Mansi literature. In general, the realities and paradoxes of A. Tarkhanov’s artistic worldview are identified, which are manifested at the figurative-symbolic level and in the author’s choice of themes, situations of poetic actualization and aesthetic guidelines. Andrei Tarkhanov was less a Mansi poet in his inner nature than the legendary Yuvan Shestalov. It seems that he wanted to break out of the Mansi “vicious circle” and at the same time understood that a creative break with his roots, ethnocultural origins would significantly affect the quality and originality of his texts.He constantly had to balance between the Russian and Mansi worlds, artistic mentality.

Download Full-text

Natural Language Processing utilization in Healthcare

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f1305.0886s219 ◽

2019 ◽

Vol 8 (6S2) ◽

pp. 1117-1120

Keyword(s):

Natural Language Processing ◽

Side Effects ◽

Natural Language ◽

Language Processing ◽

Clinical Results ◽

Clinical Informatics ◽

The Other ◽

Psychological Wellness ◽

The Individual ◽

Assessment Strategies

The significance of consolidating Natural Language Processing (NLP) techniques in clinical informatics research has been progressively perceived over the previous years, and has prompted transformative advances. Ordinarily, clinical NLP frameworks are created and assessed on word, sentence, or record level explanations that model explicit traits and highlights, for example, archive content (e.g., persistent status, or report type), record segment types (e.g., current meds, past restorative history, or release synopsis), named substances and ideas (e.g., analyses, side effects, or medicines) or semantic qualities (e.g., nullification, seriousness, or fleetingness). While some NLP undertakings consider expectations at the individual or gathering client level, these assignments still establish a minority. Here we give an expansive synopsis and layout of the difficult issues engaged with characterizing suitable natural and outward assessment strategies for NLP look into that will be utilized for clinical results research, and the other way around. A specific spotlight is set on psychological wellness investigate, a zone still generally understudied by the clinical NLP look into network, however where NLP techniques are of prominent importance. Ongoing advances in clinical NLP strategy improvement have been huge, yet we propose more accentuation should be put on thorough assessment for the field to progress further. To empower this, we give noteworthy recommendations, including an insignificant convention that could be utilized when announcing clinical NLP strategy improvement and its assessment.

Download Full-text

Extracting Definitional Contexts in Spanish Through the Identification of Hyponymy-Hyperonymy Relations

Advances in Linguistics and Communication Studies - Modern Computational Models of Semantic Discovery in Natural Language ◽

10.4018/978-1-4666-8690-8.ch003 ◽

2015 ◽

pp. 48-70 ◽

Cited By ~ 1

Author(s):

Olga Acosta ◽

Gerardo Sierra ◽

César Aguilar

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Important Task ◽

Automatic Extraction ◽

Text Corpus

The automatic extraction of hyponymy-hypernymy relations in text corpus is one important task in Natural Language Processing. This chapter proposes a method for automatically extracting a set of hyponym-hyperonym pairs from a medical corpus in Spanish, expressed in analytical definitions. This kind of definition is composed by a term (the hyponym), a genus term (the hyperonym), and one or more differentiae, that is, a set of particular features proper to the defined term, e.g.: conjunctivitis is an infection of the conjunctiva of the eye. Definitions are obtained from definitional contexts, and then sequences of term and genus term. Then, the most frequent hyperonyms are used in order to filter relevant definitions. Additionally, using a bootstrapping technique, new hyponym candidates are extracted from the corpus, based on the previous set of hyponyms/hyperonyms detected.

Download Full-text