A NEW APPROACH FOR TEXT SIMILARITY USING ARTICLES

Author(s):  
ELSAYED ATLAM

Conventional approaches to text analysis and information retrieval which measured document similarity by considering all information in texts are relatively inefficiency for processing large text collections in heterogeneous subject areas. Previous researches showed that evidence from passage can improve retrieval results. But it also raised questions about how passage is defined, how they can be ranked efficiently, and what is their proper rule in long structure documents. Moreover, the frequency of "the" with important sentence is efficiently to summarize the text by dexterity way. We previously proposed an approach for extracting sentences which including article "the" by some restrict rules to carry out effectiveness passages. Based on previous approaches, this paper presents a new Passage SIMilarity (P-SIM) measurements between documents based on effectiveness passages after extracting them using article "the". Moreover, our new approach showing that this method is more efficient than traditional methods. Also, Recall and Precision are achieved by 92.6% and 97.5% respectively, depending on extracted passages. Furthermore, Recall and Precision significantly improved by 38.3% and 44.2% over the traditional method. The proposed methods are applied to 3,990 articles from the large tagged corpus.

1991 ◽  
Vol 30 (04) ◽  
pp. 275-283 ◽  
Author(s):  
P. M. Pietrzyk

Abstract:Much information about patients is stored in free text. Hence, the computerized processing of medical language data has been a well-known goal of medical informatics resulting in different paradigms. In Gottingen, a Medical Text Analysis System for German (abbr. MediTAS) has been under development for some time, trying to combine and to extend these paradigms. This article concentrates on the automated syntax analysis of German medical utterances. The investigated text material consists of 8,790 distinct utterances extracted from the summary sections of about 18,400 cytopathological findings reports. The parsing is based upon a new approach called Left-Associative Grammar (LAG) developed by Hausser. By extending considerably the LAG approach, most of the grammatical constructions occurring in the text material could be covered.


2012 ◽  
Vol 2012 ◽  
pp. 1-8 ◽  
Author(s):  
Anis Zouaghi ◽  
Mounir Zrigui ◽  
Georges Antoniadis ◽  
Laroussi Merhbene

We propose a new approach for determining the adequate sense of Arabic words. For that, we propose an algorithm based on information retrieval measures to identify the context of use that is the closest to the sentence containing the word to be disambiguated. The contexts of use represent a set of sentences that indicates a particular sense of the ambiguous word. These contexts are generated using the words that define the senses of the ambiguous words, the exact string-matching algorithm, and the corpus. We use the measures employed in the domain of information retrieval, Harman, Croft, and Okapi combined to the Lesk algorithm, to assign the correct sense of those proposed.


Author(s):  
Rohan Nanda ◽  
Llio Humphreys ◽  
Lorenzo Grossio ◽  
Adebayo Kolawole John

This paper presents a multilingual legal information retrieval system for mapping recitals to articles in European Union (EU) directives and normative provisions in national legislation. Such a system could be useful for purposive interpretation of norms. A previous work on mapping recitals and normative provisions was limited to EU legislation in English and only one lexical text similarity technique. In this paper, we develop state-of-the-art text similarity models to investigate the interplay between directive recitals, directive (sub-)articles and provisions of national implementing measures (NIMs) on a multilingual corpus (from Ireland, Italy and Luxembourg). Our results indicate that directive recitals do not have a direct influence on NIM provisions, but they sometimes contain additional information that is not present in the transposed directive sub-article, and can therefore facilitate purposive interpretation.


2019 ◽  
Vol 16 (5) ◽  
pp. 572-579 ◽  
Author(s):  
E. A. Maksimov ◽  
E. P. Chelyabinsk

Introduction. Traction power of the car is used to determine its traction-speed properties. The purpose of the paper is the calculation refinement of the car traction power.Materials and methods. The authors used the methodology of the refined calculation of the car traction power.Results. The authors carried out the comparative analysis of the refined and traditional methods for calculating traction power. As a result, the authors obtained the refined equation for calculating the traction power, taking into account the elastic modulus, the width of the contact track, the free radius of the wheel, the deflection of the tire and the tangential friction forces in the contact zone. The largest discrepancy between the curve of the vehicle’s traction power calculated by the updated methodology and the curve of the vehicle’s traction power calculated by the traditional method was 26.8%.Discussion and conclusions. The results of the research are useful to specialists of automobile and transport enterprises and masters of universities to compare the traction and speed properties of the various car types.


1985 ◽  
Vol 17 (1) ◽  
pp. 9-22 ◽  
Author(s):  
Rembrand B.R.C. Zenner ◽  
Rita M.M. De Caluwe ◽  
Etienne E. Kerre

2018 ◽  
Vol 10 (1) ◽  
pp. 25-33
Author(s):  
Khaireddine Bacha

The automatic processing of the Arabic language is a growing discipline, in which one sees more and more research and technologies to examine the specificities of this language and to propose tools necessary to the development of its automatic processing. The old techniques of rooting have limits that weaken the process of root extraction. In this article, the author proposes a new approach to rooting based on two finite state automata. The technique proposed is based on finite state automata in the root extraction process, with the aim of minimizing the error rate and ambiguity, usually due to the removal of the affixes. The author is currently focusing on the development and improvement of the rooting technique while trying to overcome the various problems encountered. The author is working on the compilation of a corpus of evaluation which will allow him to evaluate and compare their approach to others


Sign in / Sign up

Export Citation Format

Share Document