scholarly journals Retrieval of source documents in a text reuse system

2020 ◽  
Vol 8 (2) ◽  
pp. 140-149
Author(s):  
Nathaniel Clarence Haryanto ◽  
Lucia Dwi Krisnawati ◽  
Antonius Rachmat Chrismanto

The architecture of the text-reuse detection system consists of three main modules, i.e., source retrieval, text analysis, and knowledge-based postprocessing. Each module plays an important role in the accuracy rate of the detection outputs. Therefore, this research focuses on developing the source retrieval system in cases where the source documents have been obfuscated in different levels. Two steps of term weighting were applied to get such documents. The first was the local-word weighting, which has been applied to the test or reused documents to select query per text segments. The tf-idf term weighting was applied for indexing all documents in the corpus and as the basis for computing cosine similarity between the queries per segment and the documents in the corpus. A two-step filtering technique was applied to get the source document candidates. Using artificial cases of text reuse testing, the system achieves the same rates of precision and recall that are 0.967, while the recall rate for the simulated cases of reused text is 0.66.

2018 ◽  
Vol 14 (3) ◽  
pp. 69-89 ◽  
Author(s):  
Caiquan Xiong ◽  
Xuan Li ◽  
Yuan Li ◽  
Gang Liu

In an Online Argumentation Platform, a great deal of speech messages are produced. To find similar speech texts and extract their common summary is of great significance for improving the efficiency of argumentation and promoting consensus building. In this article, a method of speech text analysis is proposed. Firstly, a heuristic clustering algorithm is used to cluster the speech texts and obtain similar text sets. Then, an improved TextRank algorithm is used to extract a multi-document summary, and the results of the summary are fed back to experts (i.e. participants). The method of multi-document summarization is based on TextRank, which takes into account the position of sentences in paragraphs, the weight of the key sentence, and the length of the sentence. Finally, a prototype system is developed to verify the validity of the method using the four evaluation parameters of recall rate, accuracy rate, F-measure, and user feedback. The experimental results show that the method has a good performance in the system.


Author(s):  
Mikhail Tarasov

The article deals with the narrative text construction. The study thoroughly analyzes cognitive models that can become the basis of this process. Firstly, the author is studying the theory of rhetoricalcommonplaces. The article shows that this theory is suitable for constructing a rhetorical text, but not a narrative one. The second model discussed is the concept model. The article argues that this model is most convenient for text analysis, but not for its formation. Marvin Minsky's frame theory is analyzed in detail. It is stated that the theory of frames and individual narrative concepts, in particular those formulated by R. Barth, have much in common. It is concluded that the theory of frames can be perceived as the ontological basis of the narrative scientific description. In addition, the article briefly discusses the cognitive model by R. Quillian and R. Langacker. Their essence is to highlight the main and secondary content in the text. The possibility of using these models in the text analysis and its synthesis is proved by their conceptual similarity with G.Y. Solganik’s analysis of the novel by L. Tolstoy. Special attention is paid to the theory of R. Abelson. It is argued that the proposed hierarchy of cognitive structures has a generalizing character and is adequate to the text. The article gives an example based on a local narrative figure analysis undertaken by V.V. Vinogradov. The paper indicates the possibility to describe this figure within Abelson's theory. As a result of different cognitive models and narrative conceptscomparison, the article formulates the sequence of stages in the analysis and synthesis of text units found at different levels. The first stage of this sequence is the narrative figures analysis. The second one is the analysis of episodes, which are narrative figures associations. The third one is the analysis of the text plot structures. It is proposed to consider text units as realizations of cognitive structures. It is argued that the cognitive approach to the narrative provides its holistic and detailed adequate description.


2018 ◽  
Vol 10 (3) ◽  
pp. 10-29 ◽  
Author(s):  
George Shaker ◽  
Karly Smith ◽  
Ala Eldin Omer ◽  
Shuo Liu ◽  
Clement Csech ◽  
...  

This article discusses recent developments in the authors' experiments using Google's Soli alpha kit to develop a non-invasive blood glucose detection system. The Soli system (co-developed by Google and Infineon) is a 60 GHz mm-wave radar that promises a small, mobile, and wearable platform intended for gesture recognition. They have retrofitted the setup for the system and their experiments outline a proof-of-concept prototype to detect changes of the dielectric properties of solutions with different levels of glucose and distinguish between different concentrations. Preliminary results indicated that mm-waves are suitable for glucose detection among biological mediums at concentrations similar to blood glucose concentrations of diabetic patients. The authors discuss improving the repeatability and scalability of the system, other systems of glucose detection, and potential user constraints of implementation.


2020 ◽  
Vol 26 (4) ◽  
pp. 496-507
Author(s):  
Kheir Daouadi ◽  
Rim Rebaï ◽  
Ikram Amous

Nowadays, bot detection from Twitter attracts the attention of several researchers around the world. Different bot detection approaches have been proposed as a result of these research efforts. Four of the main challenges faced in this context are the diversity of types of content propagated throughout Twitter, the problem inherent to the text, the lack of sufficient labeled datasets and the fact that the current bot detection approaches are not sufficient to detect bot activities accurately. We propose, Twitterbot+, a bot detection system that leveraged a minimal number of language-independent features extracted from one single tweet with temporal enrichment of a previously labeled datasets. We conducted experiments on three benchmark datasets with standard evaluation scenarios, and the achieved results demonstrate the efficiency of Twitterbot+ against the state-of-the-art. This yielded a promising accuracy results (>95%). Our proposition is suitable for accurate and real-time use in a Twitter data collection step as an initial filtering technique to improve the quality of research data.


2021 ◽  
Vol 15 (1) ◽  
pp. 81-92
Author(s):  
Linyang Yan ◽  
Sun-Woo Ko

Introduction: Traffic accidents are easy to occur in the tunnel due to its special environment, and the consequences are very serious. The existing vehicle accident detection system and CCTV system have the issues of low detection rate. Methods: A method of using Mel Frequency Cepstrum Coefficient (MFCC) to extract sound features and using a deep neural network (DNN) to learn sound features is proposed to distinguish accident sound from the non-accident sound. Results and Discussion: The experimental results show that the method can effectively classify accident sound and non-accident sound, and the recall rate can reach more than 78% by setting appropriate neural network parameters. Conclusion: The method proposed in this research can be used to detect tunnel accidents and consequently, accidents can be detected in time and avoid greater disasters.


2021 ◽  
Vol 11 (22) ◽  
pp. 10976
Author(s):  
Rana Almohaini ◽  
Iman Almomani ◽  
Aala AlKhayer

Android ransomware is one of the most threatening attacks that is increasing at an alarming rate. Ransomware attacks usually target Android users by either locking their devices or encrypting their data files and then requesting them to pay money to unlock the devices or recover the files back. Existing solutions for detecting ransomware mainly use static analysis. However, limited approaches apply dynamic analysis specifically for ransomware detection. Furthermore, the performance of these approaches is either poor or often fails in the presence of code obfuscation techniques or benign applications that use cryptography methods for their APIs usage. Additionally, most of them are unable to detect ransomware attacks at early stages. Therefore, this paper proposes a hybrid detection system that effectively utilizes both static and dynamic analyses to detect ransomware with high accuracy. For the static analysis, the proposed hybrid system considered more than 70 state-of-the-art antivirus engines. For the dynamic analysis, this research explored the existing dynamic tools and conducted an in-depth comparative study to find the proper tool to integrate it in detecting ransomware whenever needed. To evaluate the performance of the proposed hybrid system, we analyzed statically and dynamically over one hundred ransomware samples. These samples originated from 10 different ransomware families. The experiments’ results revealed that static analysis achieved almost half of the detection accuracy—ranging around 40–55%, compared to the dynamic analysis, which reached a 100% accuracy rate. Moreover, this research reports some of the high API classes, methods, and permissions used in these ransomware apps. Finally, some case studies are highlighted, including failed running apps and crypto-ransomware patterns.


2021 ◽  
Vol 15 ◽  
Author(s):  
Xin Li ◽  
Yonggang Li ◽  
Renchao Wu ◽  
Can Zhou ◽  
Hongqiu Zhu

This paper is concerned with the problem of short circuit detection in infrared image for metal electrorefining with an improved Faster Region-based Convolutional Neural Network (Faster R-CNN). To address the problem of insufficient label data, a framework for automatically generating labeled infrared images is proposed. After discussing factors that affect sample diversity, background, object shape, and gray scale distribution are established as three key variables for synthesis. Raw infrared images without fault are used as backgrounds. By simulating the other two key variables on the background, different classes of objects are synthesized. To improve the detection rate of small scale targets, an attention module is introduced in the network to fuse the semantic segment results of U-Net and the synthetic dataset. In this way, the Faster R-CNN can obtain rich representation ability about small scale object on the infrared images. Strategies of parameter tuning and transfer learning are also applied to improve the detection precision. The detection system trains on only synthetic dataset and tests on actual images. Extensive experiments on different infrared datasets demonstrate the effectiveness of the synthetic methods. The synthetically trained network obtains a mAP of 0.826, and the recall rate of small latent short circuit is superior to that of Faster R-CNN and U-Net, effectively avoiding short-circuit missed detection.


Sensors ◽  
2019 ◽  
Vol 19 (8) ◽  
pp. 1937 ◽  
Author(s):  
Adam Stawiarski ◽  
Aleksander Muc

In this paper, the elastic wave propagation method was used in damage detection in thin structures. The effectiveness and accuracy of the system based on the wave propagation phenomenon depend on the number and localization of the sensors. The utilization of the piezoelectric (PZT) transducers makes possible to build a low-cost damage detection system that can be used in structural health monitoring (SHM) of the metallic and composite structures. The different number and localization of transducers were considered in the numerical and experimental analysis of the wave propagation phenomenon. The relation of the sensors configuration and the damage detection capability was demonstrated. The main assumptions and requirements of SHM systems of different levels were discussed with reference to the damage detection expectations. The importance of the damage detection system constituents (sensors number, localization, or damage index) in different levels of analysis was verified and discussed to emphasize that in many practical applications introducing complicated procedures and sophisticated data processing techniques does not lead to improving the damage detection efficiency. Finally, the necessity of the appropriate formulation of SHM system requirements and expectations was underlined to improve the effectiveness of the detection methods in particular levels of analysis and thus to improve the safety of the monitored structures.


Author(s):  
Hari K. Sarma ◽  
Ramana V. Grandhi ◽  
Ronald F. Taylor

This paper discusses the framework of a knowledge-based expert system environment to design aerospace structures under structural and aerodynamic constraints using ASTROS (Automated Structural Optimization program). ASTROS is a synthesis tool built around the NASTRAN finite element program. The knowledge base capabilities are discussed for synthesizing in statics, normal mode, steady and unsteady aerodynamic disciplines. A description of the two ASTROS advisor modules, the Editor/Bulk Data generator and Post-processor, is included. Experiences and issues involved in hierarchical representation of knowledge as menu options at different levels of abstraction are presented. Illustrative examples of the advisor in designing airframe structures are also included.


2011 ◽  
Vol 204-210 ◽  
pp. 2171-2175
Author(s):  
Zi Yu Liu ◽  
Dong Li Zhang ◽  
Xue Hui Li

Domain ontology can effectively organize the knowledge of that domain and make it easier to share and reuse. We can build domain ontology on thesaurus and thematic words and index document knowledge using domain ontology. Under which this paper designs a semantic retrieval system for the document knowledge based on domain ontology, and the system consists of four main components: ontology query, semantic precomputation for document and the concept similarity, semantic extended search and reasoning search. Finally, this paper makes an experiment on high-speed railway domain. The experimental results show that the developed semantic retrieval system can reach the satisfied recall and precision.


Sign in / Sign up

Export Citation Format

Share Document