topic segmentation
Recently Published Documents


TOTAL DOCUMENTS

86
(FIVE YEARS 6)

H-INDEX

13
(FIVE YEARS 0)

Author(s):  
Volodymyr Sokol ◽  
Vitalii Krykun ◽  
Mariia Bilova ◽  
Ivan Perepelytsya ◽  
Volodymyr Pustovarov ◽  
...  

The demand for the creation of information systems that simplifies and accelerates work has greatly increased in the context of the rapidinformatization of society and all its branches. It provokes the emergence of more and more companies involved in the development of softwareproducts and information systems in general. In order to ensure the systematization, processing and use of this knowledge, knowledge managementsystems are used. One of the main tasks of IT companies is continuous training of personnel. This requires export of the content from the company'sknowledge management system to the learning management system. The main goal of the research is to choose an algorithm that allows solving theproblem of marking up the text of articles close to those used in knowledge management systems of IT companies. To achieve this goal, it is necessaryto compare various topic segmentation methods on a dataset with a computer science texts. Inspec is one such dataset used for keyword extraction andin this research it has been adapted to the structure of the datasets used for the topic segmentation problem. The TextTiling and TextSeg methods wereused for comparison on some well-known data science metrics and specific metrics that relate to the topic segmentation problem. A new generalizedmetric was also introduced to compare the results for the topic segmentation problem. All software implementations of the algorithms were written inPython programming language and represent a set of interrelated functions. Results were obtained showing the advantages of the Text Seg method incomparison with TextTiling when compared using classical data science metrics and special metrics developed for the topic segmentation task. Fromall the metrics, including the introduced one it can be concluded that the TextSeg algorithm performs better than the TextTiling algorithm on theadapted Inspec test data set.


2021 ◽  
pp. 517-531
Author(s):  
Minghui Wang ◽  
Ping Xue ◽  
Ying Li ◽  
Zhonghai Wu

2018 ◽  
Vol 24 (6) ◽  
pp. 921-946
Author(s):  
PEDRO MOTA ◽  
MAXINE ESKENAZI ◽  
LUÍSA COHEUR

AbstractResearch on topic segmentation has recently focused on segmenting documents by taking advantage of documents covering the same topics. In order to properly evaluate such approaches, a dataset of related documents is needed. However, existing datasets are limited in the number of related documents per domain. In addition, most of the available datasets do not consider documents from different media sources (PowerPoints, videos, etc.), which pose specific challenges to segmentation. We fill this gap with the MUltimedia SEgmentation Dataset (MUSED), a collection of documents manually segmented, from different media sources, in seven different domains, with an average of twenty related documents per domain. In this paper, we describe the process of building MUSED. A multi-annotator study is carried out to determine if it is possible to observe agreement among human judges and characterize their disagreement patterns. In addition, we use MUSED to compare the state-of-the-art topic segmentation techniques, including the ones that take advantage of related documents. Moreover, we study the impact of having documents from different media sources in the dataset. To the best of our knowledge, MUSED is the first dataset that allows a straightforward evaluation of both single- and multiple-documents topic segmentation techniques, as well as to study how these behave in the presence of documents from different media sources. Results show that some techniques are, indeed, sensitive to different media sources, and also that current multi-document segmentation models do not outperform previous models, pointing to a research line that needs to be boosted.


2018 ◽  
Author(s):  
Eduardo R. Soares ◽  
Eduardo Barrére

Nowadays, video lectures are a very popular way to transmit knowledge, and because of that, there are many repositories with a large catalog of those videos on web. Despite all benefits that this high availability of video lectures brings, some problems also emerge from this scenario. One of these problems is that, it is very difficult find relevant content associate with those videos. Many times, students must to watch the entire video lecture to find the point of interest and, sometimes, these points are not found. For that reason, the proposal of this master’s project is to investigate and propose a novel framework based on early fusion of low and high-level audio features enriched with external knowledge from open databases for automatic topic segmentation in video lectures. We have performed preliminary experiments in two sets of video lectures using the current state of our work. The obtained results were very satisfactory, which evidences the potential of our proposal.


Sign in / Sign up

Export Citation Format

Share Document