scholarly journals TOPIC SEGMENTATION METHODS COMPARISON ON COMPUTER SCIENCE TEXTS

Author(s):  
Volodymyr Sokol ◽  
Vitalii Krykun ◽  
Mariia Bilova ◽  
Ivan Perepelytsya ◽  
Volodymyr Pustovarov ◽  
...  

The demand for the creation of information systems that simplifies and accelerates work has greatly increased in the context of the rapidinformatization of society and all its branches. It provokes the emergence of more and more companies involved in the development of softwareproducts and information systems in general. In order to ensure the systematization, processing and use of this knowledge, knowledge managementsystems are used. One of the main tasks of IT companies is continuous training of personnel. This requires export of the content from the company'sknowledge management system to the learning management system. The main goal of the research is to choose an algorithm that allows solving theproblem of marking up the text of articles close to those used in knowledge management systems of IT companies. To achieve this goal, it is necessaryto compare various topic segmentation methods on a dataset with a computer science texts. Inspec is one such dataset used for keyword extraction andin this research it has been adapted to the structure of the datasets used for the topic segmentation problem. The TextTiling and TextSeg methods wereused for comparison on some well-known data science metrics and specific metrics that relate to the topic segmentation problem. A new generalizedmetric was also introduced to compare the results for the topic segmentation problem. All software implementations of the algorithms were written inPython programming language and represent a set of interrelated functions. Results were obtained showing the advantages of the Text Seg method incomparison with TextTiling when compared using classical data science metrics and special metrics developed for the topic segmentation task. Fromall the metrics, including the introduced one it can be concluded that the TextSeg algorithm performs better than the TextTiling algorithm on theadapted Inspec test data set.

Author(s):  
Yang-Hui He

Calabi-Yau spaces, or Kähler spaces admitting zero Ricci curvature, have played a pivotal role in theoretical physics and pure mathematics for the last half century. In physics, they constituted the first and natural solution to compactification of superstring theory to our 4-dimensional universe, primarily due to one of their equivalent definitions being the admittance of covariantly constant spinors. Since the mid-1980s, physicists and mathematicians have joined forces in creating explicit examples of Calabi-Yau spaces, compiling databases of formidable size, including the complete intersecion (CICY) data set, the weighted hypersurfaces data set, the elliptic-fibration data set, the Kreuzer-Skarke toric hypersurface data set, generalized CICYs, etc., totaling at least on the order of 1010 manifolds. These all contribute to the vast string landscape, the multitude of possible vacuum solutions to string compactification. More recently, this collaboration has been enriched by computer science and data science, the former in bench-marking the complexity of the algorithms in computing geometric quantities, and the latter in applying techniques such as machine learning in extracting unexpected information. These endeavours, inspired by the physics of the string landscape, have rendered the investigation of Calabi-Yau spaces one of the most exciting and interdisciplinary fields.


2014 ◽  
Vol 28 (31) ◽  
pp. 1430021 ◽  
Author(s):  
J. K. Freericks ◽  
B. K. Nikolić ◽  
O. Frieder

Generating big data pervades much of physics. But some problems, which we call extreme data problems, are too large to be treated within big data science. The nonequilibrium quantum many-body problem on a lattice is just such a problem, where the Hilbert space grows exponentially with system size and rapidly becomes too large to fit on any computer (and can be effectively thought of as an infinite-sized data set). Nevertheless, much progress has been made with computational methods on this problem, which serve as a paradigm for how one can approach and attack extreme data problems. In addition, viewing these physics problems from a computer-science perspective leads to new approaches that can be tried to solve more accurately and for longer times. We review a number of these different ideas here.


2020 ◽  
Author(s):  
JAYDIP DATTA

With Reference to earlier works like MATHEMATICAL STATISTICS: AN APPLICATION BASED STATISTICS, December 2019 , DOI : 10.13140/RG.2.2.32537.57446 / DATA STRUCTURE & MANAGEMENT SYSTEM: A REVIEW, December 2019 , DOI : 10.13140/RG.2.2.36453.96488 / OPTIMISATION: A VIEW FROM INDUSTRIAL ECONOMICS , January 2020 , DOI : 10.13140/RG.2.2.35662.61764 the following aspects of any general graduate engineering courses highlight the following feature.


Author(s):  
Ritu Khandelwal ◽  
Hemlata Goyal ◽  
Rajveer Singh Shekhawat

Introduction: Machine learning is an intelligent technology that works as a bridge between businesses and data science. With the involvement of data science, the business goal focuses on findings to get valuable insights on available data. The large part of Indian Cinema is Bollywood which is a multi-million dollar industry. This paper attempts to predict whether the upcoming Bollywood Movie would be Blockbuster, Superhit, Hit, Average or Flop. For this Machine Learning techniques (classification and prediction) will be applied. To make classifier or prediction model first step is the learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations. Methods: All the techniques related to classification and Prediction such as Support Vector Machine(SVM), Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, Adaboost, and KNN will be applied and try to find out efficient and effective results. All these functionalities can be applied with GUI Based workflows available with various categories such as data, Visualize, Model, and Evaluate. Result: To make classifier or prediction model first step is learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations Conclusion: This paper focuses on Comparative Analysis that would be performed based on different parameters such as Accuracy, Confusion Matrix to identify the best possible model for predicting the movie Success. By using Advertisement Propaganda, they can plan for the best time to release the movie according to the predicted success rate to gain higher benefits. Discussion: Data Mining is the process of discovering different patterns from large data sets and from that various relationships are also discovered to solve various problems that come in business and helps to predict the forthcoming trends. This Prediction can help Production Houses for Advertisement Propaganda and also they can plan their costs and by assuring these factors they can make the movie more profitable.


Examples of the value that can be created and captured through crowdsourcing go back to at least 1714, when the UK used crowdsourcing to solve the Longitude Problem, obtaining a solution that would enable the UK to become the dominant maritime force of its time. Today, Wikipedia uses crowds to provide entries for the world’s largest and free encyclopedia. Partly fueled by the value that can be created and captured through crowdsourcing, interest in researching the phenomenon has been remarkable. For example, the Best Paper Awards in 2012 for a record-setting three journals—the Academy of Management Review, Journal of Product Innovation Management, and Academy of Management Perspectives—were about crowdsourcing. In spite of the interest in crowdsourcing—or perhaps because of it—research on the phenomenon has been conducted in different research silos within the fields of management (from strategy to finance to operations to information systems), biology, communications, computer science, economics, political science, among others. In these silos, crowdsourcing takes names such as broadcast search, innovation tournaments, crowdfunding, community innovation, distributed innovation, collective intelligence, open source, crowdpower, and even open innovation. The book aims to assemble papers from as many of these silos as possible since the ultimate potential of crowdsourcing research is likely to be attained only by bridging them. The papers provide a systematic overview of the research on crowdsourcing from different fields based on a more encompassing definition of the concept, its difference for innovation, and its value for both the private and public sectors.


1997 ◽  
Vol 25 (4) ◽  
pp. 38-47 ◽  
Author(s):  
Mary J. Granger ◽  
Elizabeth S. Adams ◽  
Christina Björkman ◽  
Don Gotterbarn ◽  
Diana D. Juettner ◽  
...  

Burns ◽  
2015 ◽  
Vol 41 (5) ◽  
pp. 1092-1099 ◽  
Author(s):  
Maryam Ahmadi ◽  
Jahanpour Alipour ◽  
Ali Mohammadi ◽  
Farid Khorami

2020 ◽  
Vol 8 ◽  
Author(s):  
Devasis Bassu ◽  
Peter W. Jones ◽  
Linda Ness ◽  
David Shallcross

Abstract In this paper, we present a theoretical foundation for a representation of a data set as a measure in a very large hierarchically parametrized family of positive measures, whose parameters can be computed explicitly (rather than estimated by optimization), and illustrate its applicability to a wide range of data types. The preprocessing step then consists of representing data sets as simple measures. The theoretical foundation consists of a dyadic product formula representation lemma, and a visualization theorem. We also define an additive multiscale noise model that can be used to sample from dyadic measures and a more general multiplicative multiscale noise model that can be used to perturb continuous functions, Borel measures, and dyadic measures. The first two results are based on theorems in [15, 3, 1]. The representation uses the very simple concept of a dyadic tree and hence is widely applicable, easily understood, and easily computed. Since the data sample is represented as a measure, subsequent analysis can exploit statistical and measure theoretic concepts and theories. Because the representation uses the very simple concept of a dyadic tree defined on the universe of a data set, and the parameters are simply and explicitly computable and easily interpretable and visualizable, we hope that this approach will be broadly useful to mathematicians, statisticians, and computer scientists who are intrigued by or involved in data science, including its mathematical foundations.


2021 ◽  
Vol 19 (10) ◽  
pp. 2001-2010
Author(s):  
Yurii P. KISHKOVICH

Subject. This article discusses the use of information systems in various spheres of the Russian economy. Objectives. The article aims to assess the prospects for the use of cloud-based ITSM systems by Russian medium-sized and large companies. Methods. For the study, I used a comparative analysis. Results. The article finds that the cloud-based ITSM system reduces the time of business processes significantly and saves money resources. Conclusions. The practical application of cloud-based ITSM systems expands the opportunities and capabilities of medium-sized and large enterprises significantly and contributes to improving the financial performance of their activities. The enterprise management system is an important factor the implementation of these systems depends on.


Sign in / Sign up

Export Citation Format

Share Document