TOPIC SEGMENTATION METHODS COMPARISON ON COMPUTER SCIENCE TEXTS

Bulletin of National Technical University KhPI Series System Analysis Control and Information Technologies ◽

10.20998/2079-0023.2021.02.10 ◽

2021 ◽

pp. 59-66

Author(s):

Volodymyr Sokol ◽

Vitalii Krykun ◽

Mariia Bilova ◽

Ivan Perepelytsya ◽

Volodymyr Pustovarov ◽

...

Keyword(s):

Information Systems ◽

Computer Science ◽

Management System ◽

Data Science ◽

Continuous Training ◽

Data Set ◽

Topic Segmentation ◽

Science Texts ◽

Segmentation Methods ◽

Segmentation Problem

The demand for the creation of information systems that simplifies and accelerates work has greatly increased in the context of the rapidinformatization of society and all its branches. It provokes the emergence of more and more companies involved in the development of softwareproducts and information systems in general. In order to ensure the systematization, processing and use of this knowledge, knowledge managementsystems are used. One of the main tasks of IT companies is continuous training of personnel. This requires export of the content from the company'sknowledge management system to the learning management system. The main goal of the research is to choose an algorithm that allows solving theproblem of marking up the text of articles close to those used in knowledge management systems of IT companies. To achieve this goal, it is necessaryto compare various topic segmentation methods on a dataset with a computer science texts. Inspec is one such dataset used for keyword extraction andin this research it has been adapted to the structure of the datasets used for the topic segmentation problem. The TextTiling and TextSeg methods wereused for comparison on some well-known data science metrics and specific metrics that relate to the topic segmentation problem. A new generalizedmetric was also introduced to compare the results for the topic segmentation problem. All software implementations of the algorithms were written inPython programming language and represent a set of interrelated functions. Results were obtained showing the advantages of the Text Seg method incomparison with TextTiling when compared using classical data science metrics and special metrics developed for the topic segmentation task. Fromall the metrics, including the introduced one it can be concluded that the TextSeg algorithm performs better than the TextTiling algorithm on theadapted Inspec test data set.

Download Full-text

Calabi-Yau Spaces in the String Landscape

Oxford Research Encyclopedia of Physics ◽

10.1093/acrefore/9780190871994.013.60 ◽

2020 ◽

Author(s):

Yang-Hui He

Keyword(s):

Machine Learning ◽

Computer Science ◽

Data Science ◽

Theoretical Physics ◽

Superstring Theory ◽

Data Set ◽

Pure Mathematics ◽

String Landscape ◽

Natural Solution ◽

Vacuum Solutions

Calabi-Yau spaces, or Kähler spaces admitting zero Ricci curvature, have played a pivotal role in theoretical physics and pure mathematics for the last half century. In physics, they constituted the first and natural solution to compactification of superstring theory to our 4-dimensional universe, primarily due to one of their equivalent definitions being the admittance of covariantly constant spinors. Since the mid-1980s, physicists and mathematicians have joined forces in creating explicit examples of Calabi-Yau spaces, compiling databases of formidable size, including the complete intersecion (CICY) data set, the weighted hypersurfaces data set, the elliptic-fibration data set, the Kreuzer-Skarke toric hypersurface data set, generalized CICYs, etc., totaling at least on the order of 1010 manifolds. These all contribute to the vast string landscape, the multitude of possible vacuum solutions to string compactification. More recently, this collaboration has been enriched by computer science and data science, the former in bench-marking the complexity of the algorithms in computing geometric quantities, and the latter in applying techniques such as machine learning in extracting unexpected information. These endeavours, inspired by the physics of the string landscape, have rendered the investigation of Calabi-Yau spaces one of the most exciting and interdisciplinary fields.

Download Full-text

The nonequilibrium quantum many-body problem as a paradigm for extreme data science

International Journal of Modern Physics B ◽

10.1142/s0217979214300217 ◽

2014 ◽

Vol 28 (31) ◽

pp. 1430021 ◽

Cited By ~ 4

Author(s):

J. K. Freericks ◽

B. K. Nikolić ◽

O. Frieder

Keyword(s):

Hilbert Space ◽

Big Data ◽

Computer Science ◽

Computational Methods ◽

Data Science ◽

Body Problem ◽

System Size ◽

Many Body ◽

Data Set ◽

New Approaches

Generating big data pervades much of physics. But some problems, which we call extreme data problems, are too large to be treated within big data science. The nonequilibrium quantum many-body problem on a lattice is just such a problem, where the Hilbert space grows exponentially with system size and rapidly becomes too large to fit on any computer (and can be effectively thought of as an infinite-sized data set). Nevertheless, much progress has been made with computational methods on this problem, which serve as a paradigm for how one can approach and attack extreme data problems. In addition, viewing these physics problems from a computer-science perspective leads to new approaches that can be tried to solve more accurately and for longer times. We review a number of these different ideas here.

Download Full-text

Geographic Information Systems (GIS) Emergency Management System (GEMS) for the University of Redlands

10.26716/redlands/master/2002.5 ◽

2002 ◽

Author(s):

Kevin Johnson

Keyword(s):

Information Systems ◽

Geographic Information Systems ◽

Emergency Management ◽

Management System ◽

Geographic Information ◽

Emergency Management System ◽

The University

Download Full-text

ECONOMICS, STATISTICS, MATHEMATICS & COMPUTER SCIENCE: THE SPECS OF ENGINEERING ACADEMICS

10.31219/osf.io/yw8gd ◽

2020 ◽

Author(s):

JAYDIP DATTA

Keyword(s):

Data Structure ◽

Computer Science ◽

Management System ◽

Mathematical Statistics ◽

Industrial Economics ◽

System A

With Reference to earlier works like MATHEMATICAL STATISTICS: AN APPLICATION BASED STATISTICS, December 2019 , DOI : 10.13140/RG.2.2.32537.57446 / DATA STRUCTURE & MANAGEMENT SYSTEM: A REVIEW, December 2019 , DOI : 10.13140/RG.2.2.36453.96488 / OPTIMISATION: A VIEW FROM INDUSTRIAL ECONOMICS , January 2020 , DOI : 10.13140/RG.2.2.35662.61764 the following aspects of any general graduate engineering courses highlight the following feature.

Download Full-text

Comparative Analysis of Machine Learning Techniques Using Predictive Modeling

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813999200904164539 ◽

2020 ◽

Vol 13 ◽

Author(s):

Ritu Khandelwal ◽

Hemlata Goyal ◽

Rajveer Singh Shekhawat

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Data Science ◽

Training Data ◽

Machine Learning Techniques ◽

Future Trends ◽

Data Set ◽

Learning Stage ◽

Learning Techniques ◽

Different Types

Introduction: Machine learning is an intelligent technology that works as a bridge between businesses and data science. With the involvement of data science, the business goal focuses on findings to get valuable insights on available data. The large part of Indian Cinema is Bollywood which is a multi-million dollar industry. This paper attempts to predict whether the upcoming Bollywood Movie would be Blockbuster, Superhit, Hit, Average or Flop. For this Machine Learning techniques (classification and prediction) will be applied. To make classifier or prediction model first step is the learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations. Methods: All the techniques related to classification and Prediction such as Support Vector Machine(SVM), Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, Adaboost, and KNN will be applied and try to find out efficient and effective results. All these functionalities can be applied with GUI Based workflows available with various categories such as data, Visualize, Model, and Evaluate. Result: To make classifier or prediction model first step is learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations Conclusion: This paper focuses on Comparative Analysis that would be performed based on different parameters such as Accuracy, Confusion Matrix to identify the best possible model for predicting the movie Success. By using Advertisement Propaganda, they can plan for the best time to release the movie according to the predicted success rate to gain higher benefits. Discussion: Data Mining is the process of discovering different patterns from large data sets and from that various relationships are also discovered to solve various problems that come in business and helps to predict the forthcoming trends. This Prediction can help Production Houses for Advertisement Propaganda and also they can plan their costs and by assuring these factors they can make the movie more profitable.

Download Full-text

Creating and Capturing Value through Crowdsourcing

10.1093/oso/9780198816225.001.0001 ◽

2018 ◽

Cited By ~ 12

Keyword(s):

Information Systems ◽

Systems Biology ◽

Computer Science ◽

Open Innovation ◽

Product Innovation ◽

Collective Intelligence ◽

Innovation Management ◽

Private And Public ◽

Definition Of ◽

The Uk

Examples of the value that can be created and captured through crowdsourcing go back to at least 1714, when the UK used crowdsourcing to solve the Longitude Problem, obtaining a solution that would enable the UK to become the dominant maritime force of its time. Today, Wikipedia uses crowds to provide entries for the world’s largest and free encyclopedia. Partly fueled by the value that can be created and captured through crowdsourcing, interest in researching the phenomenon has been remarkable. For example, the Best Paper Awards in 2012 for a record-setting three journals—the Academy of Management Review, Journal of Product Innovation Management, and Academy of Management Perspectives—were about crowdsourcing. In spite of the interest in crowdsourcing—or perhaps because of it—research on the phenomenon has been conducted in different research silos within the fields of management (from strategy to finance to operations to information systems), biology, communications, computer science, economics, political science, among others. In these silos, crowdsourcing takes names such as broadcast search, innovation tournaments, crowdfunding, community innovation, distributed innovation, collective intelligence, open source, crowdpower, and even open innovation. The book aims to assemble papers from as many of these silos as possible since the ultimate potential of crowdsourcing research is likely to be attained only by bridging them. The papers provide a systematic overview of the research on crowdsourcing from different fields based on a more encompassing definition of the concept, its difference for innovation, and its value for both the private and public sectors.

Download Full-text

Using information technology to integrate social and ethical issues into the computer science and information systems curriculum

ACM SIGCUE Outlook ◽

10.1145/274382.274384 ◽

1997 ◽

Vol 25 (4) ◽

pp. 38-47 ◽

Cited By ~ 1

Author(s):

Mary J. Granger ◽

Elizabeth S. Adams ◽

Christina Björkman ◽

Don Gotterbarn ◽

Diana D. Juettner ◽

...

Keyword(s):

Information Technology ◽

Information Systems ◽

Computer Science ◽

Ethical Issues ◽

Social And Ethical Issues ◽

Information Systems Curriculum

Download Full-text

Development a minimum data set of the information management system for burns

Burns ◽

10.1016/j.burns.2014.12.009 ◽

2015 ◽

Vol 41 (5) ◽

pp. 1092-1099 ◽

Cited By ~ 17

Author(s):

Maryam Ahmadi ◽

Jahanpour Alipour ◽

Ali Mohammadi ◽

Farid Khorami

Keyword(s):

Information Management ◽

Management System ◽

Information Management System ◽

Minimum Data Set ◽

Data Set ◽

Minimum Data

Download Full-text

Product formalisms for measures on spaces with binary tree structures: representation, visualization, and multiscale noise

Forum of Mathematics Sigma ◽

10.1017/fms.2020.40 ◽

2020 ◽

Vol 8 ◽

Author(s):

Devasis Bassu ◽

Peter W. Jones ◽

Linda Ness ◽

David Shallcross

Keyword(s):

Data Science ◽

Theoretical Foundation ◽

Continuous Functions ◽

Noise Model ◽

Data Types ◽

Data Set ◽

Tree Structures ◽

Simple Concept ◽

Wide Range ◽

Computer Scientists

Abstract In this paper, we present a theoretical foundation for a representation of a data set as a measure in a very large hierarchically parametrized family of positive measures, whose parameters can be computed explicitly (rather than estimated by optimization), and illustrate its applicability to a wide range of data types. The preprocessing step then consists of representing data sets as simple measures. The theoretical foundation consists of a dyadic product formula representation lemma, and a visualization theorem. We also define an additive multiscale noise model that can be used to sample from dyadic measures and a more general multiplicative multiscale noise model that can be used to perturb continuous functions, Borel measures, and dyadic measures. The first two results are based on theorems in [15, 3, 1]. The representation uses the very simple concept of a dyadic tree and hence is widely applicable, easily understood, and easily computed. Since the data sample is represented as a measure, subsequent analysis can exploit statistical and measure theoretic concepts and theories. Because the representation uses the very simple concept of a dyadic tree defined on the universe of a data set, and the parameters are simply and explicitly computable and easily interpretable and visualizable, we hope that this approach will be broadly useful to mathematicians, statisticians, and computer scientists who are intrigued by or involved in data science, including its mathematical foundations.

Download Full-text

Stimulation and limits of cloud-based ITSM system application by Russian companies

Regional Economics Theory and Practice ◽

10.24891/re.19.10.2001 ◽

2021 ◽

Vol 19 (10) ◽

pp. 2001-2010

Author(s):

Yurii P. KISHKOVICH

Keyword(s):

Information Systems ◽

Comparative Analysis ◽

Financial Performance ◽

System Application ◽

Management System ◽

Business Processes ◽

Enterprise Management ◽

Practical Application ◽

Russian Economy ◽

Enterprise Management System

Subject. This article discusses the use of information systems in various spheres of the Russian economy. Objectives. The article aims to assess the prospects for the use of cloud-based ITSM systems by Russian medium-sized and large companies. Methods. For the study, I used a comparative analysis. Results. The article finds that the cloud-based ITSM system reduces the time of business processes significantly and saves money resources. Conclusions. The practical application of cloud-based ITSM systems expands the opportunities and capabilities of medium-sized and large enterprises significantly and contributes to improving the financial performance of their activities. The enterprise management system is an important factor the implementation of these systems depends on.

Download Full-text