INCREASING QUALITY OF CLASSIFYING OBJECTS USING NEW METRICS OF CLUSTERING

VESTNIK OF ASTRAKHAN STATE TECHNICAL UNIVERSITY SERIES MANAGEMENT COMPUTER SCIENCE AND INFORMATICS ◽

10.24143/2072-9502-2019-4-106-114 ◽

2019 ◽

pp. 106-114

Author(s):

Iscandar Maratovich Azhmukhamedov ◽

Raisa Yurevna Demina

Keyword(s):

Machine Learning ◽

Rand Index ◽

The Other ◽

Similarity Matrix ◽

Clustering Method ◽

Subject Areas ◽

Learning Set ◽

Original Picture ◽

Object Similarity

The article touches upon one of the main problems of machine learning - clustering objects. It has been widely used in various subject areas: marketing, sociology, psychology, etc. Clusterization algorithms, as a rule, are based on a metric that reflects the distance between objects. However, in some cases it is not practical to use the distance between objects. In certain situations, it is possible to say that one object is similar to the other, the latter being not similar to the former. The original picture and its copy may serve as an example. For such cases, a measure of object similarity is proposed in the work, which shows how many features of one object are contained in another one. A similarity matrix is built on this measure, the analysis of which allows revealing clusters of mutually similar objects. When testing the proposed clustering method, the Rand index (the proportion of correctly connected or unrelated objects) made 0.93. There has been proposed an algorithm that allows to form a set of objects absolutely different from each other. A set of objects formed in this way can later become a learning set for classifiers and increase their fidelity in recognition.

Download Full-text

Data science in economics: comprehensive review of advanced machine learning and deep learning methods

10.21203/rs.3.rs-91905/v1 ◽

2020 ◽

Author(s):

Saeed Nosratabadi ◽

Amir Mosavi ◽

Puhong Duan ◽

Pedram Ghamisi ◽

Filip Ferdinand ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Prediction Accuracy ◽

Data Science ◽

State Of The Art ◽

Hybrid Models ◽

The Other ◽

Learning Models ◽

Comprehensive Review

Abstract This paper provides the state of the art of data science in economics. Through a novel taxonomy of applications and methods advances in data science are investigated. The data science advances are investigated in three individual classes of deep learning models, ensemble models, and hybrid models. Application domains include stock market, marketing, E-commerce, corporate banking, and cryptocurrency. Prisma method, a systematic literature review methodology is used to ensure the quality of the survey. The findings revealed that the trends are on advancement of hybrid models as more than 51% of the reviewed articles applied hybrid model. On the other hand, it is found that based on the RMSE accuracy metric, hybrid models had higher prediction accuracy than other algorithms. While it is expected the trends go toward the advancements of deep learning models.

Download Full-text

Comparison of synchronous and asynchronous parallelization of extreme surrogate-assisted multi-objective evolutionary algorithm

Natural Computing ◽

10.1007/s11047-020-09806-2 ◽

2020 ◽

Author(s):

Tomohiro Harada ◽

Misaki Kaidan ◽

Ruck Thawonmas

Keyword(s):

Machine Learning ◽

Evolutionary Algorithm ◽

Optimization Problems ◽

Computing Time ◽

The Other ◽

Multi Objective Optimization ◽

Multi Objective ◽

Evaluation Time ◽

Surrogate Function

Abstract This paper investigates the integration of a surrogate-assisted multi-objective evolutionary algorithm (MOEA) and a parallel computation scheme to reduce the computing time until obtaining the optimal solutions in evolutionary algorithms (EAs). A surrogate-assisted MOEA solves multi-objective optimization problems while estimating the evaluation of solutions with a surrogate function. A surrogate function is produced by a machine learning model. This paper uses an extreme learning surrogate-assisted MOEA/D (ELMOEA/D), which utilizes one of the well-known MOEA algorithms, MOEA/D, and a machine learning technique, extreme learning machine (ELM). A parallelization of MOEA, on the other hand, evaluates solutions in parallel on multiple computing nodes to accelerate the optimization process. We consider a synchronous and an asynchronous parallel MOEA as a master-slave parallelization scheme for ELMOEA/D. We carry out an experiment with multi-objective optimization problems to compare the synchronous parallel ELMOEA/D with the asynchronous parallel ELMOEA/D. In the experiment, we simulate two settings of the evaluation time of solutions. One determines the evaluation time of solutions by the normal distribution with different variances. On the other hand, another evaluation time correlates to the objective function value. We compare the quality of solutions obtained by the parallel ELMOEA/D variants within a particular computing time. The experimental results show that the parallelization of ELMOEA/D significantly reduces the computational time. In addition, the integration of ELMOEA/D with the asynchronous parallelization scheme obtains higher quality of solutions quicker than the synchronous parallel ELMOEA/D.

Download Full-text

Practical Web Spam Lifelong Machine Learning System with Automatic Adjustment to Current Lifecycle Phase

Security and Communication Networks ◽

10.1155/2019/6587020 ◽

2019 ◽

Vol 2019 ◽

pp. 1-16 ◽

Cited By ~ 1

Author(s):

Marcin Luckner

Keyword(s):

Machine Learning ◽

Real Data ◽

Recognition System ◽

Learning System ◽

Machine Learning Techniques ◽

External Data ◽

Recognition Quality ◽

Web Spam ◽

Learning Set

Machine learning techniques are a standard approach in spam detection. Their quality depends on the quality of the learning set, and when the set is out of date, the quality of classification falls rapidly. The most popular public web spam dataset that can be used to train a spam detector—WEBSPAM-UK2007—is over ten years old. Therefore, there is a place for a lifelong machine learning system that can replace the detectors based on a static learning set. In this paper, we propose a novel web spam recognition system. The system automatically rebuilds the learning set to avoid classification based on outdated data. Using a built-in automatic selection of the active classifier the system very quickly attains productive accuracy despite a limited learning set. Moreover, the system automatically rebuilds the learning set using external data from spam traps and popular web services. A test on real data from Quora, Reddit, and Stack Overflow proved the high recognition quality. Both the obtained average accuracy and the F-measure were 0.98 and 0.96 for semiautomatic and full–automatic mode, respectively.

Download Full-text

Data Science in Economics: Comprehensive Review of Advanced Machine Learning and Deep Learning Methods

10.20944/preprints202010.0263.v1 ◽

2020 ◽

Author(s):

Saeed Nosratabadi ◽

Amir Mosavi ◽

Puhong Duan ◽

Pedram Ghamisi ◽

Filip Ferdinand ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Prediction Accuracy ◽

Data Science ◽

State Of The Art ◽

Hybrid Models ◽

The Other ◽

Learning Models ◽

Comprehensive Review

This paper provides the state of the art of data science in economics. Through a novel taxonomy of applications and methods advances in data science are investigated. The data science advances are investigated in three individual classes of deep learning models, ensemble models, and hybrid models. Application domains include stock market, marketing, E-commerce, corporate banking, and cryptocurrency. Prisma method, a systematic literature review methodology is used to ensure the quality of the survey. The findings revealed that the trends are on advancement of hybrid models as more than 51% of the reviewed articles applied hybrid model. On the other hand, it is found that based on the RMSE accuracy metric, hybrid models had higher prediction accuracy than other algorithms. While it is expected the trends go toward the advancements of deep learning models.

Download Full-text

Fast Enhanced Exemplar-Based Clustering for Incomplete EEG Signals

Computational and Mathematical Methods in Medicine ◽

10.1155/2020/4147807 ◽

2020 ◽

Vol 2020 ◽

pp. 1-11

Author(s):

Anqi Bi ◽

Wenhao Ying ◽

Lu Zhao

Keyword(s):

Machine Learning ◽

Incomplete Data ◽

Clustering Algorithm ◽

Target Function ◽

The Other ◽

Eeg Signal ◽

Similarity Matrix ◽

Pairwise Similarity ◽

Eeg Signals ◽

Brain Science

The diagnosis and treatment of epilepsy is a significant direction for both machine learning and brain science. This paper newly proposes a fast enhanced exemplar-based clustering (FEEC) method for incomplete EEG signal. The algorithm first compresses the potential exemplar list and reduces the pairwise similarity matrix. By processing the most complete data in the first stage, FEEC then extends the few incomplete data into the exemplar list. A new compressed similarity matrix will be constructed and the scale of this matrix is greatly reduced. Finally, FEEC optimizes the new target function by the enhanced α-expansion move method. On the other hand, due to the pairwise relationship, FEEC also improves the generalization of this algorithm. In contrast to other exemplar-based models, the performance of the proposed clustering algorithm is comprehensively verified by the experiments on two datasets.

Download Full-text

K-Means Algorithm for Clustering Afaan Oromo Text Documents using Python Tools

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.a2284.059120 ◽

2020 ◽

Vol 9 (1) ◽

pp. 1279-1282

Keyword(s):

Machine Learning ◽

Cluster Analysis ◽

Clustering Analysis ◽

Low Cost ◽

Research Work ◽

The Other ◽

Time Requirement ◽

Text Documents ◽

News Agencies

With the advancement of technology and proliferation of computers in the country, the amount of Afaan Oromo language news documents produced increasingly which becomes a difficult task for news agencies to organize such huge collection of documents items manually. To solve this problem, researches is conducted using unsupervised machine learning python tools for Afaan Oromo news document clustering with low cost and best quality of clustering solution. In this research work focusing on k-means clustering analysis which produced better results as compared to the other cluster analysis both in terms of time requirement and the quality of the clusters produced

Download Full-text

Immunoferritin localization of intracellular antigens in positively stained frozen sections

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s042482010008170x ◽

1978 ◽

Vol 36 (2) ◽

pp. 168-169

Author(s):

K. T. Tokuyasu

Keyword(s):

Surface Tension ◽

Water Surface ◽

Thin Layers ◽

The Other ◽

Positive Staining ◽

Frozen Sections ◽

The Past ◽

Ultrathin Frozen Sections ◽

Definition Of

During the past investigations of immunoferritin localization of intracellular antigens in ultrathin frozen sections, we found that the degree of negative staining required to delineate u1trastructural details was often too dense for the recognition of ferritin particles. The quality of positive staining of ultrathin frozen sections, on the other hand, has generally been far inferior to that attainable in conventional plastic embedded sections, particularly in the definition of membranes. As we discussed before, a main cause of this difficulty seemed to be the vulnerability of frozen sections to the damaging effects of air-water surface tension at the time of drying of the sections.Indeed, we found that the quality of positive staining is greatly improved when positively stained frozen sections are protected against the effects of surface tension by embedding them in thin layers of mechanically stable materials at the time of drying (unpublished).

Download Full-text

PENERAPAN iLEARNING SURVEY (iSur) DALAM MENINGKATKAN KUALITAS SISTEM INFORMASI SELAMA PROSES PEMBELAJARAN DI PERGURUAN TINGGI RAHARJA

CCIT Journal ◽

10.33050/ccit.v7i3.258 ◽

2014 ◽

Vol 7 (3) ◽

pp. 335-354

Author(s):

Untung Rahardja ◽

Muhamad Yusup ◽

Ana Nurmaliana

Keyword(s):

Higher Education ◽

Student Satisfaction ◽

Teaching And Learning ◽

Online Survey ◽

The Other ◽

Learning Activities ◽

Faculty Performance ◽

Satisfaction Measurement ◽

Valid Information

The accuracy and reliability is the quality of the information. The more accurate and reliable, the more information it’s good quality. Similarly, a survey, the better the survey, the more accurate the information provided. Implementation of student satisfaction measurement to the process of teaching and learning activities on the quality of the implementation of important lectures in order to get feedback on the assessed variables and for future repair. Likewise in Higher Education Prog has undertaken the process of measuring student satisfaction through a distributed questioner finally disemester each class lecture. However, the deployment process questioner is identified there are 7 (seven) problems. However, the problem can be resolved by the 3 (three) ways of solving problems one of which is a system of iLearning Survey (Isur), that is by providing an online survey to students that can be accessed anywhere and anytime. In the implementation shown a prototype of Isur itself. It can be concluded that the contribution Isur system can maximize the decision taken by the Higher Education Prog. By using this Isur system with questions and evaluation forms are submitted and given to the students and the other colleges. To assess the extent to which the campus has grown and how faculty performance in teaching students class, and can be used as a media Isur valid information for an assessment of activities throughout college.

Download Full-text

Determining the location of postal centers in B&H using machine learning clustering method and GIS

2020 43rd International Convention on Information, Communication and Electronic Technology (MIPRO) ◽

10.23919/mipro48935.2020.9245087 ◽

2020 ◽

Author(s):

Amel Kosovac ◽

Ermin Muharemovic ◽

Muhamed Begovic ◽

Edvin Simic

Keyword(s):

Machine Learning ◽

Clustering Method

Download Full-text

A Literature Review Study of Software Defect Prediction using Machine Learning Techniques

International Journal of Emerging Research in Management and Technology ◽

10.23956/ijermt.v6i6.286 ◽

2018 ◽

Vol 6 (6) ◽

pp. 300 ◽

Cited By ~ 3

Author(s):

Feidu Akmel ◽

Ermiyas Birihanu ◽

Bahir Siraj

Keyword(s):

Machine Learning ◽

Software Metrics ◽

Quality Standard ◽

Machine Learning Techniques ◽

Software Systems ◽

Health Care Insurance ◽

Software Defect ◽

Learning Techniques ◽

Software Product

Software systems are any software product or applications that support business domains such as Manufacturing,Aviation, Health care, insurance and so on.Software quality is a means of measuring how software is designed and how well the software conforms to that design. Some of the variables that we are looking for software quality are Correctness, Product quality, Scalability, Completeness and Absence of bugs, However the quality standard that was used from one organization is different from other for this reason it is better to apply the software metrics to measure the quality of software. Attributes that we gathered from source code through software metrics can be an input for software defect predictor. Software defect are an error that are introduced by software developer and stakeholders. Finally, in this study we discovered the application of machine learning on software defect that we gathered from the previous research works.

Download Full-text