BULNER: BUg Localization with word embeddings and NEtwork Regularization

Bug localization (BL) from the bug report is the strategic activity of the software maintaining process. Because BL is a costly and tedious activity, BL techniques information retrieval-based and machine learning-based could aid software engineers. We propose a method for BUg Localization with word embeddings and Network Regularization (BULNER). The preliminary results suggest that BULNER has better performance than two state-of-the-art methods.

Download Full-text

Improvement in bug localization based on kernel extreme learning machine

Journal of Communications Technology Electronics and Computer Science ◽

10.22385/jctecs.v5i0.77 ◽

2016 ◽

Vol 5 ◽

pp. 1

Author(s):

Marzie Rahmati ◽

Mohammad Ali Zare Chahooki

Keyword(s):

Machine Learning ◽

Information Retrieval ◽

Extreme Learning Machine ◽

Bug Localization ◽

Learning Methods ◽

Bug Reports ◽

Bug Report ◽

Mozilla Firefox ◽

Learning Machine ◽

Information Retrieval Methods

Bug localization uses bug reports received from users, developers and testers to locate buggy files. Since finding a buggy file among thousands of files is time consuming and tedious for developers, various methods based on information retrieval is suggested to automate this process. In addition to information retrieval methods for error localization, machine learning methods are used too. Machine learning-based approach, improves methods of describing bug report and program code by representing them in feature vectors. Learning hypothesis on Extreme Learning Machine (ELM) has been recently effective in many areas. This paper shows effectiveness of none-linear kernel of ELM in bug localization. Furthermore the effectiveness of Different kernels in ELM compare to other kernel-based learning methods is analyzed. The experimental results for hypothesis evaluation on Mozilla Firefox dataset show effectiveness of Kernel ELM for bug localization in software projects.

Download Full-text

Multi-hop assortativities for network classification

Journal of Complex Networks ◽

10.1093/comnet/cny034 ◽

2018 ◽

Vol 7 (4) ◽

pp. 603-622 ◽

Cited By ~ 1

Author(s):

Leonardo Gutiérrez-Gómez ◽

Jean-Charles Delvenne

Keyword(s):

Machine Learning ◽

Scientific Collaboration ◽

State Of The Art ◽

Medical Engineering ◽

Research Field ◽

Classification Task ◽

Collaboration Network ◽

Structural Patterns ◽

Art Methods

Abstract Several social, medical, engineering and biological challenges rely on discovering the functionality of networks from their structure and node metadata, when it is available. For example, in chemoinformatics one might want to detect whether a molecule is toxic based on structure and atomic types, or discover the research field of a scientific collaboration network. Existing techniques rely on counting or measuring structural patterns that are known to show large variations from network to network, such as the number of triangles, or the assortativity of node metadata. We introduce the concept of multi-hop assortativity, that captures the similarity of the nodes situated at the extremities of a randomly selected path of a given length. We show that multi-hop assortativity unifies various existing concepts and offers a versatile family of ‘fingerprints’ to characterize networks. These fingerprints allow in turn to recover the functionalities of a network, with the help of the machine learning toolbox. Our method is evaluated empirically on established social and chemoinformatic network benchmarks. Results reveal that our assortativity based features are competitive providing highly accurate results often outperforming state of the art methods for the network classification task.

Download Full-text

Comparative Quality Estimation for Machine Translation Observations on Machine Learning and Features

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2017-0029 ◽

2017 ◽

Vol 108 (1) ◽

pp. 307-318 ◽

Cited By ~ 1

Author(s):

Eleftherios Avramidis

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Machine Translation ◽

State Of The Art ◽

Linear Method ◽

The State ◽

Quality Estimation ◽

Art Methods ◽

Improved Performance

AbstractA deeper analysis on Comparative Quality Estimation is presented by extending the state-of-the-art methods with adequacy and grammatical features from other Quality Estimation tasks. The previously used linear method, unable to cope with the augmented features, is replaced with a boosting classifier assisted by feature selection. The methods indicated show improved performance for 6 language pairs, when applied on the output from MT systems developed over 7 years. The improved models compete better with reference-aware metrics.Notable conclusions are reached through the examination of the contribution of the features in the models, whereas it is possible to identify common MT errors that are captured by the features. Many grammatical/fluency features have a good contribution, few adequacy features have some contribution, whereas source complexity features are of no use. The importance of many fluency and adequacy features is language-specific.

Download Full-text

Comparing Deep-Learning Architectures and Traditional Machine-Learning Approaches for Satire Identification in Spanish Tweets

Mathematics ◽

10.3390/math8112075 ◽

2020 ◽

Vol 8 (11) ◽

pp. 2075

Author(s):

Óscar Apolinario-Arzube ◽

José Antonio García-Díaz ◽

José Medina-Moreira ◽

Harry Luna-Aveiga ◽

Rafael Valencia-García

Keyword(s):

Machine Learning ◽

Deep Learning ◽

User Interfaces ◽

State Of The Art ◽

Learning Approaches ◽

Word Embeddings ◽

Linguistic Features ◽

Intended Meaning ◽

Language User ◽

Learning Architectures

Automatic satire identification can help to identify texts in which the intended meaning differs from the literal meaning, improving tasks such as sentiment analysis, fake news detection or natural-language user interfaces. Typically, satire identification is performed by training a supervised classifier for finding linguistic clues that can determine whether a text is satirical or not. For this, the state-of-the-art relies on neural networks fed with word embeddings that are capable of learning interesting characteristics regarding the way humans communicate. However, as far as our knowledge goes, there are no comprehensive studies that evaluate these techniques in Spanish in the satire identification domain. Consequently, in this work we evaluate several deep-learning architectures with Spanish pre-trained word-embeddings and compare the results with strong baselines based on term-counting features. This evaluation is performed with two datasets that contain satirical and non-satirical tweets written in two Spanish variants: European Spanish and Mexican Spanish. Our experimentation revealed that term-counting features achieved similar results to deep-learning approaches based on word-embeddings, both outperforming previous results based on linguistic features. Our results suggest that term-counting features and traditional machine learning models provide competitive results regarding automatic satire identification, slightly outperforming state-of-the-art models.

Download Full-text

A Topological Method for Comparing Document Semantics

10.5121/csit.2020.101411 ◽

2020 ◽

Author(s):

Yuqi Kong ◽

Fanchao Meng ◽

Ben Carterette

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Language Processing ◽

State Of The Art ◽

Vector Space Model ◽

The Other ◽

Space Model ◽

Topological Persistence ◽

Art Methods ◽

Novel Algorithm

Comparing document semantics is one of the toughest tasks in both Natural Language Processing and Information Retrieval. To date, on one hand, the tools for this task are still rare. On the other hand, most relevant methods are devised from the statistic or the vector space model perspectives but nearly none from a topological perspective. In this paper, we hope to make a different sound. A novel algorithm based on topological persistence for comparing semantics similarity between two documents is proposed. Our experiments are conducted on a document dataset with human judges’ results. A collection of state-of-the-art methods are selected for comparison. The experimental results show that our algorithm can produce highly human-consistent results, and also beats most state-of-the-art methods though ties with NLTK.

Download Full-text

Bug Reports and Deep Learning Models

International Journal of Computer Science and Mobile Computing ◽

10.47760/ijcsmc.2021.v10i12.003 ◽

2021 ◽

Vol 10 (12) ◽

pp. 21-26

Author(s):

Som Gupta ◽

Sanjai Kumar Gupta

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Research Area ◽

Learning Approaches ◽

Bug Localization ◽

Future Directions ◽

Bug Reports ◽

Bug Report ◽

The Future

Deep Learning is one of the emerging and trending research area of machine learning in various domains. The paper describes the deep learning approaches applied to the domain of Bug Reports. The paper classifies the tasks being performed for mining of Bug Reports into Bug Report Classification, Bug Localization, Bug Report Summarization and Duplicate Bug Report Detection. The paper systematically discusses about the deep learning approaches being used for the mentioned tasks, and the future directions in this field of research.

Download Full-text

Machine Learning-Based State-of-the-Art Methods for the Classification of RNA-Seq Data

Lecture Notes in Computational Vision and Biomechanics - Classification in BioApps ◽

10.1007/978-3-319-65981-7_6 ◽

2017 ◽

pp. 133-172 ◽

Cited By ~ 7

Author(s):

Almas Jabeen ◽

Nadeem Ahmad ◽

Khalid Raza

Keyword(s):

Machine Learning ◽

State Of The Art ◽

Rna Seq ◽

Art Methods

Download Full-text

Study of Information Retrieval and Machine Learning-Based Software Bug Localization Models

Advances in Computing and Intelligent Systems - Algorithms for Intelligent Systems ◽

10.1007/978-981-15-0222-4_47 ◽

2020 ◽

pp. 503-510

Author(s):

Tamanna ◽

Om Prakash Sangwan

Keyword(s):

Machine Learning ◽

Information Retrieval ◽

Bug Localization ◽

Software Bug

Download Full-text

A Method for Recommending Bug Fixer Using Community Q&A Information

MATEC Web of Conferences ◽

10.1051/matecconf/201817303031 ◽

2018 ◽

Vol 173 ◽

pp. 03031

Author(s):

Qingjie Wei ◽

Jiao Liu ◽

Jun Chen

Keyword(s):

Machine Learning ◽

Information Retrieval ◽

Open Source Software ◽

Text Classification ◽

Classification Problem ◽

Software Projects ◽

Bug Fixing ◽

Bug Report ◽

Information Retrieval Methods ◽

Time Aware

It is a very time-consuming task to assign a bug report to the most suitable fixer in large open source software projects. Therefore, it is very necessary to propose an effective recommendation method for bug fixer. Most research in this area translate it into a text classification problem and use machine learning or information retrieval methods to recommend the bug fixer. These methods are complex and overdependent on the fixers’ prior bug-fixing activities. In this paper, we propose a more effective bug fixer recommendation method which uses the community Q & A platforms (such as Stack Overflow) to measure the fixers’ expertise and uses the fixed bugs to measure the time-aware of fixers’ fixed work. The experimental results show that the proposed method is more accurate than most of current restoration methods.

Download Full-text

Predict COVID-19 Spreading With C-SMOTE

Business Information Systems ◽

10.52825/bis.v1i.45 ◽

2021 ◽

pp. 27-38

Author(s):

Alessio Bernardo ◽

Emanuele Della Valle

Keyword(s):

Machine Learning ◽

State Of The Art ◽

High Impact ◽

Statistical Evidence ◽

The Other ◽

Classification Algorithms ◽

Minority Class ◽

Art Methods ◽

Concept Drifts

Data continuously gathered monitoring the spreading of the COVID-19 pandemic form an unbounded flow of data. Accurately forecasting if the infections will increase or decrease has a high impact, but it is challenging because the pandemic spreads and contracts periodically. Technically, the flow of data is said to be imbalanced and subject to concept drifts because signs of decrements are the minority class during the spreading periods, while they become the majority class in the contraction periods and the other way round. In this paper, we propose a case study applying the Continuous Synthetic Minority Oversampling Technique (C-SMOTE), a novel meta-strategy to pipeline with Streaming Machine Learning (SML) classification algorithms, to forecast the COVID-19 pandemic trend. Benchmarking SML pipelinesthat use C-SMOTE against state-of-the-art methods on a COVID-19 dataset, we bring statistical evidence that models learned using C-SMOTE are better.

Download Full-text

BULNER: BUg Localization with word embeddings and NEtwork Regularization

Improvement in bug localization based on kernel extreme learning machine

Multi-hop assortativities for network classification

Comparative Quality Estimation for Machine Translation Observations on Machine Learning and Features

Comparing Deep-Learning Architectures and Traditional Machine-Learning Approaches for Satire Identification in Spanish Tweets

A Topological Method for Comparing Document Semantics

Bug Reports and Deep Learning Models﻿

Machine Learning-Based State-of-the-Art Methods for the Classification of RNA-Seq Data

Study of Information Retrieval and Machine Learning-Based Software Bug Localization Models

A Method for Recommending Bug Fixer Using Community Q&A Information

Predict COVID-19 Spreading With C-SMOTE

Bug Reports and Deep Learning Models