A Study on Machine Learning for Imbalanced Datasets with Answer Validation of Question Answering

Author(s):  
Min-Yuh Day ◽  
Cheng-Chia Tsai
2021 ◽  
pp. 1-12
Author(s):  
Melesio Crespo-Sanchez ◽  
Ivan Lopez-Arevalo ◽  
Edwin Aldana-Bobadilla ◽  
Alejandro Molina-Villegas

In the last few years, text analysis has grown as a keystone in several domains for solving many real-world problems, such as machine translation, spam detection, and question answering, to mention a few. Many of these tasks can be approached by means of machine learning algorithms. Most of these algorithms take as input a transformation of the text in the form of feature vectors containing an abstraction of the content. Most of recent vector representations focus on the semantic component of text, however, we consider that also taking into account the lexical and syntactic components the abstraction of content could be beneficial for learning tasks. In this work, we propose a content spectral-based text representation applicable to machine learning algorithms for text analysis. This representation integrates the spectra from the lexical, syntactic, and semantic components of text producing an abstract image, which can also be treated by both, text and image learning algorithms. These components came from feature vectors of text. For demonstrating the goodness of our proposal, this was tested on text classification and complexity reading score prediction tasks obtaining promising results.


2018 ◽  
Vol 18 (3-4) ◽  
pp. 623-637 ◽  
Author(s):  
ARINDAM MITRA ◽  
CHITTA BARAL

AbstractOver the years the Artificial Intelligence (AI) community has produced several datasets which have given the machine learning algorithms the opportunity to learn various skills across various domains. However, a subclass of these machine learning algorithms that aimed at learning logic programs, namely the Inductive Logic Programming algorithms, have often failed at the task due to the vastness of these datasets. This has impacted the usability of knowledge representation and reasoning techniques in the development of AI systems. In this research, we try to address this scalability issue for the algorithms that learn answer set programs. We present a sound and complete algorithm which takes the input in a slightly different manner and performs an efficient and more user controlled search for a solution. We show via experiments that our algorithm can learn from two popular datasets from machine learning community, namely bAbl (a question answering dataset) and MNIST (a dataset for handwritten digit recognition), which to the best of our knowledge was not previously possible. The system is publicly available athttps://goo.gl/KdWAcV.


2016 ◽  
Vol 7 (2) ◽  
pp. 43-71 ◽  
Author(s):  
Sangeeta Lal ◽  
Neetu Sardana ◽  
Ashish Sureka

Logging is an important yet tough decision for OSS developers. Machine-learning models are useful in improving several steps of OSS development, including logging. Several recent studies propose machine-learning models to predict logged code construct. The prediction performances of these models are limited due to the class-imbalance problem since the number of logged code constructs is small as compared to non-logged code constructs. No previous study analyzes the class-imbalance problem for logged code construct prediction. The authors first analyze the performances of J48, RF, and SVM classifiers for catch-blocks and if-blocks logged code constructs prediction on imbalanced datasets. Second, the authors propose LogIm, an ensemble and threshold-based machine-learning model. Third, the authors evaluate the performance of LogIm on three open-source projects. On average, LogIm model improves the performance of baseline classifiers, J48, RF, and SVM, by 7.38%, 9.24%, and 4.6% for catch-blocks, and 12.11%, 14.95%, and 19.13% for if-blocks logging prediction.


2020 ◽  
Vol 10 (3) ◽  
pp. 20-34
Author(s):  
Lawrence Master

There are many applications for ranking, including page searching, question answering, recommender systems, sentiment analysis, and collaborative filtering, to name a few. In the past several years, machine learning and information retrieval techniques have been used to develop ranking algorithms and several list wise approaches to learning to rank have been developed. We propose a new method, which we call GeneticListMLE++ and GeneticListNet++, which build on the original ListMLE and ListNet algorithms. Our method substantially improves on the original ListMLE and ListNet ranking approaches by incorporating genetic optimization of hyperparameters, a nonlinear neural network ranking model, and a regularization technique.


Author(s):  
Antonio Juárez-González ◽  
Alberto Téllez-Valero ◽  
Claudia Denicia-Carral ◽  
Manuel Montes-y-Gómez ◽  
Luis Villaseñor-Pineda

Sign in / Sign up

Export Citation Format

Share Document