scholarly journals A stacked ensemble for the detection of COVID-19 with high recall and accuracy

Author(s):  
Ebenezer Jangam ◽  
Chandra Sekhara Rao Annavarapu
Keyword(s):  
2021 ◽  
Vol 19 (12) ◽  
pp. 2105-2112
Author(s):  
Matheus Vinicius Todescato ◽  
Jean Hilger ◽  
Guilherme Dal Bianco

2020 ◽  
Vol 38 (3) ◽  
pp. 1-35 ◽  
Author(s):  
Jie Zou ◽  
Evangelos Kanoulas

2013 ◽  
Vol 753-755 ◽  
pp. 3018-3024 ◽  
Author(s):  
Fen Gyu Yang ◽  
Ying Chen ◽  
Ye Zhang

As increasing data have been collected in many applications, we have to face with millions of data in record linkage. With respect to traditional methods, there comes out a big challenge in performance while dealing with massive data. Parallel computing framework, such as MapReduce, has become an efficient and practical way to address this problem. In this paper, we propose a practical 3-phase MapReduce approach that fulfills blocking, filtering, and linking in 3 consecutive processes on Hadoop cluster. Experiments show that our approach functions efficiently and effectively with keeping high recall in contrast to tradition method.


Author(s):  
Nuthan Munaiah ◽  
Steven Kroh ◽  
Craig Cabrey ◽  
Meiyappan Nagappan

Software forges like GitHub host millions of repositories. Software engineering researchers have been able to take advantage of such a large corpora of potential study subjects with the help of tools like GHTorrent and Boa. However, the simplicity in querying comes with a caveat: there are limited means of separating the signal (e.g. repositories containing engineered software projects) from the noise (e.g. repositories containing home work assignments). The proportion of noise in a random sample of repositories could skew the study and may lead to researchers reaching unrealistic, potentially inaccurate, conclusions. We argue that it is imperative to have the ability to sieve out the noise in such large repository forges. We propose a framework, and present a reference implementation of the framework as a tool called reaper, to enable researchers to select GitHub repositories that contain evidence of an engineered software project. We identify software engineering practices (called dimensions) and propose means for validating their existence in a GitHub repository. We used reaper to measure the dimensions of 1,994,977 GitHub repositories. We then used the data set train classifiers capable of predicting if a given GitHub repository contains an engineered software project. The performance of the classifiers was evaluated using a set of 200 repositories with known ground truth classification. We also compared the performance of the classifiers to other approaches to classification (e.g. number of GitHub Stargazers) and found our classifiers to outperform existing approaches. We found stargazers-based classifier to exhibit high precision (96%) but an inversely proportional recall (27%). On the other hand, our best classifier exhibited a high precision (82%) and a high recall (83%). The stargazer-based criteria offers precision but fails to recall a significant potion of the population.


2019 ◽  
Vol 23 (1) ◽  
pp. 1-26 ◽  
Author(s):  
Haotian Zhang ◽  
Gordon V. Cormack ◽  
Maura R. Grossman ◽  
Mark D. Smucker

Sign in / Sign up

Export Citation Format

Share Document