error removal
Recently Published Documents


TOTAL DOCUMENTS

54
(FIVE YEARS 15)

H-INDEX

13
(FIVE YEARS 3)

Author(s):  
Yanan Du ◽  
Haiqiang Fu ◽  
Lin Liu ◽  
Guangcai Feng ◽  
Xing Peng ◽  
...  

2021 ◽  
Vol 42 (20) ◽  
pp. 7863-7879
Author(s):  
Ram Narayan Patro ◽  
Subhashree Subudhi ◽  
Pradyut Kumar Biswal ◽  
Fabio Dell’Acqua ◽  
Harish Kumar Sahoo

2021 ◽  
Vol 14 (6) ◽  
pp. 1040-1052
Author(s):  
Haibo Wang ◽  
Chaoyi Ma ◽  
Olufemi O Odegbile ◽  
Shigang Chen ◽  
Jih-Kwon Peir

Measuring flow spread in real time from large, high-rate data streams has numerous practical applications, where a data stream is modeled as a sequence of data items from different flows and the spread of a flow is the number of distinct items in the flow. Past decades have witnessed tremendous performance improvement for single-flow spread estimation. However, when dealing with numerous flows in a data stream, it remains a significant challenge to measure per-flow spread accurately while reducing memory footprint. The goal of this paper is to introduce new multi-flow spread estimation designs that incur much smaller processing overhead and query overhead than the state of the art, yet achieves significant accuracy improvement in spread estimation. We formally analyze the performance of these new designs. We implement them in both hardware and software, and use real-world data traces to evaluate their performance in comparison with the state of the art. The experimental results show that our best sketch significantly improves over the best existing work in terms of estimation accuracy, data item processing throughput, and online query throughput.


2021 ◽  
pp. 92-105
Author(s):  
Zhuolin Zheng ◽  
Yinzhang Ding ◽  
Xiaotian Tang ◽  
Yu Cai ◽  
Dongxiao Li ◽  
...  

2020 ◽  
Vol 12 (16) ◽  
pp. 2559 ◽  
Author(s):  
Yuzhen Zhang ◽  
Shunlin Liang

Many advanced satellite estimation methods have been developed, but global forest aboveground biomass (AGB) products remain largely uncertain. In this study, we explored data fusion techniques to generate a global forest AGB map for the 2000s at 0.01-degree resolution with improved accuracy by integrating ten existing local or global maps. The error removal and simple averaging algorithm, which is efficient and makes no assumption about the data and associated errors, was proposed to integrate these ten forest AGB maps. We first compiled the global reference AGB from in situ measurements and high-resolution AGB data that were originally derived from field data and airborne lidar data and determined the errors of each forest AGB map at the pixels with corresponding reference AGB values. Based on the errors determined from reference AGB data, the pixel-by-pixel errors associated with each of the ten AGB datasets were estimated from multiple predictors (e.g., leaf area index, forest canopy height, forest cover, land surface elevation, slope, temperature, and precipitation) using the random forest algorithm. The estimated pixel-by-pixel errors were then removed from the corresponding forest AGB datasets, and finally, global forest AGB maps were generated by combining the calibrated existing forest AGB datasets using the simple averaging algorithm. Cross-validation using reference AGB data showed that the accuracy of the fused global forest AGB map had an R-squared of 0.61 and a root mean square error (RMSE) of 53.68 Mg/ha, which is better than the reported accuracies (R-squared of 0.56 and RMSE larger than 80 Mg/ha) in the literature. Intercomparison with previous studies also suggested that the fused AGB estimates were much closer to the reference AGB values. This study attempted to integrate existing forest AGB datasets for generating a global forest AGB map with better accuracy and moved one step forward for our understanding of the global terrestrial carbon cycle by providing improved benchmarks of global forest carbon stocks.


DNA Research ◽  
2020 ◽  
Vol 27 (3) ◽  
Author(s):  
Rei Kajitani ◽  
Dai Yoshimura ◽  
Yoshitoshi Ogura ◽  
Yasuhiro Gotoh ◽  
Tetsuya Hayashi ◽  
...  

Abstract De novo assembly of short DNA reads remains an essential technology, especially for large-scale projects and high-resolution variant analyses in epidemiology. However, the existing tools often lack sufficient accuracy required to compare closely related strains. To facilitate such studies on bacterial genomes, we developed Platanus_B, a de novo assembler that employs iterations of multiple error-removal algorithms. The benchmarks demonstrated the superior accuracy and high contiguity of Platanus_B, in addition to its ability to enhance the hybrid assembly of both short and nanopore long reads. Although the hybrid strategies for short and long reads were effective in achieving near full-length genomes, we found that short-read-only assemblies generated with Platanus_B were sufficient to obtain ≥90% of exact coding sequences in most cases. In addition, while nanopore long-read-only assemblies lacked fine-scale accuracies, inclusion of short reads was effective in improving the accuracies. Platanus_B can, therefore, be used for comprehensive genomic surveillances of bacterial pathogens and high-resolution phylogenomic analyses of a wide range of bacteria.


2020 ◽  
Vol 9 (4) ◽  
pp. 940-952
Author(s):  
Jia Zhang ◽  
Yefei Wang ◽  
Baihui Chai ◽  
Jichao Wang ◽  
Lulu Li ◽  
...  

2020 ◽  
Vol 19 ◽  
pp. 117693512091795
Author(s):  
Zeinab Sajjadnia ◽  
Raof Khayami ◽  
Mohammad Reza Moosavi

In recent years, due to an increase in the incidence of different cancers, various data sources are available in this field. Consequently, many researchers have become interested in the discovery of useful knowledge from available data to assist faster decision-making by doctors and reduce the negative consequences of such diseases. Data mining includes a set of useful techniques in the discovery of knowledge from the data: detecting hidden patterns and finding unknown relations. However, these techniques face several challenges with real-world data. Particularly, dealing with inconsistencies, errors, noise, and missing values requires appropriate preprocessing and data preparation procedures. In this article, we investigate the impact of preprocessing to provide high-quality data for classification techniques. A wide range of preprocessing and data preparation methods are studied, and a set of preprocessing steps was leveraged to obtain appropriate classification results. The preprocessing is done on a real-world breast cancer dataset of the Reza Radiation Oncology Center in Mashhad with various features and a great percentage of null values, and the results are reported in this article. To evaluate the impact of the preprocessing steps on the results of classification algorithms, this case study was divided into the following 3 experiments: Breast cancer recurrence prediction without data preprocessing Breast cancer recurrence prediction by error removal Breast cancer recurrence prediction by error removal and filling null values Then, in each experiment, dimensionality reduction techniques are used to select a suitable subset of features for the problem at hand. Breast cancer recurrence prediction models are constructed using the 3 widely used classification algorithms, namely, naïve Bayes, k-nearest neighbor, and sequential minimal optimization. The evaluation of the experiments is done in terms of accuracy, sensitivity, F-measure, precision, and G-mean measures. Our results show that recurrence prediction is significantly improved after data preprocessing, especially in terms of sensitivity, F-measure, precision, and G-mean measures.


2020 ◽  
Vol 22 (7) ◽  
pp. 3789-3799 ◽  
Author(s):  
Jorge Vargas ◽  
Peter Ufondu ◽  
Tunna Baruah ◽  
Yoh Yamamoto ◽  
Koblar A. Jackson ◽  
...  

Removing self-interaction errors in density functional approximations results in significantly improved vertical detachment energies of water anions and is essential for obtaining orbital energies consistent with electron binding energies.


Sign in / Sign up

Export Citation Format

Share Document