sample weighting
Recently Published Documents


TOTAL DOCUMENTS

42
(FIVE YEARS 14)

H-INDEX

6
(FIVE YEARS 1)

Electronics ◽  
2021 ◽  
Vol 10 (19) ◽  
pp. 2443
Author(s):  
Tan  Yue ◽  
Yong Li ◽  
Zonghai Hu

The structure of a document contains rich information such as logical relations in context, hierarchy, affiliation, dependence, and applicability. It will greatly affect the accuracy of document information processing, particularly of legal documents and business contracts. Therefore, intelligent document structural analysis is important to information extraction and data mining. However, unlike the well-studied field of text semantic analysis, current work in document structural analysis is still scarce. In this paper, we propose an intelligent document structural analysis framework through data pre-processing, feature engineering, and structural classification with a dynamic sample weighting algorithm. As a typical application, we collect more than 11,000 insurance document content samples and carry out the machine learning experiments to check the efficiency of our framework. Meanwhile, to address the sample imbalance problem in the hierarchy classification task, a dynamic sample weighting algorithm is incorporated into our Dynamic Weighting Structural Analysis (DWSA) framework, in which the weights of different category tags according to the structural levels are iterated dynamically in training. Our results show that the DWSA has significantly improved the comprehensive accuracy and the classification F1-score of each category. The comprehensive accuracy is as high as 94.68% (3.36% absolute improvement) and the Macro F1-score is 88.29% (5.1% absolute improvement).


2021 ◽  
Vol 13 (16) ◽  
pp. 3233
Author(s):  
Pawel Slowak ◽  
Piotr Kaniewski

This paper presents a solution to the problem of simultaneous localization and mapping (SLAM), developed from a particle filter, utilizing a monocular camera as its main sensor. It implements a novel sample-weighting idea, based on the of sorting of particles into sets and separating those sets with an importance-factor offset. The grouping criteria for samples is the number of landmarks correctly matched by a given particle. This results in the stratification of samples and amplifies weighted differences. The proposed system is designed for a UAV, navigating outdoors, with a downward-pointed camera. To evaluate the proposed method, it is compared with different samples-weighting approaches, using simulated and real-world data. The conducted experiments show that the developed SLAM solution is more accurate and robust than other particle-filter methods, as it allows the employment of a smaller number of particles, lowering the overall computational complexity.


2021 ◽  
Author(s):  
Michael Steininger ◽  
Konstantin Kobs ◽  
Padraig Davidson ◽  
Anna Krause ◽  
Andreas Hotho

AbstractIn many real world settings, imbalanced data impedes model performance of learning algorithms, like neural networks, mostly for rare cases. This is especially problematic for tasks focusing on these rare occurrences. For example, when estimating precipitation, extreme rainfall events are scarce but important considering their potential consequences. While there are numerous well studied solutions for classification settings, most of them cannot be applied to regression easily. Of the few solutions for regression tasks, barely any have explored cost-sensitive learning which is known to have advantages compared to sampling-based methods in classification tasks. In this work, we propose a sample weighting approach for imbalanced regression datasets called DenseWeight and a cost-sensitive learning approach for neural network regression with imbalanced data called DenseLoss based on our weighting scheme. DenseWeight weights data points according to their target value rarities through kernel density estimation (KDE). DenseLoss adjusts each data point’s influence on the loss according to DenseWeight, giving rare data points more influence on model training compared to common data points. We show on multiple differently distributed datasets that DenseLoss significantly improves model performance for rare data points through its density-based weighting scheme. Additionally, we compare DenseLoss to the state-of-the-art method SMOGN, finding that our method mostly yields better performance. Our approach provides more control over model training as it enables us to actively decide on the trade-off between focusing on common or rare cases through a single hyperparameter, allowing the training of better models for rare data points.


Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 3273
Author(s):  
Ehsan Othman ◽  
Philipp Werner ◽  
Frerk Saxen ◽  
Ayoub Al-Hamadi ◽  
Sascha Gruss ◽  
...  

Prior work on automated methods demonstrated that it is possible to recognize pain intensity from frontal faces in videos, while there is an assumption that humans are very adept at this task compared to machines. In this paper, we investigate whether such an assumption is correct by comparing the results achieved by two human observers with the results achieved by a Random Forest classifier (RFc) baseline model (called RFc-BL) and by three proposed automated models. The first proposed model is a Random Forest classifying descriptors of Action Unit (AU) time series; the second is a modified MobileNetV2 CNN classifying face images that combine three points in time; and the third is a custom deep network combining two CNN branches using the same input as for MobileNetV2 plus knowledge of the RFc. We conduct experiments with X-ITE phasic pain database, which comprises videotaped responses to heat and electrical pain stimuli, each of three intensities. Distinguishing these six stimulation types plus no stimulation was the main 7-class classification task for the human observers and automated approaches. Further, we conducted reduced 5-class and 3-class classification experiments, applied Multi-task learning, and a newly suggested sample weighting method. Experimental results show that the pain assessments of the human observers are significantly better than guessing and perform better than the automatic baseline approach (RFc-BL) by about 1%; however, the human performance is quite poor due to the challenge that pain that is ethically allowed to be induced in experimental studies often does not show up in facial reaction. We discovered that downweighting those samples during training improves the performance for all samples. The proposed RFc and two-CNNs models (using the proposed sample weighting) significantly outperformed the human observer by about 6% and 7%, respectively.


2021 ◽  
Vol 31 (2) ◽  
Author(s):  
Shane Barratt ◽  
Guillermo Angeris ◽  
Stephen Boyd

2020 ◽  
Vol 189 (7) ◽  
pp. 717-725 ◽  
Author(s):  
Marnie Downes ◽  
John B Carlin

Abstract Multilevel regression and poststratification (MRP) is a model-based approach for estimating a population parameter of interest, generally from large-scale surveys. It has been shown to be effective in highly selected samples, which is particularly relevant to investigators of large-scale population health and epidemiologic surveys facing increasing difficulties in recruiting representative samples of participants. We aimed to further examine the accuracy and precision of MRP in a context where census data provided reasonable proxies for true population quantities of interest. We considered 2 outcomes from the baseline wave of the Ten to Men study (Australia, 2013–2014) and obtained relevant population data from the 2011 Australian Census. MRP was found to achieve generally superior performance relative to conventional survey weighting methods for the population as a whole and for population subsets of varying sizes. MRP resulted in less variability among estimates across population subsets relative to sample weighting, and there was some evidence of small gains in precision when using MRP, particularly for smaller population subsets. These findings offer further support for MRP as a promising analytical approach for addressing participation bias in the estimation of population descriptive quantities from large-scale health surveys and cohort studies.


Sign in / Sign up

Export Citation Format

Share Document