scholarly journals Method to collect ground truth data for walking speed in real-world environments: description and validation

Author(s):  
Gerhard Aigner ◽  
Bernd Grimm ◽  
Christian Lederer ◽  
Martin Daumer

Background. Physical activity (PA) is increasingly being recognized as a major factor related to the development or prevention of many diseases, as an intervention to cure or delay disease and for patient assessment in diagnostics, as a clinical outcome measure or clinical trial endpoint. Thus, wearable sensors and signal algorithms to monitor PA in the free-living environment (real-world) are becoming popular in medicine and clinical research. This is especially true for walking speed, a parameter of PA behaviour with increasing evidence to serve as a patient outcome and clinical trial endpoint in many diseases. The development and validation of sensor signal algorithms for PA classification, in particular walking, and deriving specific PA parameters, such as real world walking speed depends on the availability of large reference data sets with ground truth values. In this study a novel, reliable, scalable (high throughput), user-friendly device and method to generate such ground truth data for real world walking speed, other physical activity types and further gait-related parameters in a real-world environment is described and validated. Methods. A surveyor’s wheel was instrumented with a rotating 3D accelerometer (actibelt). A signal processing algorithm is described to derive distance and speed values. In addition, a high-resolution camera was attached via an active gimbal to video record context and detail. Validation was performed in the following main parts: 1) walking distance measurement is compared to the wheel’s built-in mechanical counter, 2) walking speed measurement is analysed on a treadmill at various speed settings, 3) speed measurement accuracy is analysed by an independent certified calibration laboratory - accreditation by DAkkS applying standardised test procedures. Results: The mean relative error for distance measurements between our method and the built-in counter was 0.12%. Comparison of the speed values algorithmically extracted from accelerometry data and true treadmill speed revealed a mean adjusted absolute error of 0.01 m/s (relative error: 0.71 %). The calibration laboratory found a mean relative error between values algorithmically extracted from accelerometry data and laboratory gold standard of 0.36% (0.17-0.64 min/max), which is below the resolution of the laboratory. An official certificate was issued. Discussion. Error values were a magnitude smaller than the any clinically important difference for walking speed. Conclusion. Besides the high accuracy, the presented method can be deployed in a real world setting and allows to be integrated into the digital data flow.

2019 ◽  
Author(s):  
Gerhard Aigner ◽  
Bernd Grimm ◽  
Christian Lederer ◽  
Martin Daumer

Background. Physical activity (PA) is increasingly being recognized as a major factor related to the development or prevention of many diseases, as an intervention to cure or delay disease and for patient assessment in diagnostics, as a clinical outcome measure or clinical trial endpoint. Thus, wearable sensors and signal algorithms to monitor PA in the free-living environment (real-world) are becoming popular in medicine and clinical research. This is especially true for walking speed, a parameter of PA behaviour with increasing evidence to serve as a patient outcome and clinical trial endpoint in many diseases. The development and validation of sensor signal algorithms for PA classification, in particular walking, and deriving specific PA parameters, such as real world walking speed depends on the availability of large reference data sets with ground truth values. In this study a novel, reliable, scalable (high throughput), user-friendly device and method to generate such ground truth data for real world walking speed, other physical activity types and further gait-related parameters in a real-world environment is described and validated. Methods. A surveyor’s wheel was instrumented with a rotating 3D accelerometer (actibelt). A signal processing algorithm is described to derive distance and speed values. In addition, a high-resolution camera was attached via an active gimbal to video record context and detail. Validation was performed in the following main parts: 1) walking distance measurement is compared to the wheel’s built-in mechanical counter, 2) walking speed measurement is analysed on a treadmill at various speed settings, 3) speed measurement accuracy is analysed by an independent certified calibration laboratory - accreditation by DAkkS applying standardised test procedures. Results: The mean relative error for distance measurements between our method and the built-in counter was 0.12%. Comparison of the speed values algorithmically extracted from accelerometry data and true treadmill speed revealed a mean adjusted absolute error of 0.01 m/s (relative error: 0.71 %). The calibration laboratory found a mean relative error between values algorithmically extracted from accelerometry data and laboratory gold standard of 0.36% (0.17-0.64 min/max), which is below the resolution of the laboratory. An official certificate was issued. Discussion. Error values were a magnitude smaller than the any clinically important difference for walking speed. Conclusion. Besides the high accuracy, the presented method can be deployed in a real world setting and allows to be integrated into the digital data flow.


2021 ◽  
Vol 14 (6) ◽  
pp. 997-1005
Author(s):  
Sandeep Tata ◽  
Navneet Potti ◽  
James B. Wendt ◽  
Lauro Beltrão Costa ◽  
Marc Najork ◽  
...  

Extracting structured information from templatic documents is an important problem with the potential to automate many real-world business workflows such as payment, procurement, and payroll. The core challenge is that such documents can be laid out in virtually infinitely different ways. A good solution to this problem is one that generalizes well not only to known templates such as invoices from a known vendor, but also to unseen ones. We developed a system called Glean to tackle this problem. Given a target schema for a document type and some labeled documents of that type, Glean uses machine learning to automatically extract structured information from other documents of that type. In this paper, we describe the overall architecture of Glean, and discuss three key data management challenges : 1) managing the quality of ground truth data, 2) generating training data for the machine learning model using labeled documents, and 3) building tools that help a developer rapidly build and improve a model for a given document type. Through empirical studies on a real-world dataset, we show that these data management techniques allow us to train a model that is over 5 F1 points better than the exact same model architecture without the techniques we describe. We argue that for such information-extraction problems, designing abstractions that carefully manage the training data is at least as important as choosing a good model architecture.


2020 ◽  
Vol 34 (07) ◽  
pp. 11661-11668 ◽  
Author(s):  
Yunfei Liu ◽  
Feng Lu

Many real world vision tasks, such as reflection removal from a transparent surface and intrinsic image decomposition, can be modeled as single image layer separation. However, this problem is highly ill-posed, requiring accurately aligned and hard to collect triplet data to train the CNN models. To address this problem, this paper proposes an unsupervised method that requires no ground truth data triplet in training. At the core of the method are two assumptions about data distributions in the latent spaces of different layers, based on which a novel unsupervised layer separation pipeline can be derived. Then the method can be constructed based on the GANs framework with self-supervision and cycle consistency constraints, etc. Experimental results demonstrate its successfulness in outperforming existing unsupervised methods in both synthetic and real world tasks. The method also shows its ability to solve a more challenging multi-layer separation task.


Author(s):  
Jose Paredes ◽  
Gerardo Simari ◽  
Maria Vanina Martinez ◽  
Marcelo Falappa

In traditional databases, the entity resolution problem (which is also known as deduplication), refers to the task of mapping multiple manifestations of virtual objects to its corresponding real-world entity. When addressing this problem, in both theory and practice, it is widely assumed that such sets of virtual object appear as the result of clerical errors, transliterations, missing or updated attributes, abbreviations, and so forth. In this paper, we address this problem under the assumption that this situation is caused by malicious actors operating in domains in which they do not wish to be identified, such as hacker forums and markets in which the participants are motivated to remain semi-anonymous (though they wish to keep their true identities secret, they find it useful for customers to identify their products and services). We are therefore in the presence of a different, even more challenging problem that we refer to as adversarial deduplication. In this paper, we study this problem via examples that arise from real-world data on malicious hacker forums and markets arising from collaborations with a cyber threat intelligence company focusing on understanding this kind of behavior. We argue that it is very difficult---if not impossible---to find ground truth data on which to build solutions to this problem, and develop a set of preliminary experiments based on training machine learning classifiers that leverage text analysis to detect potential cases of duplicate entities. Our results are encouraging as a first step towards building tools that human analysts can use to enhance their capabilities towards fighting cyber threats.


Information ◽  
2018 ◽  
Vol 9 (8) ◽  
pp. 189 ◽  
Author(s):  
Jose Paredes ◽  
Gerardo Simari ◽  
Maria Martinez ◽  
Marcelo Falappa

In traditional databases, the entity resolution problem (which is also known as deduplication) refers to the task of mapping multiple manifestations of virtual objects to their corresponding real-world entities. When addressing this problem, in both theory and practice, it is widely assumed that such sets of virtual objects appear as the result of clerical errors, transliterations, missing or updated attributes, abbreviations, and so forth. In this paper, we address this problem under the assumption that this situation is caused by malicious actors operating in domains in which they do not wish to be identified, such as hacker forums and markets in which the participants are motivated to remain semi-anonymous (though they wish to keep their true identities secret, they find it useful for customers to identify their products and services). We are therefore in the presence of a different, and even more challenging, problem that we refer to as adversarial deduplication. In this paper, we study this problem via examples that arise from real-world data on malicious hacker forums and markets arising from collaborations with a cyber threat intelligence company focusing on understanding this kind of behavior. We argue that it is very difficult—if not impossible—to find ground truth data on which to build solutions to this problem, and develop a set of preliminary experiments based on training machine learning classifiers that leverage text analysis to detect potential cases of duplicate entities. Our results are encouraging as a first step towards building tools that human analysts can use to enhance their capabilities towards fighting cyber threats.


Author(s):  
Jose Paredes ◽  
Gerardo Simari ◽  
Maria Vanina Martinez ◽  
Marcelo Falappa

In traditional databases, the entity resolution problem (which is also known as deduplication), refers to the task of mapping multiple manifestations of virtual objects to its corresponding real-world entity. When addressing this problem, in both theory and practice, it is widely assumed that such sets of virtual object appear as the result of clerical errors, transliterations, missing or updated attributes, abbreviations, and so forth. In this paper, we address this problem under the assumption that this situation is caused by malicious actors operating in domains in which they do not wish to be identified, such as hacker forums and markets in which the participants are motivated to remain semi-anonymous (though they wish to keep their true identities secret, they find it useful for customers to identify their products and services). We are therefore in the presence of a different, even more challenging problem that we refer to as adversarial deduplication. In this paper, we study this problem via examples that arise from real-world data on malicious hacker forums and markets arising from collaborations with a cyber threat intelligence company focusing on understanding this kind of behavior. We argue that it is very difficult---if not impossible---to find ground truth data on which to build solutions to this problem, and develop a set of preliminary experiments based on training machine learning classifiers that leverage text analysis to detect potential cases of duplicate entities. Our results are encouraging as a first step towards building tools that human analysts can use to enhance their capabilities towards fighting cyber threats.


2020 ◽  
Vol 2020 (17) ◽  
pp. 36-1-36-7
Author(s):  
Umamaheswaran RAMAN KUMAR ◽  
Inge COUDRON ◽  
Steven PUTTEMANS ◽  
Patrick VANDEWALLE

Applications ranging from simple visualization to complex design require 3D models of indoor environments. This has given rise to advancements in the field of automated reconstruction of such models. In this paper, we review several state-of-the-art metrics proposed for geometric comparison of 3D models of building interiors. We evaluate their performance on a real-world dataset and propose one tailored metric which can be used to assess the quality of the reconstructed model. In addition, the proposed metric can also be easily visualized to highlight the regions or structures where the reconstruction failed. To demonstrate the versatility of the proposed metric we conducted experiments on various interior models by comparison with ground truth data created by expert Blender artists. The results of the experiments were then used to improve the reconstruction pipeline.


2021 ◽  
Vol 26 (4) ◽  
pp. 484-506
Author(s):  
Toby Howison ◽  
Simon Hauser ◽  
Josie Hughes ◽  
Fumiya Iida

We introduce the framework of reality-assisted evolution to summarize a growing trend towards combining model-based and model-free approaches to improve the design of physically embodied soft robots. In silico, data-driven models build, adapt, and improve representations of the target system using real-world experimental data. By simulating huge numbers of virtual robots using these data-driven models, optimization algorithms can illuminate multiple design candidates for transference to the real world. In reality, large-scale physical experimentation facilitates the fabrication, testing, and analysis of multiple candidate designs. Automated assembly and reconfigurable modular systems enable significantly higher numbers of real-world design evaluations than previously possible. Large volumes of ground-truth data gathered via physical experimentation can be returned to the virtual environment to improve data-driven models and guide optimization. Grounding the design process in physical experimentation ensures that the complexity of virtual robot designs does not outpace the model limitations or available fabrication technologies. We outline key developments in the design of physically embodied soft robots in the framework of reality-assisted evolution.


2011 ◽  
Vol 18 (0) ◽  
Author(s):  
Sebastien J. Hotte ◽  
G.A. Bjarnason ◽  
D.Y.C. Heng ◽  
M.A.S. Jewett ◽  
A. Kapoor ◽  
...  

2021 ◽  
Vol 13 (10) ◽  
pp. 1966
Author(s):  
Christopher W Smith ◽  
Santosh K Panda ◽  
Uma S Bhatt ◽  
Franz J Meyer ◽  
Anushree Badola ◽  
...  

In recent years, there have been rapid improvements in both remote sensing methods and satellite image availability that have the potential to massively improve burn severity assessments of the Alaskan boreal forest. In this study, we utilized recent pre- and post-fire Sentinel-2 satellite imagery of the 2019 Nugget Creek and Shovel Creek burn scars located in Interior Alaska to both assess burn severity across the burn scars and test the effectiveness of several remote sensing methods for generating accurate map products: Normalized Difference Vegetation Index (NDVI), Normalized Burn Ratio (NBR), and Random Forest (RF) and Support Vector Machine (SVM) supervised classification. We used 52 Composite Burn Index (CBI) plots from the Shovel Creek burn scar and 28 from the Nugget Creek burn scar for training classifiers and product validation. For the Shovel Creek burn scar, the RF and SVM machine learning (ML) classification methods outperformed the traditional spectral indices that use linear regression to separate burn severity classes (RF and SVM accuracy, 83.33%, versus NBR accuracy, 73.08%). However, for the Nugget Creek burn scar, the NDVI product (accuracy: 96%) outperformed the other indices and ML classifiers. In this study, we demonstrated that when sufficient ground truth data is available, the ML classifiers can be very effective for reliable mapping of burn severity in the Alaskan boreal forest. Since the performance of ML classifiers are dependent on the quantity of ground truth data, when sufficient ground truth data is available, the ML classification methods would be better at assessing burn severity, whereas with limited ground truth data the traditional spectral indices would be better suited. We also looked at the relationship between burn severity, fuel type, and topography (aspect and slope) and found that the relationship is site-dependent.


Sign in / Sign up

Export Citation Format

Share Document