Reality-Assisted Evolution of Soft Robots through Large-Scale Physical Experimentation: A Review

We introduce the framework of reality-assisted evolution to summarize a growing trend towards combining model-based and model-free approaches to improve the design of physically embodied soft robots. In silico, data-driven models build, adapt, and improve representations of the target system using real-world experimental data. By simulating huge numbers of virtual robots using these data-driven models, optimization algorithms can illuminate multiple design candidates for transference to the real world. In reality, large-scale physical experimentation facilitates the fabrication, testing, and analysis of multiple candidate designs. Automated assembly and reconfigurable modular systems enable significantly higher numbers of real-world design evaluations than previously possible. Large volumes of ground-truth data gathered via physical experimentation can be returned to the virtual environment to improve data-driven models and guide optimization. Grounding the design process in physical experimentation ensures that the complexity of virtual robot designs does not outpace the model limitations or available fabrication technologies. We outline key developments in the design of physically embodied soft robots in the framework of reality-assisted evolution.

Download Full-text

Glean

Proceedings of the VLDB Endowment ◽

10.14778/3447689.3447703 ◽

2021 ◽

Vol 14 (6) ◽

pp. 997-1005

Author(s):

Sandeep Tata ◽

Navneet Potti ◽

James B. Wendt ◽

Lauro Beltrão Costa ◽

Marc Najork ◽

...

Keyword(s):

Machine Learning ◽

Data Management ◽

Real World ◽

Empirical Studies ◽

Ground Truth ◽

Training Data ◽

Ground Truth Data ◽

Document Type ◽

Machine Learning Model ◽

Structured Information

Extracting structured information from templatic documents is an important problem with the potential to automate many real-world business workflows such as payment, procurement, and payroll. The core challenge is that such documents can be laid out in virtually infinitely different ways. A good solution to this problem is one that generalizes well not only to known templates such as invoices from a known vendor, but also to unseen ones. We developed a system called Glean to tackle this problem. Given a target schema for a document type and some labeled documents of that type, Glean uses machine learning to automatically extract structured information from other documents of that type. In this paper, we describe the overall architecture of Glean, and discuss three key data management challenges : 1) managing the quality of ground truth data, 2) generating training data for the machine learning model using labeled documents, and 3) building tools that help a developer rapidly build and improve a model for a given document type. Through empirical studies on a real-world dataset, we show that these data management techniques allow us to train a model that is over 5 F1 points better than the exact same model architecture without the techniques we describe. We argue that for such information-extraction problems, designing abstractions that carefully manage the training data is at least as important as choosing a good model architecture.

Download Full-text

Logo-2K+: A Large-Scale Logo Dataset for Scalable Logo Classification

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6085 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6194-6201

Author(s):

Jing Wang ◽

Weiqing Min ◽

Sujuan Hou ◽

Shengnan Ma ◽

Yuanjie Zheng ◽

...

Keyword(s):

Image Recognition ◽

Real World ◽

Large Scale ◽

Data Augmentation ◽

Ground Truth ◽

Classification Task ◽

The Real ◽

Product Recommendation ◽

Contextual Advertising ◽

Benchmark Datasets

Logo classification has gained increasing attention for its various applications, such as copyright infringement detection, product recommendation and contextual advertising. Compared with other types of object images, the real-world logo images have larger variety in logo appearance and more complexity in their background. Therefore, recognizing the logo from images is challenging. To support efforts towards scalable logo classification task, we have curated a dataset, Logo-2K+, a new large-scale publicly available real-world logo dataset with 2,341 categories and 167,140 images. Compared with existing popular logo datasets, such as FlickrLogos-32 and LOGO-Net, Logo-2K+ has more comprehensive coverage of logo categories and larger quantity of logo images. Moreover, we propose a Discriminative Region Navigation and Augmentation Network (DRNA-Net), which is capable of discovering more informative logo regions and augmenting these image regions for logo classification. DRNA-Net consists of four sub-networks: the navigator sub-network first selected informative logo-relevant regions guided by the teacher sub-network, which can evaluate its confidence belonging to the ground-truth logo class. The data augmentation sub-network then augments the selected regions via both region cropping and region dropping. Finally, the scrutinizer sub-network fuses features from augmented regions and the whole image for logo classification. Comprehensive experiments on Logo-2K+ and other three existing benchmark datasets demonstrate the effectiveness of proposed method. Logo-2K+ and the proposed strong baseline DRNA-Net are expected to further the development of scalable logo image recognition, and the Logo-2K+ dataset can be found at https://github.com/msn199959/Logo-2k-plus-Dataset.

Download Full-text

A Product Feature Inference Model for Mining Implicit Customer Preferences Within Large Scale Social Media Networks

Volume 1B: 35th Computers and Information in Engineering Conference ◽

10.1115/detc2015-47225 ◽

2015 ◽

Cited By ~ 6

Author(s):

Suppawong Tuarob ◽

Conrad S. Tucker

Keyword(s):

Social Media ◽

Large Scale ◽

Ground Truth ◽

Inference Model ◽

Underlying Assumption ◽

Ground Truth Data ◽

Social Media Networks ◽

Customer Preferences ◽

Product Features ◽

Online Sources

The acquisition and mining of product feature data from online sources such as customer review websites and large scale social media networks is an emerging area of research. In many existing design methodologies that acquire product feature preferences form online sources, the underlying assumption is that product features expressed by customers are explicitly stated and readily observable to be mined using product feature extraction tools. In many scenarios however, product feature preferences expressed by customers are implicit in nature and do not directly map to engineering design targets. For example, a customer may implicitly state “wow I have to squint to read this on the screen”, when the explicit product feature may be a larger screen. The authors of this work propose an inference model that automatically assigns the most probable explicit product feature desired by a customer, given an implicit preference expressed. The algorithm iteratively refines its inference model by presenting a hypothesis and using ground truth data, determining its statistical validity. A case study involving smartphone product features expressed through Twitter networks is presented to demonstrate the effectiveness of the proposed methodology.

Download Full-text

SHYBRID: A graphical tool for generating hybrid ground-truth spiking data for evaluating spike sorting performance

10.1101/734061 ◽

2019 ◽

Cited By ~ 4

Author(s):

Jasper Wouters ◽

Fabian Kloosterman ◽

Alexander Bertrand

Keyword(s):

Ground Truth ◽

Neural Recording ◽

Data Driven ◽

Sorting Algorithm ◽

Spike Sorting ◽

Recording Device ◽

Ground Truth Data ◽

Sorting Algorithms ◽

Graphical Tool ◽

Fine Tune

AbstractSpike sorting is the process of retrieving the spike times of individual neurons that are present in an extracellular neural recording. Over the last decades, many spike sorting algorithms have been published. In an effort to guide a user towards a specific spike sorting algorithm, given a specific recording setting (i.e., brain region and recording device), we provide an open-source graphical tool for the generation of hybrid ground-truth data in Python. Hybrid ground-truth data is a data-driven modelling paradigm in which spikes from a single unit are moved to a different location on the recording probe, thereby generating a virtual unit of which the spike times are known. The tool enables a user to efficiently generate hybrid ground-truth datasets and make informed decisions between spike sorting algorithms, fine-tune the algorithm parameters towards the used recording setting, or get a deeper understanding of those algorithms.

Download Full-text

Separate in Latent Space: Unsupervised Single Image Layer Separation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6835 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11661-11668 ◽

Cited By ~ 3

Author(s):

Yunfei Liu ◽

Feng Lu

Keyword(s):

Real World ◽

Ground Truth ◽

Single Image ◽

Ground Truth Data ◽

Layer Separation ◽

World Vision ◽

Image Layer ◽

Latent Space ◽

Consistency Constraints ◽

Ill Posed

Many real world vision tasks, such as reflection removal from a transparent surface and intrinsic image decomposition, can be modeled as single image layer separation. However, this problem is highly ill-posed, requiring accurately aligned and hard to collect triplet data to train the CNN models. To address this problem, this paper proposes an unsupervised method that requires no ground truth data triplet in training. At the core of the method are two assumptions about data distributions in the latent spaces of different layers, based on which a novel unsupervised layer separation pipeline can be derived. Then the method can be constructed based on the GANs framework with self-supervision and cycle consistency constraints, etc. Experimental results demonstrate its successfulness in outperforming existing unsupervised methods in both synthetic and real world tasks. The method also shows its ability to solve a more challenging multi-layer separation task.

Download Full-text

Measuring Flood Discharge

Oxford Research Encyclopedia of Natural Hazard Science ◽

10.1093/acrefore/9780199389407.013.121 ◽

2017 ◽

Cited By ~ 1

Author(s):

Marian Muste ◽

Ton Hoitink

Keyword(s):

Real Time ◽

Water Resource Management ◽

Large Scale ◽

Water Cycle ◽

Ground Truth ◽

Flood Frequency ◽

Main Concern ◽

Essential Difference ◽

Discharge Data ◽

Ground Truth Data

With a continuous global increase in flood frequency and intensity, there is an immediate need for new science-based solutions for flood mitigation, resilience, and adaptation that can be quickly deployed in any flood-prone area. An integral part of these solutions is the availability of river discharge measurements delivered in real time with high spatiotemporal density and over large-scale areas. Stream stages and the associated discharges are the most perceivable variables of the water cycle and the ones that eventually determine the levels of hazard during floods. Consequently, the availability of discharge records (a.k.a. streamflows) is paramount for flood-risk management because they provide actionable information for organizing the activities before, during, and after floods, and they supply the data for planning and designing floodplain infrastructure. Moreover, the discharge records represent the ground-truth data for developing and continuously improving the accuracy of the hydrologic models used for forecasting streamflows. Acquiring discharge data for streams is critically important not only for flood forecasting and monitoring but also for many other practical uses, such as monitoring water abstractions for supporting decisions in various socioeconomic activities (from agriculture to industry, transportation, and recreation) and for ensuring healthy ecological flows. All these activities require knowledge of past, current, and future flows in rivers and streams. Given its importance, an ability to measure the flow in channels has preoccupied water users for millennia. Starting with the simplest volumetric methods to estimate flows, the measurement of discharge has evolved through continued innovation to sophisticated methods so that today we can continuously acquire and communicate the data in real time. There is no essential difference between the instruments and methods used to acquire streamflow data during normal conditions versus during floods. The measurements during floods are, however, complex, hazardous, and of limited accuracy compared with those acquired during normal flows. The essential differences in the configuration and operation of the instruments and methods for discharge estimation stem from the type of measurements they acquire—that is, discrete and autonomous measurements (i.e., measurements that can be taken any time any place) and those acquired continuously (i.e., estimates based on indirect methods developed for fixed locations). Regardless of the measurement situation and approach, the main concern of the data providers for flooding (as well as for other areas of water resource management) is the timely delivery of accurate discharge data at flood-prone locations across river basins.

Download Full-text

Comparative Study of Data-driven Solar Coronal Field Models Using a Flux Emergence Simulation as a Ground-truth Data Set

The Astrophysical Journal ◽

10.3847/1538-4357/ab6b1f ◽

2020 ◽

Vol 890 (2) ◽

pp. 103 ◽

Cited By ~ 5

Author(s):

Shin Toriumi ◽

Shinsuke Takasao ◽

Mark C. M. Cheung ◽

Chaowei Jiang ◽

Yang Guo ◽

...

Keyword(s):

Comparative Study ◽

Ground Truth ◽

Data Driven ◽

Flux Emergence ◽

Data Set ◽

Ground Truth Data ◽

Coronal Field ◽

Solar Coronal

Download Full-text

Assessing the tropical forest cover change in northern parts of Sonitpur and Udalguri District of Assam, India

Scientific Reports ◽

10.1038/s41598-021-90595-8 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Ranjit Mahato ◽

Gibji Nimasow ◽

Oyi Dai Nimasow ◽

Dhoni Bushi

Keyword(s):

Protected Areas ◽

National Park ◽

Large Scale ◽

Forest Cover ◽

Ground Truth ◽

Forest Cover Change ◽

Wildlife Sanctuary ◽

Remotely Sensed Data ◽

Ground Truth Data ◽

Increasing Trends

AbstractSonitpur and Udalguri district of Assam possess rich tropical forests with equally important faunal species. The Nameri National Park, Sonai-Rupai Wildlife Sanctuary, and other Reserved Forests are areas of attraction for tourists and wildlife lovers. However, these protected areas are reportedly facing the problem of encroachment and large-scale deforestation. Therefore, this study attempts to estimate the forest cover change in the area through integrating the remotely sensed data of 1990, 2000, 2010, and 2020 with the Geographic Information System. The Maximum Likelihood algorithm-based supervised classification shows acceptable agreement between the classified image and the ground truth data with an overall accuracy of about 96% and a Kappa coefficient of 0.95. The results reveal a forest cover loss of 7.47% from 1990 to 2000 and 7.11% from 2000 to 2010. However, there was a slight gain of 2.34% in forest cover from 2010 to 2020. The net change of forest to non-forest was 195.17 km2 in the last forty years. The forest transition map shows a declining trend of forest remained forest till 2010 and a slight increase after that. There was a considerable decline in the forest to non-forest (11.94% to 3.50%) from 2000–2010 to 2010–2020. Further, a perceptible gain was also observed in the non-forest to the forest during the last four decades. The overlay analysis of forest cover maps show an area of 460.76 km2 (28.89%) as forest (unchanged), 764.21 km2 (47.91%) as non-forest (unchanged), 282.67 km2 (17.72%) as deforestation and 87.50 km2 (5.48%) as afforestation. The study found hotspots of deforestation in the closest areas of National Park, Wildlife Sanctuary, and Reserved Forests due to encroachments for human habitation, agriculture, and timber/fuelwood extractions. Therefore, the study suggests an early declaration of these protected areas as Eco-Sensitive Zone to control the increasing trends of deforestation.

Download Full-text

CellSense

Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies ◽

10.1145/3478087 ◽

2021 ◽

Vol 5 (3) ◽

pp. 1-22

Author(s):

Zhihan Fang ◽

Yu Yang ◽

Guang Yang ◽

Yikuan Xian ◽

Fan Zhang ◽

...

Keyword(s):

Cellular Network ◽

Large Scale ◽

Human Mobility ◽

Ground Truth ◽

Mobility Patterns ◽

Billing Data ◽

Ground Truth Data ◽

Mobility Modeling ◽

High Penetration ◽

Recovery System

Data from the cellular network have been proved as one of the most promising way to understand large-scale human mobility for various ubiquitous computing applications due to the high penetration of cellphones and low collection cost. Existing mobility models driven by cellular network data suffer from sparse spatial-temporal observations because user locations are recorded with cellphone activities, e.g., calls, text, or internet access. In this paper, we design a human mobility recovery system called CellSense to take the sparse cellular billing data (CBR) as input and outputs dense continuous records to recover the sensing gap when using cellular networks as sensing systems to sense the human mobility. There is limited work on this kind of recovery systems at large scale because even though it is straightforward to design a recovery system based on regression models, it is very challenging to evaluate these models at large scale due to the lack of the ground truth data. In this paper, we explore a new opportunity based on the upgrade of cellular infrastructures to obtain cellular network signaling data as the ground truth data, which log the interaction between cellphones and cellular towers at signal levels (e.g., attaching, detaching, paging) even without billable activities. Based on the signaling data, we design a system CellSense for human mobility recovery by integrating collective mobility patterns with individual mobility modeling, which achieves the 35.3% improvement over the state-of-the-art models. The key application of our recovery model is to take regular sparse CBR data that a researcher already has, and to recover the missing data due to sensing gaps of CBR data to produce a dense cellular data for them to train a machine learning model for their use cases, e.g., next location prediction.

Download Full-text

Resolution and Accuracy of an Airborne Scanning Laser System for Beach Surveys

Journal of Atmospheric and Oceanic Technology ◽

10.1175/jtech-d-12-00174.1 ◽

2013 ◽

Vol 30 (10) ◽

pp. 2452-2464 ◽

Cited By ~ 12

Author(s):

J. H. Middleton ◽

C. G. Cooke ◽

E. T. Kearney ◽

P. J. Mumford ◽

M. A. Mole ◽

...

Keyword(s):

Large Scale ◽

Laser System ◽

Ground Truth ◽

Satellite System ◽

Airborne Lidar ◽

Scanning Laser ◽

Lidar Data ◽

Ground Truth Data ◽

Short Period ◽

Airborne Lidar Data

Abstract Airborne scanning laser technology provides an effective method to systematically survey surface topography and changes in that topography with time. In this paper, the authors describe the capability of a rapid-response lidar system in which airborne observations are utilized to describe results from a set of surveys of Narrabeen–Collaroy Beach, Sydney, New South Wales, Australia, over a short period of time during which significant erosion and deposition of the subaerial beach occurred. The airborne lidar data were obtained using a Riegl Q240i lidar coupled with a NovAtel SPAN-CPT integrated Global Navigation Satellite System (GNSS) and inertial unit and flown at various altitudes. A set of the airborne lidar data is compared with ground-truth data acquired from the beach using a GNSS/real-time kinematic (RTK) system mounted on an all-terrain vehicle. The comparison shows consistency between systems, with the airborne lidar data being less than 0.02 m different from the ground-truth data when four surveys are undertaken, provided a method of removing outliers—developed here and designated as “weaving”—is used. The combination of airborne lidar data with ground-truth data provides an excellent method of obtaining high-quality topographic data. Using the results from this analysis, it is shown that airborne lidar data alone produce results that can be used for ongoing large-scale surveys of beaches with reliable accuracy, and that the enhanced accuracy resulting from multiple airborne surveys can be assessed quantitatively.

Download Full-text