scholarly journals Surprisingly Popular Voting Recovers Rankings, Surprisingly!

Author(s):  
Hadi Hosseini ◽  
Debmalya Mandal ◽  
Nisarg Shah ◽  
Kevin Shi

The wisdom of the crowd has long become the de facto approach for eliciting information from individuals or experts in order to predict the ground truth. However, classical democratic approaches for aggregating individual \emph{votes} only work when the opinion of the majority of the crowd is relatively accurate. A clever recent approach, \emph{surprisingly popular voting}, elicits additional information from the individuals, namely their \emph{prediction} of other individuals' votes, and provably recovers the ground truth even when experts are in minority. This approach works well when the goal is to pick the correct option from a small list, but when the goal is to recover a true ranking of the alternatives, a direct application of the approach requires eliciting too much information. We explore practical techniques for extending the surprisingly popular algorithm to ranked voting by partial votes and predictions and designing robust aggregation rules. We experimentally demonstrate that even a little prediction information helps surprisingly popular voting outperform classical approaches.


Geosciences ◽  
2018 ◽  
Vol 8 (12) ◽  
pp. 455 ◽  
Author(s):  
Timo Gaida ◽  
Tengku Tengku Ali ◽  
Mirjam Snellen ◽  
Alireza Amiri-Simkooei ◽  
Thaiënne van Dijk ◽  
...  

Multi-frequency backscatter data collected from multibeam echosounders (MBESs) is increasingly becoming available. The ability to collect data at multiple frequencies at the same time is expected to allow for better discrimination between seabed sediments. We propose an extension of the Bayesian method for seabed classification to multi-frequency backscatter. By combining the information retrieved at single frequencies we produce a multispectral acoustic classification map, which allows us to distinguish more seabed environments. In this study we use three triple-frequency (100, 200, and 400 kHz) backscatter datasets acquired with an R2Sonic 2026 in the Bedford Basin, Canada in 2016 and 2017 and in the Patricia Bay, Canada in 2016. The results are threefold: (1) combining 100 and 400 kHz, in general, reveals the most additional information about the seabed; (2) the use of multiple frequencies allows for a better acoustic discrimination of seabed sediments than single-frequency data; and (3) the optimal frequency selection for acoustic sediment classification depends on the local seabed. However, a quantification of the benefit using multiple frequencies cannot clearly be determined based on the existing ground-truth data. Still, a qualitative comparison and a geological interpretation indicate an improved discrimination between different seabed environments using multi-frequency backscatter.



1987 ◽  
Vol 9 ◽  
pp. 253
Author(s):  
N. Young ◽  
I. Goodwin

Ground surveys of the ice sheet in Wilkes Land, Antarctica, have been made on oversnow traverses operating out of Casey. Data collected include surface elevation, accumulation rate, snow temperature, and physical characteristics of the snow cover. By the nature of the surveys, the data are mostly restricted to line profiles. In some regions, aerial surveys of surface topography have been made over a grid network. Satellite imagery and remote sensing are two means of extrapolating the results from measurements along lines to an areal presentation. They are also the only source of data over large areas of the continent. Landsat images in the visible and near infra-red wavelengths clearly depict many of the large- and small scale features of the surface. The intensity of the reflected radiation varies with the aspect and magnitude of the surface slope to reveal the surface topography. The multi-channel nature of the Landsat data is exploited to distinguish between different surface types through their different spectral signatures, e.g. bare ice, glaze, snow, etc. Additional information on surface type can be gained at a coarser scale from other satellite-borne sensors such as ESMR, SMMR, etc. Textural enhancement of the Landsat images reveals the surface micro-relief. Features in the enhanced images are compared to ground-truth data from the traverse surveys to produce a classification of surface types across the images and to determine the magnitude of the surface topography and micro-relief observed. The images can then be used to monitor changes over time.



2015 ◽  
Vol 54 (9) ◽  
pp. 1861-1870 ◽  
Author(s):  
Jeffrey C. Snyder ◽  
Alexander V. Ryzhkov

AbstractAlthough radial velocity data from Doppler radars can partially resolve some tornadoes, particularly large tornadoes near the radar, most tornadoes are not explicitly resolved by radar owing to inadequate spatiotemporal resolution. In addition, it can be difficult to determine which mesocyclones typically observed on radar are associated with tornadoes. Since debris lofted by tornadoes has scattering characteristics that are distinct from those of hydrometeors, the additional information provided by polarimetric weather radars can aid in identifying debris from tornadoes; the polarimetric tornadic debris signature (TDS) provides what is nearly “ground truth” that a tornado is ongoing (or has recently occurred). This paper outlines a modification to the hydrometeor classification algorithm used with the operational Weather Surveillance Radar-1988 Doppler (WSR-88D) network in the United States to include a TDS category. Examples of automated TDS classification are provided for several recent cases that were observed in the United States.



Metabolites ◽  
2020 ◽  
Vol 10 (5) ◽  
pp. 183
Author(s):  
Ramtin Hosseini ◽  
Neda Hassanpour ◽  
Li-Ping Liu ◽  
Soha Hassoun

Motivation: Untargeted metabolomics comprehensively characterizes small molecules and elucidates activities of biochemical pathways within a biological sample. Despite computational advances, interpreting collected measurements and determining their biological role remains a challenge. Results: To interpret measurements, we present an inference-based approach, termed Probabilistic modeling for Untargeted Metabolomics Analysis (PUMA). Our approach captures metabolomics measurements and the biological network for the biological sample under study in a generative model and uses stochastic sampling to compute posterior probability distributions. PUMA predicts the likelihood of pathways being active, and then derives probabilistic annotations, which assign chemical identities to measurements. Unlike prior pathway analysis tools that analyze differentially active pathways, PUMA defines a pathway as active if the likelihood that the path generated the observed measurements is above a particular (user-defined) threshold. Due to the lack of “ground truth” metabolomics datasets, where all measurements are annotated and pathway activities are known, PUMA is validated on synthetic datasets that are designed to mimic cellular processes. PUMA, on average, outperforms pathway enrichment analysis by 8%. PUMA is applied to two case studies. PUMA suggests many biological meaningful pathways as active. Annotation results were in agreement to those obtained using other tools that utilize additional information in the form of spectral signatures. Importantly, PUMA annotates many measurements, suggesting 23 chemical identities for metabolites that were previously only identified as isomers, and a significant number of additional putative annotations over spectral database lookups. For an experimentally validated 50-compound dataset, annotations using PUMA yielded 0.833 precision and 0.676 recall.



2005 ◽  
Vol 17 (11) ◽  
pp. 2482-2507 ◽  
Author(s):  
Qi Zhao ◽  
David J. Miller

The goal of semisupervised clustering/mixture modeling is to learn the underlying groups comprising a given data set when there is also some form of instance-level supervision available, usually in the form of labels or pairwise sample constraints. Most prior work with constraints assumes the number of classes is known, with each learned cluster assumed to be a class and, hence, subject to the given class constraints. When the number of classes is unknown or when the one-cluster-per-class assumption is not valid, the use of constraints may actually be deleterious to learning the ground-truth data groups. We address this by (1) allowing allocation of multiple mixture components to individual classes and (2) estimating both the number of components and the number of classes. We also address new class discovery, with components void of constraints treated as putative unknown classes. For both real-world and synthetic data, our method is shown to accurately estimate the number of classes and to give favorable comparison with the recent approach of Shental, Bar-Hillel, Hertz, and Weinshall (2003).



2018 ◽  
Author(s):  
Charlie Beirnaert ◽  
Laura Peeters ◽  
Pieter Meysman ◽  
Wout Bittremieux ◽  
Kenn Foubert ◽  
...  

AbstractData analysis for metabolomics is undergoing rapid progress thanks to the proliferation of novel tools and the standardization of existing workflows. However, as datasets and experiments continue to increase in size and complexity, standardized workflows are often not sufficient. In addition, as the ground truth for metabolomics experiments is intrinsically unknown, there is no way to critically evaluate the performance of tools. Here, we investigate the problem of dynamic multi-class metabolomics experiments using a simulated dataset with a known ground truth and evaluate the performance of tinderesting, a new and intuitive tool based on gathering expert knowledge to be used in machine learning, and compare it to EDGE, a statistical method for sequence data. This paper presents three novel outcomes. First we present a way to simulate dynamic metabolomics data with a known ground truth based on ordinary differential equations. This method is made available through the MetaboLouise R package. Second, we show that the EDGE tool, originally developed for genomics data analysis, is highly performant in analyzing dynamic case vs control metabolomics data. Last, we introduce the tinderesting method to analyse more complex dynamic metabolomics experiments that performs on par with statistical methods. This tool consists of a Shiny app for collecting expert knowledge, which in turn is used to train a machine learning model to emulate the decision process of the expert. This approach does not replace traditional data analysis workflows for metabolomics, but can provide additional information, improved performance or easier interpretation of results. The advantage is that the tool is agnostic to the complexity of the experiment, and thus is easier to use in advanced setups. All code for the presented analysis, MetaboLouise and tinderesting are freely available.



2018 ◽  
Vol 1 ◽  
pp. 1-7
Author(s):  
Robert Hecht ◽  
Matthias Kalla ◽  
Tobias Krüger

Human settlements are mainly formed by buildings with their different characteristics and usage. Despite the importance of buildings for the economy and society, complete regional or even national figures of the entire building stock and its spatial distribution are still hardly available. Available digital topographic data sets created by National Mapping Agencies or mapped voluntarily through a crowd via Volunteered Geographic Information (VGI) platforms (e.g. OpenStreetMap) contain building footprint information but often lack additional information on building type, usage, age or number of floors. For this reason, predictive modeling is becoming increasingly important in this context. The capabilities of machine learning allow for the prediction of building types and other building characteristics and thus, the efficient classification and description of the entire building stock of cities and regions. However, such data-driven approaches always require a sufficient amount of ground truth (reference) information for training and validation. The collection of reference data is usually cost-intensive and time-consuming. Experiences from other disciplines have shown that crowdsourcing offers the possibility to support the process of obtaining ground truth data. Therefore, this paper presents the results of an experimental study aiming at assessing the accuracy of non-expert annotations on street view images collected from an internet crowd. The findings provide the basis for a future integration of a crowdsourcing component into the process of land use mapping, particularly the automatic building classification.



1987 ◽  
Vol 9 ◽  
pp. 253-253
Author(s):  
N. Young ◽  
I. Goodwin

Ground surveys of the ice sheet in Wilkes Land, Antarctica, have been made on oversnow traverses operating out of Casey. Data collected include surface elevation, accumulation rate, snow temperature, and physical characteristics of the snow cover. By the nature of the surveys, the data are mostly restricted to line profiles. In some regions, aerial surveys of surface topography have been made over a grid network.Satellite imagery and remote sensing are two means of extrapolating the results from measurements along lines to an areal presentation. They are also the only source of data over large areas of the continent. Landsat images in the visible and near infra-red wavelengths clearly depict many of the large- and small scale features of the surface. The intensity of the reflected radiation varies with the aspect and magnitude of the surface slope to reveal the surface topography. The multi-channel nature of the Landsat data is exploited to distinguish between different surface types through their different spectral signatures, e.g. bare ice, glaze, snow, etc. Additional information on surface type can be gained at a coarser scale from other satellite-borne sensors such as ESMR, SMMR, etc. Textural enhancement of the Landsat images reveals the surface micro-relief.Features in the enhanced images are compared to ground-truth data from the traverse surveys to produce a classification of surface types across the images and to determine the magnitude of the surface topography and micro-relief observed. The images can then be used to monitor changes over time.



2020 ◽  
Vol 12 (18) ◽  
pp. 2941
Author(s):  
Mikel Galar ◽  
Rubén Sesma ◽  
Christian Ayala ◽  
Lourdes Albizua ◽  
Carlos Aranda

Earth observation data is becoming more accessible and affordable thanks to the Copernicus programme and its Sentinel missions. Every location worldwide can be freely monitored approximately every 5 days using the multi-spectral images provided by Sentinel-2. The spatial resolution of these images for RGBN (RGB + Near-infrared) bands is 10 m, which is more than enough for many tasks but falls short for many others. For this reason, if their spatial resolution could be enhanced without additional costs, any posterior analyses based on these images would be benefited. Previous works have mainly focused on increasing the resolution of lower resolution bands of Sentinel-2 (20 m and 60 m) to 10 m resolution. In these cases, super-resolution is supported by bands captured at finer resolutions (RGBN at 10 m). On the contrary, this paper focuses on the problem of increasing the spatial resolution of 10 m bands to either 5 m or 2.5 m resolutions, without having additional information available. This problem is known as single-image super-resolution. For standard images, deep learning techniques have become the de facto standard to learn the mapping from lower to higher resolution images due to their learning capacity. However, super-resolution models learned for standard images do not work well with satellite images and hence, a specific model for this problem needs to be learned. The main challenge that this paper aims to solve is how to train a super-resolution model for Sentinel-2 images when no ground truth exists (Sentinel-2 images at 5 m or 2.5 m). Our proposal consists of using a reference satellite with a high similarity in terms of spectral bands with respect to Sentinel-2, but with higher spatial resolution, to create image pairs at both the source and target resolutions. This way, we can train a state-of-the-art Convolutional Neural Network to recover details not present in the original RGBN bands. An exhaustive experimental study is carried out to validate our proposal, including a comparison with the most extended strategy for super-resolving Sentinel-2, which consists in learning a model to super-resolve from an under-sampled version at either 40 m or 20 m to the original 10 m resolution and then, applying this model to super-resolve from 10 m to 5 m or 2.5 m. Finally, we will also show that the spectral radiometry of the native bands is maintained when super-resolving images, in such a way that they can be used for any subsequent processing as if they were images acquired by Sentinel-2.



Author(s):  
M. Galar ◽  
R. Sesma ◽  
C. Ayala ◽  
L. Albizua ◽  
C. Aranda

Abstract. Copernicus program via its Sentinel missions is making earth observation more accessible and affordable for everybody. Sentinel-2 images provide multi-spectral information every 5 days for each location. However, the maximum spatial resolution of its bands is 10m for RGB and near-infrared bands. Increasing the spatial resolution of Sentinel-2 images without additional costs, would make any posterior analysis more accurate. Most approaches on super-resolution for Sentinel-2 have focused on obtaining 10m resolution images for those at lower resolutions (20m and 60m), taking advantage of the information provided by bands of finer resolutions (10m). Otherwise, our focus is on increasing the resolution of the 10m bands, that is, super-resolving 10m bands to 2.5m resolution, where no additional information is available. This problem is known as single-image super-resolution and deep learning-based approaches have become the state-of-the-art for this problem on standard images. Obviously, models learned for standard images do not translate well to satellite images. Hence, the problem is how to train a deep learning model for super-resolving Sentinel-2 images when no ground truth exist (Sentinel-2 images at 2.5m). We propose a methodology for learning Convolutional Neural Networks for Sentinel-2 image super-resolution making use of images from other sensors having a high similarity with Sentinel-2 in terms of spectral bands, but greater spatial resolution. Our proposal is tested with a state-of-the-art neural network showing that it can be useful for learning to increase the spatial resolution of RGB and near-infrared bands of Sentinel-2.



Sign in / Sign up

Export Citation Format

Share Document