data record
Recently Published Documents


TOTAL DOCUMENTS

564
(FIVE YEARS 189)

H-INDEX

41
(FIVE YEARS 7)

2022 ◽  
Vol 14 (1) ◽  
pp. 1-27
Author(s):  
Khalid Belhajjame

Workflows have been adopted in several scientific fields as a tool for the specification and execution of scientific experiments. In addition to automating the execution of experiments, workflow systems often include capabilities to record provenance information, which contains, among other things, data records used and generated by the workflow as a whole but also by its component modules. It is widely recognized that provenance information can be useful for the interpretation, verification, and re-use of workflow results, justifying its sharing and publication among scientists. However, workflow execution in some branches of science can manipulate sensitive datasets that contain information about individuals. To address this problem, we investigate, in this article, the problem of anonymizing the provenance of workflows. In doing so, we consider a popular class of workflows in which component modules use and generate collections of data records as a result of their invocation, as opposed to a single data record. The solution we propose offers guarantees of confidentiality without compromising lineage information, which provides transparency as to the relationships between the data records used and generated by the workflow modules. We provide algorithmic solutions that show how the provenance of a single module and an entire workflow can be anonymized and present the results of experiments that we conducted for their evaluation.


2022 ◽  
Author(s):  
Gérard Ancellet ◽  
Sophie Godin-Beekmann ◽  
Herman G. J. Smit ◽  
Ryan M. Stauffer ◽  
Roeland Van Malderen ◽  
...  

Abstract. The Observatoire de Haute Provence (OHP) weekly Electrochemical Concentration Cell (ECC) ozonesonde data have been homogenized for the time period 1991–2020 according to the recommendations of the Ozonesonde Data Quality Assessment (O3S-DQA) panel. The assessment of the ECC homogenization benefit has been carried out using comparisons with ground based instruments also measuring ozone at the same station (lidar, surface measurements) and with collocated satellite observations of the O3 vertical profile by Microwave Limb Sounder (MLS). The major differences between uncorrected and homogenized ECC are related to a change of ozonesonde type in 1997, removal of the pressure dependency of the ECC background current and correction of internal ozonesonde temperature. The 3–4 ppbv positive bias between ECC and lidar in the troposphere is corrected with the homogenization. The ECC 30-years trends of the seasonally adjusted ozone concentrations are also significantly improved both in the troposphere and the stratosphere when the ECC concentrations are homogenized, as shown by the ECC/lidar or ECC/surface ozone trend comparisons. A −0.29 % per year negative trend of the normalization factor (NT) calculated using independent measurements of the total ozone column (TOC) at OHP disappears after homogenization of the ECC. There is however a remaining −5 % negative bias in the TOC which is likely related to an underestimate of the ECC concentrations in the stratosphere above 50 hPa as shown by direct comparison with the OHP lidar and MLS. The reason for this bias is still unclear, but a possible explanation might be related to freezing or evaporation of the sonde solution in the stratosphere. Both the comparisons with lidar and satellite observations suggest that homogenization increases the negative bias of the ECC up to 10 % above 28 km.


2021 ◽  
Vol 19 (43) ◽  
pp. 208-228
Author(s):  
Manuela Fetter Nicoletti ◽  
João Guilherme Barone Reis e Silva

Against the countless cancellations of cultural events around the globe, the dynamics of the film festival circuit and its representations took on new courses and different perspectives. About these symbolic power relations, the article dives into a brief data record, on the performance and possible adaptations upon the organization of film festivals, during the year 2020, and throughout the global pandemic, that exponentiated the digitization of some structuring processes on international cinematographic circulation. Ultimately, it adds notions of cultural diplomacy to international film festivals. In order to transpose theoretical concepts to contemporary practice and verify, in this way, the influences and consequences of virtualization to the subjectivities and significance of diplomacy and otherness upon interconnected identities in the current global community.


2021 ◽  
Author(s):  
Uwe Pfeifroth ◽  
Jaqueline Drücke ◽  
Jörg Trentmann ◽  
Rainer Hollmann

<p class="western"><span lang="en-US">The EUMETSAT Satellite Application Facility on Climate Monitoring (CM SAF) generates and distributes high quality long-term climate data records (CDR) of energy and water cycle parameters, which are freely available.</span></p> <p class="western"><span lang="en-US">In 2022, a new version of the “Surface Solar Radiation data set – Heliosat” will be released: SARAH-3. As the previous editions, the SARAH-3 climate data record is based on satellite observations from the first and second METEOSAT generations and provides various surface radiation parameters, including global radiation, direct radiation, sunshine duration, photosynthetic active radiation and others. SARAH-3 covers the time period 1983 to 2020 and offers 30-minute instantaneous data as well as daily and monthly means on a regular 0.05° x 0.05° lon/lat grid.</span></p> <p class="western" align="left"><span lang="en-US">In this presentation, an overview of the SARAH climate data record and their applications will be given. A focus will be on the SARAH-3 developments and validation with surface reference observations. Further, SARAH-3 will be used for a first analysis of the climate variability and potential trends of global radiation in Europe during the last decades. </span><span lang="en-US">The data record reveals that there is an increasing trend of surface solar radiation in Europe during the last decades, which is superimposed by decadal and regional variability.</span></p>


2021 ◽  
Author(s):  
Christian Borger ◽  
Steffen Beirle ◽  
Thomas Wagner

Abstract. We present a long-term data set of 1° × 1° monthly mean total column water vapour (TCWV) based on global measurements of the Ozone Monitoring Instrument (OMI) covering the time range from January 2005 to December 2020. In comparison to the retrieval algorithm of Borger et al. (2020) several modifications and filters have been applied accounting for instrumental issues (such as OMI's "row-anomaly") or the inferior quality of solar reference spectra. For instance, to overcome the problems of low quality reference spectra, the daily solar irradiance spectrum is replaced by an annually varying mean Earthshine radiance obtained in December over Antarctica. For the TCWV data set only measurements are taken into account for which the effective cloud fraction < 20 %, the AMF > 0.1, the ground pixel is snow- and ice-free, and the OMI row is not affected by the "row-anomaly" over the complete time range of the data set. The individual TCWV measurements are then gridded to a regular 1° × 1° lattice, from which the monthly means are calculated. In a comprehensive validation study we demonstrate that the OMI TCWV data set is in good agreement to reference data sets of ERA5, RSS SSM/I, and ESA CCI Water Vapour CDR-2: over ocean ordinary least squares (OLS) as well as orthogonal distance regressions (ODR) indicate slopes close to unity with very small offsets and high correlation coefficients of around 0.98. However, over land, distinctive positive deviations are obtained especially within the tropics with relative deviations of approximately +10 % likely caused by uncertainties in the retrieval input data (surface albedo, cloud information) due to frequent cloud contamination in these regions. Nevertheless, a temporal stability analysis proves that the OMI TCWV data set is consistent with the temporal changes of the reference data sets and shows no significant deviation trends. Since the TCWV retrieval can be easily applied to further satellite missions, additional TCWV data sets can be created from past missions such as GOME-1 or SCIAMACHY, which under consideration of systematic differences (e.g. due to different observation times) can be combined with the OMI TCWV data set in order to create a data record that would cover a time span from 1995 to the present. Moreover, the TCWV retrieval will also work for all missions dedicated to NO2 in future such as Sentinel-5 on MetOp-SG. The MPIC OMI total column water vapour (TCWV) climate data record is available at https://doi.org/10.5281/zenodo.5776718 (Borger et al., 2021b).


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Wenke Xiao ◽  
Lijia Jing ◽  
Yaxin Xu ◽  
Shichao Zheng ◽  
Yanxiong Gan ◽  
...  

The amount of medical text data is increasing dramatically. Medical text data record the progress of medicine and imply a large amount of medical knowledge. As a natural language, they are characterized by semistructured, high-dimensional, high data volume semantics and cannot participate in arithmetic operations. Therefore, how to extract useful knowledge or information from the total available data is very important task. Using various techniques of data mining can extract valuable knowledge or information from data. In the current study, we reviewed different approaches to apply for medical text data mining. The advantages and shortcomings for each technique compared to different processes of medical text data were analyzed. We also explored the applications of algorithms for providing insights to the users and enabling them to use the resources for the specific challenges in medical text data. Further, the main challenges in medical text data mining were discussed. Findings of this paper are benefit for helping the researchers to choose the reasonable techniques for mining medical text data and presenting the main challenges to them in medical text data mining.


2021 ◽  
Vol 35 (6) ◽  
pp. 926-942
Author(s):  
Ling Sun ◽  
Hong Qiu ◽  
Ronghua Wu ◽  
Jing Wang ◽  
Liyang Zhang ◽  
...  

2021 ◽  
Vol 2022 (1) ◽  
pp. 460-480
Author(s):  
Bogdan Kulynych ◽  
Mohammad Yaghini ◽  
Giovanni Cherubin ◽  
Michael Veale ◽  
Carmela Troncoso

Abstract A membership inference attack (MIA) against a machine-learning model enables an attacker to determine whether a given data record was part of the model’s training data or not. In this paper, we provide an in-depth study of the phenomenon of disparate vulnerability against MIAs: unequal success rate of MIAs against different population subgroups. We first establish necessary and sufficient conditions for MIAs to be prevented, both on average and for population subgroups, using a notion of distributional generalization. Second, we derive connections of disparate vulnerability to algorithmic fairness and to differential privacy. We show that fairness can only prevent disparate vulnerability against limited classes of adversaries. Differential privacy bounds disparate vulnerability but can significantly reduce the accuracy of the model. We show that estimating disparate vulnerability by naïvely applying existing attacks can lead to overestimation. We then establish which attacks are suitable for estimating disparate vulnerability, and provide a statistical framework for doing so reliably. We conduct experiments on synthetic and real-world data finding significant evidence of disparate vulnerability in realistic settings.


2021 ◽  
Vol 13 (22) ◽  
pp. 4622
Author(s):  
Wolfgang Wagner ◽  
Bernhard Bauer-Marschallinger ◽  
Claudio Navacchi ◽  
Felix Reuß ◽  
Senmao Cao ◽  
...  

The Sentinel-1 Synthetic Aperture Radar (SAR) satellites allow global monitoring of the Earth’s land surface with unprecedented spatio-temporal coverage. Yet, implementing large-scale monitoring capabilities is a challenging task given the large volume of data from Sentinel-1 and the complex algorithms needed to convert the SAR intensity data into higher-level geophysical data products. While on-demand processing solutions have been proposed to cope with the petabyte-scale data volumes, in practice many applications require preprocessed datacubes that permit fast access to multi-year time series and image stacks. To serve near-real-time as well as offline land monitoring applications, we have created a Sentinel-1 backscatter datacube for all continents (except Antarctica) that is constantly being updated and maintained to ensure consistency and completeness of the data record over time. In this technical note, we present the technical specifications of the datacube, means of access and analysis capabilities, and its use in scientific and operational applications.


Sign in / Sign up

Export Citation Format

Share Document