DWD Geoportal – Converging open data, metadata and documentation in a user-friendly way

Since 2017, Open Data has been a part of the DWD data distribution strategy. Starting with a small selection of meteorological products, the number of available datasets has grown continuously over the last years. Since the start, users can access datasets anonymously via the website https://opendata.dwd.de to download file-based meteorological products. Free access and the variety of products has been welcomed by the general public as well as private met service providers. The more datasets are provided in a directory structure, however, the more tedious it is to find and select among all available data. Also, metadata and documentation were available, but on separate public websites. This turned out to be an issue, especially for new users of DWD's open data.To help users explore the available datasets as well as to quickly decide on their suitability for a certain use case, the Open Data team at DWD is developing a geoportal. It enables free-text search along with combined access to data, metadata, and description along with interactive previews via OGC WMS.Cloud technology is a suitable way forward for hosting the geoportal along with the data in its operational state. Benefits are expected for the easy integration of rich APIs with the geoportal, and the flexible and fast deployment and scaling of optional or prototypical services such as WMS-based previews. Flexibility is also mandatory to respond to fluctuating user demands, depending on time of day and critical weather situations, which is supported by containerization. The growing overall volume of meteorological data at DWD may mandate to allow customers to bring their code to the data&#160;&#8211; for on-demand processing including slicing and interpolation &#8211;&#160; instead of transferring files to every customer. Shared cloud instances are the ideal interface for this purpose.The contribution will outline a protoype version of the new geoportal and discuss further steps for launching it to the public.

Download Full-text

RESOURCES PROVIDING DATA FOR MACHINE LEARNING AND TESTING ARTIFICIAL INTELLIGENCE TECHNOLOGIES

Информационные и математические технологии в науке и управлении ◽

10.38028/esi.2021.22.2.004 ◽

2021 ◽

pp. 39-52

Author(s):

Денис Валерьевич Сикулер

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

World Bank ◽

Open Data ◽

Data Access ◽

Free Access ◽

Internet Resources ◽

Data Registry ◽

Data Portal ◽

Access To Data

В статье выполнен обзор 10 ресурсов сети Интернет, позволяющих подобрать данные для разнообразных задач, связанных с машинным обучением и искусственным интеллектом. Рассмотрены как широко известные сайты (например, Kaggle, Registry of Open Data on AWS), так и менее популярные или узкоспециализированные ресурсы (к примеру, The Big Bad NLP Database, Common Crawl). Все ресурсы предоставляют бесплатный доступ к данным, в большинстве случаев для этого даже не требуется регистрация. Для каждого ресурса указаны характеристики и особенности, касающиеся поиска и получения наборов данных. В работе представлены следующие сайты: Kaggle, Google Research, Microsoft Research Open Data, Registry of Open Data on AWS, Harvard Dataverse Repository, Zenodo, Портал открытых данных Российской Федерации, World Bank, The Big Bad NLP Database, Common Crawl. The work presents review of 10 Internet resources that can be used to find data for different tasks related to machine learning and artificial intelligence. There were examined some popular sites (like Kaggle, Registry of Open Data on AWS) and some less known and specific ones (like The Big Bad NLP Database, Common Crawl). All included resources provide free access to data. Moreover in most cases registration is not needed for data access. Main features are specified for every examined resource, including regarding data search and access. The following sites are included in the review: Kaggle, Google Research, Microsoft Research Open Data, Registry of Open Data on AWS, Harvard Dataverse Repository, Zenodo, Open Data portal of the Russian Federation, World Bank, The Big Bad NLP Database, Common Crawl.

Download Full-text

Interactive access to climate data from Germany

10.5194/ems2021-496 ◽

2021 ◽

Author(s):

Frank Kratzenstein ◽

Frank Kaspar

Keyword(s):

Meteorological Data ◽

Free Access ◽

Climate Data ◽

Oracle Database ◽

Phenological Data ◽

Observation Systems ◽

Long Time ◽

Data Portal ◽

Set Up ◽

User Friendly

In recent years, the DWD has significantly expanded free access to its climate observations. A first step was a simple FTP site with the possibility to download archives with different data categories, e.g. national and international station-based meteorological data, derived parameters, gridded products, and special categories like phenological data. The data are based on the DWD's observation systems for Germany as well as on the DWD's international activities.Based on the consistent implementation of OGC standards, an interactive and user-friendly access to the data has been created with the development of the DWD climate portal.In addition to browsing, previewing, running basic analysis and downloading the data, the available OGC services enable users to set up their own services on the DWD data. Along with the free and extended access to the data and services, the users' demands on the availability, quality, and detail of the metadata also increased significantly. Maintaining and linking metadata to the opendata and services remains a challenge. However, INSPIRE and WIGOS are paving the way to a unified solution and overcoming the problems.Another challenging requirement was to provide interactive access to long time series from gridded products to the users. To accomplish this, we have moved away from a previously file-based approach to storing the raster data as a georaster in an Oracle database. This design allows us a combined analysis of raster and station data not only in the climate data portal but also in the central climate database.The presentation will provide a technical and functional overview of the DWD climate data portal.

Download Full-text

Unintended Behavioural Consequences of Publishing Performance Data: Is More Always Better?

The Journal of Community Informatics ◽

10.15353/joci.v8i2.3041 ◽

2012 ◽

Vol 8 (2) ◽

Author(s):

Simon McGinnes ◽

Kasturi Muthu Elandy

Keyword(s):

Service Providers ◽

Open Data ◽

Performance Data ◽

Freedom Of Information ◽

Cumulative Impact ◽

League Tables ◽

Public Consumption ◽

Data Provision ◽

The Media ◽

Access To Data

This paper explores the proposition that IT-driven provision of open data may have unanticipated consequences. Transparency is normally considered desirable: knowledge is “power”, the “oxygen of democracy” and so on. Accordingly there has been a trend towards greater freedom of information, with citizens given access to an increasing diversity of datasets. For many years, governments have produced one particular type of data specifically for public consumption: performance data, such as hospital waiting list statistics, figures on crime, and school performance league tables. Having more information is usually considered beneficial, particularly when there is little available. But when the information supply becomes plentiful, it is not clear that benefits continue to accrue in a simple way. Some apparently negative repercussions are being observed from the publication of performance data. For example, in education the use of league tables seems unable to correct performance problems in some schools, and may even depress performance. Similar effects are observed in other spheres. Data reporting a decreasing threat of crime may be linked with a widespread sense of heightened danger. In the private sector, publication of CEO salaries seems to have fuelled rampant salary inflation. These effects are to do with the cumulative impact of individual behaviours when people respond en masse to information. Individuals react according to their environment, which includes data, creating a complex system with potentially unpredictable and non-intuitive behaviour. We may hope that increased access to data will create net benefits, but evidence suggests that we cannot assume this will always be true. This paper reviews the results of research into this phenomenon from theoretical and empirical perspectives. Results indicate that the publication of performance data can affect the behaviour of service providers, the media, and service consumers, and that the effects are heavily situation-dependent and by no means universally benign. An agenda for further research is outlined, which may help to guide the formulation of policies regarding the publication of government performance data in particular and open data provision in general.

Download Full-text

The BHL-Plazi Partnership: Getting data from the 1800s directly into 21st century, reused digital accessible knowledge

Biodiversity Information Science and Standards ◽

10.3897/biss.5.75604 ◽

2021 ◽

Vol 5 ◽

Author(s):

Diego Alvares ◽

Marcus Guidoti ◽

Felipe Simoes ◽

Carolina Sokolowicz ◽

Donat Agosti

Keyword(s):

Data Mining ◽

Service Providers ◽

Nuclear Research ◽

Free Access ◽

Digital Repository ◽

Global Biodiversity Information Facility ◽

European Organization ◽

Governmental Organization ◽

Access To Data ◽

Taxonomic Groups

Plazi is a Swiss non-governmental organization dedicated to the liberation of data imprisoned in flat, dead-end formats such as PDFs. In the process, the data therein is annotated and exported in various formats, following field-specific standards, facilitating free access and reutilization by several other service providers and end-users. This data mining and enhancement process allows for the rediscovery of the known biodiversity since the knowledge on known taxa is published into an ever-growing corpus of papers, chapters and books, inaccessible to the state-of-the-art service providers, such as Global Biodiversity Information Facility (GBIF). The data liberated by Plazi focuses on taxonomic treatments, which carry the unit of knowledge about a taxon concept in a given publication and can be considered the building block of taxonomic science. Although these extracted taxonomic treatments can be found in Plazi’s TreatmentBank and Biodiversity Literature Repository (BLR), hosted in the European Organization for Nuclear Research (CERN) digital repository Zenodo, data included in treatments (e.g., material citations and treatment citations) can also be found in other applications as well, such as Plazi’s Synospecies, Zenodeo, and GBIF. Plazi’s efforts result in more Findable, Accessible, Interoperable, and Reusable (FAIR) biodiversity literature, improving, enhancing and enabling access to data included therein as digital accessible data, otherwise almost unreachable. The Biodiversity Heritage Library (BHL), on the other hand, provides a pivotal service by digitizing heritage literature and current literature for which BHL negotiates permission, and provides free access to otherwise inaccessible sources. In 2021, BHL and Plazi signed a Statement of Collaboration, aiming to combine the efforts of both institutions to contribute even further to FAIR-ifying biodiversity literature and data. In a collaborative demonstration project, we selected the earliest volumes and issues of the Revue Suisse de Zoologie in order to conduct a pilot study that combines the efforts of both BHL and Plazi. The corpus is composed of eight volumes (tomes), 24 issues (numbers) and 98 papers, including a total of over 5000 pages and 200 images. To process this material, BHL assigned CrossRef Digital Object Identifiers (DOI) to these already digitally accessible publications. Plazi created a template to be used in GoldenGate-Imagine, indicating key parameters used for tailored data mining of these articles, and customized to the journal’s graphic layout characteristics at that time. Then, we proceeded with quality control steps to provide fit-for-use data for BLR and GBIF by ensuring that the data was correctly annotated and eliminating potential data transit blockages at Plazi’s built-in data gatekeeper. The data was then subsequently reused by GBIF. Finally, we present here the summary of the obtained results, highlighting the number of key publication attributes aforementioned (pages, images), but also including a drill-down into the different taxonomic groups, countries and collections of origin of the studied material, and more. All the data is available via the Plazi statistics, the Biodiversity Literature Repository Website and community at Zenodo, the Zenodeo APIs and GBIF where the data is being reused.

Download Full-text

A multi-site, year-round turbulence microstructure atlas for the deep perialpine Lake Garda

Scientific Data ◽

10.1038/s41597-021-00965-0 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Sebastiano Piccolroaz ◽

Bieito Fernández-Castro ◽

Marco Toffolon ◽

Henk A. Dijkstra

Keyword(s):

Turbulent Mixing ◽

Meteorological Data ◽

Time Of Day ◽

Monthly Basis ◽

Lake Garda ◽

Standardized Protocol ◽

Wide Range ◽

Buoyancy Driven Convection ◽

Monitoring Activity ◽

Additional Monitoring

AbstractA multi-site, year-round dataset comprising a total of 606 high-resolution turbulence microstructure profiles of shear and temperature gradient in the upper 100 m depth is made available for Lake Garda (Italy). Concurrent meteorological data were measured from the fieldwork boat at the location of the turbulence measurements. During the fieldwork campaign (March 2017-June 2018), four different sites were sampled on a monthly basis, following a standardized protocol in terms of time-of-day and locations of the measurements. Additional monitoring activity included a 24-h campaign and sampling at other sites. Turbulence quantities were estimated, quality-checked, and merged with water quality and meteorological data to produce a unique turbulence atlas for a lake. The dataset is open to a wide range of possible applications, including research on the variability of turbulent mixing across seasons and sites (demersal vs pelagic zones) and driven by different factors (lake-valley breezes vs buoyancy-driven convection), validation of hydrodynamic lake models, as well as technical studies on the use of shear and temperature microstructure sensors.

Download Full-text

TWISTED POUCH SYNDROME - A DIFFICULT TO DIAGNOSE MECHANICAL COMPLICATION OF ILEAL POUCH-ANAL ANASTOMOSIS: OUTCOMES AFTER REDO POUCH PROCEDURES

Inflammatory Bowel Diseases ◽

10.1093/ibd/izaa347.041 ◽

2021 ◽

Vol 27 (Supplement_1) ◽

pp. S16-S17

Author(s):

Stefan Holubar ◽

Amy Lightner ◽

Taha Qazi ◽

Erica Savage ◽

Justin Ream ◽

...

Keyword(s):

Ileal Pouch ◽

Ileal Pouch Anal Anastomosis ◽

Free Text ◽

Mechanical Complication ◽

Rectal Pain ◽

Bowel Habits ◽

Additional Surgery ◽

Free Text Search ◽

Anal Anastomosis ◽

Redo Pouch

Abstract Background Ileal pouch-anal anastomosis (IPAA) is a technically demanding procedure. Intraoperatively, great care must be taken to assure a straight superior mesenteric axis. Rarely, twisted pouches are inadvertently constructed, resulting in deviations of expected pouch function, i.e. patients readily able to open their bowels on average 7x/24 hours without pain. Twisted pouches may result in symptoms classified as pouch dysfunction. Herein we describe our quaternary pouch referral center experience with twisted pouch syndrome (TPS). Methods We performed a retrospective review of our prospectively maintained pouch registry from 1995 – 2020. Patients were identified using free-text search of redo IPAA operative reports for variations of the term “twist”. We defined twisted pouch syndrome as intraoperative findings of twisting of the pouch as the primary pathology. Data are presented as frequency (proportion) or median (interquartile range). Results Over 25-years, we identified 29 patients with confirmed TPS who underwent a redo pouch procedure by 10 surgeons. Overall, 65% were female, median BMI 21.2 (19.5 – 26) kg/m2. The duration from the index IPAA to the redo procedure was 4 (2 – 8) years; all (100%) were referral cases constructed elsewhere. Original diagnoses included: ulcerative colitis (90%), FAP (10%), lack of interstitial cells of Cajal in 1 patient (10%). All patients presented with symptoms of pouch dysfunction including erratic bowel habits (96%) with urgency/frequency, abdominal/pelvic/rectal pain (92%), and obstructive symptoms (88%). Most had (75%) been treated for chronic pouchitis with antibiotics or biologics, and 46% had undergone 1 or more additional surgery. Prior to redo IPAA procedure patients underwent a thorough workup: 100% pouchoscopy, 96% GGE, 93% underwent EUA, 88% MRI, 73% manometry, and 42% defecography. TPS was diagnosed in 15% by pouchoscopy, in 10% by imaging, and in 75% was diagnosed intra-operatively at re-diversion (20%) or revision/redo IPAA (55%). In terms of surgical intervention, 85% were initially re-diverted. A total of 18 (62%) underwent pouch revision, and 10 (38%) required redo-IPAA. Short-term outcomes: LOS 7.5 (5 – 9) days, any complication 48%, readmission 11%, reoperation 3.4%, zero mortalities. After a median follow-up 50 (28 – 60) months, 2 never had loop ileostomy closure, 1 had pouch excision, and 1 a Kock pouch, yielding an overall pouch survival rate of 86%. Conclusions Twisted pouch syndrome presents with pouch dysfunction manifest by erratic bowel habits, unexplained pain, and obstructive (defecation) symptoms. This syndrome may also mimic chronic pouchitis. Despite a thorough workup which may suggest a mechanical problem, many patient may not be diagnosed until time of redo pouch surgery. Redo surgery for twisted pouch syndrome results in long-term pouch survival for the majority.

Download Full-text

An Architecture for Hybrid P2P Free-Text Search

Cooperative Information Agents XI - Lecture Notes in Computer Science ◽

10.1007/978-3-540-75119-9_5 ◽

2007 ◽

pp. 57-71 ◽

Cited By ~ 1

Author(s):

Avi Rosenfeld ◽

Claudia V. Goldman ◽

Gal A. Kaminka ◽

Sarit Kraus

Keyword(s):

Free Text ◽

Text Search ◽

Free Text Search ◽

Hybrid P2p

Download Full-text

Verification of impact-based operational weather warnings at ZAMG using real-time fire brigade and eye-witness data

10.5194/ems2021-503 ◽

2021 ◽

Author(s):

Georg Pistotnik ◽

Hannes Rieder ◽

Simon Hölzl ◽

Rainer Kaltenberger ◽

Thomas Krennert ◽

...

Keyword(s):

Real Time ◽

General Public ◽

Meteorological Data ◽

State Level ◽

Safety Net ◽

Federal State ◽

Free Text ◽

Civil Protection ◽

Fire Brigade ◽

Primary Basis

Development, verification and feedback of impact-based weather warnings require novel data and methods. Unlike meteorological data, impact information is often qualitative and subjective, and therefore needs some sort of quantification and objectivation. It is also inherently incomplete: an absence of reporting does not automatically imply an absence of impacts. The reconciliation of impact information with conventional meteorological data demands a paradigm change. We designed and implemented a verification scheme around a backbone of weather-related fire brigade operations and eye-witness reports at ZAMG, the national meteorological service of Austria. Meteorological stations, radar and derived gridded data are conceptualized as a backstop to mitigate impact voids (possibly arising from a lack of vulnerability, exposure or simply a lack of reporting), but are not the primary basis anymore. Operation data from fire brigade units across Austria are stored at civil protection authorities at federal state level and copied to ZAMG servers in real-time. Their crucial information is condensed into a few components: time, place, a keyword (from a predefined list of operations) and an optional free text field. This compact information is cross-checked with meteorological data to single out weather-related operations, which are then assigned to event types (rain, wind, snow, ice, or thunderstorm) and categorized into three different intensity levels (&#8222;remarkable&#8221;, &#8222;severe&#8221; and &#8222;extreme&#8221;) according to an elaborated criteria catalogue. This quality management and refinement is performed in a three-stage procedure to utilize the dataset for different time scales and applications: &#61623; &#8222;First guess&#8221; based on automatic filtering: available in real-time and used for an immediate adjustment of active warnings, if necessary; &#61623; &#8222;Educated guess&#8221; based on a semi-manual plausibility check: timely available (ideally within a day) and used for an evaluation of latest warnings (including possible implications for follow-up warnings); &#61623; Final classification based on a thorough manual quality control: available some days to weeks later and used for objective verification. Eye-witnesses can report weather events and their impacts in real-time via a reporting app implemented at ZAMG (wettermelden.at). Reports from different sources and trustworthiness are funneled into a standardized API. Observations from the general public are treated like a &#8222;first guess&#8221;, those from trained observers like an &#8222;educated guess&#8221;, and are merged with the refined fire brigade data at the corresponding stages. The weather event types are synchronized with our warning parameters to allow an objective verification of impact-based warnings. We illustrate our measures to convert these point-wise impact data into spatial impact information, to circumvent artifacts due to varying population density and to include the &#8220;safety net&#8221; of conventional meteorological data. Yellow, orange and red warnings are thereby translated into probabilities for certain scenarios, which are meaningful and intuitive for the general public and for civil protection authorities.

Download Full-text

Quo Vadis Open data?

Masaryk University Journal of Law and Technology ◽

10.5817/mujlt2018-2-4 ◽

2018 ◽

Vol 12 (2) ◽

pp. 179-220

Author(s):

Jozef Andraško ◽

Matúš Mesarčík

Keyword(s):

Public Sector ◽

Data Protection ◽

New Technologies ◽

Open Data ◽

Legal Order ◽

Free Access ◽

Access To Information ◽

European Union Law ◽

Public Sector Information ◽

Quo Vadis

New technologies have irreversibly changed the nature of the traditional way of exercising the right to free access to information. In the current information society, the information available to public authorities is not just a tool for controlling the public administration and increasing its transparency. Information has become an asset that individuals and legal entities also seek to use for business purposes. PSI particularly in form of open data create new opportunities for developing and improving the performance of public administration.In that regard, authors analyze the term open data and its legal framework from the perspective of European Union law, Slovak legal order and Czech legal order. Furthermore, authors focus is on the relation between open data regime, public sector information re-use regime and free access to information regime.New data protection regime represented by General Data Protection Regulation poses several challenges when it comes to processing of public sector information in form of open data. The article highlights the most important challenges of new regime being compliance with purpose specification, selection of legal ground and other important issues.

Download Full-text

ЕЛЕМЕНТИ ІНТЕРФЕЙСУ НАУКОВО-ДОСЛІДНОГО WEB-РЕСУРСУ PHYSIONET ТА ІМПОРТ ДАНИХ У СИСТЕМУ КОМП'ЮТЕРНОЇ МАТЕМАТИКИ MAPLE 17

Medical Informatics and Engineering ◽

10.11603/mie.1996-1960.2015.3.5008 ◽

2015 ◽

Author(s):

G. P. Chuiko ◽

I. O. Shyian ◽

D. A. Galyak

Keyword(s):

Free Access ◽

Digital Signals ◽

Interface Elements ◽

Medical Databases ◽

Physiological Signal ◽

Web Resource ◽

Interactive Display ◽

Physiological Signal Processing ◽

Data Files ◽

User Friendly

Since 1999, PhysioNet (http://physionet.org/) has offered free access via the web to large collections of recorded physiologic signals and medical databases as well as associated open-source software. The intention of this scientific resource is to stimulate current research and new investigations in the study of cardiovascular and other complex biomedical signals. PhysioBank archives include today the records obtained from healthy individuals and from patients with different diagnoses obtained under various conditions. It includes sudden cardiac death, congestive heart failure, neurological disorders, epilepsy and many others. Software packages PhysioToolkit is valuable for physiological signal processing and analysis, for creation of new databases, the interactive display and characterization of signals, the simulation of physiological and other signals. Nonetheless, a researcher should have skills to work with the operating system Unix and be knowledgeable in special commands to successful use software PhysioToolkit. Therefore, it makes sense to convert the necessary signals to a user-friendly computer algebra system. This paper describes interface elements of scientific web-resource PhysioNet, the simple methods of converting from binary medical data files to the text format and import of received digital signals into computer mathematics system Maple 17.

Download Full-text