Earthquake loss alerts to save victims

<p>Large earthquakes are unavoidable because globally the plate motions accumulate stress, which leads to ruptures of the crustal rocks hundreds of kilometers long. In developed areas, this brings buildings to collapse, which injures and kills occupants. Potential rescuers are never well informed about the extent of an earthquake disaster because communication along the rupture is interrupted. We have documented that the underestimate of fatality numbers lasts for at least the crucial first few days, often for weeks. For earthquakes that cause thousands of casualties, the extent of underestimation is usually an order of magnitude. To reduce this uncertainty of whether help is required and how much, we have assembled a data set and constructed algorithms to estimate the number of fatalities and injured within&#160; an hour of any earthquake worldwide in the computer tool QLARM. Our estimates of the population and the makeup of the built environment comes from government and internet sources. For large earthquakes, the hypocenter and magnitude is calculated and distributed by the GEOFON group at the Geoforschungszentrum (GFZ) in Potsdam, Germany and the Geological Survey (USGS) in Golden, USA within 6 to 10 minutes. Based on this information, the QLARM operator responds with an estimate of the number of casualties within 30 minutes of the earthquake, on average. These estimates are available to anyone by email alerts without charge. Since 2003, the QLARM operator has issued more than 1,000 casualty alerts at any time of the day pro bono. The USGS delivers a similar service called PAGER, which is based on different data sets and algorithms. The two loss estimates are usually close, which should give governments and news organizations confidence that these alerts are to be taken seriously. The QLARM research group also publishes research results, estimating the likely numbers of future casualties in repeats of historical large earthquakes. In such efforts the QLARM group has discovered that, contrary to the general assumption, the rural population suffers more by an order of magnitude under very large earthquakes than the urban population. It is also clear that the poorer segment of the population in cities and countryside suffer more than the affluent members of society because the former&#8217;s houses are weaker and collapse more readily. To be even more useful, a worldwide data set of hospitals and schools is needed in order to provide first responders with locations and likely damage to these critical facilities. Crucially, reliable school location data would enable first responders to focus rescue efforts on&#160;schoolchildren who die beneath the rubble of their schools in the hundreds&#160;to thousands in large&#160;earthquakes. Unfortunately, such data are not available from official sources in most developing countries, and we are not aware of good alternatives. The data on schools in open data platforms such as OpenStreetMap is sporadic. UNICEF runs a global school mapping initiative, but we have been unable to obtain their assistance to date.</p>

Download Full-text

Characterising RDF data sets

Journal of Information Science ◽

10.1177/0165551516677945 ◽

2017 ◽

Vol 44 (2) ◽

pp. 203-229 ◽

Cited By ~ 6

Author(s):

Javier D Fernández ◽

Miguel A Martínez-Prieto ◽

Pablo de la Fuente Redondo ◽

Claudio Gutiérrez

Keyword(s):

Data Structures ◽

Large Scale ◽

Open Data ◽

Structural Features ◽

Data Sets ◽

Data Set ◽

Wide Range ◽

Rdf Data ◽

Description Framework ◽

Resource Description

The publication of semantic web data, commonly represented in Resource Description Framework (RDF), has experienced outstanding growth over the last few years. Data from all fields of knowledge are shared publicly and interconnected in active initiatives such as Linked Open Data. However, despite the increasing availability of applications managing large-scale RDF information such as RDF stores and reasoning tools, little attention has been given to the structural features emerging in real-world RDF data. Our work addresses this issue by proposing specific metrics to characterise RDF data. We specifically focus on revealing the redundancy of each data set, as well as common structural patterns. We evaluate the proposed metrics on several data sets, which cover a wide range of designs and models. Our findings provide a basis for more efficient RDF data structures, indexes and compressors.

Download Full-text

Sharing Open Data in Agriculture

Advances in Library and Information Science - Open Access Implications for Sustainable Social, Political, and Economic Development ◽

10.4018/978-1-7998-5018-2.ch013 ◽

2021 ◽

pp. 244-266

Author(s):

Liah Shonhe

Keyword(s):

Agricultural Sector ◽

Open Data ◽

Research Data ◽

Data Sets ◽

Research Activity ◽

African Countries ◽

Data Set ◽

Data Repositories ◽

Bibliographic Data ◽

Prolific Authors

The main focus of the study was to explore the practices of open data sharing in the agricultural sector, including establishing the research outputs concerning open data in agriculture. The study adopted a desktop research methodology based on literature review and bibliographic data from WoS database. Bibliometric indicators discussed include yearly productivity, most prolific authors, and enhanced countries. Study findings revealed that research activity in the field of agriculture and open access is very low. There were 36 OA articles and only 6 publications had an open data badge. Most researchers do not yet embrace the need to openly publish their data set despite the availability of numerous open data repositories. Unfortunately, most African countries are still lagging behind in management of agricultural open data. The study therefore recommends that researchers should publish their research data sets as OA. African countries need to put more efforts in establishing open data repositories and implementing the necessary policies to facilitate OA.

Download Full-text

Open-Source Data Collection and Data Sets for Activity Recognition in Smart Homes

Sensors ◽

10.3390/s20030879 ◽

2020 ◽

Vol 20 (3) ◽

pp. 879 ◽

Cited By ~ 2

Author(s):

Uwe Köckemann ◽

Marjan Alirezaie ◽

Jennifer Renoux ◽

Nicolas Tsiftes ◽

Mobyen Uddin Ahmed ◽

...

Keyword(s):

Data Collection ◽

Activity Recognition ◽

Care Home ◽

Open Data ◽

Ground Truth ◽

Smart Homes ◽

Sensor Data ◽

Data Sets ◽

Data Set ◽

Home Setting

As research in smart homes and activity recognition is increasing, it is of ever increasing importance to have benchmarks systems and data upon which researchers can compare methods. While synthetic data can be useful for certain method developments, real data sets that are open and shared are equally as important. This paper presents the E-care@home system, its installation in a real home setting, and a series of data sets that were collected using the E-care@home system. Our first contribution, the E-care@home system, is a collection of software modules for data collection, labeling, and various reasoning tasks such as activity recognition, person counting, and configuration planning. It supports a heterogeneous set of sensors that can be extended easily and connects collected sensor data to higher-level Artificial Intelligence (AI) reasoning modules. Our second contribution is a series of open data sets which can be used to recognize activities of daily living. In addition to these data sets, we describe the technical infrastructure that we have developed to collect the data and the physical environment. Each data set is annotated with ground-truth information, making it relevant for researchers interested in benchmarking different algorithms for activity recognition.

Download Full-text

Has open data arrived at the British Medical Journal (BMJ)? An observational study

BMJ Open ◽

10.1136/bmjopen-2016-011784 ◽

2016 ◽

Vol 6 (10) ◽

pp. e011784 ◽

Cited By ~ 27

Author(s):

Anisa Rowhani-Farid ◽

Adrian G Barnett

Keyword(s):

Medical Journal ◽

British Medical Journal ◽

Observational Study ◽

Data Sharing ◽

Meta Analysis ◽

Open Data ◽

Research Articles ◽

Data Sets ◽

Data Set ◽

Article 50

ObjectiveTo quantify data sharing trends and data sharing policy compliance at the British Medical Journal (BMJ) by analysing the rate of data sharing practices, and investigate attitudes and examine barriers towards data sharing.DesignObservational study.SettingThe BMJ research archive.Participants160 randomly sampled BMJ research articles from 2009 to 2015, excluding meta-analysis and systematic reviews.Main outcome measuresPercentages of research articles that indicated the availability of their raw data sets in their data sharing statements, and those that easily made their data sets available on request.Results3 articles contained the data in the article. 50 out of 157 (32%) remaining articles indicated the availability of their data sets. 12 used publicly available data and the remaining 38 were sent email requests to access their data sets. Only 1 publicly available data set could be accessed and only 6 out of 38 shared their data via email. So only 7/157 research articles shared their data sets, 4.5% (95% CI 1.8% to 9%). For 21 clinical trials bound by the BMJ data sharing policy, the per cent shared was 24% (8% to 47%).ConclusionsDespite the BMJ's strong data sharing policy, sharing rates are low. Possible explanations for low data sharing rates could be: the wording of the BMJ data sharing policy, which leaves room for individual interpretation and possible loopholes; that our email requests ended up in researchers spam folders; and that researchers are not rewarded for sharing their data. It might be time for a more effective data sharing policy and better incentives for health and medical researchers to share their data.

Download Full-text

Artificial intelligence in oral and maxillofacial radiology: what is currently possible?

Dentomaxillofacial Radiology ◽

10.1259/dmfr.20200375 ◽

2020 ◽

pp. 20200375

Author(s):

Min-Suk Heo ◽

Jo-Eun Kim ◽

Jae-Joon Hwang ◽

Sang-Sun Han ◽

Jin-Soo Kim ◽

...

Keyword(s):

Artificial Intelligence ◽

Open Data ◽

Data Sets ◽

Radiographic Images ◽

Data Set ◽

Actual Clinical Practice ◽

Area Of Interest ◽

The Future ◽

Treatment Plans ◽

Image Quality Improvement

Artificial intelligence, which has been actively applied in a broad range of industries in recent years, is an active area of interest for many researchers. Dentistry is no exception to this trend, and the applications of artificial intelligence are particularly promising in the field of oral and maxillofacial (OMF) radiology. Recent researches on artificial intelligence in OMF radiology have mainly used convolutional neural networks, which can perform image classification, detection, segmentation, registration, generation, and refinement. Artificial intelligence systems in this field have been developed for the purposes of radiographic diagnosis, image analysis, forensic dentistry, and image quality improvement. Tremendous amounts of data are needed to achieve good results, and involvement of OMF radiologist is essential for making accurate and consistent data sets, which is a time-consuming task. In order to widely use artificial intelligence in actual clinical practice in the future, there are lots of problems to be solved, such as building up a huge amount of fine-labeled open data set, understanding of the judgment criteria of artificial intelligence, and DICOM hacking threats using artificial intelligence. If solutions to these problems are presented with the development of artificial intelligence, artificial intelligence will develop further in the future and is expected to play an important role in the development of automatic diagnosis systems, the establishment of treatment plans, and the fabrication of treatment tools. OMF radiologists, as professionals who thoroughly understand the characteristics of radiographic images, will play a very important role in the development of artificial intelligence applications in this field.

Download Full-text

HARVESTING, INTEGRATING AND DISTRIBUTING LARGE OPEN GEOSPATIAL DATASETS USING FREE AND OPEN-SOURCE SOFTWARE

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xli-b7-939-2016 ◽

2016 ◽

Vol XLI-B7 ◽

pp. 939-940

Author(s):

Ricardo Oliveira ◽

Rafael Moreno

Keyword(s):

Open Source ◽

Open Source Software ◽

Spatial Information ◽

Open Data ◽

Federal State ◽

Data Sets ◽

Data Set ◽

Geospatial Datasets ◽

State And Local ◽

The City

Federal, State and Local government agencies in the USA are investing heavily on the dissemination of Open Data sets produced by each of them. The main driver behind this thrust is to increase agencies’ transparency and accountability, as well as to improve citizens’ awareness. However, not all Open Data sets are easy to access and integrate with other Open Data sets available even from the same agency. The City and County of Denver Open Data Portal distributes several types of geospatial datasets, one of them is the city parcels information containing 224,256 records. Although this data layer contains many pieces of information it is incomplete for some custom purposes. Open-Source Software were used to first collect data from diverse City of Denver Open Data sets, then upload them to a repository in the Cloud where they were processed using a PostgreSQL installation on the Cloud and Python scripts. Our method was able to extract non-spatial information from a ‘not-ready-to-download’ source that could then be combined with the initial data set to enhance its potential use.

Download Full-text

HARVESTING, INTEGRATING AND DISTRIBUTING LARGE OPEN GEOSPATIAL DATASETS USING FREE AND OPEN-SOURCE SOFTWARE

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsarchives-xli-b7-939-2016 ◽

2016 ◽

Vol XLI-B7 ◽

pp. 939-940 ◽

Cited By ~ 1

Author(s):

Ricardo Oliveira ◽

Rafael Moreno

Keyword(s):

Open Source ◽

Open Source Software ◽

Spatial Information ◽

Open Data ◽

Federal State ◽

Data Sets ◽

Data Set ◽

Geospatial Datasets ◽

State And Local ◽

The City

Download Full-text

Ein Instrument zur Schätzung von Holzernteproduktivitäten mittels der kNN-Methode

Schweizerische Zeitschrift fur Forstwesen ◽

10.3188/szf.2012.0119 ◽

2012 ◽

Vol 163 (4) ◽

pp. 119-129

Author(s):

Fabian Kostadinov ◽

Renato Lemm ◽

Oliver Thees

Keyword(s):

Software Tool ◽

Estimation Accuracy ◽

Data Sets ◽

Data Set ◽

Wood Harvesting ◽

Federal Institute ◽

Nearest Neighbours ◽

Order Of Magnitude ◽

Landscape Research ◽

Cable Crane

A software tool for the estimation of wood harvesting productivity using the kNN method For operational planning and management of wood harvests it is important to have access to reliable information on time consumption and costs. To estimate these efficiently and reliably, appropriate methods and calculation tools are needed. The present article investigates whether use of the method of the k nearest neighbours (kNN) is appropriate in this case. The kNN algorithm is first explained, then is applied to two sets of data “combined cable crane and processor” and “skidder”, both containing wood harvesting figures, and thus the estimation accuracy of the method is determined. It is shown that the kNN method's estimation accuracy lies within the same order of magnitude as that of a multiple linear regression. Advantages of the kNN method are that it is easy to understand and to visualize, together with the fact that estimation models do not become out of date, since new data sets can be constantly taken into account. The kNN Workbook has been developed by the Swiss Federal Institute for Forest, Snow and Landscape Research (WSL). It is a software tool with which any data set can be analysed in practice using the kNN method. This tool is also presented in the article.

Download Full-text

Applying Lexical Link Analysis to Discover Insights from Public Information on COVID-19

10.1101/2020.05.06.079798 ◽

2020 ◽

Author(s):

Ying Zhao ◽

Charles C. Zhou

Keyword(s):

Open Data ◽

Public Information ◽

Link Analysis ◽

Data Sets ◽

Large Set ◽

Mining Method ◽

Economic Activities ◽

Data Set ◽

Information Mining ◽

Unique Information

SARS-Cov-2, the deadly and novel virus, which has caused a worldwide pandemic and drastic loss of human lives and economic activities. An open data set called the COVID-19 Open Research Dataset or CORD-19 contains large set full text scientific literature on SARS-CoV-2. The Next Strain consists of a database of SARS-CoV-2 viral genomes from since 12/3/2019. We applied an unique information mining method named lexical link analysis (LLA) to answer the call to action and help the science community answer high-priority scientific questions related to SARS-CoV-2. We first text-mined the CORD-19. We also data-mined the next strain database. Finally, we linked two databases. The linked databases and information can be used to discover the insights and help the research community to address high-priority questions related to the SARS-CoV-2’s genetics, tests, and prevention.Significance StatementIn this paper, we show how to apply an unique information mining method lexical link analysis (LLA) to link unstructured (CORD-19) and structured (Next Strain) data sets to relevant publications, integrate text and data mining into a single platform to discover the insights that can be visualized, and validated to answer the high-priority questions of genetics, incubation, treatment, symptoms, and prevention of COVID-19.

Download Full-text

Comparison of silhouette-based reallocation methods for vegetation classification

10.1101/630384 ◽

2019 ◽

Cited By ~ 1

Author(s):

Attila Lengyel ◽

David W. Roberts ◽

Zoltán Botta-Dukát

Keyword(s):

Simulated Data ◽

Primary Objective ◽

Vegetation Classification ◽

Data Sets ◽

Data Set ◽

Number Of Clusters ◽

Silhouette Width ◽

Diagnostic Species ◽

Order Of Magnitude ◽

Initial Classification

AbstractAimsTo introduce REMOS, a new iterative reallocation method (with two variants) for vegetation classification, and to compare its performance with OPTSIL. We test (1) how effectively REMOS and OPTSIL maximize mean silhouette width and minimize the number of negative silhouette widths when run on classifications with different structure; (2) how these three methods differ in runtime with different sample sizes; and (3) if classifications by the three reallocation methods differ in the number of diagnostic species, a surrogate for interpretability.Study areaSimulation; example data sets from grasslands in Hungary and forests in Wyoming and Utah, USA.MethodsWe classified random subsets of simulated data with the flexible-beta algorithm for different values of beta. These classifications were subsequently optimized by REMOS and OPTSIL and compared for mean silhouette widths and proportion of negative silhouette widths. Then, we classified three vegetation data sets of different sizes from two to ten clusters, optimized them with the reallocation methods, and compared their runtimes, mean silhouette widths, numbers of negative silhouette widths, and the number of diagnostic species.ResultsIn terms of mean silhouette width, OPTSIL performed the best when the initial classifications already had high mean silhouette width. REMOS algorithms had slightly lower mean silhouette width than what was maximally achievable with OPTSIL but their efficiency was consistent across different initial classifications; thus REMOS was significantly superior to OPTSIL when the initial classification had low mean silhouette width. REMOS resulted in zero or a negligible number of negative silhouette widths across all classifications. OPTSIL performed similarly when the initial classification was effective but could not reach as low proportion of misclassified objects when the initial classification was inefficient. REMOS algorithms were typically more than an order of magnitude faster to calculate than OPTSIL. There was no clear difference between REMOS and OPTSIL in the number of diagnostic species.ConclusionsREMOS algorithms may be preferable to OPTSIL when (1) the primary objective is to reduce or eliminate negative silhouette widths in a classification, (2) the initial classification has low mean silhouette width, or (3) when the time efficiency of the algorithm is important because of the size of the data set or the high number of clusters.

Download Full-text