Doppler radar rainfall prediction and gauge data

Abstract Objective The data herein represents multiple gauge sets and multiple radar sites of like-type Doppler data sets combined to produce populations of ordered pairs. Publications spanning decades yet specific to Doppler radar sites contain graphs of data pairs of Doppler radar precipitation estimates versus rain gauge precipitation readings. Data description Taken from multiple sources, the data set represents several radar sites and rain gauge sites combined for 8830 data points. The data is relevant in various applications of hydrometeorology and engineering as well as weather forecasting. Further, the importance of accuracy in radar and precipitation estimates continues to increase, necessitating the incorporation of as much data as possible.

Download Full-text

Dual-Polarization radar rainfall prediction and rain gauge data

BMC Research Notes ◽

10.1186/s13104-021-05693-7 ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Tyson H. Walsh ◽

Jesse W. Lansford ◽

T. V. Hromadka ◽

Prasada Rao

Keyword(s):

Weather Forecasting ◽

Rain Gauge ◽

Dual Polarization ◽

Multiple Sources ◽

Data Set ◽

Radar Rainfall ◽

Rainfall Prediction ◽

Rain Gauges ◽

Precipitation Estimates ◽

Data Points

Abstract Objective Reported rainfall data from multiple rain gauges and its corresponding estimate from Dual-Polarization (Dual-Pol) radar is presented here. The ordered set of data pairs were collected from multiple peer reviewed publications spanning across the last decade. Data description Taken from multiple sources, the data set represents several radar sites and rain gauge sites combined for 12,734 data points. The data is relevant in various applications of hydrometeorology and engineering as well as weather forecasting. Further, the importance of accuracy in radar precipitation estimates continues to increase, necessitating the incorporation of as much data as possible.

Download Full-text

Using Ancillary Information from Radar-based Observations and Rain Gauges to Identify Error and Bias

Journal of Hydrometeorology ◽

10.1175/jhm-d-20-0193.1 ◽

2021 ◽

Author(s):

Brian R. Nelson ◽

Olivier P. Prat ◽

Ronald Leeper

Keyword(s):

Quality Control ◽

Air Temperature ◽

Quality Indicator ◽

Stage Iv ◽

Rain Gauge ◽

Data Sets ◽

Data Set ◽

Precipitation Estimates ◽

Precipitation Type ◽

Ancillary Information

AbstractAncillary information that exists within rain gauge and radar-based data sets provides opportunities to better identify error and bias between the two observing platforms as compared to error and bias statistics without ancillary information. These variables include precipitation type identification, air temperature, and radar quality. There are two NEXRAD based data sets used for reference; the National Centers for Environmental Prediction (NCEP) stage IV and the NOAA NEXRAD Reanalysis (NNR) gridded data sets. The NCEP stage IV data set is available at 4km hourly and includes radar-gauge bias adjusted precipitation estimates. The NNR data set is available at 1km at 5-minute and hourly time intervals and includes several different variables such as reflectivity, radar-only estimates, precipitation flag, radar quality indicator, and radar-gauge bias adjusted precipitation estimates. The NNR data product provides additional information to apply quality control such as identification of precipitation type, identification of storm type and Z-R relation. Other measures of quality control are a part of the NNR data product development. In addition, some of the variables are available at 5-minute scale. We compare the radar-based estimates with the rain gauge observations from the U.S. Climate Reference Network (USCRN). The USCRN network is available at the 5-minute scale and includes observations of air temperature, wind, and soil moisture among others. We present statistical comparisons of rain gauge observations with radar-based estimates by segmenting information based on precipitation type, air temperature, and radar quality indicator.

Download Full-text

Long-Term Rainfall Forecast Model Based on The TabNet and LightGbm Algorithm

10.21203/rs.3.rs-107107/v1 ◽

2020 ◽

Author(s):

Tianyu Xu ◽

Yongchuan Yu ◽

Jianzhuo Yan ◽

Hongxia Xu

Keyword(s):

Prediction Model ◽

Feature Fusion ◽

Forecast Model ◽

Data Sets ◽

Good Prediction ◽

Data Set ◽

Rainfall Prediction ◽

Improve Model ◽

Probability Prediction

Abstract Due to the problems of unbalanced data sets and distribution differences in long-term rainfall prediction, the current rainfall prediction model had poor generalization performance and could not achieve good prediction results in real scenarios. This study uses multiple atmospheric parameters (such as temperature, humidity, atmospheric pressure, etc.) to establish a TabNet-LightGbm rainfall probability prediction model. This research uses feature engineering (such as generating descriptive statistical features, feature fusion) to improve model accuracy, Borderline Smote algorithm to improve data set imbalance, and confrontation verification to improve distribution differences. The experiment uses 5 years of precipitation data from 26 stations in the Beijing-Tianjin-Hebei region of China to verify the proposed rainfall prediction model. The test set is to predict the rainfall of each station in one month. The experimental results shows that the model has good performance with AUC larger than 92%. The method proposed in this study further improves the accuracy of rainfall prediction, and provides a reference for data mining tasks.

Download Full-text

First-Year Evaluation of GPM Rainfall over the Netherlands: IMERG Day 1 Final Run (V03D)

Journal of Hydrometeorology ◽

10.1175/jhm-d-16-0087.1 ◽

2016 ◽

Vol 17 (11) ◽

pp. 2799-2814 ◽

Cited By ~ 46

Author(s):

M. F. Rios Gaona ◽

A. Overeem ◽

H. Leijnse ◽

R. Uijlenhoet

Keyword(s):

The Netherlands ◽

Land Surface ◽

Tropical Rainfall Measuring Mission ◽

Ground Truth ◽

Rain Gauge ◽

First Year ◽

Radar Rainfall ◽

High Resolution Data ◽

Precipitation Estimates ◽

Global Precipitation

Abstract The Global Precipitation Measurement (GPM) mission is the successor to the Tropical Rainfall Measuring Mission (TRMM), which orbited Earth for ~17 years. With Core Observatory launched on 27 February 2014, GPM offers global precipitation estimates between 60°N and 60°S at 0.1° × 0.1° resolution every 30 min. Unlike during the TRMM era, the Netherlands is now within the coverage provided by GPM. Here the first year of GPM rainfall retrievals from the 30-min gridded Integrated Multisatellite Retrievals for GPM (IMERG) product Day 1 Final Run (V03D) is assessed. This product is compared against gauge-adjusted radar rainfall maps over the land surface of the Netherlands at 30-min, 24-h, monthly, and yearly scales. These radar rainfall maps are considered to be ground truth. The evaluation of the first year of IMERG operations is done through time series, scatterplots, empirical exceedance probabilities, and various statistical indicators. In general, there is a tendency for IMERG to slightly underestimate (2%) countrywide rainfall depths. Nevertheless, the relative underestimation is small enough to propose IMERG as a reliable source of precipitation data, especially for areas where rain gauge networks or ground-based radars do not offer these types of high-resolution data and availability. The potential of GPM for rainfall estimation in a midlatitude country is confirmed.

Download Full-text

Evaluation of Satellite and Reanalysis Precipitation Products Using GIS for All Basins in Turkey

Advances in Meteorology ◽

10.1155/2019/4820136 ◽

2019 ◽

Vol 2019 ◽

pp. 1-11 ◽

Cited By ~ 2

Author(s):

Ahmet Irvem ◽

Mustafa Ozbuldu

Keyword(s):

Rain Gauge ◽

Coefficient Of Determination ◽

Data Sets ◽

The West ◽

Data Set ◽

Precipitation Product ◽

Distance Weighting ◽

Areal Precipitation ◽

Average Annual Precipitation ◽

Inverse Distance

Use of the satellite and reanalysis precipitation products, as supplementary data sources, are steadily rising for hydrometeorological applications, especially in data-sparse areas. However, the accuracy of these data sets is often lacking, especially in Turkey. This study evaluates the accuracy of satellite precipitation product (TRMM 3B42V7) and reanalysis precipitation product (NCEP-CFSR) against rain gauge observations for the 1998–2010 periods. Average annual precipitation for the 25 basins in Turkey was calculated using rain gauge precipitation data from 225 stations. The inverse distance weighting (IDW) method was used to calculate areal precipitation for each basin using GIS. According to the results of statistical analysis, the coefficient of determination for the TRMM product gave satisfactory results (R2 > 0.88). However, R2 for the CFSR data set ranges from 0.35 for the Eastern Black Sea basin to 0.93 for the West Mediterranean basin. RMSE was calculated to be 95.679 mm and 128.097 mm for the TRMM and CFSR data, respectively. The NSE results of TRMM data showed very good performance for 6 basins, while the PBias value showed very good performance for 7 basins. The NSE results of CFSR data showed very good performance for 3 basins, while the PBias value showed very good performance for 6 basins.

Download Full-text

Tailoring data source distributions for fairness-aware data integration

Proceedings of the VLDB Endowment ◽

10.14778/3476249.3476299 ◽

2021 ◽

Vol 14 (11) ◽

pp. 2519-2532

Author(s):

Fatemeh Nargesian ◽

Abolfazl Asudeh ◽

H. V. Jagadish

Keyword(s):

Optimal Solution ◽

Cost Effective ◽

Data Sources ◽

Data Sets ◽

Multiple Sources ◽

Data Set ◽

Demographic Groups ◽

Reward Function ◽

Effective Manner ◽

Data Source

Data scientists often develop data sets for analysis by drawing upon sources of data available to them. A major challenge is to ensure that the data set used for analysis has an appropriate representation of relevant (demographic) groups: it meets desired distribution requirements. Whether data is collected through some experiment or obtained from some data provider, the data from any single source may not meet the desired distribution requirements. Therefore, a union of data from multiple sources is often required. In this paper, we study how to acquire such data in the most cost effective manner, for typical cost functions observed in practice. We present an optimal solution for binary groups when the underlying distributions of data sources are known and all data sources have equal costs. For the generic case with unequal costs, we design an approximation algorithm that performs well in practice. When the underlying distributions are unknown, we develop an exploration-exploitation based strategy with a reward function that captures the cost and approximations of group distributions in each data source. Besides theoretical analysis, we conduct comprehensive experiments that confirm the effectiveness of our algorithms.

Download Full-text

An Investigation of the Convergence of Average Peak Accelerations for High-Speed Planing Craft

10.5957/smc-2021-063 ◽

2021 ◽

Author(s):

Michael R. Riley ◽

Heidi P. Murphy ◽

Brock W. Aron

Keyword(s):

High Speed ◽

Cumulative Distribution ◽

Data Sets ◽

Peak Acceleration ◽

Multiple Sources ◽

Data Set ◽

Distribution Shape ◽

Acceleration Data ◽

Rough Water ◽

The Stability

This paper summarizes the results of an investigation of the convergence of average peak accelerations as more and more peaks are recorded during rough-water trials of small high-speed craft. Existing guidance from multiple sources suggest that more peaks is better, but how much more, and what engineering rationale should substantiate the answer? To address the question, simplified equations and numerous examples of peak acceleration data sets are presented. The results demonstrate that convergence of the average of the highest 10 percent of peaks (A1/10), and the average of the highest 1 percent of peaks (A1/100), and the ratio means that the shape of the cumulative distribution of the data set becomes more stable as the number of peak acceleration data points increases. A simple percent difference criterion is presented for quantifying the stability of the cumulative distribution shape.

Download Full-text

Automatic detection of volcanic eruptions in Doppler radar observations using a neural network approach

10.5194/egusphere-egu2020-11123 ◽

2020 ◽

Author(s):

Matthias Hort ◽

Daniel Uhle ◽

Fabio Venegas ◽

Lea Scharff ◽

Jan Walda ◽

...

Keyword(s):

Neural Network ◽

Doppler Radar ◽

Volcanic Eruptions ◽

Radar Data ◽

Visual Observation ◽

Data Sets ◽

Network Approach ◽

Neural Network Approach ◽

Data Set ◽

The Impact

<p>Immediate detection of volcanic eruptions is essential when trying to mitigate the impact on the health of people living in the vicinity of a volcano or the impact on infrastructure and aviation. Eruption detection is most often done by either visual observation or the analysis of acoustic data. While visual observation is often difficult due to environmental conditions, infrasound data usually provide the onset of an event. Doppler radar data, admittedly not available for a lot of volcanoes, however, provide information on the dynamics of the eruption and the amount of material released. Eruptions can be easily detected in the data by visual analysis and here we present a neural network approach for the automatic detection of eruptions in Doppler radar data. We use data recorded at Colima volcano in Mexico in 2014/2015 and a data set recorded at Turrialba volcano between 2017 and 2019. In a first step we picked eruptions, rain and typical noise in both data sets, which were the used for training two networks (training data set) and testing the performance of the network using a separate test data set. The accuracy for classifying the different type of signals was between 95 and 98% for both data sets, which we consider quite successful. In case of the Turriabla data set eruptions were picked based on observations of OVSICORI data. When classifying the complete data set we have from Turriabla using the trained network, an additional 40 eruptions were found, which were not in the OVSICORI catalogue.</p><p>In most cases data from the instruments are transmitted to an observatory by radio, so the amount of data available is an issue. We therefore tested by what amount the data could be reduced to still be able to successfully detect an eruption. We also kept the network as small as possible to ideally run it on a small computer (e.g. a Rasberry Pi architecture) for eruption detection on site, so only the information that an eruption is detected needs to be transmitted.</p>

Download Full-text

HARMONIZING CIGAR SURVEY DATA ACROSS TCORS, CTP, AND PATH STUDIES: THE CIGAR COLLABORATIVE RESEARCH (CCR) GROUP

Nicotine & Tobacco Research ◽

10.1093/ntr/ntz201 ◽

2019 ◽

Author(s):

Howard Fishbein ◽

Dan Bauer ◽

Qilu Yu ◽

Robin Mermelstein ◽

Dina Jones ◽

...

Keyword(s):

Survey Data ◽

Tobacco Product ◽

Data Sets ◽

Multiple Sources ◽

Data Set ◽

Use Patterns ◽

Regulatory Policies ◽

Effective Interventions ◽

Youth And Young Adults ◽

Degree Of Confidence

Abstract Introduction Cigars are a popular tobacco product of choice for youth and young adults. Despite growing interest in cigar research, there are gaps in the available literature limiting an ability to set evidence-based policies. Too small research samples, the heterogeneity of types of cigars when asking a single question about use, makes analyzing data difficult. Given the Food and Drug Administration’s (FDA) authority granted in 2016 to regulate cigars, and its popularity, data to better understand use and preference for cigars will help FDA set appropriate regulatory policies. Methods We harmonized cigar survey data previously collected by five independent tobacco regulatory science survey research projects. Data supplying participants included 3 TCORS, 1 CTP grantee, and data from PATH’s public use data set. Results Analyzing 92 data variables from across five studies, and applying a rigorous data harmonization protocol, we report findings on 24 key cigar use variables. The step by step protocol for harmonizing is presented. Selected findings show strict reproducibility across all 5 studies reveal youth 17-19 years at highest risk for cigar initiation; relative reproducibility shows males more likely to try cigars than females, but with significant differences in magnitude across studies; and areas of inconsistent reproducibility are revealed when evaluating brand preferences. Conclusion Harmonizing data from multiple sources fosters a broader view of the robustness and generalizability of survey data than that from a single source. These observations raise awareness to look for the highest degree of reproducibility among and across data sources to inform policy. Implications Harmonizing data from discrete data sets provides insights to cigar initiation and use, and is presented showing opportunities, challenges, and solutions. Comparing observational data from PATH and four independent research studies, provides a best-practices approach and example of data synthesis for the tobacco research community. The data set of 5 studies offers a look at the degree of confidence in analyzing harmonized survey results. Variable conclusions raise the need to strive for the highest degree of reproducibility, to best understand the behaviors of cigar users, and allow for future development of the most effective interventions to alter tobacco use patterns.

Download Full-text

Using Integrative Data Analysis to Investigate School Climate Across Multiple Informants

Educational and Psychological Measurement ◽

10.1177/0013164419885999 ◽

2019 ◽

Vol 80 (4) ◽

pp. 617-637 ◽

Cited By ~ 3

Author(s):

Kathleen V. McGrath ◽

Elizabeth A. Leighton ◽

Mihaela Ene ◽

Christine DiStefano ◽

Diane M. Monrad

Keyword(s):

Data Analysis ◽

School Climate ◽

Complex Model ◽

Multiple Informants ◽

Data Sets ◽

Multiple Perspectives ◽

Multiple Sources ◽

Data Set ◽

Integrative Data Analysis ◽

Practical Applications

Survey research frequently involves the collection of data from multiple informants. Results, however, are usually analyzed by informant group, potentially ignoring important relationships across groups. When the same construct(s) are measured, integrative data analysis (IDA) allows pooling of data from multiple sources into one data set to examine information from multiple perspectives within the same analysis. Here, the IDA procedure is demonstrated via the examination of pooled data from student and teacher school climate surveys. This study contributes to the sparse literature regarding IDA applications in the social sciences, specifically in education. It also lays the groundwork for future educational researchers interested in the practical applications of the IDA framework to empirical data sets with complex model structures.

Download Full-text