Evaluation of a Model for Predicting the Drift of Iceberg Ensembles

1988 ◽  
Vol 110 (2) ◽  
pp. 172-179 ◽  
Author(s):  
H. El-Tahan ◽  
S. Venkatesh ◽  
M. El-Tahan

This paper describes the evaluation of a model for predicting the drift of iceberg ensembles. The model was developed in preparation for providing an iceberg forecasting service off the Canadian east coast north of about 45°N. It was envisaged that 1–5 day forecasts of iceberg ensemble drift will be available. Following a critical examination of all available data, 10 data sets containing up to 404 icebergs in the Grand Banks area off Newfoundland were selected for detailed study. The winds measured in the vicinity of the study area as well as the detailed current system developed by the International Ice Patrol were used as inputs to the model. A discussion on the accuracy and limitations of the input data is presented. Qualitative and quantitative criteria were used to evaluate model performance. Applying these criteria to the results of the computer simulations, it is shown that the model provides good predictions. The degree of predictive success varied from one data set to another. The study demonstrated the validity of the assumption of random positioning for icebergs within a grid block, especially for ensembles with large numbers of icebergs. It was found that an “average” iceberg size can be used to represent all icebergs. The study also showed that in order to achieve improved results it will be necessary to account for the deterioration (complete melting of icebergs), especially during the summer months.

2016 ◽  
Vol 39 (11) ◽  
pp. 1477-1501 ◽  
Author(s):  
Victoria Goode ◽  
Nancy Crego ◽  
Michael P. Cary ◽  
Deirdre Thornlow ◽  
Elizabeth Merwin

Researchers need to evaluate the strengths and weaknesses of data sets to choose a secondary data set to use for a health care study. This research method review informs the reader of the major issues necessary for investigators to consider while incorporating secondary data into their repertoire of potential research designs and shows the range of approaches the investigators may take to answer nursing research questions in a variety of context areas. The researcher requires expertise in locating and judging data sets and in the development of complex data management skills for managing large numbers of records. There are important considerations such as firm knowledge of the research question supported by the conceptual framework and the selection of appropriate databases, which guide the researcher in delineating the unit of analysis. Other more complex issues for researchers to consider when conducting secondary data research methods include data access, management and security, and complex variable construction.


2019 ◽  
Vol 491 (3) ◽  
pp. 3290-3317 ◽  
Author(s):  
Oliver H E Philcox ◽  
Daniel J Eisenstein ◽  
Ross O’Connell ◽  
Alexander Wiegand

ABSTRACT To make use of clustering statistics from large cosmological surveys, accurate and precise covariance matrices are needed. We present a new code to estimate large-scale galaxy two-point correlation function (2PCF) covariances in arbitrary survey geometries that, due to new sampling techniques, runs ∼104 times faster than previous codes, computing finely binned covariance matrices with negligible noise in less than 100 CPU-hours. As in previous works, non-Gaussianity is approximated via a small rescaling of shot noise in the theoretical model, calibrated by comparing jackknife survey covariances to an associated jackknife model. The flexible code, rascalc, has been publicly released, and automatically takes care of all necessary pre- and post-processing, requiring only a single input data set (without a prior 2PCF model). Deviations between large-scale model covariances from a mock survey and those from a large suite of mocks are found to be indistinguishable from noise. In addition, the choice of input mock is shown to be irrelevant for desired noise levels below ∼105 mocks. Coupled with its generalization to multitracer data sets, this shows the algorithm to be an excellent tool for analysis, reducing the need for large numbers of mock simulations to be computed.


2017 ◽  
Vol 113 (9/10) ◽  
Author(s):  
Douw G. Breed ◽  
Tanja Verster

We applied different modelling techniques to six data sets from different disciplines in the industry, on which predictive models can be developed, to demonstrate the benefit of segmentation in linear predictive modelling. We compared the model performance achieved on the data sets to the performance of popular non-linear modelling techniques, by first segmenting the data (using unsupervised, semi-supervised, as well as supervised methods) and then fitting a linear modelling technique. A total of eight modelling techniques was compared. We show that there is no one single modelling technique that always outperforms on the data sets. Specifically considering the direct marketing data set from a local South African bank, it is observed that gradient boosting performed the best. Depending on the characteristics of the data set, one technique may outperform another. We also show that segmenting the data benefits the performance of the linear modelling technique in the predictive modelling context on all data sets considered. Specifically, of the three segmentation methods considered, the semi-supervised segmentation appears the most promising.


2014 ◽  
Vol 7 (5) ◽  
pp. 1395-1427 ◽  
Author(s):  
B. Hassler ◽  
I. Petropavlovskikh ◽  
J. Staehelin ◽  
T. August ◽  
P. K. Bhartia ◽  
...  

Abstract. Peak stratospheric chlorofluorocarbon (CFC) and other ozone depleting substance (ODS) concentrations were reached in the mid- to late 1990s. Detection and attribution of the expected recovery of the stratospheric ozone layer in an atmosphere with reduced ODSs as well as efforts to understand the evolution of stratospheric ozone in the presence of increasing greenhouse gases are key current research topics. These require a critical examination of the ozone changes with an accurate knowledge of the spatial (geographical and vertical) and temporal ozone response. For such an examination, it is vital that the quality of the measurements used be as high as possible and measurement uncertainties well quantified. In preparation for the 2014 United Nations Environment Programme (UNEP)/World Meteorological Organization (WMO) Scientific Assessment of Ozone Depletion, the SPARC/IO3C/IGACO-O3/NDACC (SI2N) Initiative was designed to study and document changes in the global ozone profile distribution. This requires assessing long-term ozone profile data sets in regards to measurement stability and uncertainty characteristics. The ultimate goal is to establish suitability for estimating long-term ozone trends to contribute to ozone recovery studies. Some of the data sets have been improved as part of this initiative with updated versions now available. This summary presents an overview of stratospheric ozone profile measurement data sets (ground and satellite based) available for ozone recovery studies. Here we document measurement techniques, spatial and temporal coverage, vertical resolution, native units and measurement uncertainties. In addition, the latest data versions are briefly described (including data version updates as well as detailing multiple retrievals when available for a given satellite instrument). Archive location information for each data set is also given.


2019 ◽  
Vol 115 (3/4) ◽  
Author(s):  
Douw G. Breed ◽  
Tanja Verster

Segmentation of data for the purpose of enhancing predictive modelling is a well-established practice in the banking industry. Unsupervised and supervised approaches are the two main types of segmentation and examples of improved performance of predictive models exist for both approaches. However, both focus on a single aspect – either target separation or independent variable distribution – and combining them may deliver better results. This combination approach is called semi-supervised segmentation. Our objective was to explore four new semi-supervised segmentation techniques that may offer alternative strengths. We applied these techniques to six data sets from different domains, and compared the model performance achieved. The original semi-supervised segmentation technique was the best for two of the data sets (as measured by the improvement in validation set Gini), but others outperformed for the other four data sets. Significance: We propose four newly developed semi-supervised segmentation techniques that can be used as additional tools for segmenting data before fitting a logistic regression. In all comparisons, using semi-supervised segmentation before fitting a logistic regression improved the modelling performance (as measured by the Gini coefficient on the validation data set) compared to using unsegmented logistic regression.


2017 ◽  
Vol 10 (9) ◽  
pp. 3359-3373 ◽  
Author(s):  
Valentin Duflot ◽  
Jean-Luc Baray ◽  
Guillaume Payen ◽  
Nicolas Marquestaut ◽  
Francoise Posny ◽  
...  

Abstract. In order to recognize the importance of ozone (O3) in the troposphere and lower stratosphere in the tropics, a DIAL (differential absorption lidar) tropospheric O3 lidar system (LIO3TUR) was developed and installed at the Université de la Réunion campus site (close to the sea) on Reunion Island (southern tropics) in 1998. From 1998 to 2010, it acquired 427 O3 profiles from the low to the upper troposphere and has been central to several studies. In 2012, the system was moved up to the new Maïdo Observatory facility (2160 m a.m.s.l. – metres above mean sea level) where it started operation in February 2013. The current system (LIO3T) configuration generates a 266 nm beam obtained with the fourth harmonic of a Nd:YAG laser sent into a Raman cell filled up with deuterium (using helium as buffer gas), generating the 289 and 316 nm beams to enable the use of the DIAL method for O3 profile measurements. The optimal range for the actual system is 6–19 km a.m.s.l., depending on the instrumental and atmospheric conditions. For a 1 h integration time, vertical resolution varies from 0.7 km at 6 km a.m.s.l. to 1.3 km at 19 km a.m.s.l., and mean uncertainty within the 6–19 km range is between 6 and 13 %. Comparisons with eight electrochemical concentration cell (ECC) sondes simultaneously launched from the Maïdo Observatory show good agreement between data sets with a 6.8 % mean absolute relative difference (D) between 6 and 17 km a.m.s.l. (LIO3T lower than ECC). Comparisons with 37 ECC sondes launched from the nearby Gillot site during the daytime in a ±24 h window around lidar shooting result in a 9.4 % D between 6 and 19 km a.m.s.l. (LIO3T lower than ECC). Comparisons with 11 ground-based Network for Detection of Atmospheric Composition Change (NDACC) Fourier transform infrared (FTIR) spectrometer measurements acquired during the daytime in a ±24 h window around lidar shooting show good agreement between data sets with a D of 11.8 % for the 8.5–16 km partial column (LIO3T higher than FTIR), and comparisons with 39 simultaneous Infrared Atmospheric Sounding Interferometer (IASI) observations over Reunion Island show good agreement between data sets with a D of 11.3 % for the 6–16 km partial column (LIO3T higher than IASI). ECC, LIO3TUR and LIO3T O3 monthly climatologies all exhibit the same range of values and patterns. In particular, the Southern Hemisphere biomass burning seasonal enhancement and the ozonopause altitude decrease in late austral winter–spring, as well as the sign of deep convection bringing boundary layer O3-poor air masses up to the middle–upper troposphere in late austral summer, are clearly visible in all data sets.


Author(s):  
N. Ram Mohan ◽  
N. Praveen Kumar

Analyzing cyber incident data sets is an important method for deepening our understanding of the evolution of the threat situation. This is a relatively new research topic, and many studies remain to be done. In this paper, I reported a statistical analysis of a breach incident data set corresponding to 12 years (2005–2017) of cyber hacking activities that include malware attacks. I shown that, in contrast to the findings reported in the literature, both hacking breach incident inter-arrival times and breach sizes should be modeled by stochastic processes, rather than by distributions because they exhibit autocorrelations. Then, I proposed a particular stochastic process models to, respectively, fit the inter-arrival times and the breach sizes. I also shown that these models can predict the inter-arrival times and the breach sizes. In order to get deeper insights into the evolution of hacking breach incidents, we conduct both qualitative and quantitative trend analyses on the data set. I drew a set of cyber security insights, including that the threat of cyber hacks is indeed getting worse in terms of their frequency, but not in terms of the magnitude of their damage.


2014 ◽  
Vol 52 (4) ◽  
pp. 737-754 ◽  
Author(s):  
Margit Raich ◽  
Julia Müller ◽  
Dagmar Abfalter

Purpose – The purpose of this paper is to provide insightful evidence of phenomena in organization and management theory. Textual data sets consist of two different elements, namely qualitative and quantitative aspects. Researchers often combine methods to harness both aspects. However, they frequently do this in a comparative, convergent, or sequential way. Design/methodology/approach – The paper illustrates and discusses a hybrid textual data analysis approach employing the qualitative software application GABEK-WinRelan in a case study of an Austrian retail bank. Findings – The paper argues that a hybrid analysis method, fully intertwining qualitative and quantitative analysis simultaneously on the same textual data set, can deliver new insight into more facets of a data set. Originality/value – A hybrid approach is not a universally applicable solution to approaching research and management problems. Rather, this paper aims at triggering and intensifying scientific discussion about stronger integration of qualitative and quantitative data and analysis methods in management research.


2013 ◽  
Vol 17 (11) ◽  
pp. 4323-4337 ◽  
Author(s):  
M. A. Sunyer ◽  
H. J. D. Sørup ◽  
O. B. Christensen ◽  
H. Madsen ◽  
D. Rosbjerg ◽  
...  

Abstract. In recent years, there has been an increase in the number of climate studies addressing changes in extreme precipitation. A common step in these studies involves the assessment of the climate model performance. This is often measured by comparing climate model output with observational data. In the majority of such studies the characteristics and uncertainties of the observational data are neglected. This study addresses the influence of using different observational data sets to assess the climate model performance. Four different data sets covering Denmark using different gauge systems and comprising both networks of point measurements and gridded data sets are considered. Additionally, the influence of using different performance indices and metrics is addressed. A set of indices ranging from mean to extreme precipitation properties is calculated for all the data sets. For each of the observational data sets, the regional climate models (RCMs) are ranked according to their performance using two different metrics. These are based on the error in representing the indices and the spatial pattern. In comparison to the mean, extreme precipitation indices are highly dependent on the spatial resolution of the observations. The spatial pattern also shows differences between the observational data sets. These differences have a clear impact on the ranking of the climate models, which is highly dependent on the observational data set, the index and the metric used. The results highlight the need to be aware of the properties of observational data chosen in order to avoid overconfident and misleading conclusions with respect to climate model performance.


2019 ◽  
Vol 43 (1) ◽  
pp. 1-29
Author(s):  
Alice Bee Kasakoff

This article highlights the usefulness of family trees for visualizing and understanding changing patterns of kin dispersion over time. Such spatial patterns are important in gauging how families influence outcomes such as health and social mobility. The article describes how rapidly growing families, originally from England, dispersed over the US North and established hubs where they originally settled that lasted hundreds of years, even as they repeated the process moving West. Fathers lived much closer to their adult sons in 1850 than they do today and many more had an adult son within a radius of 30 miles. Big Data from genealogical websites is now available to map large numbers of families. Comparing one such data set with the US Census of 1880 shows that the native-born population is well represented, but there are not as many foreign born or African Americans in these data sets. Pedigrees become less and less representative the further back in time they go because they only include lines that have survived into the present. Despite these and other limitations, Big Data make it possible to study family spatial dispersion going back many generations and to map past spatial connections in a wider variety of historical contexts and at a scale never before possible.


Sign in / Sign up

Export Citation Format

Share Document