binned data
Recently Published Documents


TOTAL DOCUMENTS

41
(FIVE YEARS 10)

H-INDEX

8
(FIVE YEARS 1)

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Masood Tadi ◽  
Irina Kortchemski

Purpose This paper aims to demonstrate a dynamic cointegration-based pairs trading strategy, including an optimal look-back window framework in the cryptocurrency market and evaluate its return and risk by applying three different scenarios. Design/methodology/approach This study uses the Engle-Granger methodology, the Kapetanios-Snell-Shin test and the Johansen test as cointegration tests in different scenarios. This study calibrates the mean-reversion speed of the Ornstein-Uhlenbeck process to obtain the half-life used for the asset selection phase and look-back window estimation. Findings By considering the main limitations in the market microstructure, the strategy of this paper exceeds the naive buy-and-hold approach in the Bitmex exchange. Another significant finding is that this study implements a numerous collection of cryptocurrency coins to formulate the model’s spread, which improves the risk-adjusted profitability of the pairs trading strategy. Besides, the strategy’s maximum drawdown level is reasonably low, which makes it useful to be deployed. The results also indicate that a class of coins has better potential arbitrage opportunities than others. Originality/value This research has some noticeable advantages, making it stand out from similar studies in the cryptocurrency market. First is the accuracy of data in which minute-binned data create the signals in the formation period. Besides, to backtest the strategy during the trading period, this study simulates the trading signals using best bid/ask quotes and market trades. This study exclusively takes the order execution into account when the asset size is already available at its quoted price (with one or more period gaps after signal generation). This action makes the backtesting much more realistic.


2020 ◽  
Author(s):  
Molly M. King

Researchers often need to work with categorical income data. While the typical nonparametric (including midpoint) and parametric estimation methods used to estimate summary statistics both have advantages, they all carry assumptions which cause them to deviate in important ways from real world distributions of income. The method introduced here, Random Empirical Distribution Imputation (REDI), imputes discrete observations using binned income data, while also calcu- lating summary statistics. REDI achieves this through random cold-deck imputation from a real world reference dataset (here, the Current Population Survey ASEC). This imputation method reconciles bins between datasets or across years and handles top incomes. REDI has other ad- vantages of computing an income distribution that is nonparametric, bin consistent, area- and variance-preserving, continuous, and computationally fast. I provide proof of concept using two years of the American Community Survey.


2020 ◽  
Vol 25 (2) ◽  
pp. 209-218
Author(s):  
Maximilian Ruffert ◽  
Victoria L.G. Todd ◽  
Ian B. Todd

C-PODs are used for Passive Acoustic Monitoring (PAM) of harbour porpoises (Phocoena phocoena) at an offshore open sea location in the German North Sea. Diel patterns of echolocation click trains are extracted from minimum inter-click interval (minICI) data by binning. The aim of this study is to reassess and refine minICI ranges of click train data with particular consideration to the binning widths. Emphasis is also placed on choosing an appropriate visualisation of these binned data. Key ecological results include presence of higher train rates during the day with intermediate minICI values defined by the range 6-28 ms and a higher train rate with short minICI values 1.25-2.00 ms at night. This indicates an increase in porpoise feeding behaviour, or change of style, at night. Click trains with long minICI values > 35 ms occur at an equal rate throughout both diel phases, suggesting a more routine behaviour, such as navigation. Results could be revealed only by judicious choice of binning widths, e.g. previously overlooked patterns within historical echolocation data. The classification methodology can be used to analyse echolocation trains from a variety of species and can be applied to any PAM data with the relevant click parameters.


2019 ◽  
Author(s):  
Patrick Mungufeni ◽  
Claudia Stolle ◽  
Sripathi Samireddipalle ◽  
Yenca Migoya-Orué ◽  
Yong Ha Kim

Abstract. This study developed a model of Total Electron Content (TEC) over the African region. The TEC data were derived from radio occultation measurements done by the Constellation Observing System for Meteorology, Ionosphere, and Climate (COSMIC) satellites. Geomagnetically quiet time (Kp  −20 nT) data during the years 2008–2011, and 2013–2017 were binned according to local time, seasons, solar flux level, geographic longitude, and dip latitude. Cubic B splines were fitted to the binned data to obtain the model. The model was validated using TEC data of the years 2012 and 2018. The validation exercise revealed that, approximation of observed TEC data by our model produces root mean squared error of 4.8 TECU. Moreover, the modeled TEC data correlated highly with the observed TEC data (r = 0.93). Our model is the first attempt to predict TECs over the entire African region by using extensive COSMIC TEC measurements. Due to the extensive input data and the good modeling technique, we were able to reproduce the well-known features such as local time, seasonal, solar activity, and spatial variations of TEC over the African region.


2019 ◽  
Vol 97 (9) ◽  
pp. 3832-3844 ◽  
Author(s):  
Amir Aliakbari ◽  
Alireza Ehsani ◽  
Rasoul Vaez Torshizi ◽  
Peter Løvendahl ◽  
Hadi Esfandyari ◽  
...  

Abstract In recent years, metabolomics has been used to clarify the biology underlying biological samples. In the field of animal breeding, investigating the magnitude of genetic control on the metabolomic profiles of animals and their relationships with quantitative traits adds valuable information to animal improvement schemes. In this study, we analyzed metabolomic features (MFs) extracted from the metabolomic profiles of 843 male Holstein calves. The metabolomic profiles were obtained using nuclear magnetic resonance (NMR) spectroscopy. We investigated 2 alternative methods to control for peak shifts in the NMR spectra, binning and aligning, to determine which approach was the most efficient for assessing genetic variance. Series of univariate analyses were implemented to elucidate the heritability of each MF. Furthermore, records on BW and ADG from 154 to 294 d of age (ADG154–294), 294 to 336 d of age (ADG294–336), and 154 to 336 d of age (ADG154–336) were used in a series of bivariate analyses to establish the genetic and phenotypic correlations with MFs. Bivariate analyses were only performed for MFs that had a heritability significantly different from zero. The heritabilities obtained in the univariate analyses for the MFs in the binned data set were low (<0.2). In contrast, in the aligned data set, we obtained moderate heritability (0.2 to 0.5) for 3.5% of MFs and high heritability (more than 0.5) for 1% of MFs. The bivariate analyses showed that ~12%, ~3%, ~9%, and ~9% of MFs had significant additive genetic correlations with BW, ADG154–294, ADG294–336, and ADG154–336, respectively. In all of the bivariate analyses, the percentage of significant additive genetic correlations was higher than the percentage of significant phenotypic correlations of the corresponding trait. Our results provided insights into the influence of the underlying genetic mechanisms on MFs. Further investigations in this field are needed for better understanding of the genetic relationship among the MFs and quantitative traits.


Galaxies ◽  
2019 ◽  
Vol 7 (2) ◽  
pp. 62
Author(s):  
Bernd Schleicher ◽  
Axel Arbet-Engels ◽  
Dominik Baack ◽  
Matteo Balbo ◽  
Adrian Biland ◽  
...  

Active Galactic Nuclei emit radiation over the whole electromagnetic spectrum up to TeV energies. Blazars are one subtype with their jets pointing towards the observer. One of their typical features is extreme variability on timescales, from minutes to years. The fractional variability is an often used parameter for investigating the degree of variability of a light curve. Different detection methods and sensitivities of the instruments result in differently binned data and light curves with gaps. As they can influence the physics interpretation of the broadband variability, the effects of these differences on the fractional variability need to be studied. In this paper, we study the systematic effects of completeness in time coverage and the sampling rate. Using public data from instruments monitoring blazars in various energy ranges, we study the variability of the bright TeV blazars Mrk 421 and Mrk 501 over the electromagnetic spectrum, taking into account the systematic effects, and compare our findings with previous results. Especially in the TeV range, the fractional variability is higher than in previous studies, which can be explained by the much longer (seven years compared to few weeks) and more complete data sample.


2019 ◽  
Vol 12 (1) ◽  
pp. 31 ◽  
Author(s):  
Thomas Fischer ◽  
Christopher Krauss ◽  
Alexander Deinert

Machine learning research has gained momentum—also in finance. Consequently, initial machine-learning-based statistical arbitrage strategies have emerged in the U.S. equities markets in the academic literature, see e.g., Takeuchi and Lee (2013); Moritz and Zimmermann (2014); Krauss et al. (2017). With our paper, we pose the question how such a statistical arbitrage approach would fare in the cryptocurrency space on minute-binned data. Specifically, we train a random forest on lagged returns of 40 cryptocurrency coins, with the objective to predict whether a coin outperforms the cross-sectional median of all 40 coins over the subsequent 120 min. We buy the coins with the top-3 predictions and short-sell the coins with the flop-3 predictions, only to reverse the positions after 120 min. During the out-of-sample period of our backtest, ranging from 18 June 2018 to 17 September 2018, and after more than 100,000 trades, we find statistically and economically significant returns of 7.1 bps per day, after transaction costs of 15 bps per half-turn. While this finding poses a challenge to the semi-strong from of market efficiency, we critically discuss it in light of limits to arbitrage, focusing on total volume constraints of the presented intraday-strategy.


2019 ◽  
Vol 214 ◽  
pp. 06035
Author(s):  
Benjamin Edward Krikler ◽  
Olivier Davignon ◽  
Lukasz Kreczko ◽  
Jacob Linacre ◽  
Emmanuel Olatunji Olaiya ◽  
...  

Binned data frames are a generalisation of multi-dimensional histograms, represented in a tabular format with one category per row containing the labels, bin contents, uncertainties and so on. Pandas is an industry-standard tool, which provides a data frame implementation complete with routines for data frame manipultion, persistency, visualisation, and easy access to “big data” scientific libraries and machine learning tools. FAST (the Faster Analysis Software Taskforce) has developed a generic approach for typical binned HEP analyses, driving the summary of ROOT Trees to multiple binned DataFrames with a yaml-based analysis description. Using Continuous Integration to run subsets of the analysis, we can monitor and test changes to the analysis itself, and deploy documentation automatically. This report describes this approach using examples from a public CMS tutorial and details the benefit over traditional methods.


Sign in / Sign up

Export Citation Format

Share Document