scholarly journals Discriminating Between Second-Order Model With/Without Interaction Base on Central Tendency Estimation

2021 ◽  
Vol 4 (3) ◽  
pp. 47-63
Author(s):  
Owhondah P.S. ◽  
Enegesele D. ◽  
Biu O.E. ◽  
Wokoma D.S.A.

The study deals with discriminating between the second-order models with/without interaction on central tendency estimation using the ordinary least square (OLS) method for the estimation of the model parameters. The paper considered two different sets of data (small and large) sample size. The small sample size used data of unemployment rate as a response, inflation rate and exchange rate as the predictors from 2007 to 2018 and the large sample size was data of flow-rate on hydrate formation for Niger Delta deep offshore field. The〖 R〗^2, AIC, SBC, and SSE were computed for both data sets to test for adequacy of the models. The results show that all three models are similar for smaller data set while for large data set the second-order model centered on the median with/without interaction is the best base on the number of significant parameters. The model’s selection criterion values (R^2, AIC, SBC, and SSE) were found to be equal for models centered on median and mode for both large and small data sets. However, the model centered on median and mode with/without interaction were better than the model centered on the mean for large data sets. This study shows that the second-order regression model centered on median and mode are better than the model centered on the mean for large data set, while they are similar for smaller data set. Hence, the second-order regression model centered on median and mode with or without interaction are better than the second-order regression model centered on the mean.

2019 ◽  
Vol 8 (2S11) ◽  
pp. 3523-3526

This paper describes an efficient algorithm for classification in large data set. While many algorithms exist for classification, they are not suitable for larger contents and different data sets. For working with large data sets various ELM algorithms are available in literature. However the existing algorithms using fixed activation function and it may lead deficiency in working with large data. In this paper, we proposed novel ELM comply with sigmoid activation function. The experimental evaluations demonstrate the our ELM-S algorithm is performing better than ELM,SVM and other state of art algorithms on large data sets.


Author(s):  
Brian Hoeschen ◽  
Darcy Bullock ◽  
Mark Schlappi

Historically, stopped delay was used to characterize the operation of intersection movements because it was relatively easy to measure. During the past decade, the traffic engineering community has moved away from using stopped delay and now uses control delay. That measurement is more precise but quite difficult to extract from large data sets if strict definitions are used to derive the data. This paper evaluates two procedures for estimating control delay. The first is based on a historical approximation that control delay is 30% larger than stopped delay. The second is new and based on segment delay. The procedures are applied to a diverse data set collected in Phoenix, Arizona, and compared with control delay calculated by using the formal definition. The new approximation was observed to be better than the historical stopped delay procedure; it provided an accurate prediction of control delay. Because it is an approximation, this methodology would be most appropriately applied to large data sets collected from travel time studies for ranking and prioritizing intersections for further analysis.


2020 ◽  
Vol 11 (4) ◽  
pp. 1013-1031
Author(s):  
Fabian von Trentini ◽  
Emma E. Aalbers ◽  
Erich M. Fischer ◽  
Ralf Ludwig

Abstract. For sectors like agriculture, hydrology and ecology, increasing interannual variability (IAV) can have larger impacts than changes in the mean state, whereas decreasing IAV in winter implies that the coldest seasons warm more than the mean. IAV is difficult to reliably quantify in single realizations of climate (observations and single-model realizations) as they are too short, and represent a combination of external forcing and IAV. Single-model initial-condition large ensembles (SMILEs) are powerful tools to overcome this problem, as they provide many realizations of past and future climate and thus a larger sample size to robustly evaluate and quantify changes in IAV. We use three SMILE-based regional climate models (CanESM-CRCM, ECEARTH-RACMO and CESM-CCLM) to investigate downscaled changes in IAV of summer and winter temperature and precipitation, the number of heat waves, and the maximum length of dry periods over Europe. An evaluation against the observational data set E-OBS reveals that all models reproduce observational IAV reasonably well, although both under- and overestimation of observational IAV occur in all models in a few cases. We further demonstrate that SMILEs are essential to robustly quantify changes in IAV since some individual realizations show significant IAV changes, whereas others do not. Thus, a large sample size, i.e., information from all members of SMILEs, is needed to robustly quantify the significance of IAV changes. Projected IAV changes in temperature over Europe are in line with existing literature: increasing variability in summer and stable to decreasing variability in winter. Here, we further show that summer and winter precipitation, as well as the two summer extreme indicators mostly also show these seasonal changes.


Paleobiology ◽  
1987 ◽  
Vol 13 (1) ◽  
pp. 100-107 ◽  
Author(s):  
Carl F. Koch

Few paleontological studies of species distribution in time and space have adequately considered the effects of sample size. Most species occur very infrequently, and therefore sample size effects may be large relative to the faunal patterns reported. Examination of 10 carefully compiled large data sets (each more than 1,000 occurrences) reveals that the species-occurrence frequency distribution of each fits the log series distribution well and therefore sample size effects can be predicted. Results show that, if the materials used in assembling a large data set are resampled, as many as 25% of the species will not be found a second time even if both samples are of the same size. If the two samples are of unequal size, then the larger sample may have as many as 70% unique species and the smaller sample no unique species. The implications of these values are important to studies of species richness, origination, and extinction patterns, and biogeographic phenomena such as endemism or province boundaries. I provide graphs showing the predicted sample size effects for a range of data set size, species richness, and relative data size. For data sets that do not fit the log series distribution well, I provide example calculations and equations which are usable without a large computer. If these graphs or equations are not used, then I suggest that species which occur infrequently be eliminated from consideration. Studies in which sample size effects are not considered should include sample size information in sufficient detail that other workers might make their own evaluation of observed faunal patterns.


Author(s):  
MICHEL BRUYNOOGHE

The clustering of large data sets is of great interest in fields such as pattern recognition, numerical taxonomy, image or speech processing. The traditional Ascendant Hierarchical Algorithm (AHC) cannot be run for sets of more than a few thousand elements. The reducible neighborhoods clustering algorithm, which is presented in this paper, has overtaken the limits of the traditional hierarchical clustering algorithm by generating an exact hierarchy on a large data set. The theoretical justification of this algorithm is the so-called Bruynooghe reducibility principle, that lays down the condition under which the exact hierarchy may be constructed locally, by carrying out aggregations in restricted regions of the representation space. As for the Day and Edelsbrunner algorithm, the maximum theoretical time complexity of the reducible neighborhoods clustering algorithm is O(n2 log n), regardless of the chosen clustering strategy. But the reducible neighborhoods clustering algorithm uses the original data table and its practical performances are by far better than Day and Edelsbrunner’s algorithm, thus allowing the hierarchical clustering of large data sets, i.e. composed of more than 10 000 objects.


2021 ◽  
Vol 99 (Supplement_1) ◽  
pp. 218-219
Author(s):  
Andres Fernando T Russi ◽  
Mike D Tokach ◽  
Jason C Woodworth ◽  
Joel M DeRouchey ◽  
Robert D Goodband ◽  
...  

Abstract The swine industry has been constantly evolving to select animals with improved performance traits and to minimize variation in body weight (BW) in order to meet packer specifications. Therefore, understanding variation presents an opportunity for producers to find strategies that could help reduce, manage, or deal with variation of pigs in a barn. A systematic review and meta-analysis was conducted by collecting data from multiple studies and available data sets in order to develop prediction equations for coefficient of variation (CV) and standard deviation (SD) as a function of BW. Information regarding BW variation from 16 papers was recorded to provide approximately 204 data points. Together, these data included 117,268 individually weighed pigs with a sample size that ranged from 104 to 4,108 pigs. A random-effects model with study used as a random effect was developed. Observations were weighted using sample size as an estimate for precision on the analysis, where larger data sets accounted for increased accuracy in the model. Regression equations were developed using the nlme package of R to determine the relationship between BW and its variation. Polynomial regression analysis was conducted separately for each variation measurement. When CV was reported in the data set, SD was calculated and vice versa. The resulting prediction equations were: CV (%) = 20.04 – 0.135 × (BW) + 0.00043 × (BW)2, R2=0.79; SD = 0.41 + 0.150 × (BW) - 0.00041 × (BW)2, R2 = 0.95. These equations suggest that there is evidence for a decreasing quadratic relationship between mean CV of a population and BW of pigs whereby the rate of decrease is smaller as mean pig BW increases from birth to market. Conversely, the rate of increase of SD of a population of pigs is smaller as mean pig BW increases from birth to market.


Author(s):  
Lior Shamir

Abstract Several recent observations using large data sets of galaxies showed non-random distribution of the spin directions of spiral galaxies, even when the galaxies are too far from each other to have gravitational interaction. Here, a data set of $\sim8.7\cdot10^3$ spiral galaxies imaged by Hubble Space Telescope (HST) is used to test and profile a possible asymmetry between galaxy spin directions. The asymmetry between galaxies with opposite spin directions is compared to the asymmetry of galaxies from the Sloan Digital Sky Survey. The two data sets contain different galaxies at different redshift ranges, and each data set was annotated using a different annotation method. The results show that both data sets show a similar asymmetry in the COSMOS field, which is covered by both telescopes. Fitting the asymmetry of the galaxies to cosine dependence shows a dipole axis with probabilities of $\sim2.8\sigma$ and $\sim7.38\sigma$ in HST and SDSS, respectively. The most likely dipole axis identified in the HST galaxies is at $(\alpha=78^{\rm o},\delta=47^{\rm o})$ and is well within the $1\sigma$ error range compared to the location of the most likely dipole axis in the SDSS galaxies with $z>0.15$ , identified at $(\alpha=71^{\rm o},\delta=61^{\rm o})$ .


2020 ◽  
Vol 6 ◽  
Author(s):  
Jaime de Miguel Rodríguez ◽  
Maria Eugenia Villafañe ◽  
Luka Piškorec ◽  
Fernando Sancho Caparrini

Abstract This work presents a methodology for the generation of novel 3D objects resembling wireframes of building types. These result from the reconstruction of interpolated locations within the learnt distribution of variational autoencoders (VAEs), a deep generative machine learning model based on neural networks. The data set used features a scheme for geometry representation based on a ‘connectivity map’ that is especially suited to express the wireframe objects that compose it. Additionally, the input samples are generated through ‘parametric augmentation’, a strategy proposed in this study that creates coherent variations among data by enabling a set of parameters to alter representative features on a given building type. In the experiments that are described in this paper, more than 150 k input samples belonging to two building types have been processed during the training of a VAE model. The main contribution of this paper has been to explore parametric augmentation for the generation of large data sets of 3D geometries, showcasing its problems and limitations in the context of neural networks and VAEs. Results show that the generation of interpolated hybrid geometries is a challenging task. Despite the difficulty of the endeavour, promising advances are presented.


1995 ◽  
Vol 3 (3) ◽  
pp. 133-142 ◽  
Author(s):  
M. Hana ◽  
W.F. McClure ◽  
T.B. Whitaker ◽  
M. White ◽  
D.R. Bahler

Two artificial neural network models were used to estimate the nicotine in tobacco: (i) a back-propagation network and (ii) a linear network. The back-propagation network consisted of an input layer, an output layer and one hidden layer. The linear network consisted of an input layer and an output layer. Both networks used the generalised delta rule for learning. Performances of both networks were compared to the multiple linear regression method MLR of calibration. The nicotine content in tobacco samples was estimated for two different data sets. Data set A contained 110 near infrared (NIR) spectra each consisting of reflected energy at eight wavelengths. Data set B consisted of 200 NIR spectra with each spectrum having 840 spectral data points. The Fast Fourier transformation was applied to data set B in order to compress each spectrum into 13 Fourier coefficients. For data set A, the linear regression model gave better results followed by the back-propagation network which was followed by the linear network. The true performance of the linear regression model was better than the back-propagation and the linear networks by 14.0% and 18.1%, respectively. For data set B, the back-propagation network gave the best result followed by MLR and the linear network. Both the linear network and MLR models gave almost the same results. The true performance of the back-propagation network model was better than the MLR and linear network by 35.14%.


2006 ◽  
Vol 39 (2) ◽  
pp. 262-266 ◽  
Author(s):  
R. J. Davies

Synchrotron sources offer high-brilliance X-ray beams which are ideal for spatially and time-resolved studies. Large amounts of wide- and small-angle X-ray scattering data can now be generated rapidly, for example, during routine scanning experiments. Consequently, the analysis of the large data sets produced has become a complex and pressing issue. Even relatively simple analyses become difficult when a single data set can contain many thousands of individual diffraction patterns. This article reports on a new software application for the automated analysis of scattering intensity profiles. It is capable of batch-processing thousands of individual data files without user intervention. Diffraction data can be fitted using a combination of background functions and non-linear peak functions. To compliment the batch-wise operation mode, the software includes several specialist algorithms to ensure that the results obtained are reliable. These include peak-tracking, artefact removal, function elimination and spread-estimate fitting. Furthermore, as well as non-linear fitting, the software can calculate integrated intensities and selected orientation parameters.


Sign in / Sign up

Export Citation Format

Share Document