The Generality of the Underlying Dimensions of the ODDI Continuing Learning Inventory

1989 ◽  
Vol 40 (1) ◽  
pp. 43-51 ◽  
Author(s):  
Jack E. Six

The purpose of this study was to determine whether or not the three empirically derived factors of the Oddi Continuing Learning Inventory (OCLI) remain hardy across study samples. Data analysis involved generating pairs of factor scores for one data set by using the factor-score coefficients of two other data sets and then correlating the pairs of factor scores. High positive correlations (r ≥ .93) indicated the three derived factors in the rotated factor structure matrix matched those reported by Oddi (1984). The results suggest the underlying dimensions of the OCLI do not break up under different study conditions.

Endocrinology ◽  
2019 ◽  
Vol 160 (10) ◽  
pp. 2395-2400 ◽  
Author(s):  
David J Handelsman ◽  
Lam P Ly

Abstract Hormone assay results below the assay detection limit (DL) can introduce bias into quantitative analysis. Although complex maximum likelihood estimation methods exist, they are not widely used, whereas simple substitution methods are often used ad hoc to replace the undetectable (UD) results with numeric values to facilitate data analysis with the full data set. However, the bias of substitution methods for steroid measurements is not reported. Using a large data set (n = 2896) of serum testosterone (T), DHT, estradiol (E2) concentrations from healthy men, we created modified data sets with increasing proportions of UD samples (≤40%) to which we applied five different substitution methods (deleting UD samples as missing and substituting UD sample with DL, DL/√2, DL/2, or 0) to calculate univariate descriptive statistics (mean, SD) or bivariate correlations. For all three steroids and for univariate as well as bivariate statistics, bias increased progressively with increasing proportion of UD samples. Bias was worst when UD samples were deleted or substituted with 0 and least when UD samples were substituted with DL/√2, whereas the other methods (DL or DL/2) displayed intermediate bias. Similar findings were replicated in randomly drawn small subsets of 25, 50, and 100. Hence, we propose that in steroid hormone data with ≤40% UD samples, substituting UD with DL/√2 is a simple, versatile, and reasonably accurate method to minimize left censoring bias, allowing for data analysis with the full data set.


SPE Journal ◽  
2017 ◽  
Vol 23 (03) ◽  
pp. 719-736 ◽  
Author(s):  
Quan Cai ◽  
Wei Yu ◽  
Hwa Chi Liang ◽  
Jenn-Tai Liang ◽  
Suojin Wang ◽  
...  

Summary The oil-and-gas industry is entering an era of “big data” because of the huge number of wells drilled with the rapid development of unconventional oil-and-gas reservoirs during the past decade. The massive amount of data generated presents a great opportunity for the industry to use data-analysis tools to help make informed decisions. The main challenge is the lack of the application of effective and efficient data-analysis tools to analyze and extract useful information for the decision-making process from the enormous amount of data available. In developing tight shale reservoirs, it is critical to have an optimal drilling strategy, thereby minimizing the risk of drilling in areas that would result in low-yield wells. The objective of this study is to develop an effective data-analysis tool capable of dealing with big and complicated data sets to identify hot zones in tight shale reservoirs with the potential to yield highly productive wells. The proposed tool is developed on the basis of nonparametric smoothing models, which are superior to the traditional multiple-linear-regression (MLR) models in both the predictive power and the ability to deal with nonlinear, higher-order variable interactions. This data-analysis tool is capable of handling one response variable and multiple predictor variables. To validate our tool, we used two real data sets—one with 249 tight oil horizontal wells from the Middle Bakken and the other with 2,064 shale gas horizontal wells from the Marcellus Shale. Results from the two case studies revealed that our tool not only can achieve much better predictive power than the traditional MLR models on identifying hot zones in the tight shale reservoirs but also can provide guidance on developing the optimal drilling and completion strategies (e.g., well length and depth, amount of proppant and water injected). By comparing results from the two data sets, we found that our tool can achieve model performance with the big data set (2,064 Marcellus wells) with only four predictor variables that is similar to that with the small data set (249 Bakken wells) with six predictor variables. This implies that, for big data sets, even with a limited number of available predictor variables, our tool can still be very effective in identifying hot zones that would yield highly productive wells. The data sets that we have access to in this study contain very limited completion, geological, and petrophysical information. Results from this study clearly demonstrated that the data-analysis tool is certainly powerful and flexible enough to take advantage of any additional engineering and geology data to allow the operators to gain insights on the impact of these factors on well performance.


2003 ◽  
Vol 3 (4) ◽  
pp. 3625-3657
Author(s):  
M. Seifert ◽  
J. Ström ◽  
R. Krejci ◽  
A. Minikin ◽  
A. Petzold ◽  
...  

Abstract. In situ measurements of the partitioning of aerosol particles within cirrus clouds were used to investigate aerosol-cloud interactions in ice clouds. The number density of interstitial aerosol particles (non-activated particles in between the cirrus crystals) was compared to the number density of cirrus crystal residuals. The data was obtained during the two INCA (Interhemispheric Differences in Cirrus Properties form Anthropogenic Emissions) campaigns, performed in the Southern Hemisphere (SH) and Northern Hemisphere (NH) midlatitudes. Different aerosol-cirrus interactions can be linked to the different stages of the cirrus lifecycle. Cloud formation is linked to positive correlations between the number density of interstitial aerosol (Nint) and crystal residuals (Ncvi), whereas the correlations are smaller or even negative in a dissolving cloud. Unlike warm clouds, where the number density of cloud droplets is positively related to the aerosol number density, we observed a rather complex relationship when expressing Ncvi as a function of Nint for forming clouds. The data sets are similar in that they both show local maxima in the Nint range 100 to 200 cm−3, where the SH-maximum is shifted towards the higher value. For lower number densities Nint and Ncvi are positively related. The slopes emerging from the data suggest that a tenfold increase in the aerosol number density corresponds to a 3 to 4 times increase in the crystal number density. As Nint increases beyond the ca. 100 to 200 cm−3, the mean crystal number density decreases at about the same rate for both data sets. For much higher aerosol number densities, only present in the NH data set, the mean Ncvi remains low. The situation for dissolving clouds presents two alternative interactions between aerosols and cirrus. Either evaporating clouds are associated with a source of aerosol particles, or air pollution (high aerosol number density) retards evaporation rates.


1986 ◽  
Vol 34 (4) ◽  
pp. 535 ◽  
Author(s):  
RH Crozier ◽  
P Pamilo ◽  
RW Taylor ◽  
YC Crozier

Genic and morphological variation were compared for 17 putative Rhytidoponera species and a species of the related genus Heteroponera, by use of an allozyme data set and one based on morphometric, surface sculpture, and pilosity characters. Each data set was considered in three versions: the raw data, principal factor scores normalized to the appropriate eigen vectors, and these scores range-coded. The agreement between these data sets, and similar sets derived from published vertebrate studies, was gauged by means of correlation coefficients between distance matrices based on them, calculated by a jack-knife procedure. In all cases, the raw allozyme data sets gave the highest correlation with the morphological sets, but none of the treatments of the morphological data was clearly superior in this regard to the others. For the ant data, congruence between the two types of data was also examined by comparing the branching orders of dendrograms (Wagner and REML), by a new test employing distributions based on the differences between randomly generated branching orders and a reference dendrogram. According to this test, the morphological dendrogram based on range-coded principal-factor scores was significantly more similar to that derived from the raw allozyme data than were those based on the other two treatments of the data. Differences in chromosome number do not correlate well with genic and morphological ones, which indicates that the speed of karyotype change in this genus has been highly variable. Some OTUS showed duplicate-locus expression for IDH, and clustering in the allozyme- based dendrograms occurred on the basis of IDH duplicate-locus expression pattern. The two 'victoriae' populations studied cluster closely with the metallica group on the morphology-based dendrograms, in agreement with conventional views that 'victoriae' is very close to 'metallica', but diverge markedly when allozymes are considered, which indicates that the morphological resemblance is probably due to convergence. The large genetic distance between these 'victoriae' populations indicates the likely presence of sibling species. R. 'tasrnaniensis' populations, in contrast, cluster strongly with 'metallica' in both morphology- and allozyme-based dendrograms. The marked divergence of scabra from other large species in the allozyme- based dendrograms indicates that its large body size has been derived independently.


2020 ◽  
Vol 26 (6) ◽  
pp. 576-586 ◽  
Author(s):  
Andrew M. Kiselica ◽  
Troy A. Webber ◽  
Jared F. Benge

AbstractObjective:The goals of this study were to (1) specify the factor structure of the Uniform Dataset 3.0 neuropsychological battery (UDS3NB) in cognitively unimpaired older adults, (2) establish measurement invariance for this model, and (3) create a normative calculator for factor scores.Methods:Data from 2520 cognitively intact older adults were submitted to confirmatory factor analyses and invariance testing across sex, age, and education. Additionally, a subsample of this dataset was used to examine invariance over time using 1-year follow-up data (n = 1061). With the establishment of metric invariance of the UDS3NB measures, factor scores could be extracted uniformly for the entire normative sample. Finally, a calculator was created for deriving demographically adjusted factor scores.Results:A higher order model of cognition yielded the best fit to the data χ2(47) = 385.18, p < .001, comparative fit index = .962, Tucker-Lewis Index = .947, root mean square error of approximation = .054, and standardized root mean residual = .036. This model included a higher order general cognitive abilities factor, as well as lower order processing speed/executive, visual, attention, language, and memory factors. Age, sex, and education were significantly associated with factor score performance, evidencing a need for demographic correction when interpreting factor scores. A user-friendly Excel calculator was created to accomplish this goal and is available in the online supplementary materials.Conclusions:The UDS3NB is best characterized by a higher order factor structure. Factor scores demonstrate at least metric invariance across time and demographic groups. Methods for calculating these factors scores are provided.


2019 ◽  
Author(s):  
Y-h. Taguchi

AbstractMultiomics data analysis is the central issue of genomics science. In spite of that, there are not well defined methods that can integrate multomics data sets, which are formatted as matrices with different sizes. In this paper, I propose the usage of tensor decomposition based unsupervised feature extraction as a data mining tool for multiomics data set. It can successfully integrate miRNA expression, mRNA expression and proteome, which were used as a demonstration example of DIABLO that is the recently proposed advanced method for the integrated analysis of multiomics data set.


2021 ◽  
Vol 13 (20) ◽  
pp. 11459
Author(s):  
Szu-Chuang Li ◽  
Yi-Wen Chen ◽  
Yennun Huang

The development of big data analysis technologies has changed how organizations work. Tech giants, such as Google and Facebook, are well positioned because they possess not only big data sets but also the in-house capability to analyze them. For small and medium-sized enterprises (SMEs), which have limited resources, capacity, and a relatively small collection of data, the ability to conduct data analysis collaboratively is key. Personal data protection regulations have become stricter due to incidents of private data being leaked, making it more difficult for SMEs to perform interorganizational data analysis. This problem can be resolved by anonymizing the data such that reidentifying an individual is no longer a concern or by deploying technical procedures that enable interorganizational data analysis without the exchange of actual data, such as data deidentification, data synthesis, and federated learning. Herein, we compared the technical options and their compliance with personal data protection regulations from several countries and regions. Using the EU’s GDPR (General Data Protection Regulation) as the main point of reference, technical studies, legislative studies, related regulations, and government-sponsored reports from various countries and regions were also reviewed. Alignment of the technical description with the government regulations and guidelines revealed that the solutions are compliant with the personal data protection regulations. Current regulations require “reasonable” privacy preservation efforts from data controllers; potential attackers are not assumed to be experts with knowledge of the target data set. This means that relevant requirements can be fulfilled without considerably sacrificing data utility. However, the potential existence of an extremely knowledgeable adversary when the stakes of data leakage are high still needs to be considered carefully.


2015 ◽  
Vol 8 (11) ◽  
pp. 12383-12431
Author(s):  
F. Mercier ◽  
A. Chazottes ◽  
L. Barthès ◽  
C. Mallet

Abstract. This paper presents a novel approach for retrieving the vertical raindrop size distribution (DSD) profiles and vertical winds during light rain events. It consists in coupling K band Doppler spectra and ground disdrometer measurements (raindrop fluxes) in a 2-D numerical model propagating the DSD from the clouds to the ground level. The coupling is made via a 4-D-VAR data assimilation algorithm. The model is, up to now, limited to the fall of droplets under gravity, modulated by the effects of vertical winds. Since evaporation, coalescence/break-up and horizontal air motion are not taken into account, we limit the study to light, stratiform rain events in which these phenomena appear negligible. We firstly use simulated data sets (data assimilation twin experiment) to show that the algorithm is able to retrieve the DSD profiles and vertical winds. It also demonstrates the ability of the algorithm to deal with the atmospheric turbulence (broadening of the Doppler spectra) and the instrumental noise. The method is then applied to a real case study which happened in the south-west of France during the autumn 2013. The data set collected during a long, quiet event (6 h duration, rain rate between 2 and 7 mm h−1) comes from an optical disdrometer and a 24 GHz vertically pointing Doppler radar. We show that the algorithm is able to explain the observations and supplies DSD and vertical wind profiles realistic compared to what could be expected for such a rain event. A perspective for this study is to apply it to extended data sets for a more thorough validation. Other data sets would also help to parameterize more phenomena needed in the model (evaporation, coalescence/break-up) to apply the algorithm to convective rain and to evaluate the adequacy of the model's parameterization.


2021 ◽  
Author(s):  
Or Mordechay Bialik ◽  
Emilia Jarochowska ◽  
Michal Grossowicz

&lt;p&gt;Ordination is a family of multivariate exploratory data analysis methods. With the advent of high-throughput data acquisition protocols, community databases, and multiproxy studies, the use of ordination in Earth sciences has snowballed. As data management and analytical tools expand, this growing body of knowledge opens new possibilities of meta-analyses and data-mining across studies. This requires the analyses to be chosen adequately to the character of Earth science data, including pre-treatment consistent with the precision and accuracy of the variables, as well as appropriate documentation. To investigate the current situation in Earth sciences, we surveyed 174 ordination analyses in 163 publications in the fields of geochemistry, sedimentology and palaeoenvironmental reconstruction and monitoring. We focussed on studies using Principal Component Analysis (PCA), Non-Metric Multidimensional Scaling (NMDS) and Detrended Correspondence Analysis (DCA).&lt;/p&gt;&lt;p&gt;PCA was the most ubiquitous type of analysis (84%), with the other two accounting for ca. 12% each. Of 128 uses of PCA, only 5 included a test for normality, and most of these cases were not applied or documented correctly. Common problems include: (1) not providing information on the dimensions of the analysed matrix (16% cases); (2) using a larger number of variables than observations (24 cases); (3) not documenting the distance metric used in NMDS (55% cases); and (4) lack of information on the software used (38% cases). The majority (53%) of surveyed studies did not provide the data used for analysis at all and a further 35% provided data sets in a format that does not allow immediate, error-free reuse, e.g. as data table directly in the article text or in PDF appendix. The &amp;#8220;golden standard&amp;#8221; of placing a curated data set in an open access repository was followed only by 6 (3%) of the analyses. Among analyses which reported using code-based statistical environments such as R Software, SAS or SPSS, none provided the code that would allow reproducing the analyses.&lt;/p&gt;&lt;p&gt;Geochemical and Earth science data sets require expert knowledge which should support analytical decisions and interpretations. Data analysis skills attract students to Earth sciences study programmes and offer a viable research alternative when field- or lab-based work is limited. However, many study curricula and publishing process have not yet endorsed this methodological progress, leading to situations where mentors, reviewers and editors cannot offer quality assurance for the use of ordination methods. We provide a review of solutions and annotated R Software code for PCA, NMDA and DCA of geochemical data sets in the freeware R Software environment, encouraging the community to reuse and further develop a reproducible ordination workflow.&lt;/p&gt;


Geophysics ◽  
2021 ◽  
pp. 1-103
Author(s):  
Jiho Park ◽  
Jihun Choi ◽  
Soon Jee Seol ◽  
Joongmoo Byun ◽  
Young Kim

Deep learning (DL) methods are recently introduced for seismic signal processing. Using DL methods, many researchers have adopted these novel techniques in an attempt to construct a DL model for seismic data reconstruction. The performance of DL-based methods depends heavily on what is learned from the training data. We focus on constructing the DL model that well reflect the features of target data sets. The main goal is to integrate DL with an intuitive data analysis approach that compares similar patterns prior to the DL training stage. We have developed a two-sequential method consisting of two stage: (i) analyzing training and target data sets simultaneously for determining target-informed training set and (ii) training the DL model with this training data set to effectively interpolate the seismic data. Here, we introduce the convolutional autoencoder t-distributed stochastic neighbor embedding (CAE t-SNE) analysis that can provide the insight into the results of interpolation through the analysis of both the training and target data sets prior to DL model training. The proposed method were tested with synthetic and field data. Dense seismic gathers (e.g. common-shot gathers; CSGs) were used as a labeled training data set, and relatively sparse seismic gather (e.g. common-receiver gathers; CRGs) were reconstructed in both cases. The reconstructed results and SNRs demonstrated that the training data can be efficiently selected using CAE t-SNE analysis and the spatial aliasing of CRGs was successfully alleviated by the trained DL model with this training data, which contain target features. These results imply that the data analysis for selecting target-informed training set is very important for successful DL interpolation. Additionally, the proposed analysis method can also be applied to investigate the similarities between training and target data sets for another DL-based seismic data reconstruction tasks.


Sign in / Sign up

Export Citation Format

Share Document