scholarly journals A tribenzylidenemethane–tantalum compound: some experiences with `inversion twinning'

1996 ◽  
Vol 52 (3) ◽  
pp. 465-470 ◽  
Author(s):  
W. P. Schaefer ◽  
R. E. Marsh ◽  
G. Rodriguez ◽  
G. C. Bazan

The six-electron-donating ligand tribenzylidenemethandiide has been used to form a tantalum (group 5) mimic, (η 5-cyclopentadienyl)(η 4-tribenzylidenemethandiide)dimethyltantalum, of a group 4 bent metallocene. The material crystallizes with two molecules in the asymmetric unit with quite different packing arrangements, although the overall structures of the two are similar. The Cp and methyl ligands are disordered about a threefold axis. Crystal data: [Ta{(C7H6)3C}(C5H5)(CH3), trigonal P31c, with a = 12.681 (3), c = 16.124 (5) Å, V = 2245.5 (7) Å3, T = 293 K, Z = 4, Mr = 558.47, Dx = 1.65 g cm−3, F(000) = 1104, Mo Kα, λ = 0.71073 Å, μ = 4.91 mm−1, R = 0.020 for 1319 reflections with Fo > 4σ(F o); S = 2.18. Because of crystal decay, three separate crystals were needed for a full data set. These polar (but achiral) crystals showed apparently differing amounts of inversion twinning, leading to problems in accurately merging the three data sets and refining the structure. These problems are discussed briefly.

Endocrinology ◽  
2019 ◽  
Vol 160 (10) ◽  
pp. 2395-2400 ◽  
Author(s):  
David J Handelsman ◽  
Lam P Ly

Abstract Hormone assay results below the assay detection limit (DL) can introduce bias into quantitative analysis. Although complex maximum likelihood estimation methods exist, they are not widely used, whereas simple substitution methods are often used ad hoc to replace the undetectable (UD) results with numeric values to facilitate data analysis with the full data set. However, the bias of substitution methods for steroid measurements is not reported. Using a large data set (n = 2896) of serum testosterone (T), DHT, estradiol (E2) concentrations from healthy men, we created modified data sets with increasing proportions of UD samples (≤40%) to which we applied five different substitution methods (deleting UD samples as missing and substituting UD sample with DL, DL/√2, DL/2, or 0) to calculate univariate descriptive statistics (mean, SD) or bivariate correlations. For all three steroids and for univariate as well as bivariate statistics, bias increased progressively with increasing proportion of UD samples. Bias was worst when UD samples were deleted or substituted with 0 and least when UD samples were substituted with DL/√2, whereas the other methods (DL or DL/2) displayed intermediate bias. Similar findings were replicated in randomly drawn small subsets of 25, 50, and 100. Hence, we propose that in steroid hormone data with ≤40% UD samples, substituting UD with DL/√2 is a simple, versatile, and reasonably accurate method to minimize left censoring bias, allowing for data analysis with the full data set.


Author(s):  
Zhiguo Bao ◽  
Shuyu Wang

For hedge funds, return prediction has always been a fundamental and important problem. Usually, a good return prediction model directly determines the performance of a quantitative investment strategy. However, the performance of the model will be influenced by the market-style. Even the models trained through the same data set, their performance is different in different market-styles. Traditional methods hope to train a universal linear or nonlinear model on the data set to cope with different market-styles. However, the linear model has limited fitting ability and is insufficient to deal with hundreds of features in the hedge fund features pool. The nonlinear model has a risk to be over-fitting. Simultaneously, changes in market-style will make certain features valid or invalid, and a traditional linear or nonlinear model is not sufficient to deal with this situation. This thesis proposes a method based on Reinforcement Learning that automatically discriminates market-styles and automatically selects the model that best fits the current market-style from sub-models pre-trained with different categories of features to predict the return of stocks. Compared with the traditional method that training return prediction model directly through the full data sets, the experiment shows that the proposed method has a better performance, which has a higher Sharpe ratio and annualized return.


2016 ◽  
Vol 9 (1) ◽  
pp. 60-69
Author(s):  
Robert M. Zink

It is sometimes said that scientists are entitled to their own opinions but not their own set of facts. This suggests that application of the scientific method ought to lead to a single conclusion from a given set of data. However, sometimes scientists have conflicting opinions about which analytical methods are most appropriate or which subsets of existing data are most relevant, resulting in different conclusions. Thus, scientists might actually lay claim to different sets of facts. However, if a contrary conclusion is reached by selecting a subset of data, this conclusion should be carefully scrutinized to determine whether consideration of the full data set leads to different conclusions. This is important because conservation agencies are required to consider all of the best available data and make a decision based on them. Therefore, exploring reasons why different conclusions are reached from the same body of data has relevance for management of species. The purpose of this paper was to explore how two groups of researchers can examine the same data and reach opposite conclusions in the case of the taxonomy of the endangered subspecies Southwestern Willow Flycatcher (Empidonax traillii extimus). It was shown that use of subsets of data and characters rather than reliance on entire data sets can explain conflicting conclusions. It was recommend that agencies tasked with making conservation decisions rely on analyses that include all relevant molecular, ecological, behavioral, and morphological data, which in this case show that the subspecies is not valid, and hence its listing is likely not warranted.


Blood ◽  
2006 ◽  
Vol 108 (11) ◽  
pp. 5415-5415 ◽  
Author(s):  
Alexander H. Schmidt ◽  
Andrea Stahr ◽  
Daniel Baier ◽  
Gerhard Ehninger ◽  
Claudia Rutt

Abstract In strategic stem cell donor registry planning, it is of special importance to decide how to type newly registered donors. This question refers to both the selection of HLA loci and the resolution (low, intermediate, or high) of HLA typings. In principle, high-resolution typings of all transplant-relevant loci are preferable. However, cost considerations generally lead to incomplete typings (only selected HLA loci with low or intermediate typing resolution) in practice. Here, we present results of a project in which newly recruited donors are typed for the HLA-A, -B, -C, and -DRB1 loci with high resolution by sequencing. Efficiency of these typings is measured by subsequent requests for confirmatory typings (CTs) and stem cell donations. Results for donors who were included in the project (Donor Group A) are compared to requests for donors with other, less complete typing levels: HLA-A and HLA-B at intermediate resolution, HLA-DRB1 at high resolution (Group B); HLA-A, -B, -C, and -DRB1 at intermediate resolution (Group C); HLA-A, -B, and -DRB1 at intermediate resolution (Group D). All data are taken from the donor file of DKMS German Bone Marrow Donor Center. Since the four groups differ considerably regarding their age and sex distributions, calculations are also carried through for restricted data sets that include only male donors up to age 25. Results are shown in Table 1. Donors of Groups A and B have similar CT request frequencies of 5.90 and 5.92 requests per 100 donors per year in the resctricted data sets, respectively. These frequencies significantly exceed the corresponding frequencies of the other groups with less complete typing levels. For donation requests, the frequency is signifcantly higher for Group A than for Group B (restricted data sets): 1.45 vs 1.02 requests per donor per year (p<0.05). Obviously, the additional HLA information for Group A donors leads to a higher ratio between donations and CT requests. Again, figures are much lower for Groups C and D. These results are based on a high number of requests even for the restricted data sets, namely between 44 and 90 donation requests and between 227 and 619 CT requests per group. Our results show that full (HLA-A, -B, -C, and -DRB1) high-resolution typings at donor recruitment lead to significantly higher probabilities for donation requests. Donor centers and registries should carefully take into account these higher probabilities when they consider full high-resolution typings for newly recruited donors. However, the final decision regarding the typing strategy at recruitment must also depend on the individual cost structure of a donor center or registry. The presented results are based on a donor file that consists mainly (≈99%) of Caucasian donors. It should be subject to further analyses if these results also apply to other, more heterogeneous donor pools. Table 1: Requests per 100 donors per year by donor group CT requests Donation requests Donor Group Full data set Only male donors≤ 25 Full data set Only male donors≤ 25 A 5.14 5.90 1.45 1.45 B 4.60 5.92 0.84 1.02 C 2.50 3.03 0.58 0.67 D 2.36 2.80 0.38 0.48


2021 ◽  
Author(s):  
Ruben van de Vijver ◽  
Emmanuel Uwambayinema ◽  
Yu-Ying Chuang

How do speakers comprehend and produce complex words? In the theory ofthe Discriminative Lexicon this is hypothesized to be the results of mapping the phonology of whole word forms onto their semantics and vice versa, without recourse to morphemes. This raises the question whether this hypothesis also holds true in highly agglutinative languages, which are oǒten seen to exemplify the compositional nature of morphology. On the one hand, one could expect that the hypothesis for agglutinative languages is correct, since it remains unclear whether speakers are able to isolate the morphemes they need to achieve this. On the other hand, agglutinative languages have so many different words that it is not obvious how speakers can use their knowledge of words to comprehend and produce them.In this paper, we investigate comprehension and production of verbs in Kinyarwanda,an agglutinative Bantu language, by means of computational modeling within the theDiscriminative Lexicon, a theory of the mental lexicon, which is grounded in word andparadigm morphology, distributional semantics, error-driven learning, and uses insightsof psycholinguistic theories, and is implemented mathematically and computationallyas a shallow, two-layered network.In order to do this, we compiled a data set of 11528 verb forms and annotated for eachverb form its meaning and grammatical functions, and, additionally, we used our dataset to extract 573 verbs that are present in our full data set and for which meanings ofverbs are based on word embeddings. In order to assess comprehension and production of Kinyarwanda verbs, we fed both data sets into the Linear Discriminative Learningalgorithm, a two-layered, fully connected network. One layer represent the phonological form and the layer represents meaning. Comprehension is modeled as a mapping from phonology to meaning and production is modeled as a mapping from meaning to phonology. Both comprehension and production is learned with high accuracy in all data and in held-out data, both for the full data set, with manually annotated semantic features, and for the data set with meanings derived from word embeddings.Our findings provide support for the various hypotheses of the Discriminative Lexicon:Words are stored as wholes, meanings are a result of the distribution of words in utterances, comprehension and production can be successfully modeled from mappings from form to meaning and vice versa, which can be modeled in a shallow two-layered network, and these mappings are learned in by minimizing errors.


2021 ◽  
Vol 12 (1) ◽  
pp. 1-11
Author(s):  
Kishore Sugali ◽  
Chris Sprunger ◽  
Venkata N Inukollu

Artificial Intelligence and Machine Learning have been around for a long time. In recent years, there has been a surge in popularity for applications integrating AI and ML technology. As with traditional development, software testing is a critical component of a successful AI/ML application. The development methodology used in AI/ML contrasts significantly from traditional development. In light of these distinctions, various software testing challenges arise. The emphasis of this paper is on the challenge of effectively splitting the data into training and testing data sets. By applying a k-Means clustering strategy to the data set followed by a decision tree, we can significantly increase the likelihood of the training data set to represent the domain of the full dataset and thus avoid training a model that is likely to fail because it has only learned a subset of the full data domain.


2018 ◽  
Vol 34 (S1) ◽  
pp. 138-139
Author(s):  
Yi-Sheng Chao ◽  
Chao-Jung Wu

Introduction:Principal component analysis (PCA) is used for dimension reduction and data summary. However, principal components (PCs) cannot be easily interpreted. To interpret PCs, this study compares two methods to approximate PCs. One uses the PCA loadings to understand how input variables are projected to PCs. The other uses forward-stepwise regression to determine the proportions of PC variances explained by input variables.Methods:Two data sets derived from the Canadian Health Measures Survey (CHMS) were used to test the concept of PC approximation: a spirometry subset with the measures from the first trial of spirometry; and, full data set that contained representative variables. Variables were centered and scaled. PCA were conducted with 282 and twenty-three variables respectively. PCs were approximated with two methods.Results:The first PC (PC1) could explain 12.1 percent and 50.3 percent of total variances in respective data sets. The leading variables explained 89.6 percent and 79.0 percent of the variances of PC1 in respective data sets. It required one and two variables to explain more than 80 percent of the variances of PC1, respectively. Measures related to physical development were the leading variables to approximate PC1 and lung function variables were leading to approximate PC2 in the full data set. The leading variable to approximate PC1 of the spirometry subset were forced expiratory volume (FEV) 0.5/forced vital capacity (FVC) (percent) and FEV1/FVC (percent).Conclusions:Approximating PCs with input variables were highly feasible and helpful for the interpretation of PCs, especially for the first PCs. This method is also useful to identify major or unique sources of variances in data sets. The variables related to physical development are the variables related to the most variations in the full data set. The leading variable in the spirometry subset, FEV0.5/FVC (percent), is not well studied for its application in clinical use.


2020 ◽  
Author(s):  
Thu Nguyen ◽  
Kim L Phan ◽  
Dale F Kreitler ◽  
Lawrence C Andrews ◽  
Sandra B Gabelli ◽  
...  

AbstractOne often observes small but measurable differences in diffraction data measured from different crystals of a single protein. These differences might reflect structural differences in the protein and potentially reflect the natural dynamism of the molecule in solution. Partitioning these mixed-state data into single-state clusters is a critical step to extract information about the dynamic behavior of proteins from hundreds or thousands of single-crystal data sets. Mixed-state data can be obtained deliberately (through intentional perturbation) or inadvertently (while attempting to measure highly redundant single-crystal data). State changes may be expressed as changes in morphology, so that a subset of the polystates may be observed as polymorphs. After mixed-state data are deliberately or inadvertently measured, the challenge is to sort the data into clusters that may represent relevant biological polystates. Here we address this problem using a simple multi-factor clustering approach that classifies each data set using independent observables in order to assign each data set to the correct location in conformation space. We illustrate this method using two independent observables (unit cell constants and intensities) to cluster mixed-state data from chymotrypsinogen (ChTg) crystals. We observe that the data populate an arc of the reaction trajectory as ChTg is converted into chymotrypsin.


2010 ◽  
Vol 298 (2) ◽  
pp. E229-E236 ◽  
Author(s):  
Pooja Singal ◽  
Ranganath Muniyappa ◽  
Robin Chisholm ◽  
Gail Hall ◽  
Hui Chen ◽  
...  

After a constant insulin infusion is initiated, determination of steady-state conditions for glucose infusion rates (GIR) typically requires ≥3 h. The glucose infusion follows a simple time-dependent rise, reaching a plateau at steady state. We hypothesized that nonlinear fitting of abbreviated data sets consisting of only the early portion of the clamp study can provide accurate estimates of steady-state GIR. Data sets from two independent laboratories were used to develop and validate this approach. Accuracy of the predicted steady-state GDR was assessed using regression analysis and Altman-Bland plots, and precision was compared by applying a calibration model. In the development data set ( n = 88 glucose clamp studies), fitting the full data set with a simple monoexponential model predicted reference GDR values with good accuracy (difference between the 2 methods −0.37 mg·kg−1·min−1) and precision [root mean square error (RMSE) = 1.11], validating the modeling procedure. Fitting data from the first 180 or 120 min predicted final GDRs with comparable accuracy but with progressively reduced precision [fitGDR-180 RMSE = 1.27 ( P = NS vs. fitGDR-full); fitGDR-120 RMSE = 1.56 ( P < 0.001)]. Similar results were obtained with the validation data set ( n = 183 glucose clamp studies), confirming the generalizability of this approach. The modeling approach also derives kinetic parameters that are not available from standard approaches to clamp data analysis. We conclude that fitting a monoexponential curve to abbreviated clamp data produces steady-state GDR values that accurately predict the GDR values obtained from the full data sets, albeit with reduced precision. This approach may help reduce the resources required for undertaking clamp studies.


2020 ◽  
Vol 37 (10) ◽  
pp. 3061-3075 ◽  
Author(s):  
Veronika Boskova ◽  
Tanja Stadler

Abstract Next-generation sequencing of pathogen quasispecies within a host yields data sets of tens to hundreds of unique sequences. However, the full data set often contains thousands of sequences, because many of those unique sequences have multiple identical copies. Data sets of this size represent a computational challenge for currently available Bayesian phylogenetic and phylodynamic methods. Through simulations, we explore how large data sets with duplicate sequences affect the speed and accuracy of phylogenetic and phylodynamic analysis within BEAST 2. We show that using unique sequences only leads to biases, and using a random subset of sequences yields imprecise parameter estimates. To overcome these shortcomings, we introduce PIQMEE, a BEAST 2 add-on that produces reliable parameter estimates from full data sets with increased computational efficiency as compared with the currently available methods within BEAST 2. The principle behind PIQMEE is to resolve the tree structure of the unique sequences only, while simultaneously estimating the branching times of the duplicate sequences. Distinguishing between unique and duplicate sequences allows our method to perform well even for very large data sets. Although the classic method converges poorly for data sets of 6,000 sequences when allowed to run for 7 days, our method converges in slightly more than 1 day. In fact, PIQMEE can handle data sets of around 21,000 sequences with 20 unique sequences in 14 days. Finally, we apply the method to a real, within-host HIV sequencing data set with several thousand sequences per patient.


Sign in / Sign up

Export Citation Format

Share Document