scholarly journals Current status of genomic evaluation

2020 ◽  
Vol 98 (4) ◽  
Author(s):  
Ignacy Misztal ◽  
Daniela Lourenco ◽  
Andres Legarra

Abstract Early application of genomic selection relied on SNP estimation with phenotypes or de-regressed proofs (DRP). Chips of 50k SNP seemed sufficient for an accurate estimation of SNP effects. Genomic estimated breeding values (GEBV) were composed of an index with parent average, direct genomic value, and deduction of a parental index to eliminate double counting. Use of SNP selection or weighting increased accuracy with small data sets but had minimal to no impact with large data sets. Efforts to include potentially causative SNP derived from sequence data or high-density chips showed limited or no gain in accuracy. After the implementation of genomic selection, EBV by BLUP became biased because of genomic preselection and DRP computed based on EBV required adjustments, and the creation of DRP for females is hard and subject to double counting. Genomic selection was greatly simplified by single-step genomic BLUP (ssGBLUP). This method based on combining genomic and pedigree relationships automatically creates an index with all sources of information, can use any combination of male and female genotypes, and accounts for preselection. To avoid biases, especially under strong selection, ssGBLUP requires that pedigree and genomic relationships are compatible. Because the inversion of the genomic relationship matrix (G) becomes costly with more than 100k genotyped animals, large data computations in ssGBLUP were solved by exploiting limited dimensionality of genomic data due to limited effective population size. With such dimensionality ranging from 4k in chickens to about 15k in cattle, the inverse of G can be created directly (e.g., by the algorithm for proven and young) at a linear cost. Due to its simplicity and accuracy, ssGBLUP is routinely used for genomic selection by the major chicken, pig, and beef industries. Single step can be used to derive SNP effects for indirect prediction and for genome-wide association studies, including computations of the P-values. Alternative single-step formulations exist that use SNP effects for genotyped or for all animals. Although genomics is the new standard in breeding and genetics, there are still some problems that need to be solved. This involves new validation procedures that are unaffected by selection, parameter estimation that accounts for all the genomic data used in selection, and strategies to address reduction in genetic variances after genomic selection was implemented.


2019 ◽  
Vol 97 (Supplement_3) ◽  
pp. 52-53
Author(s):  
Ignacy Misztal

Abstract Early application of genomic selection relied on SNP estimation with phenotypes or de-regressed proofs (DRP). Chips of 50k SNP seemed sufficient. Estimated breeding value was an index with parent average and deduction to eliminate double counting. Use of SNP selection or weighting increased accuracy with small data sets but less or none with large data sets. Use of DRP with female information required ad-hoc modifications. As BLUP is biased by genomic selection, use of DRP under genomic selection required adjustments. Efforts to include potentially causative SNP derived from sequence analysis showed limited or no gain. The genomic selection was greatly simplified using single-step GBLUP (ssGBLUP) because the procedure automatically creates the index, can use any combination of male and female genotypes, and accounts for preselection. ssGBLUP requires careful scaling for compatibility between pedigree and genomic relationships to avoid biases especially under strong selection. Large data computations in ssGBLUP were solved by exploiting limited dimensionality of SNP due to limited effective population size. With such dimensionality ranging from 4k in chicken to about 15k in Holsteins, the inverse of GRM can be created directly (e.g., by the APY algorithm) in linear cost. Due to its simplicity and accuracy ssGBLUP is routinely used for genomic selection by major companies in chicken, pigs and beef. ssGBLUP can be used to derive SNP effects for indirect prediction, and for GWAS, including computations of the P-values. An alternative single-step called ssBR exists that uses SNP effects instead of GRM. As BLUP is affected by pre-selection, there is need for new validation procedures unaffected by selection, and for parameter estimation that accounts for all the genomic data used in selection. Another issue are reduced variances due to the Bulmer effect.



2013 ◽  
Vol 2013 ◽  
pp. 1-11
Author(s):  
Dewang Chen ◽  
Long Chen

In order to obtain a decent trade-off between the low-cost, low-accuracy Global Positioning System (GPS) receivers and the requirements of high-precision digital maps for modern railways, using the concept of constraint K-segment principal curves (CKPCS) and the expert knowledge on railways, we propose three practical CKPCS generation algorithms with reduced computational complexity, and thereafter more suitable for engineering applications. The three algorithms are named ALLopt, MPMopt, and DCopt, in which ALLopt exploits global optimization and MPMopt and DCopt apply local optimization with different initial solutions. We compare the three practical algorithms according to their performance on average projection error, stability, and the fitness for simple and complex simulated trajectories with noise data. It is found that ALLopt only works well for simple curves and small data sets. The other two algorithms can work better for complex curves and large data sets. Moreover, MPMopt runs faster than DCopt, but DCopt can work better for some curves with cross points. The three algorithms are also applied in generating GPS digital maps for two railway GPS data sets measured in Qinghai-Tibet Railway (QTR). Similar results like the ones in synthetic data are obtained. Because the trajectory of a railway is relatively simple and straight, we conclude that MPMopt works best according to the comprehensive considerations on the speed of computation and the quality of generated CKPCS. MPMopt can be used to obtain some key points to represent a large amount of GPS data. Hence, it can greatly reduce the data storage requirements and increase the positioning speed for real-time digital map applications.



2020 ◽  
pp. 1-11
Author(s):  
Erjia Yan ◽  
Zheng Chen ◽  
Kai Li

Citation sentiment plays an important role in citation analysis and scholarly communication research, but prior citation sentiment studies have used small data sets and relied largely on manual annotation. This paper uses a large data set of PubMed Central (PMC) full-text publications and analyzes citation sentiment in more than 32 million citances within PMC, revealing citation sentiment patterns at the journal and discipline levels. This paper finds a weak relationship between a journal’s citation impact (as measured by CiteScore) and the average sentiment score of citances to its publications. When journals are aggregated into quartiles based on citation impact, we find that journals in higher quartiles are cited more favorably than those in the lower quartiles. Further, social science journals are found to be cited with higher sentiment, followed by engineering and natural science and biomedical journals, respectively. This result may be attributed to disciplinary discourse patterns in which social science researchers tend to use more subjective terms to describe others’ work than do natural science or biomedical researchers.



2008 ◽  
Vol 20 (2) ◽  
pp. 374-382 ◽  
Author(s):  
Tobias Glasmachers ◽  
Christian Igel

Iterative learning algorithms that approximate the solution of support vector machines (SVMs) have two potential advantages. First, they allow online and active learning. Second, for large data sets, computing the exact SVM solution may be too time-consuming, and an efficient approximation can be preferable. The powerful LASVM iteratively approaches the exact SVM solution using sequential minimal optimization (SMO). It allows efficient online and active learning. Here, this algorithm is considerably improved in speed and accuracy by replacing the working set selection in the SMO steps. A second-order working set selection strategy, which greedily aims at maximizing the progress in each single step, is incorporated.



2018 ◽  
Author(s):  
Omer Weissbrod ◽  
Jonathan Flint ◽  
Saharon Rosset

AbstractMethods that estimate heritability and genetic correlations from genome-wide association studies have proven to be powerful tools for investigating the genetic architecture of common diseases and exposing unexpected relationships between disorders. Many relevant studies employ a case-control design, yet most methods are primarily geared towards analyzing quantitative traits. Here we investigate the validity of three common methods for estimating genetic heritability and genetic correlation. We find that the Phenotype-Correlation-Genotype-Correlation (PCGC) approach is the only method that can estimate both quantities accurately in the presence of important non-genetic risk factors, such as age and sex. We extend PCGC to work with summary statistics that take the case-control sampling into account, and demonstrate that our new method, PCGC-s, accurately estimates both heritability and genetic correlations and can be applied to large data sets without requiring individual-level genotypic or phenotypic information. Finally, we use PCGC-S to estimate the genetic correlation between schizophrenia and bipolar disorder, and demonstrate that previous estimates are biased due to incorrect handling of sex as a strong risk factor. PCGC-s is available at https://github.com/omerwe/PCGCs.



Author(s):  
Kim Wallin

The standard Master Curve (MC) deals only with materials assumed to be homogeneous, but MC analysis methods for inhomogeneous materials have also been developed. Especially the bi-modal and multi-modal analysis methods are becoming more and more standard. Their drawback is that these methods are generally reliable only with sufficiently large data sets (number of valid tests, r ≥ 15–20). Here, the possibility of using the multi-modal analysis method with smaller data sets is assessed, and a new procedure to conservatively account for possible inhomogeneities is proposed.



Author(s):  
Gary Smith ◽  
Jay Cordes

Patterns are inevitable and we should not be surprised by them. Streaks, clusters, and correlations are the norm, not the exception. In a large number of coin flips, there are likely to be coincidental clusters of heads and tails. In nationwide data on cancer, crime, or test scores, there are likely to be flukey clusters. When the data are separated into smaller geographic units like cities, the most extreme results are likely to be found in the smallest cities. In athletic competitions between well-matched teams, the outcome of a small number of games is almost meaningless. Our challenge is to overcome our inherited inclination to think that all patterns are meaningful; for example, thinking that clustering in large data sets or differences among small data sets must be something real that needs to be explained. Often, it is just meaningless happenstance.



2008 ◽  
Vol 130 (2) ◽  
Author(s):  
Stuart Holdsworth

The European Creep Collaborative Committee (ECCC) approach to creep data assessment has now been established for almost ten years. The methodology covers the analysis of rupture strength and ductility, creep strain, and stress relaxation data, for a range of material conditions. This paper reviews the concepts and procedures involved. The original approach was devised to determine data sheets for use by committees responsible for the preparation of National and International Design and Product Standards, and the methods developed for data quality evaluation and data analysis were therefore intentionally rigorous. The focus was clearly on the determination of long-time property values from the largest possible data sets involving a significant number of observations in the mechanism regime for which predictions were required. More recently, the emphasis has changed. There is now an increasing requirement for full property descriptions from very short times to very long and hence the need for much more flexible model representations than were previously required. There continues to be a requirement for reliable long-time predictions from relatively small data sets comprising relatively short duration tests, in particular, to exploit new alloy developments at the earliest practical opportunity. In such circumstances, it is not feasible to apply the same degree of rigor adopted for large data set assessment. Current developments are reviewed.



2019 ◽  
Vol 88 (1) ◽  
pp. 247-280 ◽  
Author(s):  
Gary J. Doherty ◽  
Michele Petruzzelli ◽  
Emma Beddowes ◽  
Saif S. Ahmad ◽  
Carlos Caldas ◽  
...  

The complexity of human cancer underlies its devastating clinical consequences. Drugs designed to target the genetic alterations that drive cancer have improved the outcome for many patients, but not the majority of them. Here, we review the genomic landscape of cancer, how genomic data can provide much more than a sum of its parts, and the approaches developed to identify and validate genomic alterations with potential therapeutic value. We highlight notable successes and pitfalls in predicting the value of potential therapeutic targets and discuss the use of multi-omic data to better understand cancer dependencies and drug sensitivity. We discuss how integrated approaches to collecting, curating, and sharing these large data sets might improve the identification and prioritization of cancer vulnerabilities as well as patient stratification within clinical trials. Finally, we outline how future approaches might improve the efficiency and speed of translating genomic data into clinically effective therapies and how the use of unbiased genome-wide information can identify novel predictive biomarkers that can be either simple or complex.



1981 ◽  
Vol 35 (1) ◽  
pp. 35-42 ◽  
Author(s):  
J. D. Algeo ◽  
M. B. Denton

A numerical method for evaluating the inverted Abel integral employing cubic spline approximations is described along with a modification of the procedure of Cremers and Birkebak, and an extension of the Barr method. The accuracy of the computations is evaluated at several noise levels and with varying resolution of the input data. The cubic spline method is found to be useful only at very low noise levels, but capable of providing good results with small data sets. The Barr method is computationally the simplest, and is adequate when large data sets are available. For noisy data, the method of Cremers and Birkebak gave the best results.



Sign in / Sign up

Export Citation Format

Share Document