scholarly journals 49 Current status of genomic selection

2019 ◽  
Vol 97 (Supplement_3) ◽  
pp. 52-53
Author(s):  
Ignacy Misztal

Abstract Early application of genomic selection relied on SNP estimation with phenotypes or de-regressed proofs (DRP). Chips of 50k SNP seemed sufficient. Estimated breeding value was an index with parent average and deduction to eliminate double counting. Use of SNP selection or weighting increased accuracy with small data sets but less or none with large data sets. Use of DRP with female information required ad-hoc modifications. As BLUP is biased by genomic selection, use of DRP under genomic selection required adjustments. Efforts to include potentially causative SNP derived from sequence analysis showed limited or no gain. The genomic selection was greatly simplified using single-step GBLUP (ssGBLUP) because the procedure automatically creates the index, can use any combination of male and female genotypes, and accounts for preselection. ssGBLUP requires careful scaling for compatibility between pedigree and genomic relationships to avoid biases especially under strong selection. Large data computations in ssGBLUP were solved by exploiting limited dimensionality of SNP due to limited effective population size. With such dimensionality ranging from 4k in chicken to about 15k in Holsteins, the inverse of GRM can be created directly (e.g., by the APY algorithm) in linear cost. Due to its simplicity and accuracy ssGBLUP is routinely used for genomic selection by major companies in chicken, pigs and beef. ssGBLUP can be used to derive SNP effects for indirect prediction, and for GWAS, including computations of the P-values. An alternative single-step called ssBR exists that uses SNP effects instead of GRM. As BLUP is affected by pre-selection, there is need for new validation procedures unaffected by selection, and for parameter estimation that accounts for all the genomic data used in selection. Another issue are reduced variances due to the Bulmer effect.


2020 ◽  
Vol 98 (4) ◽  
Author(s):  
Ignacy Misztal ◽  
Daniela Lourenco ◽  
Andres Legarra

Abstract Early application of genomic selection relied on SNP estimation with phenotypes or de-regressed proofs (DRP). Chips of 50k SNP seemed sufficient for an accurate estimation of SNP effects. Genomic estimated breeding values (GEBV) were composed of an index with parent average, direct genomic value, and deduction of a parental index to eliminate double counting. Use of SNP selection or weighting increased accuracy with small data sets but had minimal to no impact with large data sets. Efforts to include potentially causative SNP derived from sequence data or high-density chips showed limited or no gain in accuracy. After the implementation of genomic selection, EBV by BLUP became biased because of genomic preselection and DRP computed based on EBV required adjustments, and the creation of DRP for females is hard and subject to double counting. Genomic selection was greatly simplified by single-step genomic BLUP (ssGBLUP). This method based on combining genomic and pedigree relationships automatically creates an index with all sources of information, can use any combination of male and female genotypes, and accounts for preselection. To avoid biases, especially under strong selection, ssGBLUP requires that pedigree and genomic relationships are compatible. Because the inversion of the genomic relationship matrix (G) becomes costly with more than 100k genotyped animals, large data computations in ssGBLUP were solved by exploiting limited dimensionality of genomic data due to limited effective population size. With such dimensionality ranging from 4k in chickens to about 15k in cattle, the inverse of G can be created directly (e.g., by the algorithm for proven and young) at a linear cost. Due to its simplicity and accuracy, ssGBLUP is routinely used for genomic selection by the major chicken, pig, and beef industries. Single step can be used to derive SNP effects for indirect prediction and for genome-wide association studies, including computations of the P-values. Alternative single-step formulations exist that use SNP effects for genotyped or for all animals. Although genomics is the new standard in breeding and genetics, there are still some problems that need to be solved. This involves new validation procedures that are unaffected by selection, parameter estimation that accounts for all the genomic data used in selection, and strategies to address reduction in genetic variances after genomic selection was implemented.



2021 ◽  
Author(s):  
Timo Kersten ◽  
Viktor Leis ◽  
Thomas Neumann

AbstractAlthough compiling queries to efficient machine code has become a common approach for query execution, a number of newly created database system projects still refrain from using compilation. It is sometimes claimed that the intricacies of code generation make compilation-based engines too complex. Also, a major barrier for adoption, especially for interactive ad hoc queries, is long compilation time. In this paper, we examine all stages of compiling query execution engines and show how to reduce compilation overhead. We incorporate the lessons learned from a decade of generating code in HyPer into a design that manages complexity and yields high speed. First, we introduce a code generation framework that establishes abstractions to manage complexity, yet generates code in a single fast pass. Second, we present a program representation whose data structures are tuned to support fast code generation and compilation. Third, we introduce a new compiler backend that is optimized for minimal compile time, and simultaneously, yields superior execution performance to competing approaches, e.g., Volcano-style or bytecode interpretation. We implemented these optimizations in our database system Umbra to show that it is possible to unite fast compilation and fast execution. Indeed, Umbra achieves unprecedentedly low query latencies. On small data sets, it is even faster than interpreter engines like DuckDB and PostgreSQL. At the same time, on large data sets, its throughput is on par with the state-of-the-art compiling system HyPer.



2020 ◽  
Vol 98 (Supplement_4) ◽  
pp. 8-9
Author(s):  
Zahra Karimi ◽  
Brian Sullivan ◽  
Mohsen Jafarikia

Abstract Previous studies have shown that the accuracy of Genomic Estimated Breeding Value (GEBV) as a predictor of future performance is higher than the traditional Estimated Breeding Value (EBV). The purpose of this study was to estimate the potential advantage of selection on GEBV for litter size (LS) compared to selection on EBV in the Canadian swine dam line breeds. The study included 236 Landrace and 210 Yorkshire gilts born in 2017 which had their first farrowing after 2017. GEBV and EBV for LS were calculated with data that was available at the end of 2017 (GEBV2017 and EBV2017, respectively). De-regressed EBV for LS in July 2019 (dEBV2019) was used as an adjusted phenotype. The average dEBV2019 for the top 40% of sows based on GEBV2017 was compared to the average dEBV2019 for the top 40% of sows based on EBV2017. The standard error of the estimated difference for each breed was estimated by comparing the average dEBV2019 for repeated random samples of two sets of 40% of the gilts. In comparison to the top 40% ranked based on EBV2017, ranking based on GEBV2017 resulted in an extra 0.45 (±0.29) and 0.37 (±0.25) piglets born per litter in Landrace and Yorkshire replacement gilts, respectively. The estimated Type I errors of the GEBV2017 gain over EBV2017 were 6% and 7% in Landrace and Yorkshire, respectively. Considering selection of both replacement boars and replacement gilts using GEBV instead of EBV can translate into increased annual genetic gain of 0.3 extra piglets per litter, which would more than double the rate of gain observed from typical EBV based selection. The permutation test for validation used in this study appears effective with relatively small data sets and could be applied to other traits, other species and other prediction methods.



2013 ◽  
Vol 2013 ◽  
pp. 1-11
Author(s):  
Dewang Chen ◽  
Long Chen

In order to obtain a decent trade-off between the low-cost, low-accuracy Global Positioning System (GPS) receivers and the requirements of high-precision digital maps for modern railways, using the concept of constraint K-segment principal curves (CKPCS) and the expert knowledge on railways, we propose three practical CKPCS generation algorithms with reduced computational complexity, and thereafter more suitable for engineering applications. The three algorithms are named ALLopt, MPMopt, and DCopt, in which ALLopt exploits global optimization and MPMopt and DCopt apply local optimization with different initial solutions. We compare the three practical algorithms according to their performance on average projection error, stability, and the fitness for simple and complex simulated trajectories with noise data. It is found that ALLopt only works well for simple curves and small data sets. The other two algorithms can work better for complex curves and large data sets. Moreover, MPMopt runs faster than DCopt, but DCopt can work better for some curves with cross points. The three algorithms are also applied in generating GPS digital maps for two railway GPS data sets measured in Qinghai-Tibet Railway (QTR). Similar results like the ones in synthetic data are obtained. Because the trajectory of a railway is relatively simple and straight, we conclude that MPMopt works best according to the comprehensive considerations on the speed of computation and the quality of generated CKPCS. MPMopt can be used to obtain some key points to represent a large amount of GPS data. Hence, it can greatly reduce the data storage requirements and increase the positioning speed for real-time digital map applications.



2020 ◽  
pp. 1-11
Author(s):  
Erjia Yan ◽  
Zheng Chen ◽  
Kai Li

Citation sentiment plays an important role in citation analysis and scholarly communication research, but prior citation sentiment studies have used small data sets and relied largely on manual annotation. This paper uses a large data set of PubMed Central (PMC) full-text publications and analyzes citation sentiment in more than 32 million citances within PMC, revealing citation sentiment patterns at the journal and discipline levels. This paper finds a weak relationship between a journal’s citation impact (as measured by CiteScore) and the average sentiment score of citances to its publications. When journals are aggregated into quartiles based on citation impact, we find that journals in higher quartiles are cited more favorably than those in the lower quartiles. Further, social science journals are found to be cited with higher sentiment, followed by engineering and natural science and biomedical journals, respectively. This result may be attributed to disciplinary discourse patterns in which social science researchers tend to use more subjective terms to describe others’ work than do natural science or biomedical researchers.



Endocrinology ◽  
2019 ◽  
Vol 160 (10) ◽  
pp. 2395-2400 ◽  
Author(s):  
David J Handelsman ◽  
Lam P Ly

Abstract Hormone assay results below the assay detection limit (DL) can introduce bias into quantitative analysis. Although complex maximum likelihood estimation methods exist, they are not widely used, whereas simple substitution methods are often used ad hoc to replace the undetectable (UD) results with numeric values to facilitate data analysis with the full data set. However, the bias of substitution methods for steroid measurements is not reported. Using a large data set (n = 2896) of serum testosterone (T), DHT, estradiol (E2) concentrations from healthy men, we created modified data sets with increasing proportions of UD samples (≤40%) to which we applied five different substitution methods (deleting UD samples as missing and substituting UD sample with DL, DL/√2, DL/2, or 0) to calculate univariate descriptive statistics (mean, SD) or bivariate correlations. For all three steroids and for univariate as well as bivariate statistics, bias increased progressively with increasing proportion of UD samples. Bias was worst when UD samples were deleted or substituted with 0 and least when UD samples were substituted with DL/√2, whereas the other methods (DL or DL/2) displayed intermediate bias. Similar findings were replicated in randomly drawn small subsets of 25, 50, and 100. Hence, we propose that in steroid hormone data with ≤40% UD samples, substituting UD with DL/√2 is a simple, versatile, and reasonably accurate method to minimize left censoring bias, allowing for data analysis with the full data set.



2008 ◽  
Vol 20 (2) ◽  
pp. 374-382 ◽  
Author(s):  
Tobias Glasmachers ◽  
Christian Igel

Iterative learning algorithms that approximate the solution of support vector machines (SVMs) have two potential advantages. First, they allow online and active learning. Second, for large data sets, computing the exact SVM solution may be too time-consuming, and an efficient approximation can be preferable. The powerful LASVM iteratively approaches the exact SVM solution using sequential minimal optimization (SMO). It allows efficient online and active learning. Here, this algorithm is considerably improved in speed and accuracy by replacing the working set selection in the SMO steps. A second-order working set selection strategy, which greedily aims at maximizing the progress in each single step, is incorporated.



PLoS ONE ◽  
2021 ◽  
Vol 16 (7) ◽  
pp. e0254594
Author(s):  
Leonardo X. Espín ◽  
Anders J. Asp ◽  
James K. Trevathan ◽  
Kip A. Ludwig ◽  
J. Luis Lujan

Modern techniques for estimating basal levels of electroactive neurotransmitters rely on the measurement of oxidative charges. This requires time integration of oxidation currents at certain intervals. Unfortunately, the selection of integration intervals relies on ad-hoc visual identification of peaks on the oxidation currents, which introduces sources of error and precludes the development of automated procedures necessary for analysis and quantification of neurotransmitter levels in large data sets. In an effort to improve charge quantification techniques, here we present novel methods for automatic selection of integration boundaries. Our results show that these methods allow quantification of oxidation reactions both in vitro and in vivo and of multiple analytes in vitro.



Author(s):  
Kim Wallin

The standard Master Curve (MC) deals only with materials assumed to be homogeneous, but MC analysis methods for inhomogeneous materials have also been developed. Especially the bi-modal and multi-modal analysis methods are becoming more and more standard. Their drawback is that these methods are generally reliable only with sufficiently large data sets (number of valid tests, r ≥ 15–20). Here, the possibility of using the multi-modal analysis method with smaller data sets is assessed, and a new procedure to conservatively account for possible inhomogeneities is proposed.



2021 ◽  
Author(s):  
Hideharu Yonebayashi ◽  
Atsushi Kobayashi ◽  
Susumu Hirano ◽  
Masami Okawara ◽  
Takao Iwata

Abstract As a part of laboratory Health, Safety and Environment (HSE) management system, the working environment control is applied to eliminate various occupational hazards for workers. This control is a continuous effort in our petroleum R&D laboratory as the working environment management system. As an element in the management system, workplace inspection has been taken into the regular HSE activity. Even traditional and well established, the workplace inspection has been continuously improved and optimized from various aspect of inspection design, inspection members selection, check list, and feedback. To make the continual improving practices more practical and effective, workplace features such as laboratory specific environment and ad-hoc research programs have been incorporated into the inspection design. All findings are summarized immediately after every inspection, and subsequently which types of risks hidden in the findings and necessary corrective actions are discussed. All of them: findings, risks, and corrective measures should be swiftly shared with all employees in the workplace. A check list format has been optimized from both aspects of easier recording by inspectors and correctly feedback to responsible personnel to take right counter measures. The paper analyses a large data of workplace inspection results in recent 10 years. The analysis reveals that hazardous sources are decreasing in recent years because of maturity of HSE culture in our laboratory. A combined cycle of inspection activity and data analysis would be useful for understanding the current status of working environment control and considering further updating plan. This paper discusses a practical example of laboratory HSE management system from both of detailed and high levels. Furthermore, a potential is discussed for a future workplace inspection using artificial intelligence and deep learning. The enterprising discussion contributes employee's traditional mindset fresh.



Sign in / Sign up

Export Citation Format

Share Document