scholarly journals 331 Efficient quality control methods for genomic and pedigree data used in routine genomic evaluation

2019 ◽  
Vol 97 (Supplement_3) ◽  
pp. 50-51
Author(s):  
Yutaka Masuda ◽  
Andres Legarra ◽  
Ignacio Aguilar ◽  
Ignacy Misztal

Abstract Quality control and consistency tests on genotypes and historical pedigree data are applied in a routine genomic evaluation and academic research. The quality control takes more time to finish as more genotypes become available, and this step is a bottleneck in a pipeline of routine evaluation. For the efficient quality control, we have developed several algorithms and a computer program to support for large-scale, biallelic, single nucleotide polymorphisms (SNPs). The program is designed to detect unsatisfactory genomic markers and individuals in terms of call rate, marker allele frequencies, duplicate samples, and Mendelian inconsistency in the large genomic data with the pedigree including millions of individuals. Duplicated genotypes can be detected using a set of markers. An SNP genotype is packed into a 2-bit representation in memory that enables bitwise operations with parallel computing to efficiently perform the quality control. The software optionally checks the inconsistency of pedigree information. We compared QCF90 with preGSf90, a preceding program, in terms of memory usage and computing time using a data set including 200,000 genotyped individuals, 50,000 SNP markers per individual, and 216,500 pedigree individuals. In total running time, QCF90 was approximately 6 times faster than PREGSF90 (307 s vs 2075 s) while the memory usage was 30 times less (2 GB vs 75 GB) using only 1 thread. The QCF90 program performed better in speed as more threads were used. A check for genomic duplications took 159 s with 16 threads when 5,000 genotypes were compared with 200,000 genotypes using 2500 SNP markers. The new tool is useful in the routine genomic evaluation and the academic research in which both the genotypes and the pedigree information are used. The QCF90 executable is available at http://nce.ads.uga.edu with a user manual.

Animals ◽  
2020 ◽  
Vol 10 (8) ◽  
pp. 1310
Author(s):  
Enrico Mancin ◽  
Michela Ablondi ◽  
Roberto Mantovani ◽  
Giuseppe Pigozzi ◽  
Alberto Sabbioni ◽  
...  

This study aimed to investigate the genetic diversity in the Italian Heavy Horse Breed from pedigree and genomic data. Pedigree information for 64,917 individuals were used to assess inbreeding level, effective population size (Ne), and effective numbers of founders and ancestors (fa/fe). Genotypic information from SNP markers were available for 267 individuals of both sexes, and it allowed estimating genomic inbreeding in two methods (observed versus expected homozygosity and from ROH) to study the breed genomic structure and possible selection signatures. Pedigree and genomic inbreeding were greatly correlated (0.65 on average). The inbreeding trend increased over time, apart from periods in which the base population enlarged, when Ne increased also. Recent bottlenecks did not occur in the genome, as fa/fe have shown. The observed homozygosity results were on average lower than expected, which was probably due to the use of French Breton stallions to support the breed genetic variability. High homozygous regions suggested that inbreeding increased in different periods. Two subpopulations were distinguished, which was probably due to the different inclusion of French animals by breeders. Few selection signatures were found at the population level, with possible associations to disease resistance. The almost low inbreeding rate suggested that despite the small breed size, conservation actions are not yet required.


Author(s):  
Wanshan Ning ◽  
Peiran Jiang ◽  
Yaping Guo ◽  
Chenwei Wang ◽  
Xiaodan Tan ◽  
...  

Abstract As an important reversible lipid modification, S-palmitoylation mainly occurs at specific cysteine residues in proteins, participates in regulating various biological processes and is associated with human diseases. Besides experimental assays, computational prediction of S-palmitoylation sites can efficiently generate helpful candidates for further experimental consideration. Here, we reviewed the current progress in the development of S-palmitoylation site predictors, as well as training data sets, informative features and algorithms used in these tools. Then, we compiled a benchmark data set containing 3098 known S-palmitoylation sites identified from small- or large-scale experiments, and developed a new method named data quality discrimination (DQD) to distinguish data quality weights (DQWs) between the two types of the sites. Besides DQD and our previous methods, we encoded sequence similarity values into images, constructed a deep learning framework of convolutional neural networks (CNNs) and developed a novel algorithm of graphic presentation system (GPS) 6.0. We further integrated nine additional types of sequence-based and structural features, implemented parallel CNNs (pCNNs) and designed a new predictor called GPS-Palm. Compared with other existing tools, GPS-Palm showed a >31.3% improvement of the area under the curve (AUC) value (0.855 versus 0.651) for general prediction of S-palmitoylation sites. We also produced two species-specific predictors, with corresponding AUC values of 0.900 and 0.897 for predicting human- and mouse-specific sites, respectively. GPS-Palm is free for academic research at http://gpspalm.biocuckoo.cn/.


2019 ◽  
Author(s):  
Boglárka Vincze ◽  
Márta Varga ◽  
András Gáspárdy ◽  
Orsolya Kutasi ◽  
Petra Zenke ◽  
...  

AbstractEquine grass sickness (also known as dysautonomia) is a life-threatening polyneuropathic disease affecting horses with approx. 80% mortality. Since it’s first description over a hundred years ago, several factors including phenotypic, environmental, management, climate, and intestinal microbiome) have been associated with increased risk of dysautonomia. But despite the extensive research on dysautonomia, it’s causative factors have yet been identified. A retrospective pedigree and phenotype based genetic epidemiological study was performed to analyze the associations of disease occurrence and the kinship in a Hungarian large scale stud. The pedigree data set containing 1233 horses with 49 affected animals was used in the analysis. The first finding was that among the descendants of some stallions the proportion of affected animals are unexpectedly high, with a maximum of 25% of a stallions descendants affected. Animals with affected siblings have higher odds to be a case (OR: 1.27, 95% CI: 1.01-1.57, p=0.033). Among males in the affected population the odds of dysautonomia is higher than in females (OR: 1.76, 95% CI: 0.95-3.29, p=0.057). Significant familial clustering was observed among the affected animals (GIF p=0.001). Further subgroups were identified with significant (p<0.001) aggregation among close relatives using kinship-based methods. Our analysis of the data and the observed higher disease frequency in males suggests that dysautonomia may have X-linked recessive inheritance as a causal factor. This is the first study providing ancestry data and suggesting a genetic contribution to the likely multifactorial causes of the disease.


2016 ◽  
Author(s):  
Joseph N. Paulson ◽  
Cho-Yi Chen ◽  
Camila M. Lopes-Ramos ◽  
Marieke L Kuijjer ◽  
John Platig ◽  
...  

AbstractAlthough ultrahigh-throughput RNA-Sequencing has become the dominant technology for genome-wide transcriptional profiling, the vast majority of RNA-Seq studies typically profile only tens of samples, and most analytical pipelines are optimized for these smaller studies. However, projects are generating ever-larger data sets comprising RNA-Seq data from hundreds or thousands of samples, often collected at multiple centers and from diverse tissues. These complex data sets present significant analytical challenges due to batch and tissue effects, but provide the opportunity to revisit the assumptions and methods that we use to preprocess, normalize, and filter RNA-Seq data – critical first steps for any subsequent analysis. We find analysis of large RNA-Seq data sets requires both careful quality control and that one account for sparsity due to the heterogeneity intrinsic in multi-group studies. An R package instantiating our method for large-scale RNA-Seq normalization and preprocessing, YARN, is available at bioconductor.org/packages/yarn.HighlightsOverview of assumptions used in preprocessing and normalizationPipeline for preprocessing, quality control, and normalization of large heterogeneous dataA Bioconductor package for the YARN pipeline and easy manipulation of count dataPreprocessed GTEx data set using the YARN pipeline available as a resource


1966 ◽  
Vol 05 (02) ◽  
pp. 67-74 ◽  
Author(s):  
W. I. Lourie ◽  
W. Haenszeland

Quality control of data collected in the United States by the Cancer End Results Program utilizing punchcards prepared by participating registries in accordance with a Uniform Punchcard Code is discussed. Existing arrangements decentralize responsibility for editing and related data processing to the local registries with centralization of tabulating and statistical services in the End Results Section, National Cancer Institute. The most recent deck of punchcards represented over 600,000 cancer patients; approximately 50,000 newly diagnosed cases are added annually.Mechanical editing and inspection of punchcards and field audits are the principal tools for quality control. Mechanical editing of the punchcards includes testing for blank entries and detection of in-admissable or inconsistent codes. Highly improbable codes are subjected to special scrutiny. Field audits include the drawing of a 1-10 percent random sample of punchcards submitted by a registry; the charts are .then reabstracted and recoded by a NCI staff member and differences between the punchcard and the results of independent review are noted.


2009 ◽  
Vol 28 (11) ◽  
pp. 2737-2740
Author(s):  
Xiao ZHANG ◽  
Shan WANG ◽  
Na LIAN

Author(s):  
Eun-Young Mun ◽  
Anne E. Ray

Integrative data analysis (IDA) is a promising new approach in psychological research and has been well received in the field of alcohol research. This chapter provides a larger unifying research synthesis framework for IDA. Major advantages of IDA of individual participant-level data include better and more flexible ways to examine subgroups, model complex relationships, deal with methodological and clinical heterogeneity, and examine infrequently occurring behaviors. However, between-study heterogeneity in measures, designs, and samples and systematic study-level missing data are significant barriers to IDA and, more broadly, to large-scale research synthesis. Based on the authors’ experience working on the Project INTEGRATE data set, which combined individual participant-level data from 24 independent college brief alcohol intervention studies, it is also recognized that IDA investigations require a wide range of expertise and considerable resources and that some minimum standards for reporting IDA studies may be needed to improve transparency and quality of evidence.


Author(s):  
Jeasik Cho

This book provides the qualitative research community with some insight on how to evaluate the quality of qualitative research. This topic has gained little attention during the past few decades. We, qualitative researchers, read journal articles, serve on masters’ and doctoral committees, and also make decisions on whether conference proposals, manuscripts, or large-scale grant proposals should be accepted or rejected. It is assumed that various perspectives or criteria, depending on various paradigms, theories, or fields of discipline, have been used in assessing the quality of qualitative research. Nonetheless, until now, no textbook has been specifically devoted to exploring theories, practices, and reflections associated with the evaluation of qualitative research. This book constructs a typology of evaluating qualitative research, examines actual information from websites and qualitative journal editors, and reflects on some challenges that are currently encountered by the qualitative research community. Many different kinds of journals’ review guidelines and available assessment tools are collected and analyzed. Consequently, core criteria that stand out among these evaluation tools are presented. Readers are invited to join the author to confidently proclaim: “Fortunately, there are commonly agreed, bold standards for evaluating the goodness of qualitative research in the academic research community. These standards are a part of what is generally called ‘scientific research.’ ”


2020 ◽  
Vol 47 (3) ◽  
pp. 547-560 ◽  
Author(s):  
Darush Yazdanfar ◽  
Peter Öhman

PurposeThe purpose of this study is to empirically investigate determinants of financial distress among small and medium-sized enterprises (SMEs) during the global financial crisis and post-crisis periods.Design/methodology/approachSeveral statistical methods, including multiple binary logistic regression, were used to analyse a longitudinal cross-sectional panel data set of 3,865 Swedish SMEs operating in five industries over the 2008–2015 period.FindingsThe results suggest that financial distress is influenced by macroeconomic conditions (i.e. the global financial crisis) and, in particular, by various firm-specific characteristics (i.e. performance, financial leverage and financial distress in previous year). However, firm size and industry affiliation have no significant relationship with financial distress.Research limitationsDue to data availability, this study is limited to a sample of Swedish SMEs in five industries covering eight years. Further research could examine the generalizability of these findings by investigating other firms operating in other industries and other countries.Originality/valueThis study is the first to examine determinants of financial distress among SMEs operating in Sweden using data from a large-scale longitudinal cross-sectional database.


2020 ◽  
Vol 72 (1) ◽  
Author(s):  
Chao Xiong ◽  
Claudia Stolle ◽  
Patrick Alken ◽  
Jan Rauberg

Abstract In this study, we have derived field-aligned currents (FACs) from magnetometers onboard the Defense Meteorological Satellite Project (DMSP) satellites. The magnetic latitude versus local time distribution of FACs from DMSP shows comparable dependences with previous findings on the intensity and orientation of interplanetary magnetic field (IMF) By and Bz components, which confirms the reliability of DMSP FAC data set. With simultaneous measurements of precipitating particles from DMSP, we further investigate the relation between large-scale FACs and precipitating particles. Our result shows that precipitation electron and ion fluxes both increase in magnitude and extend to lower latitude for enhanced southward IMF Bz, which is similar to the behavior of FACs. Under weak northward and southward Bz conditions, the locations of the R2 current maxima, at both dusk and dawn sides and in both hemispheres, are found to be close to the maxima of the particle energy fluxes; while for the same IMF conditions, R1 currents are displaced further to the respective particle flux peaks. Largest displacement (about 3.5°) is found between the downward R1 current and ion flux peak at the dawn side. Our results suggest that there exists systematic differences in locations of electron/ion precipitation and large-scale upward/downward FACs. As outlined by the statistical mean of these two parameters, the FAC peaks enclose the particle energy flux peaks in an auroral band at both dusk and dawn sides. Our comparisons also found that particle precipitation at dawn and dusk and in both hemispheres maximizes near the mean R2 current peaks. The particle precipitation flux maxima closer to the R1 current peaks are lower in magnitude. This is opposite to the known feature that R1 currents are on average stronger than R2 currents.


Sign in / Sign up

Export Citation Format

Share Document