scholarly journals Assessing the effect of phenotyping scoring systems and SNP calling and filtering methods on detection of QTL associated with reaction of Brassica napus to Sclerotinia sclerotiorum

Author(s):  
Fereshteh Shahoveisi ◽  
Atena Oladzad ◽  
Luis E. del Rio Mendoza ◽  
Seyedali Hosseinirad ◽  
Susan Ruud ◽  
...  

The polyploid nature of canola (Brassica napus) represents a challenge for the accurate identification of single nucleotide polymorphisms (SNPs) and the detection of quantitative trait loci (QTL). In this study, combinations of eight phenotyping scoring systems and six SNP calling and filtering parameters were evaluated for their efficiency in detection of QTL associated with response to Sclerotinia stem rot, caused by Sclerotinia sclerotiorum, in two doubled haploid (DH) canola mapping populations. Most QTL were detected in lesion length, relative areas under the disease progress curve (rAUDPC) for lesion length, and binomial-plant mortality data sets. Binomial data derived from lesion size were less efficient in QTL detection. Inclusion of additional phenotypic sets to the analysis increased the numbers of significant QTL by 2.3-fold; however, the continuous data sets were more efficient. Between two filtering parameters used to analyze genotyping by sequencing (GBS) data, imputation of missing data increased QTL detection in one population with a high level of missing data but not in the other. Inclusion of segregation-distorted SNPs increased QTL detection but did not impact their R2 values significantly. Twelve of the 16 detected QTL were on chromosomes A02 and C01, and the rest were on A07, A09, and C03. Marker A02-7594120, associated with a QTL on chromosome A02 was detected in both populations. Results of this study suggest the impact of genotypic variant calling and filtering parameters may be population dependent while deriving additional phenotyping scoring systems such as rAUDPC datasets and mortality binary may improve QTL detection efficiency.

Author(s):  
Daniel M Portik ◽  
John J Wiens

Abstract Alignment is a crucial issue in molecular phylogenetics because different alignment methods can potentially yield very different topologies for individual genes. But it is unclear if the choice of alignment methods remains important in phylogenomic analyses, which incorporate data from hundreds or thousands of genes. For example, problematic biases in alignment might be multiplied across many loci, whereas alignment errors in individual genes might become irrelevant. The issue of alignment trimming (i.e., removing poorly aligned regions or missing data from individual genes) is also poorly explored. Here, we test the impact of 12 different combinations of alignment and trimming methods on phylogenomic analyses. We compare these methods using published phylogenomic data from ultraconserved elements (UCEs) from squamate reptiles (lizards and snakes), birds, and tetrapods. We compare the properties of alignments generated by different alignment and trimming methods (e.g., length, informative sites, missing data). We also test whether these data sets can recover well-established clades when analyzed with concatenated (RAxML) and species-tree methods (ASTRAL-III), using the full data ($\sim $5000 loci) and subsampled data sets (10% and 1% of loci). We show that different alignment and trimming methods can significantly impact various aspects of phylogenomic data sets (e.g., length, informative sites). However, these different methods generally had little impact on the recovery and support values for well-established clades, even across very different numbers of loci. Nevertheless, our results suggest several “best practices” for alignment and trimming. Intriguingly, the choice of phylogenetic methods impacted the phylogenetic results most strongly, with concatenated analyses recovering significantly more well-established clades (with stronger support) than the species-tree analyses. [Alignment; concatenated analysis; phylogenomics; sequence length heterogeneity; species-tree analysis; trimming]


2021 ◽  
Author(s):  
Trenton J. Davis ◽  
Tarek R. Firzli ◽  
Emily A. Higgins Keppler ◽  
Matt Richardson ◽  
Heather D. Bean

Missing data is a significant issue in metabolomics that is often neglected when conducting data pre-processing, particularly when it comes to imputation. This can have serious implications for downstream statistical analyses and lead to misleading or uninterpretable inferences. In this study, we aim to identify the primary types of missingness that affect untargeted metab-olomics data and compare strategies for imputation using two real-world comprehensive two-dimensional gas chromatog-raphy (GC×GC) data sets. We also present these goals in the context of experimental replication whereby imputation is con-ducted in a within-replicate-based fashion—the first description and evaluation of this strategy—and introduce an R package MetabImpute to carry out these analyses. Our results conclude that, in these two data sets, missingness was most likely of the missing at-random (MAR) and missing not-at-random (MNAR) types as opposed to missing completely at-random (MCAR). Gibbs sampler imputation and Random Forest gave the best results when imputing MAR and MNAR compared against single-value imputation (zero, minimum, mean, median, and half-minimum) and other more sophisticated approach-es (Bayesian principal components analysis and quantile regression imputation for left-censored data). When samples are replicated, within-replicate imputation approaches led to an increase in the reproducibility of peak quantification compared to imputation that ignores replication, suggesting that imputing with respect to replication may preserve potentially im-portant features in downstream analyses for biomarker discovery.


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e9993
Author(s):  
Christian Niederwanger ◽  
Thomas Varga ◽  
Tobias Hell ◽  
Daniel Stuerzel ◽  
Jennifer Prem ◽  
...  

Background Scores can assess the severity and course of disease and predict outcome in an objective manner. This information is needed for proper risk assessment and stratification. Furthermore, scoring systems support optimal patient care, resource management and are gaining in importance in terms of artificial intelligence. Objective This study evaluated and compared the prognostic ability of various common pediatric scoring systems (PRISM, PRISM III, PRISM IV, PIM, PIM2, PIM3, PELOD, PELOD 2) in order to determine which is the most applicable score for pediatric sepsis patients in terms of timing of disease survey and insensitivity to missing data. Methods We retrospectively examined data from 398 patients under 18 years of age, who were diagnosed with sepsis. Scores were assessed at ICU admission and re-evaluated on the day of peak C-reactive protein. The scores were compared for their ability to predict mortality in this specific patient population and for their impairment due to missing data. Results PIM (AUC 0.76 (0.68–0.76)), PIM2 (AUC 0.78 (0.72–0.78)) and PIM3 (AUC 0.76 (0.68–0.76)) scores together with PRSIM III (AUC 0.75 (0.68–0.75)) and PELOD 2 (AUC 0.75 (0.66–0.75)) are the most suitable scores for determining patient prognosis at ICU admission. Once sepsis is pronounced, PELOD 2 (AUC 0.84 (0.77–0.91)) and PRISM IV (AUC 0.8 (0.72–0.88)) become significantly better in their performance and count among the best prognostic scores for use at this time together with PRISM III (AUC 0.81 (0.73–0.89)). PELOD 2 is good for monitoring and, like the PIM scores, is also largely insensitive to missing values. Conclusion Overall, PIM scores show comparatively good performance, are stable as far as timing of the disease survey is concerned, and they are also relatively stable in terms of missing parameters. PELOD 2 is best suitable for monitoring clinical course.


2006 ◽  
Vol 57 (10) ◽  
pp. 1131 ◽  
Author(s):  
C. X. Li ◽  
Hua Li ◽  
K. Sivasithamparam ◽  
T. D. Fu ◽  
Y. C. Li ◽  
...  

Sclerotinia stem rot, caused by Sclerotinia sclerotiorum, has become one of the most serious disease problems in oilseed rape-growing areas in Australia. Sources of resistance to this disease have been sought worldwide. In this study, germplasm comprising 42 Brassica napus and 12 Brassica juncea accessions from China and Australia, was screened for resistance to Sclerotinia stem rot under Western Australian field conditions. Resistance was confirmed in some germplasm from China and new sources of resistance were identified in germplasm from Australia. Furthermore, our study found that the severity of stem lesions was related to stem diameter and percentage of the host plants that were dead. It was evident that both stem lesion length and percentage of plant death were at the lowest level when the stem diameter was approximately 10 mm. Smaller or greater stem diameter resulted both in increased stem lesion length and plant death. Stem diameter may be a useful parameter in breeding cultivars of oilseed Brassicas with Sclerotinia resistance.


Genes ◽  
2020 ◽  
Vol 11 (5) ◽  
pp. 581
Author(s):  
Jiayi Jiang ◽  
Xueli Liao ◽  
Xiaoyun Jin ◽  
Li Tan ◽  
Qifeng Lu ◽  
...  

Arabidopsis thaliana MYB43 (AtMYB43) is suggested to be involved in cell wall lignification. PtrMYB152, the Populus orthologue of AtMYB43, is a transcriptional activator of lignin biosynthesis and vessel wall deposition. In this research, MYB43 genes from Brassica napus (rapeseed) and its parental species B. rapa and B. oleracea were molecularly characterized, which were dominantly expressed in stem and other vascular organs and showed responsiveness to Sclerotinia sclerotiorum infection. The BnMYB43 family was silenced by RNAi, and the transgenic rapeseed lines showed retardation in growth and development with smaller organs, reduced lodging resistance, fewer silique number and lower yield potential. The thickness of the xylem layer decreased by 28%; the numbers of sclerenchymatous cells, vessels, interfascicular fibers, sieve tubes and pith cells in the whole cross section of the stem decreased by 28%, 59%, 48%, 34% and 21% in these lines, respectively. The contents of cellulose and lignin decreased by 17.49% and 16.21% respectively, while the pectin content increased by 71.92% in stems of RNAi lines. When inoculated with S. sclerotiorum, the lesion length was drastically decreased by 52.10% in the stems of transgenic plants compared with WT, implying great increase in disease resistance. Correspondingly, changes in the gene expression patterns of lignin biosynthesis, cellulose biosynthesis, pectin biosynthesis, cell cycle, SA- and JA-signals, and defensive pathways were in accordance with above phenotypic modifications. These results show that BnMYB43, being a growth-defense trade-off participant, positively regulates vascular lignification, plant morphology and yield potential, but negatively affects resistance to S. sclerotiorum. Moreover, this lignification activator influences cell biogenesis of both lignified and non-lignified tissues of the whole vascular organ.


2021 ◽  
Author(s):  
Trenton J. Davis ◽  
Tarek R. Firzli ◽  
Emily A. Higgins Keppler ◽  
Matt Richardson ◽  
Heather D. Bean

Missing data is a significant issue in metabolomics that is often neglected when conducting data pre-processing, particularly when it comes to imputation. This can have serious implications for downstream statistical analyses and lead to misleading or uninterpretable inferences. In this study, we aim to identify the primary types of missingness that affect untargeted metabolomics data and compare strategies for imputation using two real-world comprehensive two-dimensional gas chromatog-raphy (GC×GC) data sets. We also present these goals in the context of experimental replication whereby imputation is conducted in a within-replicate-based fashion—the first description and evaluation of this strategy—and introduce an R package MetabImpute to carry out these analyses. Our results conclude that, in these two data sets, missingness was most likely of the missing at-random (MAR) and missing not-at-random (MNAR) types as opposed to missing completely at-random (MCAR). Gibbs sampler imputation and Random Forest gave the best results when imputing MAR and MNAR compared against single-value imputation (zero, minimum, mean, median, and half-minimum) and other more sophisticated approach-es (Bayesian principal components analysis and quantile regression imputation for left-censored data). When samples are replicated, within-replicate imputation approaches led to an increase in the reproducibility of peak quantification compared to imputation that ignores replication, suggesting that imputing with respect to replication may preserve potentially important features in downstream analyses for biomarker discovery.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Yahya Albalawi ◽  
Jim Buckley ◽  
Nikola S. Nikolov

AbstractThis paper presents a comprehensive evaluation of data pre-processing and word embedding techniques in the context of Arabic document classification in the domain of health-related communication on social media. We evaluate 26 text pre-processings applied to Arabic tweets within the process of training a classifier to identify health-related tweets. For this task we use the (traditional) machine learning classifiers KNN, SVM, Multinomial NB and Logistic Regression. Furthermore, we report experimental results with the deep learning architectures BLSTM and CNN for the same text classification problem. Since word embeddings are more typically used as the input layer in deep networks, in the deep learning experiments we evaluate several state-of-the-art pre-trained word embeddings with the same text pre-processing applied. To achieve these goals, we use two data sets: one for both training and testing, and another for testing the generality of our models only. Our results point to the conclusion that only four out of the 26 pre-processings improve the classification accuracy significantly. For the first data set of Arabic tweets, we found that Mazajak CBOW pre-trained word embeddings as the input to a BLSTM deep network led to the most accurate classifier with F1 score of 89.7%. For the second data set, Mazajak Skip-Gram pre-trained word embeddings as the input to BLSTM led to the most accurate model with F1 score of 75.2% and accuracy of 90.7% compared to F1 score of 90.8% achieved by Mazajak CBOW for the same architecture but with lower accuracy of 70.89%. Our results also show that the performance of the best of the traditional classifier we trained is comparable to the deep learning methods on the first dataset, but significantly worse on the second dataset.


2021 ◽  
pp. 1-41
Author(s):  
W. Walker Hanlon ◽  
Casper Worm Hansen ◽  
Jake Kantor

Using novel weekly mortality data for London spanning 1866-1965, we analyze the changing relationship between temperature and mortality as the city developed. Our main results show that warm weeks led to elevated mortality in the late nineteenth century, mainly due to infant deaths from digestive diseases. However, this pattern largely disappeared after WWI as infant digestive diseases became less prevalent. The resulting change in the temperature-mortality relationship meant that thousands of heat-related deaths—equal to 0.9-1.4 percent of all deaths— were averted. These findings show that improving the disease environment can dramatically alter the impact of high temperature on mortality.


2021 ◽  
Vol 45 (3) ◽  
pp. 159-177
Author(s):  
Chen-Wei Liu

Missing not at random (MNAR) modeling for non-ignorable missing responses usually assumes that the latent variable distribution is a bivariate normal distribution. Such an assumption is rarely verified and often employed as a standard in practice. Recent studies for “complete” item responses (i.e., no missing data) have shown that ignoring the nonnormal distribution of a unidimensional latent variable, especially skewed or bimodal, can yield biased estimates and misleading conclusion. However, dealing with the bivariate nonnormal latent variable distribution with present MNAR data has not been looked into. This article proposes to extend unidimensional empirical histogram and Davidian curve methods to simultaneously deal with nonnormal latent variable distribution and MNAR data. A simulation study is carried out to demonstrate the consequence of ignoring bivariate nonnormal distribution on parameter estimates, followed by an empirical analysis of “don’t know” item responses. The results presented in this article show that examining the assumption of bivariate nonnormal latent variable distribution should be considered as a routine for MNAR data to minimize the impact of nonnormality on parameter estimates.


2021 ◽  
pp. 000276422110216
Author(s):  
Kazimierz M. Slomczynski ◽  
Irina Tomescu-Dubrow ◽  
Ilona Wysmulek

This article proposes a new approach to analyze protest participation measured in surveys of uneven quality. Because single international survey projects cover only a fraction of the world’s nations in specific periods, researchers increasingly turn to ex-post harmonization of different survey data sets not a priori designed as comparable. However, very few scholars systematically examine the impact of the survey data quality on substantive results. We argue that the variation in source data, especially deviations from standards of survey documentation, data processing, and computer files—proposed by methodologists of Total Survey Error, Survey Quality Monitoring, and Fitness for Intended Use—is important for analyzing protest behavior. In particular, we apply the Survey Data Recycling framework to investigate the extent to which indicators of attending demonstrations and signing petitions in 1,184 national survey projects are associated with measures of data quality, controlling for variability in the questionnaire items. We demonstrate that the null hypothesis of no impact of measures of survey quality on indicators of protest participation must be rejected. Measures of survey documentation, data processing, and computer records, taken together, explain over 5% of the intersurvey variance in the proportions of the populations attending demonstrations or signing petitions.


Sign in / Sign up

Export Citation Format

Share Document