scholarly journals Evaluation of F-Measure and Feature Analysis of C5.0 Implementation on Single Nucleotide Polymorphism Calling

Author(s):  
Lailan Sahrina Hasibuan ◽  
Sita Nabila ◽  
Nurul Hudachair ◽  
Muhammad Abrar Istiadi

Data growing in molecular biology has increased rapidly since Next-Generation Sequencing (NGS) technology introduced in 2000, the latest technology used to sequence DNA with high throughput. Single Nucleotide Polymorphism (SNP) is a marker based on DNA which can be used to identify organism specifically. SNPs are usually exploited for optimizing parents selection in producing high-quality seed for plant breeding. This paper discusses SNP calling underlying NGS data of cultivated soybean (Glycine max [L]. Merr) using C5.0, an improved rule-based algorithm of C4.5. The evaluation illustrated that C5.0 is better than the other rule-based algorithm CART based on f-measure. The value of f-measure using C5.0 and CART are 0.63 and 0.58. Besides of that, C5.0 is robust for imbalanced training dataset up to 1:17 but it is suffer in large training dataset. C5.0’s performance may be increased by applying bagging or the other ensemble technique as improvement of CART by applying bagging in final decision. The other important thing is using appropriate features in representing SNP candidates. Based on information gain of C5.0, this paper recommends error probability, homopolymer left, mismatch alt and mean nearby qual as features for SNP calling.

2008 ◽  
Vol 71 (12) ◽  
pp. 2559-2566 ◽  
Author(s):  
SARA LOMONACO ◽  
YI CHEN ◽  
STEPHEN J. KNABEL

Previous molecular subtyping studies have defined four epidemic clones (ECs) of Listeria monocytogenes (ECI, ECII, ECIII, and ECIV). Partial sequences of eight virulence genes were previously shown to be identical within individual ECs of L. monocytogenes. The present study was conducted to determine if the sequences of other virulence genes and virulence gene regions are also conserved within these ECs. Six additional virulence genes—bsh, hly, inlJ, lplA1, pgdA, and srtA—and three additional virulence gene regions of actA, inlA, and inlB were selected based on their role in L. monocytogenes virulence, and intragenic regions of each gene were sequenced. Sequencing was performed on a diverse set of 44 to 48 L. monocytogenes strains. Results demonstrated that the sequenced regions of the nine virulence genes were identical within each of the ECs, and 257 new single nucleotide polymorphism (SNPs) were identified. ECIII (lineage II) was easily distinguishable from the other ECs, as 238 SNPs were observed in ECIII due to its significant evolutionary divergence from lineage I. With regard to the other ECs, there were 5 SNPs that represented an informative set, since these SNPs were able to differentiate specific ECs from all other unrelated strains used in this study. This study confirms our previous finding that virulence gene sequences are highly conserved within individual ECs and contain stable SNPs that can be used to very accurately differentiate ECs of L. monocytogenes from each other and from other diverse strains.


Crop Science ◽  
2020 ◽  
Vol 60 (6) ◽  
pp. 3066-3082
Author(s):  
Amanda Avelar Oliveira ◽  
Lauro José Moreira Guimarães ◽  
Claudia Teixeira Guimarães ◽  
Paulo Evaristo de Oliveira Guimarães ◽  
Marcos de Oliveira Pinto ◽  
...  

2013 ◽  
Vol 72 (12) ◽  
pp. 2032-2038 ◽  
Author(s):  
F David Carmona ◽  
M Carmen Cénit ◽  
Lina-Marcela Diaz-Gallo ◽  
Jasper C A Broen ◽  
Carmen P Simeón ◽  
...  

ObjectiveTo evaluate whether the systemic sclerosis (SSc)-associated IRAK1 non-synonymous single-nucleotide polymorphism rs1059702 is responsible for the Xq28 association with SSc or whether there are other independent signals in the nearby methyl-CpG-binding protein 2 gene (MECP2).MethodsWe analysed a total of 3065 women with SSc and 2630 unaffected controls from five independent Caucasian cohorts. Four tag single-nucleotide polymorphisms of MECP2 (rs3027935, rs17435, rs5987201 and rs5945175) and the IRAK1 variant rs1059702 were genotyped using TaqMan predesigned assays. A meta-analysis including all cohorts was performed to test the overall effect of these Xq28 polymorphisms on SSc.ResultsIRAK1 rs1059702 and MECP2 rs17435 were associated specifically with diffuse cutaneous SSc (PFDR=4.12×10−3, OR=1.27, 95% CI 1.09 to 1.47, and PFDR=5.26×10−4, OR=1.30, 95% CI 1.14 to 1.48, respectively), but conditional logistic regression analysis showed that the association of IRAK1 rs1059702 with this subtype was explained by that of MECP2 rs17435. On the other hand, IRAK1 rs1059702 was consistently associated with presence of pulmonary fibrosis (PF), because statistical significance was observed when comparing SSc patients PF+ versus controls (PFDR=0.039, OR=1.30, 95% CI 1.07 to 1.58) and SSc patients PF+ versus SSc patients PF− (p=0.025, OR=1.26, 95% CI 1.03 to 1.55).ConclusionsOur data clearly suggest the existence of two independent signals within the Xq28 region, one located in IRAK1 related to PF and another in MECP2 related to diffuse cutaneous SSc, indicating that both genes may have an impact on the clinical outcome of the disease.


2020 ◽  
Vol 100 (4) ◽  
pp. 650-656
Author(s):  
M. Babicz ◽  
M. Pastwa ◽  
A. Kozubska-Sobocińska ◽  
B. Danielak-Czech ◽  
E. Skrzypczak ◽  
...  

The objective of the study was to analyse the association of growth hormone (GH1) and C-reactive protein (CRP) loci polymorphisms with reproductive traits in native Pulawska gilts and sows. In the GH1 locus, two mutations were identified: one in the second intron [single-nucleotide polymorphism (SNP) = rs340429823 in.742C>T, MspI], and one in the second exon (SNP = rs340087546 c.566G>A, HaeII). In the CRP locus, two mutations were found in exon 2 (SNP = rs340175625, NM_213844.2: c.1271A>G, BstNI; and SNP = rs80928546, NM_213844.2: c.788C>T, HinfI). Analysis of sexual activity showed that intensity of external estrus signs differed significantly (P ≤ 0.05) and was the most manifested in gilts with the CT (GH1_MspI) genotype during the second estrus. In case of the CRP gene, statistically significant differences (P ≤ 0.05) were found in terms of the duration of farrowing. The longest farrowings were reported for the GG (CRP_HinfI) and the TT (CRP_BstNI) genotypes and the shortest for the AA (CRP_HinfI) and CC (CRP_BstNI). The most numerous first litters were produced by sows with the AA genotype (CRP_HinfI), with significant differences (P ≤ 0.05) between the AA and GG genotypes. In turn, the CC homozygotes (CRP_BstNI) differed significantly (P ≤ 0.05) in terms of the number of piglets born and reared to day 21 in the second litters compared to the other genotype groups.


2020 ◽  
Vol 25 (39) ◽  
Author(s):  
Katharina Ziegler ◽  
Philipp Steininger ◽  
Renate Ziegler ◽  
Jörg Steinmann ◽  
Klaus Korn ◽  
...  

We found that a single nucleotide polymorphism (SNP) in the nucleoprotein gene of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) from a patient interfered with detection in a widely used commercial assay. Some 0.2% of the isolates in the EpiCoV database contain this SNP. Although SARS-CoV-2 was still detected by the other probe in the assay, this underlines the necessity of targeting two independent essential regions of a pathogen for reliable detection.


Sign in / Sign up

Export Citation Format

Share Document