scholarly journals Parsing Expression Grammars and Their Induction Algorithm

2020 ◽  
Vol 10 (23) ◽  
pp. 8747
Author(s):  
Wojciech Wieczorek ◽  
Olgierd Unold ◽  
Łukasz Strąk

Grammatical inference (GI), i.e., the task of finding a rule that lies behind given words, can be used in the analyses of amyloidogenic sequence fragments, which are essential in studies of neurodegenerative diseases. In this paper, we developed a new method that generates non-circular parsing expression grammars (PEGs) and compares it with other GI algorithms on the sequences from a real dataset. The main contribution of this paper is a genetic programming-based algorithm for the induction of parsing expression grammars from a finite sample. The induction method has been tested on a real bioinformatics dataset and its classification performance has been compared to the achievements of existing grammatical inference methods. The evaluation of the generated PEG on an amyloidogenic dataset revealed its accuracy when predicting amyloid segments. We show that the new grammatical inference algorithm achieves the best ACC (Accuracy), AUC (Area under ROC curve), and MCC (Mathew’s correlation coefficient) scores in comparison to five other automata or grammar learning methods.

2019 ◽  
Vol 3 (2) ◽  
pp. 11-18
Author(s):  
George Mweshi

Extracting useful and novel information from the large amount of collected data has become a necessity for corporations wishing to maintain a competitive advantage. One of the biggest issues in handling these significantly large datasets is the curse of dimensionality. As the dimension of the data increases, the performance of the data mining algorithms employed to mine the data deteriorates. This deterioration is mainly caused by the large search space created as a result of having irrelevant, noisy and redundant features in the data. Feature selection is one of the various techniques that can be used to remove these unnecessary features. Feature selection consequently reduces the dimension of the data as well as the search space which in turn increases the efficiency and the accuracy of the mining algorithms. In this paper, we investigate the ability of Genetic Programming (GP), an evolutionary algorithm searching strategy capable of automatically finding solutions in complex and large search spaces, to perform feature selection. We implement a basic GP algorithm and perform feature selection on 5 benchmark classification datasets from UCI repository. To test the competitiveness and feasibility of the GP approach, we examine the classification performance of four classifiers namely J48, Naives Bayes, PART, and Random Forests using the GP selected features, all the original features and the features selected by the other commonly used feature selection techniques i.e. principal component analysis, information gain, relief-f and cfs. The experimental results show that not only does GP select a smaller set of features from the original features, classifiers using GP selected features achieve a better classification performance than using all the original features. Furthermore, compared to the other well-known feature selection techniques, GP achieves very competitive results.


2020 ◽  
Author(s):  
Harith Al-Sahaf ◽  
A Song ◽  
K Neshatian ◽  
Mengjie Zhang

Image classification is a complex but important task especially in the areas of machine vision and image analysis such as remote sensing and face recognition. One of the challenges in image classification is finding an optimal set of features for a particular task because the choice of features has direct impact on the classification performance. However the goodness of a feature is highly problem dependent and often domain knowledge is required. To address these issues we introduce a Genetic Programming (GP) based image classification method, Two-Tier GP, which directly operates on raw pixels rather than features. The first tier in a classifier is for automatically defining features based on raw image input, while the second tier makes decision. Compared to conventional feature based image classification methods, Two-Tier GP achieved better accuracies on a range of different tasks. Furthermore by using the features defined by the first tier of these Two-Tier GP classifiers, conventional classification methods obtained higher accuracies than classifying on manually designed features. Analysis on evolved Two-Tier image classifiers shows that there are genuine features captured in the programs and the mechanism of achieving high accuracy can be revealed. The Two-Tier GP method has clear advantages in image classification, such as high accuracy, good interpretability and the removal of explicit feature extraction process. © 2012 IEEE.


Author(s):  
Altaf Ahmad Bhat ◽  
Anjum Shamim ◽  
Sabeeha Gul ◽  
Rukaya Akther ◽  
Iqra Bhat

<strong>Background:</strong>Neonatal sepsis continues to be a major cause of morbidity and mortality in India, but is treatable if diagnosis is made in time.<p><strong>Objectives:</strong> The present study was undertaken to evaluate and highlight the importance of procalcitonin v\s CRP in early detection of neonatal sepsis.</p><p><strong>Materials and Methods:</strong> The prospective study enrolled 150 neonates who had maternal risk factors and clinically suspected of infection (study group). Abnormal total leukocyte count, abnormal total polymorphonuclear neutrophils (PMN) count, elevated immature PMN count, elevated immature: Total (I:T) PMN ratio, platelet count ≤150,000/mm3, and pronounced degenerative or toxic changes in PMN were noted by the pathologist who were blind for the clinical status of the baby in NICU. Blood culture was taken as a gold standard for septicemia. The perinatal history, clinical profile and laboratory data were recorded and correlated in each case. Each hematological parameter was assessed for its individual performance and also with the culture-proven sepsis. Sensitivity, specificity, positive and negative predictive values (NPVs) were calculated for each parameter and for different gestational ages. P value was also calculated for different parameters.</p><p><strong>Results:</strong> Among 150 babies evaluated for sepsis in NICU over a period of one year, Procalcitonin is observed as better early marker of neonatal sepsis over and CRP:- Procalcitonin in comparison with CRP: - Sensitivity was 97% Specificity was 59% PPV was 70% and NPV was 99.9%. With area under ROC curve being 0.915(p-value of 0.02) CRP in comparison with Procalcitonin: - Sensitivity was 75% Specificity was 75% PPV was 86% and NPV was 99%. With area under ROC curve being 0.769 (p- value 0.61).</p><p><strong>Conclusion:</strong> The sensitivities of the screening test namely C-reactive protein and Procalcitonin were found to be satisfactory in identifying neonatal sepsis. Comparing to Other test procalcitonin appears to be simple and feasible diagnostic tool although costly.</p>


2007 ◽  
Vol 135 (1-2) ◽  
pp. 31-37 ◽  
Author(s):  
Zorana Penezic ◽  
Milos Zarkovic ◽  
Svetlana Vujovic ◽  
Miomira Ivovic ◽  
Biljana Beleslin ◽  
...  

Introduction: Diagnosis and differential diagnosis of Cushing?s syndrome (CS) remain considerable challenge in endocrinology. For more than 20 years, CRH has been widely used as differential diagnostic test. Following the CRH administration, the majority of patients with ACTH secreting pituitary adenoma show a significant rise of plasma cortisol and ACTH, whereas those with ectopic ACTH secretion characteristically do not. Objective The aim of our study was to assess the value of CRF test for differential diagnosis of CS using the ROC (receiver operating characteristic) curve method. Method A total of 30 patients with CS verified by pathological examination and postoperative testing were evaluated. CRH test was performed within diagnostic procedures. ACTH secreting pituitary adenoma was found in 18, ectopic ACTH secretion in 3 and cortisol secreting adrenal adenoma in 9 of all patients with CS. Cortisol and ACTH were determined -15, 0, 15, 30, 45, 60, 90 and 120 min. after i.v. administration of 100?g of ovine CRH. Cortisol and ACTH were determined by commercial RIA. Statistical data processing was done by ROC curve analysis. Due to small number, the patients with ectopic ACTH secretion were excluded from test evaluation by ROC curve method. Results In evaluated subgroups, basal cortisol was (1147.3?464.3 vs. 1589.8?296.3 vs. 839.2?405.6 nmol/L); maximal stimulated cortisol (1680.3?735.5 vs. 1749.0?386.6 vs. 906.1?335.0 nmol/L); and maximal increase as a percent of basal cortisol (49.1?36.9 vs. 9.0?7.6 vs. 16.7?37.3 %). Consequently, basal ACTH was (100.9 ?85.0 vs. 138.0?123.7 vs. 4.8?4.3 pg/mL) and maximal stimulated ACTH (203.8 ?160.1 vs. 288.0?189.5 vs. 7.4?9.2 pg/mL). For cortisol, determination area under ROC curve was 0.815?0.083 (CI 95% 0.652-0.978). For cortisol increase cut-off level of 20%, test sensitivity was 83%, with specificity of 78%. For ACTH, determination area under ROC curve was 0.637?0.142 (CI 95% 0.359-0.916). For ACTH increase cut-off level of 30%, test sensitivity was 70%, with specificity of 57%. Conclusion Determination of cortisol and ACTH levels in CRH test remains reliable tool in differential diagnosis of Cushing?s syndrome.


2020 ◽  
Vol 58 (05) ◽  
pp. 439-444 ◽  
Author(s):  
Anne Kerstin Thomann ◽  
Lucas-Alexander Schulte ◽  
Anna-Maria Globig ◽  
Peter Hoffmann ◽  
Thomas Klag ◽  
...  

Abstract Background and aim The role of therapeutic drug monitoring (TDM) in ustekinumab (UST) therapy for Crohn’s disease (CD) has not been established, as only few studies have analyzed the relationship between UST serum concentrations and clinical outcome. In this pilot study, we retrospectively examined the potential of UST-concentrations (cUST) 8 weeks after induction (cUSTw8) to predict clinical response at week 16. Methods Serum samples and clinical data from patients (n = 72) with moderate to severely active CD who received intravenous induction with UST were retrospectively analyzed. cUST were quantitated using liquid chromatography-tandem mass spectrometry (LC-MSMS). A receiver-operating characteristic (ROC) curve and area under ROC curve (AUROC) was computed to analyze the predictive potential of cUSTw8 for clinical response at week 16 and to determine the minimal therapeutic UST trough concentration. Results Forty-four patients (61 %) achieved clinical response to UST therapy at week 16. cUSTw8 was moderately effective to predict clinical response with a minimal therapeutic cUSTw8 of 2.0 mg/l (AUC 0.72, p = 0.001). Conclusion Trough concentrations of UST 8 weeks after induction predict clinical response to therapy in week 16 with moderate sensitivity and specificity. TDM using LC-MSMS could prove beneficial in personalized UST therapy of patients with CD by identifying individuals with subtherapeutic concentrations who might benefit from dose escalation.


2019 ◽  
Vol 2019 ◽  
pp. 1-12 ◽  
Author(s):  
Liyun Su ◽  
Li Deng ◽  
Wanlin Zhu ◽  
Shengli Zhao

With the development in communications, the weak pulse signal is submerged in chaotic noise, which is very common in seismic monitoring and detection of ocean clutter targets, and is very difficult to detect and extract. Based on the threshold autoregressive model, pulse linear form, Markov chain Monte Carlo (MCMC), and profile least squares (PrLS) algorithm, phase threshold autoregressive (PTAR) model and double layer threshold autoregressive (DLTAR) model are proposed for detection and extraction of weak pulse signals in chaotic noise, respectively. Firstly, based on noisy chaotic observation, phase space is reconstructed according to Takens’s delay embedding theorem, and the phase threshold autoregressive (PTAR) model is presented to detect weak pulse signals, and then the MCMC algorithm is applied to estimate parameters in the PTAR model; lastly, we obtain one-step prediction error, which is used to realize adaptively detection of weak signals with the hypothesis test. Secondly, a linear form for the pulse signal and PTAR model is fused to build a DLTAR model to extract weak pulse signals. The DLTAR model owns two kinds of parameters, which are affected mutually. Here, the PrLS algorithm is applied to estimate parameters of the DLTAR model and ultimately extract weak pulse signals. Finally, accurate rate (Acc), receiver operating characteristic (ROC) curve, and area under ROC curve (AUC) are used as the detector performance index; mean square error (MSE), mean absolute percent error (MAPE), and relative error (Re) are used as the extraction accuracy index. The presented scheme does not need prior knowledge of chaotic noise and weak pulse signals, and simulation results show that the proposed PTAR-DLTAR model is significantly effective for detection and extraction of weak pulse signals under chaotic interference. Specifically, in very low signal-to-interference ratio (SIR), weak pulse signals can be detected and extracted compared with support vector machine (SVM) class and neural network model.


2011 ◽  
Vol 19 (1) ◽  
pp. 137-166 ◽  
Author(s):  
Andrew R. McIntyre ◽  
Malcolm I. Heywood

Intuitively population based algorithms such as genetic programming provide a natural environment for supporting solutions that learn to decompose the overall task between multiple individuals, or a team. This work presents a framework for evolving teams without recourse to prespecifying the number of cooperating individuals. To do so, each individual evolves a mapping to a distribution of outcomes that, following clustering, establishes the parameterization of a (Gaussian) local membership function. This gives individuals the opportunity to represent subsets of tasks, where the overall task is that of classification under the supervised learning domain. Thus, rather than each team member representing an entire class, individuals are free to identify unique subsets of the overall classification task. The framework is supported by techniques from evolutionary multiobjective optimization (EMO) and Pareto competitive coevolution. EMO establishes the basis for encouraging individuals to provide accurate yet nonoverlaping behaviors; whereas competitive coevolution provides the mechanism for scaling to potentially large unbalanced datasets. Benchmarking is performed against recent examples of nonlinear SVM classifiers over 12 UCI datasets with between 150 and 200,000 training instances. Solutions from the proposed coevolutionary multiobjective GP framework appear to provide a good balance between classification performance and model complexity, especially as the dataset instance count increases.


2017 ◽  
Vol 2017 ◽  
pp. 1-10 ◽  
Author(s):  
Liping Pan ◽  
Fei Liu ◽  
Jinli Zhang ◽  
Xinting Yang ◽  
Shiqi Zheng ◽  
...  

The aim of this study was to examine the performance of T-SPOT.TB on cerebrospinal fluid (CSF) and peripheral blood (PB) in diagnosis of tuberculous meningitis (TBM) in China. Of 100 patients with presumed TBM prospectively enrolled from Sep 2012 to Oct 2014, 53 were TBM (21 definite and 32 probable TBM cases) and 37 were non-TBM cases; the other 10 patients were excluded from analysis due to inconclusive diagnosis, no sufficient CSF samples, or incomplete follow-up. T-SPOT.TB on CSF and PB and routine laboratory tests of CSF were performed simultaneously. The receiver operating characteristic (ROC) curve and cut-off value of CSF T-SPOT.TB and routine CSF parameters were established between TBM and non-TBM group. The area under ROC curve (AUC) of the T-SPOT.TB on CSF and PB was 0.81 and 0.89, which was higher than that of the routine CSF parameters (AUC 0.67–0.77). Although the sensitivity of CSF T-SPOT.TB was lower than that of PB T-SPOT.TB (60.8% versus 90.6%,P<0.001), the specificity of CSF T-SPOT.TB was significantly higher than that of PB T-SPOT.TB (97.2% versus 75.7%,P=0.007). These results indicated that the diagnostic accuracies of PB and CSF T-SPOT.TB are higher than routine laboratory tests. Furthermore, the higher specificity of CSF T-SPOT.TB makes it a useful rule-in test in rapid diagnosis of TBM.


Sign in / Sign up

Export Citation Format

Share Document