scholarly journals SNP Variable Selection by Generalized Graph Domination

2018 ◽  
Author(s):  
Shuzhen Sun ◽  
Zhuqi Miao ◽  
Blaise Ratcliffe ◽  
Polly Campbell ◽  
Bret Pasch ◽  
...  

AbstractHigh-throughput sequencing technology has revolutionized both medical and biological research by generating exceedingly large numbers of genetic variants. The resulting datasets share a number of common characteristics that might lead to poor generalization capacity. Concerns include noise accumulated due to the large number of predictors, sparse information regarding the p ≫ n problem, and overfitting and model mis-identification resulting from spurious collinearity. Additionally, complex correlation patterns are present among variables. As a consequence, reliable variable selection techniques play a pivotal role in predictive analysis, generalization capability, and robustness in clustering, as well as interpretability of the derived models.K-dominating set, a parameterized graph-theoretic generalization model, was used to model SNP (single nucleotide polymorphism) data as a similarity network and searched for representative SNP variables. In particular, each SNP was represented as a vertex in the graph, (dis)similarity measures such as correlation coefficients or pairwise linkage disequilibrium were estimated to describe the relationship between each pair of SNPs; a pair of vertices are adjacent, i.e. joined by an edge, if the pairwise similarity measure exceeds a user-specified threshold. A minimum K-dominating set in the SNP graph was then made as the smallest subset such that every SNP that is excluded from the subset has at least k neighbors in the selected ones. The strength ofk-dominating set selection in identifying independent variables, and in culling representative variables that are highly correlated with others, was demonstrated by a simulated dataset. The advantages of k-dominating set variable selection were also illustrated in two applications: pedigree reconstruction using SNP profiles of 1,372 Douglas-fir trees, and species delineation for 226 grasshopper mouse samples. A C++ source code that implements SNP-SELECT and uses Gurobi™ optimization solver for the k-dominating set variable selection is available (https://github.com/transgenomicsosu/SNP-SELECT).


The present study explored the relationship between spot and futures coffee prices. The Correlation and Regression analysis were carried out based on monthly observations of International Coffee Organization (ICO) indicator prices of the four groups (Colombian Milds, Other Milds, Brazilian Naturals, and Robustas) representing Spot markets and the averages of 2nd and 3rd positions of the Intercontinental Exchange (ICE) New York for Arabica and ICE Europe for Robusta representing the Futures market for the period 1990 to 2019. The study also used the monthly average prices paid to coffee growers in India from 1990 to 2019. The estimated correlation coefficients indicated both the Futures prices and Spot prices of coffee are highly correlated. Further, estimated regression coefficients revealed a very strong relationship between Futures prices and Spot prices for all four ICO group indicator prices. Hence, the ICE New York (Arabica) and ICE Europe (Robusta) coffee futures prices are very closely related to Spot prices. The estimated regression coefficients between Futures prices and the price paid to coffee growers in India confirmed the positive relationship, but the dispersion of more prices over the trend line indicates a lesser degree of correlation between the price paid to growers at India and Futures market prices during the study period.



2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Tatsuhiko Hoshino ◽  
Ryohei Nakao ◽  
Hideyuki Doi ◽  
Toshifumi Minamoto

AbstractThe combination of high-throughput sequencing technology and environmental DNA (eDNA) analysis has the potential to be a powerful tool for comprehensive, non-invasive monitoring of species in the environment. To understand the correlation between the abundance of eDNA and that of species in natural environments, we have to obtain quantitative eDNA data, usually via individual assays for each species. The recently developed quantitative sequencing (qSeq) technique enables simultaneous phylogenetic identification and quantification of individual species by counting random tags added to the 5′ end of the target sequence during the first DNA synthesis. Here, we applied qSeq to eDNA analysis to test its effectiveness in biodiversity monitoring. eDNA was extracted from water samples taken over 4 days from aquaria containing five fish species (Hemigrammocypris neglectus, Candidia temminckii, Oryzias latipes, Rhinogobius flumineus, and Misgurnus anguillicaudatus), and quantified by qSeq and microfluidic digital PCR (dPCR) using a TaqMan probe. The eDNA abundance quantified by qSeq was consistent with that quantified by dPCR for each fish species at each sampling time. The correlation coefficients between qSeq and dPCR were 0.643, 0.859, and 0.786 for H. neglectus, O. latipes, and M. anguillicaudatus, respectively, indicating that qSeq accurately quantifies fish eDNA.



2020 ◽  
Author(s):  
Silvia Morbelli ◽  
Dario Arnaldi ◽  
Eugenia Cella ◽  
Stefano Raffa ◽  
Isabella Donegani ◽  
...  

Abstract Purpose. Our aim was the head-to-head comparison between two automatic tools for semi-quantification of striatal dopamine transporter (DAT) specific-to-non displaceable (SBR) ratio brain SPECT values in a naturalistic cohort of patients. Procedures. We analyzed consecutive scans from one-hundred and fifty-one outpatients submitted to brain DAT SPECT for a suspected parkinsonism. Images were post-processed using a commercial (Datquant®) and a free (BasGanV2) software. Reading by expert was the gold-standard. A subset of patients with pathological or borderline scan was evaluated with the clinical Unified Parkinson’s disease rating scale, motor part (MDS-UPDRS-III). Results. SBR, putamen-to-caudate (P/C) ratio, and both P and C asymmetries were highly correlated between the two software with Pearson’s ‘r’ correlation coefficients ranging from .706 to .887. Correlation coefficients with the MDS-UPDRS III score were higher with caudate than with putamen SBR values with both software, and in general higher with BasGanV2 than with Datquant® . Datquant® correspondence with expert reading was 84.1% (94.0% by additionally considering the P/C ratio as a further index). BasGanV2 correspondence with expert reading was 80.8% (86.1% by additionally considering the P/C ratio). Conclusions. Both Datquant® and BasGanV2 work reasonably well and similarly one another in semi-quantification of DAT SPECT. Both tools have their own strength and pitfalls that must be known in detail by users in order to obtain the best help in visual reading and reporting of DAT SPECT.



2018 ◽  
Author(s):  
Gao Wang ◽  
Abhishek Sarkar ◽  
Peter Carbonetto ◽  
Matthew Stephens

We introduce a simple new approach to variable selection in linear regression, with a particular focus on quantifying uncertainty in which variables should be selected. The approach is based on a new model — the “Sum of Single Effects” (SuSiE) model — which comes from writing the sparse vector of regression coefficients as a sum of “single-effect” vectors, each with one non-zero element. We also introduce a corresponding new fitting procedure — Iterative Bayesian Stepwise Selection (IBSS) — which is a Bayesian analogue of stepwise selection methods. IBSS shares the computational simplicity and speed of traditional stepwise methods, but instead of selecting a single variable at each step, IBSS computes a distribution on variables that captures uncertainty in which variable to select. We provide a formal justification of this intuitive algorithm by showing that it optimizes a variational approximation to the posterior distribution under the SuSiE model. Further, this approximate posterior distribution naturally yields convenient novel summaries of uncertainty in variable selection, providing a Credible Set of variables for each selection. Our methods are particularly well-suited to settings where variables are highly correlated and detectable effects are sparse, both of which are characteristics of genetic fine-mapping applications. We demonstrate through numerical experiments that our methods outper-form existing methods for this task, and illustrate their application to fine-mapping genetic variants influencing alternative splicing in human cell-lines. We also discuss the potential and challenges for applying these methods to generic variable selection problems.



2019 ◽  
Author(s):  
Marco Bardus ◽  
Nathalie Awada ◽  
Lilian A Ghandour ◽  
Elie-Jacques Fares ◽  
Tarek Gherbal ◽  
...  

BACKGROUND With thousands of health apps in app stores globally, it is crucial to systemically and thoroughly evaluate the quality of these apps due to their potential influence on health decisions and outcomes. The Mobile App Rating Scale (MARS) is the only currently available tool that provides a comprehensive, multidimensional evaluation of app quality, which has been used to compare medical apps from American and European app stores in various areas, available in English, Italian, Spanish, and German. However, this tool is not available in Arabic. OBJECTIVE This study aimed to translate and adapt MARS to Arabic and validate the tool with a sample of health apps aimed at managing or preventing obesity and associated disorders. METHODS We followed a well-established and defined “universalist” process of cross-cultural adaptation using a mixed methods approach. Early translations of the tool, accompanied by confirmation of the contents by two rounds of separate discussions, were included and culminated in a final version, which was then back-translated into English. Two trained researchers piloted the MARS in Arabic (MARS-Ar) with a sample of 10 weight management apps obtained from Google Play and the App Store. Interrater reliability was established using intraclass correlation coefficients (ICCs). After reliability was ascertained, the two researchers independently evaluated a set of additional 56 apps. RESULTS MARS-Ar was highly aligned with the original English version. The ICCs for MARS-Ar (0.836, 95% CI 0.817-0.853) and MARS English (0.838, 95% CI 0.819-0.855) were good. The MARS-Ar subscales were highly correlated with the original counterparts (<i>P</i>&lt;.001). The lowest correlation was observed in the area of usability (<i>r</i>=0.685), followed by aesthetics (<i>r</i>=0.827), information quality (<i>r</i>=0.854), engagement (<i>r</i>=0.894), and total app quality (<i>r</i>=0.897). Subjective quality was also highly correlated (<i>r</i>=0.820). CONCLUSIONS MARS-Ar is a valid instrument to assess app quality among trained Arabic-speaking users of health and fitness apps. Researchers and public health professionals in the Arab world can use the overall MARS score and its subscales to reliably evaluate the quality of weight management apps. Further research is necessary to test the MARS-Ar on apps addressing various health issues, such as attention or anxiety prevention, or sexual and reproductive health.



2021 ◽  
Author(s):  
Subodh K. Srivastava ◽  
Leandra M. Knight ◽  
Mark K. Nakhla ◽  
Z. Gloria Abad

Phytophthora is one of the most important genera of plant pathogens with many members causing high economic losses world-wide. To build robust molecular identification systems, it is very important to have information from well-authenticated specimens and in preference the ex-type specimens. The reference genomes of well-authenticated specimens form a critical foundation for genetics, biological research, and diagnostic applications. In this study, we describe four draft Phytophthora genomes resources for the Ex-type of P. citricola BL34 (P0716 WPC) (118 contigs for 50 Mb), and well-authenticated specimens of P. syringae BL57G (P10330 WPC) (591 contigs for 75 Mb), P. hibernalis BL41G (P3822 WPC) (404 contigs for 84 Mb), and P. nicotianae BL162 (P6303 WPC) (3984 contigs for 108 Mb) generated with MinION long-read High-Throughput Sequencing (HTS) technology (Oxford Nanopore Technologies, ONT). Using the quality reads we assembled high coverage genomes of P. citricola with 291X coverage and 16,662 annotated genes; P. nicotianae with 205X coverage and 29,271 annotated genes; P. syringae with 76X coverage and 23,331 annotated genes, and P. hibernalis with 42X coverage and 21,762 annotated genes. With the availability of genomes sequences and its annotations, we predict that these draft genomes will be accommodating for various basic and applied research including diagnostics to protect global agriculture.



2021 ◽  
Vol 12 ◽  
Author(s):  
Rubén Mollá-Albaladejo ◽  
Juan A. Sánchez-Alcañiz

Among individuals, behavioral differences result from the well-known interplay of nature and nurture. Minute differences in the genetic code can lead to differential gene expression and function, dramatically affecting developmental processes and adult behavior. Environmental factors, epigenetic modifications, and gene expression and function are responsible for generating stochastic behaviors. In the last decade, the advent of high-throughput sequencing has facilitated studying the genetic basis of behavior and individuality. We can now study the genomes of multiple individuals and infer which genetic variations might be responsible for the observed behavior. In addition, the development of high-throughput behavioral paradigms, where multiple isogenic animals can be analyzed in various environmental conditions, has again facilitated the study of the influence of genetic and environmental variations in animal personality. Mainly, Drosophila melanogaster has been the focus of a great effort to understand how inter-individual behavioral differences emerge. The possibility of using large numbers of animals, isogenic populations, and the possibility of modifying neuronal function has made it an ideal model to search for the origins of individuality. In the present review, we will focus on the recent findings that try to shed light on the emergence of individuality with a particular interest in D. melanogaster.



PEDIATRICS ◽  
1956 ◽  
Vol 17 (4) ◽  
pp. 510-523
Author(s):  
M. F. Trulson ◽  
C. Collazos ◽  
D. M. Hegsted

One hundred nine school children from 2 rural areas in the coastal area of Peru were measured and weighed and roentgenograms of the hand and wrist were obtained. Three-fourths of the children were below Stuart's tenth percentile in height. Roughly, a third of the children were below the tenth percentile in weight. Fifteen per cent of the girls and 30 per cent of the boys were above the fiftieth percentile in weight. Forty to forty-five per cent of the children were in the stocky to obese channels of the Wetzel grid; 5 to 10 per cent would be classified as fair to poor, and roughly half would be considered average. Developmental age (Wetzel) was 7.5 ± 15.6 months less than chronological age for boys, 10.5 ± 11.3 months less for girls. A third of the boys and 15 per cent of the girls were advanced in Wetzel developmental age. It was apparent that the heavier children were generally advanced in Wetzel developmental age. Roentgenograms of the hand and wrist were assessed by comparing the films to the Greulich-Pyle Standards. Skeletal age was -11.3 ± 12.7 months for boys and -7.1 ± 9.8 for girls. Eighteen per cent of the population were advanced in skeletal age. Boys were more retarded than girls in skeletal age. The correlation and partial correlation coefficients for all combinations of the 4 measurements (retardation in weight, retardation in height, retardation in skeletal age and retardation in developmental age) were calculated. The various pairs were all rather highly correlated, this being particularly true of weight and Wetzel developmental age. The partial correlation coefficients show, however, that skeletal age was not closely correlated with any of the other 3 measurements. Height and developmental age were negatively correlated to a significant degree, and developmental age and weight were so closely related that they appear to be measures of the same characteristic in this population. Individual dietary histories are not available from these children, but it is known that the diets in the area are considerably below recommended levels in certain nutriients. Whether dietary deficiencies are factors in the apparently abnormal developmental patterns, or if the patterns are truly abnormal for the Peruvian child or indicate an adverse effect on health, remain to be shown. It is pointed out that there are probably advantages in studies upon growth and development in different areas of the world where a variety of dietary or environmental factors may have specific effects.



Author(s):  
Vadim Zverovich

Here, a graph-theoretic approach is applied to some problems in networks, for example in wireless sensor networks (WSNs) where some sensor nodes should be selected to behave as a backbone/dominating set to support routing communications in an efficient and fault-tolerant way. Four different types of multiple domination (k-, k-tuple, α‎- and α‎-rate domination) are considered and recent upper bounds for cardinality of these types of dominating sets are discussed. Randomized algorithms are presented for finding multiple dominating sets whose expected size satisfies the upper bounds. Limited packings in networks are studied, in particular the k-limited packing number. One possible application of limited packings is a secure facility location problem when there is a need to place as many resources as possible in a given network subject to some security constraints. The last section is devoted to two general frameworks for multiple domination: <r,s>-domination and parametric domination. Finally, different threshold functions for multiple domination are considered.



2019 ◽  
Vol 21 (Supplement_6) ◽  
pp. vi165-vi166 ◽  
Author(s):  
Reuben R Shamir ◽  
Ze’ev Bomzon

Abstract Tumor treating fields (TTFields) is an FDA approved therapy for the treatment of glioblastoma multiform (GBM), malignant pleural mesothelioma (MPM), and currently being investigated for additional tumor types. TTFields are delivered to the tumor through the placement of transducer arrays (TAs) placed on the patient’s shaved scalp. The positions of the TAs are associated with treatment outcomes via simulations of the electric fields. Therefore, we are currently developing a method for recommending optimal placement of TAs. A key step to achieve this goal is to correctly segment the head into tissues of similar electrical properties. Visual inspection of segmentation quality is invaluable but time-consuming. Automatic quality assessment can assist in automatic refinement of the segmentation parameters, suggest flaw points to the user and indicate if the segmented method is of sufficient accuracy for TTFields simulation. As a first step in this direction, we identified a set of features that are relevant to atlas-based segmentation and show that these are significantly correlated (p < 0.05) with a similarity measure between gold-standard and automatically computed segmentations. Furthermore, we incorporated these features in a decision tree regressor to predict the similarity of the gold-standard and computed segmentations of 20 TTFields patients using a leave-one-out approach. The predicted similarity measures were highly correlated with the actual ones (average absolute difference 3% (SD = 3%); r = 0.92, p < 0.001). We conclude that automatic quality estimation of segmentations is feasible by incorporating segmentation-relevant features with statistical and machine learning methods, such as decision tree regressor.



Sign in / Sign up

Export Citation Format

Share Document