Feature Trees: Theory and Applications from Large‐scale Virtual Screening to Data Analysis

Author(s):  
Matthias Rarey ◽  
Sally Hindle ◽  
Patrick Maaß ◽  
Günther Metz ◽  
Christian Rummey ◽  
...  
2020 ◽  
Vol 27 (38) ◽  
pp. 6523-6535 ◽  
Author(s):  
Antreas Afantitis ◽  
Andreas Tsoumanis ◽  
Georgia Melagraki

Drug discovery as well as (nano)material design projects demand the in silico analysis of large datasets of compounds with their corresponding properties/activities, as well as the retrieval and virtual screening of more structures in an effort to identify new potent hits. This is a demanding procedure for which various tools must be combined with different input and output formats. To automate the data analysis required we have developed the necessary tools to facilitate a variety of important tasks to construct workflows that will simplify the handling, processing and modeling of cheminformatics data and will provide time and cost efficient solutions, reproducible and easier to maintain. We therefore develop and present a toolbox of >25 processing modules, Enalos+ nodes, that provide very useful operations within KNIME platform for users interested in the nanoinformatics and cheminformatics analysis of chemical and biological data. With a user-friendly interface, Enalos+ Nodes provide a broad range of important functionalities including data mining and retrieval from large available databases and tools for robust and predictive model development and validation. Enalos+ Nodes are available through KNIME as add-ins and offer valuable tools for extracting useful information and analyzing experimental and virtual screening results in a chem- or nano- informatics framework. On top of that, in an effort to: (i) allow big data analysis through Enalos+ KNIME nodes, (ii) accelerate time demanding computations performed within Enalos+ KNIME nodes and (iii) propose new time and cost efficient nodes integrated within Enalos+ toolbox we have investigated and verified the advantage of GPU calculations within the Enalos+ nodes. Demonstration data sets, tutorial and educational videos allow the user to easily apprehend the functions of the nodes that can be applied for in silico analysis of data.


Author(s):  
Eun-Young Mun ◽  
Anne E. Ray

Integrative data analysis (IDA) is a promising new approach in psychological research and has been well received in the field of alcohol research. This chapter provides a larger unifying research synthesis framework for IDA. Major advantages of IDA of individual participant-level data include better and more flexible ways to examine subgroups, model complex relationships, deal with methodological and clinical heterogeneity, and examine infrequently occurring behaviors. However, between-study heterogeneity in measures, designs, and samples and systematic study-level missing data are significant barriers to IDA and, more broadly, to large-scale research synthesis. Based on the authors’ experience working on the Project INTEGRATE data set, which combined individual participant-level data from 24 independent college brief alcohol intervention studies, it is also recognized that IDA investigations require a wide range of expertise and considerable resources and that some minimum standards for reporting IDA studies may be needed to improve transparency and quality of evidence.


Electronics ◽  
2021 ◽  
Vol 10 (14) ◽  
pp. 1670
Author(s):  
Waheeb Abu-Ulbeh ◽  
Maryam Altalhi ◽  
Laith Abualigah ◽  
Abdulwahab Ali Almazroi ◽  
Putra Sumari ◽  
...  

Cyberstalking is a growing anti-social problem being transformed on a large scale and in various forms. Cyberstalking detection has become increasingly popular in recent years and has technically been investigated by many researchers. However, cyberstalking victimization, an essential part of cyberstalking, has empirically received less attention from the paper community. This paper attempts to address this gap and develop a model to understand and estimate the prevalence of cyberstalking victimization. The model of this paper is produced using routine activities and lifestyle exposure theories and includes eight hypotheses. The data of this paper is collected from the 757 respondents in Jordanian universities. This review paper utilizes a quantitative approach and uses structural equation modeling for data analysis. The results revealed a modest prevalence range is more dependent on the cyberstalking type. The results also indicated that proximity to motivated offenders, suitable targets, and digital guardians significantly influences cyberstalking victimization. The outcome from moderation hypothesis testing demonstrated that age and residence have a significant effect on cyberstalking victimization. The proposed model is an essential element for assessing cyberstalking victimization among societies, which provides a valuable understanding of the prevalence of cyberstalking victimization. This can assist the researchers and practitioners for future research in the context of cyberstalking victimization.


Author(s):  
Nicolas Fischer ◽  
Ean-Jeong Seo ◽  
Sara Abdelfatah ◽  
Edmond Fleischer ◽  
Anette Klinger ◽  
...  

SummaryIntroduction Differentiation therapy is a promising strategy for cancer treatment. The translationally controlled tumor protein (TCTP) is an encouraging target in this context. By now, this field of research is still at its infancy, which motivated us to perform a large-scale screening for the identification of novel ligands of TCTP. We studied the binding mode and the effect of TCTP blockade on the cell cycle in different cancer cell lines. Methods Based on the ZINC-database, we performed virtual screening of 2,556,750 compounds to analyze the binding of small molecules to TCTP. The in silico results were confirmed by microscale thermophoresis. The effect of the new ligand molecules was investigated on cancer cell survival, flow cytometric cell cycle analysis and protein expression by Western blotting and co-immunoprecipitation in MOLT-4, MDA-MB-231, SK-OV-3 and MCF-7 cells. Results Large-scale virtual screening by PyRx combined with molecular docking by AutoDock4 revealed five candidate compounds. By microscale thermophoresis, ZINC10157406 (6-(4-fluorophenyl)-2-[(8-methoxy-4-methyl-2-quinazolinyl)amino]-4(3H)-pyrimidinone) was identified as TCTP ligand with a KD of 0.87 ± 0.38. ZINC10157406 revealed growth inhibitory effects and caused G0/G1 cell cycle arrest in MOLT-4, SK-OV-3 and MCF-7 cells. ZINC10157406 (2 × IC50) downregulated TCTP expression by 86.70 ± 0.44% and upregulated p53 expression by 177.60 ± 12.46%. We validated ZINC10157406 binding to the p53 interaction site of TCTP and replacing p53 by co-immunoprecipitation. Discussion ZINC10157406 was identified as potent ligand of TCTP by in silico and in vitro methods. The compound bound to TCTP with a considerably higher affinity compared to artesunate as known TCTP inhibitor. We were able to demonstrate the effect of TCTP blockade at the p53 binding site, i.e. expression of TCTP decreased, whereas p53 expression increased. This effect was accompanied by a dose-dependent decrease of CDK2, CDK4, CDK, cyclin D1 and cyclin D3 causing a G0/G1 cell cycle arrest in MOLT-4, SK-OV-3 and MCF-7 cells. Our findings are supposed to stimulate further research on TCTP-specific small molecules for differentiation therapy in oncology.


1983 ◽  
Vol 38 ◽  
pp. 1-9
Author(s):  
Herbert F. Weisberg

We are now entering a new era of computing in political science. The first era was marked by punched-card technology. Initially, the most sophisticated analyses possible were frequency counts and tables produced on a counter-sorter, a machine that specialized in chewing up data cards. By the early 1960s, batch processing on large mainframe computers became the predominant mode of data analysis, with turnaround time of up to a week. By the late 1960s, turnaround time was cut down to a matter of a few minutes and OSIRIS and then SPSS (and more recently SAS) were developed as general-purpose data analysis packages for the social sciences. Even today, use of these packages in batch mode remains one of the most efficient means of processing large-scale data analysis.


mSphere ◽  
2017 ◽  
Vol 2 (5) ◽  
Author(s):  
Gaorui Bian ◽  
Gregory B. Gloor ◽  
Aihua Gong ◽  
Changsheng Jia ◽  
Wei Zhang ◽  
...  

ABSTRACT We report the large-scale use of compositional data analysis to establish a baseline microbiota composition in an extremely healthy cohort of the Chinese population. This baseline will serve for comparison for future cohorts with chronic or acute disease. In addition to the expected difference in the microbiota of children and adults, we found that the microbiota of the elderly in this population was similar in almost all respects to that of healthy people in the same population who are scores of years younger. We speculate that this similarity is a consequence of an active healthy lifestyle and diet, although cause and effect cannot be ascribed in this (or any other) cross-sectional design. One surprising result was that the gut microbiota of persons in their 20s was distinct from those of other age cohorts, and this result was replicated, suggesting that it is a reproducible finding and distinct from those of other populations. The microbiota of the aged is variously described as being more or less diverse than that of younger cohorts, but the comparison groups used and the definitions of the aged population differ between experiments. The differences are often described by null hypothesis statistical tests, which are notoriously irreproducible when dealing with large multivariate samples. We collected and examined the gut microbiota of a cross-sectional cohort of more than 1,000 very healthy Chinese individuals who spanned ages from 3 to over 100 years. The analysis of 16S rRNA gene sequencing results used a compositional data analysis paradigm coupled with measures of effect size, where ordination, differential abundance, and correlation can be explored and analyzed in a unified and reproducible framework. Our analysis showed several surprising results compared to other cohorts. First, the overall microbiota composition of the healthy aged group was similar to that of people decades younger. Second, the major differences between groups in the gut microbiota profiles were found before age 20. Third, the gut microbiota differed little between individuals from the ages of 30 to >100. Fourth, the gut microbiota of males appeared to be more variable than that of females. Taken together, the present findings suggest that the microbiota of the healthy aged in this cross-sectional study differ little from that of the healthy young in the same population, although the minor variations that do exist depend upon the comparison cohort. IMPORTANCE We report the large-scale use of compositional data analysis to establish a baseline microbiota composition in an extremely healthy cohort of the Chinese population. This baseline will serve for comparison for future cohorts with chronic or acute disease. In addition to the expected difference in the microbiota of children and adults, we found that the microbiota of the elderly in this population was similar in almost all respects to that of healthy people in the same population who are scores of years younger. We speculate that this similarity is a consequence of an active healthy lifestyle and diet, although cause and effect cannot be ascribed in this (or any other) cross-sectional design. One surprising result was that the gut microbiota of persons in their 20s was distinct from those of other age cohorts, and this result was replicated, suggesting that it is a reproducible finding and distinct from those of other populations.


Sign in / Sign up

Export Citation Format

Share Document