scholarly journals SMRT: Randomized Data Transformation for Cancer Subtyping and Big Data Analysis

2021 ◽  
Vol 11 ◽  
Author(s):  
Hung Nguyen ◽  
Duc Tran ◽  
Bang Tran ◽  
Monikrishna Roy ◽  
Adam Cassell ◽  
...  

Cancer is an umbrella term that includes a range of disorders, from those that are fast-growing and lethal to indolent lesions with low or delayed potential for progression to death. The treatment options, as well as treatment success, are highly dependent on the correct subtyping of individual patients. With the advancement of high-throughput platforms, we have the opportunity to differentiate among cancer subtypes from a holistic perspective that takes into consideration phenomena at different molecular levels (mRNA, methylation, etc.). This demands powerful integrative methods to leverage large multi-omics datasets for a better subtyping. Here we introduce Subtyping Multi-omics using a Randomized Transformation (SMRT), a new method for multi-omics integration and cancer subtyping. SMRT offers the following advantages over existing approaches: (i) the scalable analysis pipeline allows researchers to integrate multi-omics data and analyze hundreds of thousands of samples in minutes, (ii) the ability to integrate data types with different numbers of patients, (iii) the ability to analyze un-matched data of different types, and (iv) the ability to offer users a convenient data analysis pipeline through a web application. We also improve the efficiency of our ensemble-based, perturbation clustering to support analysis on machines with memory constraints. In an extensive analysis, we compare SMRT with eight state-of-the-art subtyping methods using 37 TCGA and two METABRIC datasets comprising a total of almost 12,000 patient samples from 28 different types of cancer. We also performed a number of simulation studies. We demonstrate that SMRT outperforms other methods in identifying subtypes with significantly different survival profiles. In addition, SMRT is extremely fast, being able to analyze hundreds of thousands of samples in minutes. The web application is available at http://SMRT.tinnguyen-lab.com. The R package will be deposited to CRAN as part of our PINSPlus software suite.

2021 ◽  
Author(s):  
Cecilia Noecker ◽  
Alexander Eng ◽  
Elhanan Borenstein

Motivation: Recent technological developments have facilitated an expansion of microbiome-metabolome studies, in which a set of microbiome samples are assayed using both genomic and metabolomic technologies to characterize the composition of microbial taxa and the concentrations of various metabolites. A common goal of many of these studies is to identify microbial features (species or genes) that contribute to differences in metabolite levels across samples. Previous work indicated that integrating these datasets with reference knowledge on microbial metabolic capacities may enable more precise and confident inference of such microbe-metabolite links. Results: We present MIMOSA2, an R package and web application for model-based integrative analysis of microbiome-metabolome datasets. MIMOSA2 uses reference databases to construct a community metabolic model based on microbiome data and uses this model to predict differences in metabolite levels across samples. These predictions are compared with metabolomics data to identify putative microbiome-governed metabolites and specific taxonomic contributors to metabolite variation. MIMOSA2 supports various input data types and can be customized to incorporate user-defined metabolic pathways. We demonstrate MIMOSA2's ability to identify ground truth microbial mechanisms in simulation datasets, and compare its results with experimentally inferred mechanisms in a dataset describing honeybee gut microbiota. Overall, MIMOSA2 combines reference databases, a validated statistical framework, and a user-friendly interface to facilitate modeling and evaluating relationships between members of the microbiota and their metabolic products. Availability and Implementation: MIMOSA2 is implemented in R under the GNU General Public License v3.0 and is freely available as a web server and R package from www.borensteinlab.com/software_MIMOSA2.html.


Author(s):  
A. S. Glotov ◽  
P. Yu. Kozyulina ◽  
E. S. Vashukova ◽  
R. A. Illarionov ◽  
N. O. Yurkina ◽  
...  

Aim. To study changes in the level of piRNA in plasma and serum of pregnant women at different stages of gestation.Material and Methods. A total of 42 samples of plasma and blood serum were obtained from seven women with physiological singleton pregnancy without obstetric and gynecological pathology. The study was carried out at three time points corresponding to 8–13, 18–25, and 30–35 weeks of pregnancy, respectively. To assess the spectrum and levels of piRNA by the NGS method, whole genome sequencing of small RNAs was carried out. Sequencing data analysis was performed using the GeneGlobe Data Analysis Center web application. Differential expression was assessed using the DESeq2 R package.Results and Discussion. The piRNA contents among all small RNAs were 2.29%, 2.61%, and 4.16% in plasma and 7.29%, 7.02%, and 10.82% in serum during the first, second, and third trimesters, respectively. The contents of the following piRNAs increased in blood plasma from the first to the third trimester: piR 000765, piR 020326, piR 019825, piR 020497, piR 015026, piR 001312, and piR 017716. The study showed that the levels of piR 000765, piR 020326, piR 019825, piR 015026, piR 020497, piR 001312, piR 017716, and piR 004153 were significantly higher in serum compared with the corresponding values in plasma whereas the content of only one molecule, piR 018849, was higher in plasma.Conclusion. This pilot work created a basis for understanding the processes of piRNA expression in plasma and serum of pregnant women and can become the foundation for the search for biomarkers of various complications in pregnancy.


2018 ◽  
Vol 3 (2) ◽  
pp. 30-38 ◽  
Author(s):  
Niels Hendrik Bech ◽  
Daniel Haverkamp

In this review, we bring to the attention of the reader three relatively unknown types of hip impingement. We explain the concept of low anterior inferior iliac spine (AIIS) impingement, also known as sub-spine impingement, ischio-femoral impingement (IFI) and pelvi-trochanteric impingement. For each type of impingement, we performed a search of relevant literature. We searched the PubMed, Medline (Ovid) and Embase databases from 1960 to March 2016. For each different type of impingement, a different search strategy was conducted. In total, 19 studies were included and described. No data analysis was performed since there was not much comparable data between studies. An overview of symptoms, clinical tests and possible surgical treatment options for the three different types of extra-articular impingement is provided. Several disorders around the hip can cause similar complaints. Therefore, we plead for a standardized classification. In young and athletic patients, in particular, there is much to gain if hip impingement is diagnosed early. Cite this article: EFORT Open Rev 2018;3:30-38. DOI: 10.1302/2058-5241.3.160068


2006 ◽  
pp. 115-127
Author(s):  
T Natkhov

The article considers recent tendencies in the development of the market of insurance in Russia. On the basis of statistical data analysis the most urgent problems of the insurance sector are formulated. Basic characteristics of different types of insurance are revealed, and measures on perfection of the insurance institution in the medium term are proposed.


Author(s):  
Franco Stellari ◽  
Peilin Song

Abstract In this paper, the development of advanced emission data analysis methodologies for IC debugging and characterization is discussed. Techniques for automated layout to emission registration and data segmentations are proposed and demonstrated using both 22 nm and 14 nm SOI test chips. In particular, gate level registration accuracy is leveraged to compare the emission of different types of gates and quickly create variability maps automatically.


2020 ◽  
Vol 79 (Suppl 1) ◽  
pp. 1405.1-1406
Author(s):  
F. Morton ◽  
J. Nijjar ◽  
C. Goodyear ◽  
D. Porter

Background:The American College of Rheumatology (ACR) and the European League Against Rheumatism (EULAR) individually and collaboratively have produced/recommended diagnostic classification, response and functional status criteria for a range of different rheumatic diseases. While there are a number of different resources available for performing these calculations individually, currently there are no tools available that we are aware of to easily calculate these values for whole patient cohorts.Objectives:To develop a new software tool, which will enable both data analysts and also researchers and clinicians without programming skills to calculate ACR/EULAR related measures for a number of different rheumatic diseases.Methods:Criteria that had been developed by ACR and/or EULAR that had been approved for the diagnostic classification, measurement of treatment response and functional status in patients with rheumatoid arthritis were identified. Methods were created using the R programming language to allow the calculation of these criteria, which were incorporated into an R package. Additionally, an R/Shiny web application was developed to enable the calculations to be performed via a web browser using data presented as CSV or Microsoft Excel files.Results:acreular is a freely available, open source R package (downloadable fromhttps://github.com/fragla/acreular) that facilitates the calculation of ACR/EULAR related RA measures for whole patient cohorts. Measures, such as the ACR/EULAR (2010) RA classification criteria, can be determined using precalculated values for each component (small/large joint counts, duration in days, normal/abnormal acute-phase reactants, negative/low/high serology classification) or by providing “raw” data (small/large joint counts, onset/assessment dates, ESR/CRP and CCP/RF laboratory values). Other measures, including EULAR response and ACR20/50/70 response, can also be calculated by providing the required information. The accompanying web application is included as part of the R package but is also externally hosted athttps://fragla.shinyapps.io/shiny-acreular. This enables researchers and clinicians without any programming skills to easily calculate these measures by uploading either a Microsoft Excel or CSV file containing their data. Furthermore, the web application allows the incorporation of additional study covariates, enabling the automatic calculation of multigroup comparative statistics and the visualisation of the data through a number of different plots, both of which can be downloaded.Figure 1.The Data tab following the upload of data. Criteria are calculated by the selecting the appropriate checkbox.Figure 2.A density plot of DAS28 scores grouped by ACR/EULAR 2010 RA classification. Statistical analysis has been performed and shows a significant difference in DAS28 score between the two groups.Conclusion:The acreular R package facilitates the easy calculation of ACR/EULAR RA related disease measures for whole patient cohorts. Calculations can be performed either from within R or by using the accompanying web application, which also enables the graphical visualisation of data and the calculation of comparative statistics. We plan to further develop the package by adding additional RA related criteria and by adding ACR/EULAR related measures for other rheumatic disorders.Disclosure of Interests:Fraser Morton: None declared, Jagtar Nijjar Shareholder of: GlaxoSmithKline plc, Consultant of: Janssen Pharmaceuticals UK, Employee of: GlaxoSmithKline plc, Paid instructor for: Janssen Pharmaceuticals UK, Speakers bureau: Janssen Pharmaceuticals UK, AbbVie, Carl Goodyear: None declared, Duncan Porter: None declared


Author(s):  
Ying Wang ◽  
Yiding Liu ◽  
Minna Xia

Big data is featured by multiple sources and heterogeneity. Based on the big data platform of Hadoop and spark, a hybrid analysis on forest fire is built in this study. This platform combines the big data analysis and processing technology, and learns from the research results of different technical fields, such as forest fire monitoring. In this system, HDFS of Hadoop is used to store all kinds of data, spark module is used to provide various big data analysis methods, and visualization tools are used to realize the visualization of analysis results, such as Echarts, ArcGIS and unity3d. Finally, an experiment for forest fire point detection is designed so as to corroborate the feasibility and effectiveness, and provide some meaningful guidance for the follow-up research and the establishment of forest fire monitoring and visualized early warning big data platform. However, there are two shortcomings in this experiment: more data types should be selected. At the same time, if the original data can be converted to XML format, the compatibility is better. It is expected that the above problems can be solved in the follow-up research.


2019 ◽  
Vol 24 (3) ◽  
pp. 213-223 ◽  
Author(s):  
Raimo Franke ◽  
Bettina Hinkelmann ◽  
Verena Fetz ◽  
Theresia Stradal ◽  
Florenz Sasse ◽  
...  

Mode of action (MoA) identification of bioactive compounds is very often a challenging and time-consuming task. We used a label-free kinetic profiling method based on an impedance readout to monitor the time-dependent cellular response profiles for the interaction of bioactive natural products and other small molecules with mammalian cells. Such approaches have been rarely used so far due to the lack of data mining tools to properly capture the characteristics of the impedance curves. We developed a data analysis pipeline for the xCELLigence Real-Time Cell Analysis detection platform to process the data, assess and score their reproducibility, and provide rank-based MoA predictions for a reference set of 60 bioactive compounds. The method can reveal additional, previously unknown targets, as exemplified by the identification of tubulin-destabilizing activities of the RNA synthesis inhibitor actinomycin D and the effects on DNA replication of vioprolide A. The data analysis pipeline is based on the statistical programming language R and is available to the scientific community through a GitHub repository.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Luis F. Iglesias-Martinez ◽  
Barbara De Kegel ◽  
Walter Kolch

AbstractReconstructing gene regulatory networks is crucial to understand biological processes and holds potential for developing personalized treatment. Yet, it is still an open problem as state-of-the-art algorithms are often not able to process large amounts of data within reasonable time. Furthermore, many of the existing methods predict numerous false positives and have limited capabilities to integrate other sources of information, such as previously known interactions. Here we introduce KBoost, an algorithm that uses kernel PCA regression, boosting and Bayesian model averaging for fast and accurate reconstruction of gene regulatory networks. We have benchmarked KBoost against other high performing algorithms using three different datasets. The results show that our method compares favorably to other methods across datasets. We have also applied KBoost to a large cohort of close to 2000 breast cancer patients and 24,000 genes in less than 2 h on standard hardware. Our results show that molecularly defined breast cancer subtypes also feature differences in their GRNs. An implementation of KBoost in the form of an R package is available at: https://github.com/Luisiglm/KBoost and as a Bioconductor software package.


2000 ◽  
Vol 57 (3) ◽  
pp. 616-627 ◽  
Author(s):  
Louis W Botsford ◽  
Charles M Paulsen

We assessed covariability among a number of spawning populations of spring-summer run chinook salmon (Oncorhynchus tshawytscha) in the Columbia River basin by computing correlations among several different types of spawner and recruit data. We accounted for intraseries correlation explicitly in judging the significance of correlations. To reduce the errors involved in computing effective degrees of freedom, we computed a generic effective degrees of freedom for each data type. In spite of the fact that several of these stocks have declined, covariability among locations using several different combinations of spawner and recruitment data indicated no basinwide covariability. There was, however, significant covariability among index populations within the three main subbasins: the Snake River, the mid-Columbia River, and the John Day River. This covariability was much stronger and more consistent in data types reflecting survival (e.g., the natural logarithm of recruits per spawner) than in data reflecting abundance (e.g., spawning escapement). We also tested a measure of survival that did not require knowing the age structure of spawners, the ratio of spawners in one year to spawners 4 years earlier. It displayed a similar spatial pattern.


Sign in / Sign up

Export Citation Format

Share Document