scholarly journals MacroSEQUEST: Efficient Candidate-Centric Searching and High-Resolution Correlation Analysis for Large-Scale Proteomics Data Sets

2010 ◽  
Vol 82 (16) ◽  
pp. 6821-6829 ◽  
Author(s):  
Brendan K. Faherty ◽  
Scott A. Gerber
2021 ◽  
Author(s):  
Andrew J Kavran ◽  
Aaron Clauset

Abstract Background: Large-scale biological data sets are often contaminated by noise, which can impede accurate inferences about underlying processes. Such measurement noise can arise from endogenous biological factors like cell cycle and life history variation, and from exogenous technical factors like sample preparation and instrument variation.Results: We describe a general method for automatically reducing noise in large-scale biological data sets. This method uses an interaction network to identify groups of correlated or anti-correlated measurements that can be combined or “filtered” to better recover an underlying biological signal. Similar to the process of denoising an image, a single network filter may be applied to an entire system, or the system may be first decomposed into distinct modules and a different filter applied to each. Applied to synthetic data with known network structure and signal, network filters accurately reduce noise across a wide range of noise levels and structures. Applied to a machine learning task of predicting changes in human protein expression in healthy and cancerous tissues, network filtering prior to training increases accuracy up to 43% compared to using unfiltered data.Conclusions: Network filters are a general way to denoise biological data and can account for both correlation and anti-correlation between different measurements. Furthermore, we find that partitioning a network prior to filtering can significantly reduce errors in networks with heterogenous data and correlation patterns, and this approach outperforms existing diffusion based methods. Our results on proteomics data indicate the broad potential utility of network filters to applications in systems biology.


2014 ◽  
Vol 11 (6) ◽  
pp. 6139-6166 ◽  
Author(s):  
T. R. Marthews ◽  
S. J. Dadson ◽  
B. Lehner ◽  
S. Abele ◽  
N. Gedney

Abstract. Modelling land surface water flow is of critical importance for simulating land-surface fluxes, predicting runoff and water table dynamics and for many other applications of Land Surface Models. Many approaches are based on the popular hydrology model TOPMODEL, and the most important parameter of this model is the well-knowntopographic index. Here we present new, high-resolution parameter maps of the topographic index for all ice-free land pixels calculated from hydrologically-conditioned HydroSHEDS data sets using the GA2 algorithm. At 15 arcsec resolution, these layers are 4× finer than the resolution of the previously best-available topographic index layers, the Compound Topographic Index of HYDRO1k (CTI). In terms of the largest river catchments occurring on each continent, we found that in comparison to our revised values, CTI values were up to 20% higher in e.g. the Amazon. We found the highest catchment means were for the Murray-Darling and Nelson-Saskatchewan rather than for the Amazon and St. Lawrence as found from the CTI. We believe these new index layers represent the most robust existing global-scale topographic index values and hope that they will be widely used in land surface modelling applications in the future.


2016 ◽  
Vol 9 (6) ◽  
pp. 1187-1213 ◽  
Author(s):  
Petra Schneidhofer ◽  
Erich Nau ◽  
Alois Hinterleitner ◽  
Agata Lugmayr ◽  
Jan Bill ◽  
...  

2018 ◽  
Author(s):  
Li Chen ◽  
Bai Zhang ◽  
Michael Schnaubelt ◽  
Punit Shah ◽  
Paul Aiyetan ◽  
...  

ABSTRACTRapid development and wide adoption of mass spectrometry-based proteomics technologies have empowered scientists to study proteins and their modifications in complex samples on a large scale. This progress has also created unprecedented challenges for individual labs to store, manage and analyze proteomics data, both in the cost for proprietary software and high-performance computing, and the long processing time that discourages on-the-fly changes of data processing settings required in explorative and discovery analysis. We developed an open-source, cloud computing-based pipeline, MS-PyCloud, with graphical user interface (GUI) support, for LC-MS/MS data analysis. The major components of this pipeline include data file integrity validation, MS/MS database search for spectral assignment, false discovery rate estimation, protein inference, determination of protein post-translation modifications, and quantitation of specific (modified) peptides and proteins. To ensure the transparency and reproducibility of data analysis, MS-PyCloud includes open source software tools with comprehensive testing and versioning for spectrum assignments. Leveraging public cloud computing infrastructure via Amazon Web Services (AWS), MS-PyCloud scales seamlessly based on analysis demand to achieve fast and efficient performance. Application of the pipeline to the analysis of large-scale iTRAQ/TMT LC-MS/MS data sets demonstrated the effectiveness and high performance of MS-PyCloud. The software can be downloaded at: https://bitbucket.org/mschnau/ms-pycloud/downloads/


1987 ◽  
Vol 121 ◽  
pp. 337-339
Author(s):  
J. P. Mücket ◽  
V. Müller

For six published high resolution QSO spectra a correlation analysis of unidentified absorption lines is performed. The two-point correlation functions typically show some quasiperiodic structure. The results allow for the interpretation that absorbing clouds lie in sheetlike structures as predicted by the pancake theory.


2008 ◽  
Vol 08 (02) ◽  
pp. 243-263 ◽  
Author(s):  
BENJAMIN A. AHLBORN ◽  
OLIVER KREYLOS ◽  
SOHAIL SHAFII ◽  
BERND HAMANN ◽  
OLIVER G. STAADT

We introduce a system that adds a foveal inset to large-scale projection displays. The effective resolution of the foveal inset projection is higher than the original display resolution, allowing the user to see more details and finer features in large data sets. The foveal inset is generated by projecting a high-resolution image onto a mirror mounted on a panCtilt unit that is controlled by the user with a laser pointer. Our implementation is based on Chromium and supports many OpenGL applications without modifications.We present experimental results using high-resolution image data from medical imaging and aerial photography.


2021 ◽  
Vol 13 (4) ◽  
pp. 692
Author(s):  
Yuwei Jin ◽  
Wenbo Xu ◽  
Ce Zhang ◽  
Xin Luo ◽  
Haitao Jia

Convolutional Neural Networks (CNNs), such as U-Net, have shown competitive performance in the automatic extraction of buildings from Very High-Resolution (VHR) aerial images. However, due to the unstable multi-scale context aggregation, the insufficient combination of multi-level features and the lack of consideration of the semantic boundary, most existing CNNs produce incomplete segmentation for large-scale buildings and result in predictions with huge uncertainty at building boundaries. This paper presents a novel network with a special boundary-aware loss embedded, called the Boundary-Aware Refined Network (BARNet), to address the gap above. The unique properties of the proposed BARNet are the gated-attention refined fusion unit, the denser atrous spatial pyramid pooling module, and the boundary-aware loss. The performance of the BARNet is tested on two popular data sets that include various urban scenes and diverse patterns of buildings. Experimental results demonstrate that the proposed method outperforms several state-of-the-art approaches in both visual interpretation and quantitative evaluations.


2021 ◽  
Vol 15 ◽  
pp. 117793222110359
Author(s):  
Saraswati Koppad ◽  
Annappa B ◽  
Georgios V Gkoutos ◽  
Animesh Acharjee

High-throughput experiments enable researchers to explore complex multifactorial diseases through large-scale analysis of omics data. Challenges for such high-dimensional data sets include storage, analyses, and sharing. Recent innovations in computational technologies and approaches, especially in cloud computing, offer a promising, low-cost, and highly flexible solution in the bioinformatics domain. Cloud computing is rapidly proving increasingly useful in molecular modeling, omics data analytics (eg, RNA sequencing, metabolomics, or proteomics data sets), and for the integration, analysis, and interpretation of phenotypic data. We review the adoption of advanced cloud-based and big data technologies for processing and analyzing omics data and provide insights into state-of-the-art cloud bioinformatics applications.


Sign in / Sign up

Export Citation Format

Share Document