scholarly journals Protamine Characterization by Top-Down Proteomics: Boosting Proteoform Identification with DBSCAN

Proteomes ◽  
2021 ◽  
Vol 9 (2) ◽  
pp. 21
Author(s):  
Gianluca Arauz-Garofalo ◽  
Meritxell Jodar ◽  
Mar Vilanova ◽  
Alberto de la Iglesia de la Iglesia Rodriguez ◽  
Judit Castillo ◽  
...  

Protamines replace histones as the main nuclear protein in the sperm cells of many species and play a crucial role in compacting the paternal genome. Human spermatozoa contain protamine 1 (P1) and the family of protamine 2 (P2) proteins. Alterations in protamine PTMs or the P1/P2 ratio may be associated with male infertility. Top-down proteomics enables large-scale analysis of intact proteoforms derived from alternative splicing, missense or nonsense genetic variants or PTMs. In contrast to current gold standard techniques, top-down proteomics permits a more in-depth analysis of protamine PTMs and proteoforms, thereby opening up new perspectives to unravel their impact on male fertility. We report on the analysis of two normozoospermic semen samples by top-down proteomics. We discuss the difficulties encountered with the data analysis and propose solutions as this step is one of the current bottlenecks in top-down proteomics with the bioinformatics tools currently available. Our strategy for the data analysis combines two software packages, ProSight PD (PS) and TopPIC suite (TP), with a clustering algorithm to decipher protamine proteoforms. We identified up to 32 protamine proteoforms at different levels of characterization. This in-depth analysis of the protamine proteoform landscape of normozoospermic individuals represents the first step towards the future study of sperm pathological conditions opening up the potential personalized diagnosis of male infertility.

2019 ◽  
Vol 3 (Supplement_1) ◽  
pp. S221-S221
Author(s):  
David Melzer ◽  
Luigi Ferrucci

Abstract Great progress has been made recently in identifying the genetic and likely biological mechanisms of aging traits in humans, thanks to the very large UK Biobank cohort of 500,000 persons. This symposium will discuss new results for known genetic loci influencing aging, including in-depth analysis of the APOE e2 “protective” allele and aging outcomes, and will highlight new pathways for follow-up. Identifying genetic variants associated with frailty (accumulation of deficits) is implicating specific pathways with possible causal effects on premature frailty, with particular emphasis on neurological traits. Large-scale analysis of physical frailty – here parametrized as muscle weakness – is shedding light on specific mechanisms that are divergent from those associated with the Rockwood-like analysis, with evidence for sex-specific pathways. At the end of this symposium, the audience should be able to better understand the processes that potentially drive aging and frailty in older adults, and the possibilities for future diagnostic and treatment modalities for delaying or reversing aging and frailty in older adults.


2018 ◽  
Vol 30 (5) ◽  
pp. 554-571 ◽  
Author(s):  
Maria Vincenza Ciasullo ◽  
Orlando Troisi ◽  
Francesca Loia ◽  
Gennaro Maione

Purpose The purpose of this paper is to provide a better understanding of the reasons why people use or do not use carpooling. A further aim is to collect and analyze empirical evidence concerning the advantages and disadvantages of carpooling. Design/methodology/approach A large-scale text analytics study has been conducted: the collection of the peoples’ opinions have been realized on Twitter by means of a dedicated web crawler, named “Twitter4J.” After their mining, the collected data have been treated through a sentiment analysis realized by means of “SentiWordNet.” Findings The big data analysis identified the 12 most frequently used concepts about carpooling by Twitter’s users: seven advantages (economic efficiency, environmental efficiency, comfort, traffic, socialization, reliability, curiosity) and five disadvantages (lack of effectiveness, lack of flexibility, lack of privacy, danger, lack of trust). Research limitations/implications Although the sample is particularly large (10 percent of the data flow published on Twitter from all over the world in about one year), the automated collection of people’s comments has prevented a more in-depth analysis of users’ thoughts and opinions. Practical implications The research findings may direct entrepreneurs, managers and policy makers to understand the variables to be leveraged and the actions to be taken to take advantage of the potential benefits that carpooling offers. Originality/value The work has utilized skills from three different areas, i.e., business management, computing science and statistics, which have been synergistically integrated for customizing, implementing and using two IT tools capable of automatically identifying, selecting, collecting, categorizing and analyzing people’s tweets about carpooling.


Genes ◽  
2021 ◽  
Vol 12 (11) ◽  
pp. 1670
Author(s):  
Hyundoo Jeong ◽  
Sungtae Shin ◽  
Hong-Gi Yeom

Single-cell sequencing provides novel means to interpret the transcriptomic profiles of individual cells. To obtain in-depth analysis of single-cell sequencing, it requires effective computational methods to accurately predict single-cell clusters because single-cell sequencing techniques only provide the transcriptomic profiles of each cell. Although an accurate estimation of the cell-to-cell similarity is an essential first step to derive reliable single-cell clustering results, it is challenging to obtain the accurate similarity measurement because it highly depends on a selection of genes for similarity evaluations and the optimal set of genes for the accurate similarity estimation is typically unknown. Moreover, due to technical limitations, single-cell sequencing includes a larger number of artificial zeros, and the technical noise makes it difficult to develop effective single-cell clustering algorithms. Here, we describe a novel single-cell clustering algorithm that can accurately predict single-cell clusters in large-scale single-cell sequencing by effectively reducing the zero-inflated noise and accurately estimating the cell-to-cell similarities. First, we construct an ensemble similarity network based on different similarity estimates, and reduce the artificial noise using a random walk with restart framework. Finally, starting from a larger number small size but highly consistent clusters, we iteratively merge a pair of clusters with the maximum similarities until it reaches the predicted number of clusters. Extensive performance evaluation shows that the proposed single-cell clustering algorithm can yield the accurate single-cell clustering results and it can help deciphering the key messages underlying complex biological mechanisms.


2019 ◽  
Vol 71 (3) ◽  
pp. 310-324
Author(s):  
Dirk Lewandowski ◽  
Sebastian Sünkler

Purpose The purpose of this paper is to describe a new method to improve the analysis of search engine results by considering the provider level as well as the domain level. This approach is tested by conducting a study using queries on the topic of insurance comparisons. Design/methodology/approach The authors conducted an empirical study that analyses the results of search queries aimed at comparing insurance companies. The authors used a self-developed software system that automatically queries commercial search engines and automatically extracts the content of the returned result pages for further data analysis. The data analysis was carried out using the KNIME Analytics Platform. Findings Google’s top search results are served by only a few providers that frequently appear in these results. The authors show that some providers operate several domains on the same topic and that these domains appear for the same queries in the result lists. Research limitations/implications The authors demonstrate the feasibility of this approach and draw conclusions for further investigations from the empirical study. However, the study is a limited use case based on a limited number of search queries. Originality/value The proposed method allows large-scale analysis of the composition of the top results from commercial search engines. It allows using valid empirical data to determine what users actually see on the search engine result pages.


Author(s):  
Gabriele Scalia

AbstractOver the last few years, machine learning has revolutionized countless areas and fields. Nowadays, AI bears promise for analyzing, extracting knowledge, and driving discovery across many scientific domains such as chemistry, biology, and genomics. However, the specific challenges posed by scientific data demand to adapt machine learning techniques to new requirements. We investigate machine learning-driven scientific data analysis, focusing on a set of key requirements. These include the management of uncertainty for complex data and models, the estimation of system properties starting from low-volume and imprecise collected data, the support to scientific model development through large-scale analysis of experimental data, and the machine learning-driven integration of complementary experimental technologies.


2018 ◽  
Vol 3 (1) ◽  
pp. 001
Author(s):  
Zulhendra Zulhendra ◽  
Gunadi Widi Nurcahyo ◽  
Julius Santony

In this study using Data Mining, namely K-Means Clustering. Data Mining can be used in searching for a large enough data analysis that aims to enable Indocomputer to know and classify service data based on customer complaints using Weka Software. In this study using the algorithm K-Means Clustering to predict or classify complaints about hardware damage on Payakumbuh Indocomputer. And can find out the data of Laptop brands most do service on Indocomputer Payakumbuh as one of the recommendations to consumers for the selection of Laptops.


Sign in / Sign up

Export Citation Format

Share Document