protein dataset
Recently Published Documents


TOTAL DOCUMENTS

18
(FIVE YEARS 9)

H-INDEX

3
(FIVE YEARS 1)

2021 ◽  
Vol 19 (3) ◽  
pp. e27
Author(s):  
Pierre Larmande ◽  
Yusha Liu ◽  
Xinzhi Yao ◽  
Jingbo Xia

Due to the rapid evolution of high-throughput technologies, a tremendous amount of data is being produced in the biological domain, which poses a challenging task for information extraction and natural language understanding. Biological named entity recognition (NER) and named entity normalisation (NEN) are two common tasks aiming at identifying and linking biologically important entities such as genes or gene products mentioned in the literature to biological databases. In this paper, we present an updated version of OryzaGP, a gene and protein dataset for rice species created to help natural language processing (NLP) tools in processing NER and NEN tasks. To create the dataset, we selected more than 15,000 abstracts associated with articles previously curated for rice genes. We developed four dictionaries of gene and protein names associated with database identifiers. We used these dictionaries to annotate the dataset. We also annotated the dataset using pre-trained NLP models. Finally, we analysed the annotation results and discussed how to improve OryzaGP.


2021 ◽  
Vol 8 ◽  
Author(s):  
Pin Huang ◽  
Haoming Xing ◽  
Xun Zou ◽  
Qi Han ◽  
Ke Liu ◽  
...  

We propose a method based on neural networks to accurately predict hydration sites in proteins. In our approach, high-quality data of protein structures are used to parametrize our neural network model, which is a differentiable score function that can evaluate an arbitrary position in 3D structures on proteins and predict the nearest water molecule that is not present. The score function is further integrated into our water placement algorithm to generate explicit hydration sites. In experiments on the OppA protein dataset used in previous studies and our selection of protein structures, our method achieves the highest model quality in terms of F1 score, compared to several previous studies.


2021 ◽  
Vol 7 (7) ◽  
pp. 560
Author(s):  
Sofia Dimou ◽  
Xenia Georgiou ◽  
Eleana Sarantidi ◽  
George Diallinas ◽  
Athanasios K. Anagnostopoulos

Solute and ion transporters are proteins essential for cell nutrition, detoxification, signaling, homeostasis and drug resistance. Being polytopic transmembrane proteins, they are co-translationally inserted and folded into the endoplasmic reticulum (ER) of eukaryotic cells and subsequently sorted to their final membrane destination via vesicular secretion. During their trafficking and in response to physiological/stress signals or prolonged activity, transporters undergo multiple quality control processes and regulated turnover. Consequently, transporters interact dynamically and transiently with multiple proteins. To further dissect the trafficking and turnover mechanisms underlying transporter subcellular biology, we herein describe a novel mass spectrometry-based proteomic protocol adapted to conditions allowing for maximal identification of proteins related to N source uptake in A. nidulans. Our analysis led to identification of 5690 proteins, which to our knowledge constitutes the largest protein dataset identified by omics-based approaches in Aspergilli. Importantly, we detected possibly all major proteins involved in basic cellular functions, giving particular emphasis to factors essential for membrane cargo trafficking and turnover. Our protocol is easily reproducible and highly efficient for unearthing the full A. nidulans proteome. The protein list delivered herein will form the basis for downstream systematic approaches and identification of protein–protein interactions in living fungal cells.


Data in Brief ◽  
2021 ◽  
Vol 35 ◽  
pp. 106871
Author(s):  
A.L. Rusanov ◽  
D.D. Romashin ◽  
V.G. Zgoda ◽  
T.V. Butkova ◽  
N.G. Luzgina

2021 ◽  
Author(s):  
Quenisha Baldwin ◽  
Eleni Panagiotou

Protein folding, the process by which proteins attain a 3-dimensional conformation necessary for their function, remains an important unsolved problem in biology. A major gap in our understanding is how local properties of proteins relate to their global properties. In this manuscript, we use the Writhe and Torsion to introduce a new local topological/geometrical free energy that can be associated to 4 consecutive residues along protein backbone. By analyzing a culled protein dataset from the PDB, our results show that high local topological free energy conformations are independent of sequence and may be involved in the rate limiting step in protein folding. By analyzing a set of 2-state single domain proteins, we find that the total local topological free energy of these proteins correlates with the experimentally observed folding rates reported in [29].


PLoS ONE ◽  
2020 ◽  
Vol 15 (12) ◽  
pp. e0239154
Author(s):  
Pablo Mier ◽  
Miguel A. Andrade-Navarro

Background Proteins with low complexity regions (LCRs) have atypical sequence and structural features. Their amino acid composition varies from the expected, determined proteome-wise, and they do not follow the rules of structural folding that prevail in globular regions. One way to characterize these regions is by assessing the repeatability of a sequence, that is, calculating the local propensity of a region to be part of a repeat. Results We combine two local measures of low complexity, repeatability (using the RES algorithm) and fraction of the most frequent amino acid, to evaluate different proteomes, datasets of protein regions with specific features, and individual cases of proteins with extreme compositions. We apply a representation called ‘low complexity triangle’ as a proof-of-concept to represent the low complexity measured values. Results show that proteomes have distinct signatures in the low complexity triangle, and that these signatures are associated to complexity features of the sequences. We developed a web tool called LCT (http://cbdm-01.zdv.uni-mainz.de/~munoz/lct/) to allow users to calculate the low complexity triangle of a given protein or region of interest. Conclusions The low complexity triangle proves to be a suitable procedure to represent the general low complexity of a sequence or protein dataset. Homorepeats, direpeats, compositionally biased regions and globular regions occupy characteristic positions in the triangle. The described pipeline can be used to characterize LCRs and may help in quantifying the content of degenerated tandem repeats in proteins and proteomes.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Salvatore Pisanu ◽  
Carla Cacciotto ◽  
Daniela Pagnozzi ◽  
Giulia Maria Grazia Puggioni ◽  
Sergio Uzzau ◽  
...  

Abstract Subclinical mastitis by Staphylococcus aureus (SAU) and by non-aureus staphylococci (NAS) is a major issue in the water buffalo. To understand its impact on milk, 6 quarter samples with >3,000,000 cells/mL (3 SAU-positive and 3 NAS-positive) and 6 culture-negative quarter samples with <50,000 cells/mL were investigated by shotgun proteomics and label-free quantitation. A total of 1530 proteins were identified, of which 152 were significantly changed. SAU was more impacting, with 162 vs 127 differential proteins and higher abundance changes (P < 0.0005). The 119 increased proteins had mostly structural (n = 43, 28.29%) or innate immune defence functions (n = 39, 25.66%) and included vimentin, cathelicidins, histones, S100 and neutrophil granule proteins, haptoglobin, and lysozyme. The 33 decreased proteins were mainly involved in lipid metabolism (n = 13, 59.10%) and included butyrophilin, xanthine dehydrogenase/oxidase, and lipid biosynthetic enzymes. The same biological processes were significantly affected also upon STRING analysis. Cathelicidins were the most increased family, as confirmed by western immunoblotting, with a stronger reactivity in SAU mastitis. S100A8 and haptoglobin were also validated by western immunoblotting. In conclusion, we generated a detailed buffalo milk protein dataset and defined the changes occurring in SAU and NAS mastitis, with potential for improving detection (ProteomeXchange identifier PXD012355).


PLoS ONE ◽  
2018 ◽  
Vol 13 (6) ◽  
pp. e0198170 ◽  
Author(s):  
Mohammad Uzzal Hossain ◽  
Taimur Md. Omar ◽  
Iftekhar Alam ◽  
Keshob Chandra Das ◽  
A. K. M. Mohiuddin ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document