scholarly journals Monophyletic clustering and characterization of protein families

2007 ◽  
Vol 4 (3) ◽  
pp. 89-100 ◽  
Author(s):  
Jian Zhang ◽  
Zhiyuan Zhao ◽  
Jennifer Evershed ◽  
Guoying Li

Summary A protein family contains sequences that are evolutionarily related. Generally, this is reflected by sequence similarity. There have been many attempts to organize the set of protein families into evolutionarily homogenous clusters using certain clustering methods. How do we characterize these clusters? How can we cluster protein families using these characterizations? In this work, these questions were addressed by use of a concept called group-wide co-evolution, and was exemplified by some real and simulated protein family data. The results have shown that the trend of a group of monophyletic proteins might be characterized by a normal distribution, while the strength and variability of this trend can be described by the sample mean and variance of the observed correlation coefficients after a suitable transformation. To exploit this property, we have developed a monophyletic clustering method called monophyletic k−medoids clustering. A software package written in R has been made available at http://www.kent.ac.uk/ims/personal/jz .

2020 ◽  
Vol 17 (3) ◽  
pp. 241-254
Author(s):  
Yaqiong Zhang ◽  
Zhiping Jia ◽  
Yunyang Liu ◽  
Xinwen Zhou ◽  
Yi Kong

Background: Deinagkistrodon acutus (D. acutus) and Bungarus multicinctus (B. multicinctus) as traditional medicines have been used for hundreds of years in China. The venoms of these two species have strong toxicity on the victims. Objective: The objective of this study is to reveal the profile of venom proteins and peptides of D. acutus and B. multicinctus. Method: Ultrafiltration, SDS-PAGE coupled with in-gel tryptic digestion and Liquid Chromatography- Electrospray Ionization-Tandem Mass Spectrometry (LC-ESI-MS/MS) were used to characterize proteins and peptides of venoms of D. acutus and B. multicinctus. Results: In the D. acutus venom, 67 proteins (16 protein families) were identified, and snake venom metalloproteinases (SVMPs, 38.0%) and snake venom C-type lectins (snaclecs, 36.7%) were dominated proteins. In the B. multicinctus venom, 47 proteins (15 protein families) were identified, and three-finger toxins (3FTxs, 36.3%) and Kunitz-type Serine Protease Inhibitors (KSPIs, 32.8%) were major components. In addition, both venoms contained small amounts of other proteins, such as Snake Venom Serine Proteinases (SVSPs), Phospholipases A2 (PLA2s), Cysteine-Rich Secreted Proteins (CRISPs), 5'nucleotidases (5'NUCs), Phospholipases B (PLBs), Phosphodiesterases (PDEs), Phospholipase A2 Inhibitors (PLIs), Dipeptidyl Peptidases IV (DPP IVs), L-amino Acid Oxidases (LAAOs) and Angiotensin-Converting Enzymes (ACEs). Each venom also had its unique proteins, Nerve Growth Factors (NGFs) and Hyaluronidases (HYs) in D. acutus, and Cobra Venom Factors (CVFs) in B. multicinctus. In the peptidomics, 1543 and 250 peptides were identified in the venoms of D. acutus and B. multicinctus, respectively. Some peptides showed high similarity with neuropeptides, ACE inhibitory peptides, Bradykinin- Potentiating Peptides (BPPs), LAAOs and movement related peptides. Conclusion: Characterization of venom proteins and peptides of D. acutus and B. multicinctus will be helpful for the treatment of envenomation and drug discovery.


Antioxidants ◽  
2021 ◽  
Vol 10 (5) ◽  
pp. 750
Author(s):  
Leyre Notario-Barandiaran ◽  
Eva-María Navarrete-Muñoz ◽  
Desirée Valera-Gran ◽  
Elena Hernández-Álvarez ◽  
Encarnación Donoso-Navarro ◽  
...  

Reliable tools to evaluate diet are needed, particularly in life periods such as adolescence in which a rapid rate of growth and development occurs. We assessed the biochemical validity of a self-administered food frequency questionnaire (FFQ) in a sample of Spanish male adolescents using carotenoids and vitamin E and D data. We analyzed data from 122 male adolescents aged 15–17 years of the INMA-Granada birth cohort study. Adolescents answered a 104-item FFQ and provided a non-fasting blood sample. Mean daily nutrient intakes and serum concentration were estimated for main carotenoids (lutein-zeaxanthin, β-cryptoxanthin, lycopene, α-carotene and β-carotene), vitamins E and D and also for fruit and vegetable intake. Pearson correlation coefficients (r) and the percentage of agreement (same or adjacent quintiles) between serum vitamin concentrations and energy-adjusted intakes were estimated. Statistically significant correlation coefficients were observed for the total carotenoids (r = 0.40) and specific carotenoids, with the highest correlation observed for lutein–zeaxanthin (r = 0.42) and the lowest for β-carotene (0.23). The correlation coefficient between fruit and vegetable intake and serum carotenoids was 0.29 (higher for vegetable intake, r = 0.33 than for fruit intake, r = 0.19). Low correlations were observed for vitamin E and D. The average percentage of agreement for carotenoids was 55.8%, and lower for vitamin E and D (50% and 41%, respectively). The FFQ may be an acceptable tool for dietary assessment among male adolescents in Spain.


2021 ◽  
Vol 10 (3) ◽  
pp. 161
Author(s):  
Hao-xuan Chen ◽  
Fei Tao ◽  
Pei-long Ma ◽  
Li-na Gao ◽  
Tong Zhou

Spatial analysis is an important means of mining floating car trajectory information, and clustering method and density analysis are common methods among them. The choice of the clustering method affects the accuracy and time efficiency of the analysis results. Therefore, clarifying the principles and characteristics of each method is the primary prerequisite for problem solving. Taking four representative spatial analysis methods—KMeans, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Clustering by Fast Search and Find of Density Peaks (CFSFDP), and Kernel Density Estimation (KDE)—as examples, combined with the hotspot spatiotemporal mining problem of taxi trajectory, through quantitative analysis and experimental verification, it is found that DBSCAN and KDE algorithms have strong hotspot discovery capabilities, but the heat regions’ shape of DBSCAN is found to be relatively more robust. DBSCAN and CFSFDP can achieve high spatial accuracy in calculating the entrance and exit position of a Point of Interest (POI). KDE and DBSCAN are more suitable for the classification of heat index. When the dataset scale is similar, KMeans has the highest operating efficiency, while CFSFDP and KDE are inferior. This paper resolves to a certain extent the lack of scientific basis for selecting spatial analysis methods in current research. The conclusions drawn in this paper can provide technical support and act as a reference for the selection of methods to solve the taxi trajectory mining problem.


2015 ◽  
Vol 17 (5) ◽  
pp. 719-732
Author(s):  
Dulakshi Santhusitha Kumari Karunasingha ◽  
Shie-Yui Liong

A simple clustering method is proposed for extracting representative subsets from lengthy data sets. The main purpose of the extracted subset of data is to use it to build prediction models (of the form of approximating functional relationships) instead of using the entire large data set. Such smaller subsets of data are often required in exploratory analysis stages of studies that involve resource consuming investigations. A few recent studies have used a subtractive clustering method (SCM) for such data extraction, in the absence of clustering methods for function approximation. SCM, however, requires several parameters to be specified. This study proposes a clustering method, which requires only a single parameter to be specified, yet it is shown to be as effective as the SCM. A method to find suitable values for the parameter is also proposed. Due to having only a single parameter, using the proposed clustering method is shown to be orders of magnitudes more efficient than using SCM. The effectiveness of the proposed method is demonstrated on phase space prediction of three univariate time series and prediction of two multivariate data sets. Some drawbacks of SCM when applied for data extraction are identified, and the proposed method is shown to be a solution for them.


2001 ◽  
Vol 183 (14) ◽  
pp. 4167-4175 ◽  
Author(s):  
David W. Hunnicutt ◽  
Mark J. McBride

ABSTRACT Cells of Flavobacterium johnsoniae move over surfaces by a process known as gliding motility. The mechanism of this form of motility is not known. Cells of F. johnsoniaepropel latex spheres along their surfaces, which is thought to be a manifestation of the motility machinery. Three of the genes that are required for F. johnsoniae gliding motility,gldA, gldB, and ftsX, have recently been described. Tn4351 mutagenesis was used to identify another gene, gldD, that is needed for gliding. Tn4351-induced gldD mutants formed nonspreading colonies, and cells failed to glide. They also lacked the ability to propel latex spheres and were resistant to bacteriophages that infect wild-type cells. Introduction of wild-type gldD into the mutants restored motility, ability to propel latex spheres, and sensitivity to bacteriophage infection. gldD codes for a cytoplasmic membrane protein that does not exhibit strong sequence similarity to proteins of known function. gldE, which lies immediately upstream ofgldD, encodes another cytoplasmic membrane protein that may be involved in gliding motility. Overexpression ofgldE partially suppressed the motility defects of agldB point mutant, suggesting that GldB and GldE may interact. GldE exhibits sequence similarity to Borrelia burgdorferi TlyC and Salmonella enterica serovar Typhimurium CorC.


2018 ◽  
Vol 6 (2) ◽  
Author(s):  
Elly Muningsih - AMIK BSI Yogyakarta

Abstract ~ The K-Means method is one of the clustering methods that is widely used in data clustering research. While the K-Medoids method is an efficient method used for processing small data. This study aims to compare two clustering methods by grouping customers into 3 clusters according to their characteristics, namely very potential (loyal) customers, potential customers and non potential customers. The method used in this study is the K-Means clustering method and the K-Medoids method. The data used is online sales transaction. The clustering method testing is done by using a Fuzzy RFM (Recency, Frequenty and Monetary) model where the average (mean) of the third value is taken. From the data testing is known that the K-Means method is better than the K-Medoids method with an accuracy value of 90.47%. Whereas from the data processing carried out is known that cluster 1 has 16 members (customers), cluster 2 has 11 members and cluster 3 has 15 members. Keywords : clustering, K-Means method, K-Medoids method, customer, Fuzzy RFM model. Abstrak ~ Metode K-Means merupakan salah satu metode clustering yang banyak digunakan dalam penelitian pengelompokan data. Sedangkan metode K-Medoids merupakan metode yang efisien digunakan untuk pengolahan data yang kecil. Penelitian ini bertujuan untuk membandingkan atau mengkomparasi dua metode clustering dengan cara mengelompokkan pelanggan menjadi 3 cluster sesuai dengan karakteristiknya, yaitu pelanggan sangat potensial (loyal), pelanggan potensial dan pelanggan kurang (tidak) potensial. Metode yang digunakan dalam penelitian ini adalah metode clustering K-Means dan metode K-Medoids. Data yang digunakan adalah data transaksi penjualan online. Pengujian metode clustering yang dilakukan adalah dengan menggunakan model Fuzzy RFM (Recency, Frequenty dan Monetary) dimana diambil rata-rata (mean) dari nilai ketiga tersebut. Dari pengujian data diketahui bahwa metode K-Means lebih baik dari metode K-Medoids dengan nilai akurasi 90,47%. Sedangkan dari pengolahan data yang dilakukan diketahui bahwa cluster 1 memiliki 16 anggota (pelanggan), cluster 2 memiliki 11 anggota dan cluster 3 memiliki 15 anggota. Kata kunci : clustering, metode K-Means, metode K-Medoids, pelanggan, model Fuzzy RFM.


Sign in / Sign up

Export Citation Format

Share Document