scholarly journals Broccoli: combining phylogenetic and network analyses for orthology assignment

Author(s):  
Romain Derelle ◽  
Hervé Philippe ◽  
John K. Colbourne

AbstractOrthology assignment is a key step of comparative genomic studies, for which many bioinformatic tools have been developed. However, all gene clustering pipelines are based on the analysis of protein distances, which are subject to many artefacts. In this paper we introduce Broccoli, a user-friendly pipeline designed to infer, with high precision, orthologous groups and pairs of proteins using a phylogeny-based approach. Briefly, Broccoli performs ultra-fast phylogenetic analyses on most proteins and builds a network of orthologous relationships. Orthologous groups are then identified from the network using a parameter-free machine learning algorithm. Broccoli is also able to detect chimeric proteins resulting from gene-fusion events and to assign these proteins to the corresponding orthologous groups. Tested on two benchmark datasets, Broccoli outperforms current orthology pipelines. In addition, Broccoli is scalable, with runtimes similar to those of recent distance-based pipelines. Given its high level of performance and efficiency, this new pipeline represents a suitable choice for comparative genomic studies.Broccoli is freely available at https://github.com/rderelle/Broccoli.

2020 ◽  
Vol 37 (11) ◽  
pp. 3389-3396 ◽  
Author(s):  
Romain Derelle ◽  
Hervé Philippe ◽  
John K Colbourne

Abstract Orthology assignment is a key step of comparative genomic studies, for which many bioinformatic tools have been developed. However, all gene clustering pipelines are based on the analysis of protein distances, which are subject to many artifacts. In this article, we introduce Broccoli, a user-friendly pipeline designed to infer, with high precision, orthologous groups, and pairs of proteins using a phylogeny-based approach. Briefly, Broccoli performs ultrafast phylogenetic analyses on most proteins and builds a network of orthologous relationships. Orthologous groups are then identified from the network using a parameter-free machine learning algorithm. Broccoli is also able to detect chimeric proteins resulting from gene-fusion events and to assign these proteins to the corresponding orthologous groups. Tested on two benchmark data sets, Broccoli outperforms current orthology pipelines. In addition, Broccoli is scalable, with runtimes similar to those of recent distance-based pipelines. Given its high level of performance and efficiency, this new pipeline represents a suitable choice for comparative genomic studies. Broccoli is freely available at https://github.com/rderelle/Broccoli.


2020 ◽  
Vol 64 (7) ◽  
Author(s):  
Takahiro Shirakawa ◽  
Tsuyoshi Sekizuka ◽  
Makoto Kuroda ◽  
Satowa Suzuki ◽  
Manao Ozawa ◽  
...  

ABSTRACT The off-label use of third-generation cephalosporins (3GCs) during in ovo vaccination or vaccination of newly hatched chicks has been a common practice worldwide. CMY-2-producing Escherichia coli strains have been disseminated in broiler chicken production. The objective of this study was to determine the epidemiological linkage of blaCMY-2-positive plasmids among broilers both within and outside Japan, because the grandparent stock and parent stock were imported into Japan. We examined the whole-genome sequences of 132 3GC-resistant E. coli isolates collected from healthy broilers during 2002 to 2014. The predominant 3GC resistance gene was blaCMY-2, which was detected in the plasmids of 87 (65.9%) isolates. The main plasmid replicon types were IncI1-Iγ (n = 21; 24.1%), IncI (n = 12; 13.8%), IncB/O/K/Z (n = 28; 32.2%), and IncC (n = 22; 25.3%). Those plasmids were subjected to gene clustering, network analyses, and plasmid multilocus sequence typing (pMLST). The chromosomal DNA of isolates was subjected to MLST and single-nucleotide variant (SNV)-based phylogenetic analysis. MLST and SNV-based phylogenetic analysis revealed high diversity of E. coli isolates. The sequence type 429 (ST429) cluster harboring blaCMY-2-positive IncB/O/K/Z was closely related to isolates from broilers in Germany harboring blaCMY-2-positive IncB/O/K/Z. pST55-IncI, pST12-IncI1-Iγ, and pST3-IncC were prevalent in western Japan. pST12-IncI1-Iγ and pST3-IncC were closely related to plasmids detected in E. coli isolates from chickens in North America, whereas 26 IncB/O/K/Z types were related to those in Europe. These data will be useful to reveal the whole picture of transmission of CMY-2-producing bacteria inside and outside Japan.


Genetika ◽  
2021 ◽  
Vol 53 (1) ◽  
pp. 195-208
Author(s):  
Himani Sharma ◽  
Parul Sharma ◽  
Rajnish Sharma

Extensive use of simple sequence repeat (SSR) is facilitated if loci would be transferable across species even in closely related genera to overcome high cost and efforts involved in their development as major constraints. In the present study, apple and pear genomic microsatellite primer pairs were used to amplify SSR loci in apple, pear, quince and loquat genotypes, respectively. Already reported SSRs were selected based on their polymorphic survey for successful amplification with at least one polymerase chain reaction (PCR) product of the approximate size expected for a homologous locus screened among apple and pear genotypes for further transferability exploration across other temperate pome fruit crops, respectively. Highest transferability of apple and pear SSR, 61.53 % and 73.33 % was observed in closely related quince and apple genotypes, respectively. This indicated that primer binding sites between these two closely related genera, Malus and Pyrus, are fairly well conserved. Maximum transferability rate was found to be 93.33 % and 80.00 % across all the subjected genotypes for primer CH05D11 and TSUenh016 in apple and pear, respectively. The transferability of markers is based on genomic similarity, and can reflect the relationship of genome collinearity and even evolution between species. This high level of transferability of apple and pear SSRs to other temperate pome fruit crops indicated their promise for application to future molecular screening, map construction, and comparative genomic studies, etc.


2018 ◽  
Vol 10 (3) ◽  
pp. 29-45
Author(s):  
Xu Yuan ◽  
Hua Zhong ◽  
Zhikui Chen ◽  
Fangming Zhong ◽  
Yueming Hu

This article describes how with the rapid increasing of multimedia content on the Internet, the need for effective cross-modal retrieval has attracted much attention recently. Many related works ignore the latent semantic correlations of modalities in the non-linear space and the extraction of high-level modality features, which only focuses on the semantic mapping of modalities in linear space and the use of low-level artificial features as modality feature representation. To solve these issues, the authors first utilizes convolutional neural networks and topic modal to obtain a high-level semantic feature of various modalities. Sequentially, they propose a supervised learning algorithm based on a kernel with partial least squares that can capture semantic correlations across modalities. Finally, the joint model of different modalities is learnt by the training set. Extensive experiments are conducted on three benchmark datasets that include Wikipedia, Pascal and MIRFlickr. The results show that the proposed approach achieves better retrieval performance over several state-of-the-art approaches.


2019 ◽  
Vol 26 (34) ◽  
pp. 6207-6221 ◽  
Author(s):  
Innocenzo Rainero ◽  
Alessandro Vacca ◽  
Flora Govone ◽  
Annalisa Gai ◽  
Lorenzo Pinessi ◽  
...  

Migraine is a common, chronic neurovascular disorder caused by a complex interaction between genetic and environmental risk factors. In the last two decades, molecular genetics of migraine have been intensively investigated. In a few cases, migraine is transmitted as a monogenic disorder, and the disease phenotype cosegregates with mutations in different genes like CACNA1A, ATP1A2, SCN1A, KCNK18, and NOTCH3. In the common forms of migraine, candidate genes as well as genome-wide association studies have shown that a large number of genetic variants may increase the risk of developing migraine. At present, few studies investigated the genotype-phenotype correlation in patients with migraine. The purpose of this review was to discuss recent studies investigating the relationship between different genetic variants and the clinical characteristics of migraine. Analysis of genotype-phenotype correlations in migraineurs is complicated by several confounding factors and, to date, only polymorphisms of the MTHFR gene have been shown to have an effect on migraine phenotype. Additional genomic studies and network analyses are needed to clarify the complex pathways underlying migraine and its clinical phenotypes.


2021 ◽  
Vol 22 (11) ◽  
pp. 5723
Author(s):  
Yuan-Yuan Xu ◽  
Sheng-Rui Liu ◽  
Zhi-Meng Gan ◽  
Ren-Fang Zeng ◽  
Jin-Zhi Zhang ◽  
...  

A high-density genetic linkage map is essential for genetic and genomic studies including QTL mapping, genome assembly, and comparative genomic analysis. Here, we constructed a citrus high-density linkage map using SSR and SNP markers, which are evenly distributed across the citrus genome. The integrated linkage map contains 4163 markers with an average distance of 1.12 cM. The female and male linkage maps contain 1478 and 2976 markers with genetic lengths of 1093.90 cM and 1227.03 cM, respectively. Meanwhile, a genetic map comparison demonstrates that the linear order of common markers is highly conserved between the clementine mandarin and Poncirus trifoliata. Based on this high-density integrated citrus genetic map and two years of deciduous phenotypic data, two loci conferring leaf abscission phenotypic variation were detected on scaffold 1 (including 36 genes) and scaffold 8 (including 107 genes) using association analysis. Moreover, the expression patterns of 30 candidate genes were investigated under cold stress conditions because cold temperature is closely linked with the deciduous trait. The developed high-density genetic map will facilitate QTL mapping and genomic studies, and the localization of the leaf abscission deciduous trait will be valuable for understanding the mechanism of this deciduous trait and citrus breeding.


Genes ◽  
2021 ◽  
Vol 12 (4) ◽  
pp. 483
Author(s):  
Wen-Juan Ma ◽  
Paris Veltsos

Frogs are ideal organisms for studying sex chromosome evolution because of their diversity in sex chromosome differentiation and sex-determination systems. We review 222 anuran frogs, spanning ~220 Myr of divergence, with characterized sex chromosomes, and discuss their evolution, phylogenetic distribution and transitions between homomorphic and heteromorphic states, as well as between sex-determination systems. Most (~75%) anurans have homomorphic sex chromosomes, with XY systems being three times more common than ZW systems. Most remaining anurans (~25%) have heteromorphic sex chromosomes, with XY and ZW systems almost equally represented. There are Y-autosome fusions in 11 species, and no W-/Z-/X-autosome fusions are known. The phylogeny represents at least 19 transitions between sex-determination systems and at least 16 cases of independent evolution of heteromorphic sex chromosomes from homomorphy, the likely ancestral state. Five lineages mostly have heteromorphic sex chromosomes, which might have evolved due to demographic and sexual selection attributes of those lineages. Males do not recombine over most of their genome, regardless of which is the heterogametic sex. Nevertheless, telomere-restricted recombination between ZW chromosomes has evolved at least once. More comparative genomic studies are needed to understand the evolutionary trajectories of sex chromosomes among frog lineages, especially in the ZW systems.


Sensors ◽  
2021 ◽  
Vol 21 (2) ◽  
pp. 656
Author(s):  
Xavier Larriva-Novo ◽  
Víctor A. Villagrá ◽  
Mario Vega-Barbas ◽  
Diego Rivera ◽  
Mario Sanz Rodrigo

Security in IoT networks is currently mandatory, due to the high amount of data that has to be handled. These systems are vulnerable to several cybersecurity attacks, which are increasing in number and sophistication. Due to this reason, new intrusion detection techniques have to be developed, being as accurate as possible for these scenarios. Intrusion detection systems based on machine learning algorithms have already shown a high performance in terms of accuracy. This research proposes the study and evaluation of several preprocessing techniques based on traffic categorization for a machine learning neural network algorithm. This research uses for its evaluation two benchmark datasets, namely UGR16 and the UNSW-NB15, and one of the most used datasets, KDD99. The preprocessing techniques were evaluated in accordance with scalar and normalization functions. All of these preprocessing models were applied through different sets of characteristics based on a categorization composed by four groups of features: basic connection features, content characteristics, statistical characteristics and finally, a group which is composed by traffic-based features and connection direction-based traffic characteristics. The objective of this research is to evaluate this categorization by using various data preprocessing techniques to obtain the most accurate model. Our proposal shows that, by applying the categorization of network traffic and several preprocessing techniques, the accuracy can be enhanced by up to 45%. The preprocessing of a specific group of characteristics allows for greater accuracy, allowing the machine learning algorithm to correctly classify these parameters related to possible attacks.


Mathematics ◽  
2021 ◽  
Vol 9 (8) ◽  
pp. 830
Author(s):  
Seokho Kang

k-nearest neighbor (kNN) is a widely used learning algorithm for supervised learning tasks. In practice, the main challenge when using kNN is its high sensitivity to its hyperparameter setting, including the number of nearest neighbors k, the distance function, and the weighting function. To improve the robustness to hyperparameters, this study presents a novel kNN learning method based on a graph neural network, named kNNGNN. Given training data, the method learns a task-specific kNN rule in an end-to-end fashion by means of a graph neural network that takes the kNN graph of an instance to predict the label of the instance. The distance and weighting functions are implicitly embedded within the graph neural network. For a query instance, the prediction is obtained by performing a kNN search from the training data to create a kNN graph and passing it through the graph neural network. The effectiveness of the proposed method is demonstrated using various benchmark datasets for classification and regression tasks.


Sign in / Sign up

Export Citation Format

Share Document