AnnoFly: annotating Drosophila embryonic images based on an attention-enhanced RNN model

2019 ◽  
Vol 35 (16) ◽  
pp. 2834-2842 ◽  
Author(s):  
Yang Yang ◽  
Mingyu Zhou ◽  
Qingwei Fang ◽  
Hong-Bin Shen

Abstract Motivation In the post-genomic era, image-based transcriptomics have received huge attention, because the visualization of gene expression distribution is able to reveal spatial and temporal expression pattern, which is significantly important for understanding biological mechanisms. The Berkeley Drosophila Genome Project has collected a large-scale spatial gene expression database for studying Drosophila embryogenesis. Given the expression images, how to annotate them for the study of Drosophila embryonic development is the next urgent task. In order to speed up the labor-intensive labeling work, automatic tools are highly desired. However, conventional image annotation tools are not applicable here, because the labeling is at the gene-level rather than the image-level, where each gene is represented by a bag of multiple related images, showing a multi-instance phenomenon, and the image quality varies by image orientations and experiment batches. Moreover, different local regions of an image correspond to different CV annotation terms, i.e. an image has multiple labels. Designing an accurate annotation tool in such a multi-instance multi-label scenario is a very challenging task. Results To address these challenges, we develop a new annotator for the fruit fly embryonic images, called AnnoFly. Driven by an attention-enhanced RNN model, it can weight images of different qualities, so as to focus on the most informative image patterns. We assess the new model on three standard datasets. The experimental results reveal that the attention-based model provides a transparent approach for identifying the important images for labeling, and it substantially enhances the accuracy compared with the existing annotation methods, including both single-instance and multi-instance learning methods. Availability and implementation http://www.csbio.sjtu.edu.cn/bioinf/annofly/ Supplementary information Supplementary data are available at Bioinformatics online.

Genetics ◽  
1996 ◽  
Vol 144 (4) ◽  
pp. 1681-1692 ◽  
Author(s):  
Norbert Perrimon ◽  
Anne Lanjuin ◽  
Charles Arnold ◽  
Elizabeth Noll

Screens for zygotic lethal mutations that are associated with specific maternal effect lethal phenotypes have only been conducted for the X chromosome. To identify loci on the autosomes, which represent four-fifths of the Drosophila genome, we have used the autosomal “FLP-DFS” technique to screen a collection of 496 P element-induced mutations established by the Berkeley Drosophila Genome Project. We have identified 64 new loci whose gene products are required for proper egg formation or normal embryonic development.


Author(s):  
Cynthia Z Ma ◽  
Michael R Brent

Abstract Motivation The activity of a transcription factor (TF) in a sample of cells is the extent to which it is exerting its regulatory potential. Many methods of inferring TF activity from gene expression data have been described, but due to the lack of appropriate large-scale datasets, systematic and objective validation has not been possible until now. Results We systematically evaluate and optimize the approach to TF activity inference in which a gene expression matrix is factored into a condition-independent matrix of control strengths and a condition-dependent matrix of TF activity levels. We find that expression data in which the activities of individual TFs have been perturbed are both necessary and sufficient for obtaining good performance. To a considerable extent, control strengths inferred using expression data from one growth condition carry over to other conditions, so the control strength matrices derived here can be used by others. Finally, we apply these methods to gain insight into the upstream factors that regulate the activities of yeast TFs Gcr2, Gln3, Gcn4 and Msn2. Availability and implementation Evaluation code and data are available at https://doi.org/10.5281/zenodo.4050573. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Anastasiya Belyaeva ◽  
Chandler Squires ◽  
Caroline Uhler

Abstract Summary Designing interventions to control gene regulation necessitates modeling a gene regulatory network by a causal graph. Currently, large-scale gene expression datasets from different conditions, cell types, disease states, and developmental time points are being collected. However, application of classical causal inference algorithms to infer gene regulatory networks based on such data is still challenging, requiring high sample sizes and computational resources. Here, we describe an algorithm that efficiently learns the differences in gene regulatory mechanisms between different conditions. Our difference causal inference (DCI) algorithm infers changes (i.e. edges that appeared, disappeared, or changed weight) between two causal graphs given gene expression data from the two conditions. This algorithm is efficient in its use of samples and computation since it infers the differences between causal graphs directly without estimating each possibly large causal graph separately. We provide a user-friendly Python implementation of DCI and also enable the user to learn the most robust difference causal graph across different tuning parameters via stability selection. Finally, we show how to apply DCI to single-cell RNA-seq data from different conditions and cell states, and we also validate our algorithm by predicting the effects of interventions. Availability and implementation Python package freely available at http://uhlerlab.github.io/causaldag/dci. Supplementary information Supplementary data are available at Bioinformatics online.


2012 ◽  
Vol 13 (1) ◽  
pp. 107 ◽  
Author(s):  
Lei Yuan ◽  
Alexander Woodard ◽  
Shuiwang Ji ◽  
Yuan Jiang ◽  
Zhi-Hua Zhou ◽  
...  

2019 ◽  
Vol 35 (19) ◽  
pp. 3672-3678 ◽  
Author(s):  
Nafiseh Saberian ◽  
Azam Peyvandipour ◽  
Michele Donato ◽  
Sahar Ansari ◽  
Sorin Draghici

Abstract Motivation Drug repurposing is a potential alternative to the classical drug discovery pipeline. Repurposing involves finding novel indications for already approved drugs. In this work, we present a novel machine learning-based method for drug repurposing. This method explores the anti-similarity between drugs and a disease to uncover new uses for the drugs. More specifically, our proposed method takes into account three sources of information: (i) large-scale gene expression profiles corresponding to human cell lines treated with small molecules, (ii) gene expression profile of a human disease and (iii) the known relationship between Food and Drug Administration (FDA)-approved drugs and diseases. Using these data, our proposed method learns a similarity metric through a supervised machine learning-based algorithm such that a disease and its associated FDA-approved drugs have smaller distance than the other disease-drug pairs. Results We validated our framework by showing that the proposed method incorporating distance metric learning technique can retrieve FDA-approved drugs for their approved indications. Once validated, we used our approach to identify a few strong candidates for repurposing. Availability and implementation The R scripts are available on demand from the authors. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (15) ◽  
pp. 4363-4365 ◽  
Author(s):  
Leslie Solorzano ◽  
Gabriele Partel ◽  
Carolina Wählby

Abstract Motivation Visual assessment of scanned tissue samples and associated molecular markers, such as gene expression, requires easy interactive inspection at multiple resolutions. This requires smart handling of image pyramids and efficient distribution of different types of data across several levels of detail. Results We present TissUUmaps, enabling fast visualization and exploration of millions of data points overlaying a tissue sample. TissUUmaps can be used both as a web service or locally in any computer, and regions of interest as well as local statistics can be extracted and shared among users. Availability and implementation TissUUmaps is available on github at github.com/wahlby-lab/TissUUmaps. Several demos and video tutorials are available at http://tissuumaps.research.it.uu.se/howto.html. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 36 (4) ◽  
pp. 1143-1149 ◽  
Author(s):  
Juan Xie ◽  
Anjun Ma ◽  
Yu Zhang ◽  
Bingqiang Liu ◽  
Sha Cao ◽  
...  

Abstract Motivation The biclustering of large-scale gene expression data holds promising potential for detecting condition-specific functional gene modules (i.e. biclusters). However, existing methods do not adequately address a comprehensive detection of all significant bicluster structures and have limited power when applied to expression data generated by RNA-Sequencing (RNA-Seq), especially single-cell RNA-Seq (scRNA-Seq) data, where massive zero and low expression values are observed. Results We present a new biclustering algorithm, QUalitative BIClustering algorithm Version 2 (QUBIC2), which is empowered by: (i) a novel left-truncated mixture of Gaussian model for an accurate assessment of multimodality in zero-enriched expression data, (ii) a fast and efficient dropouts-saving expansion strategy for functional gene modules optimization using information divergency and (iii) a rigorous statistical test for the significance of all the identified biclusters in any organism, including those without substantial functional annotations. QUBIC2 demonstrated considerably improved performance in detecting biclusters compared to other five widely used algorithms on various benchmark datasets from E.coli, Human and simulated data. QUBIC2 also showcased robust and superior performance on gene expression data generated by microarray, bulk RNA-Seq and scRNA-Seq. Availability and implementation The source code of QUBIC2 is freely available at https://github.com/OSU-BMBL/QUBIC2. Supplementary information Supplementary data are available at Bioinformatics online.


Genetics ◽  
1999 ◽  
Vol 153 (1) ◽  
pp. 135-177 ◽  
Author(s):  
Allan C Spradling ◽  
Dianne Stern ◽  
Amy Beaton ◽  
E Jay Rhem ◽  
Todd Laverty ◽  
...  

AbstractA fundamental goal of genetics and functional genomics is to identify and mutate every gene in model organisms such as Drosophila melanogaster. The Berkeley Drosophila Genome Project (BDGP) gene disruption project generates single P-element insertion strains that each mutate unique genomic open reading frames. Such strains strongly facilitate further genetic and molecular studies of the disrupted loci, but it has remained unclear if P elements can be used to mutate all Drosophila genes. We now report that the primary collection has grown to contain 1045 strains that disrupt more than 25% of the estimated 3600 Drosophila genes that are essential for adult viability. Of these P insertions, 67% have been verified by genetic tests to cause the associated recessive mutant phenotypes, and the validity of most of the remaining lines is predicted on statistical grounds. Sequences flanking >920 insertions have been determined to exactly position them in the genome and to identify 376 potentially affected transcripts from collections of EST sequences. Strains in the BDGP collection are available from the Bloomington Stock Center and have already assisted the research community in characterizing >250 Drosophila genes. The likely identity of 131 additional genes in the collection is reported here. Our results show that Drosophila genes have a wide range of sensitivity to inactivation by P elements, and provide a rationale for greatly expanding the BDGP primary collection based entirely on insertion site sequencing. We predict that this approach can bring >85% of all Drosophila open reading frames under experimental control.


Genetics ◽  
2001 ◽  
Vol 159 (2) ◽  
pp. 609-622
Author(s):  
Jon D Schnorr ◽  
Robert Holdcraft ◽  
Brett Chevalier ◽  
Celeste A Berg

Abstract Little is known about the genes that interact with Ras signaling pathways to regulate morphogenesis. The synthesis of dorsal eggshell structures in Drosophila melanogaster requires multiple rounds of Ras signaling followed by dramatic epithelial sheet movements. We took advantage of this process to identify genes that link patterning and morphogenesis; we screened lethal mutations on the second chromosome for those that could enhance a weak Ras1 eggshell phenotype. Of 1618 lethal P-element mutations tested, 13 showed significant enhancement, resulting in forked and fused dorsal appendages. Our genetic and molecular analyses together with information from the Berkeley Drosophila Genome Project reveal that 11 of these lines carry mutations in previously characterized genes. Three mutations disrupt the known Ras1 cell signaling components Star, Egfr, and Blistered, while one mutation disrupts Sec61β, implicated in ligand secretion. Seven lines represent cell signaling and cytoskeletal components that are new to the Ras1 pathway; these are Chickadee (Profilin), Tec29, Dreadlocks, POSH, Peanut, Smt3, and MESK2, a suppressor of dominant-negative Ksr. A twelfth insertion disrupts two genes, Nrk, a “neurospecific” receptor tyrosine kinase, and Tpp, which encodes a neuropeptidase. These results suggest that Ras1 signaling during oogenesis involves novel components that may be intimately associated with additional signaling processes and with the reorganization of the cytoskeleton. To determine whether these Ras1 Enhancers function upstream or downstream of the Egf receptor, four mutations were tested for their ability to suppress an activated Egfr construct (λtop) expressed in oogenesis exclusively in the follicle cells. Mutations in Star and l(2)43Bb had no significant effect upon the λtop eggshell defect whereas smt3 and dock alleles significantly suppressed the λtop phenotype.


2018 ◽  
Author(s):  
Brandon Monier ◽  
Adam McDermaid ◽  
Jing Zhao ◽  
Anne Fennell ◽  
Qin Ma

AbstractMotivationNext-Generation Sequencing has made available much more large-scale genomic and transcriptomic data. Studies with RNA-sequencing (RNA-seq) data typically involve generation of gene expression profiles that can be further analyzed, many times involving differential gene expression (DGE). This process enables comparison across samples of two or more factor levels. A recurring issue with DGE analyses is the complicated nature of the comparisons to be made, in which a variety of factor combinations, pairwise comparisons, and main or blocked main effects need to be tested.ResultsHere we present a tool called IRIS-DGE, which is a server-based DGE analysis tool developed using Shiny. It provides a straightforward, user-friendly platform for performing comprehensive DGE analysis, and crucial analyses that help design hypotheses and to determine key genomic features. IRIS-DGE integrates the three most commonly used R-based DGE tools to determine differentially expressed genes (DEGs) and includes numerous methods for performing preliminary analysis on user-provided gene expression information. Additionally, this tool integrates a variety of visualizations, in a highly interactive manner, for improved interpretation of preliminary and DGE analyses.AvailabilityIRIS-DGE is freely available at http://bmbl.sdstate.edu/IRIS/[email protected] informationSupplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document