An Ambiguity Aware Treebank Search Tool

Author(s):  
Marcin Woliński ◽  
Andrzej Zaborowski
Keyword(s):  
2019 ◽  
Vol 14 (2) ◽  
pp. 157-163
Author(s):  
Majid Hajibaba ◽  
Mohsen Sharifi ◽  
Saeid Gorgin

Background: One of the pivotal challenges in nowadays genomic research domain is the fast processing of voluminous data such as the ones engendered by high-throughput Next-Generation Sequencing technologies. On the other hand, BLAST (Basic Local Alignment Search Tool), a longestablished and renowned tool in Bioinformatics, has shown to be incredibly slow in this regard. Objective: To improve the performance of BLAST in the processing of voluminous data, we have applied a novel memory-aware technique to BLAST for faster parallel processing of voluminous data. Method: We have used a master-worker model for the processing of voluminous data alongside a memory-aware technique in which the master partitions the whole data in equal chunks, one chunk for each worker, and consequently each worker further splits and formats its allocated data chunk according to the size of its memory. Each worker searches every split data one-by-one through a list of queries. Results: We have chosen a list of queries with different lengths to run insensitive searches in a huge database called UniProtKB/TrEMBL. Our experiments show 20 percent improvement in performance when workers used our proposed memory-aware technique compared to when they were not memory aware. Comparatively, experiments show even higher performance improvement, approximately 50 percent, when we applied our memory-aware technique to mpiBLAST. Conclusion: We have shown that memory-awareness in formatting bulky database, when running BLAST, can improve performance significantly, while preventing unexpected crashes in low-memory environments. Even though distributed computing attempts to mitigate search time by partitioning and distributing database portions, our memory-aware technique alleviates negative effects of page-faults on performance.


Author(s):  
Liron Pantanowitz ◽  
Pamela Michelow ◽  
Scott Hazelhurst ◽  
Shivam Kalra ◽  
Charles Choi ◽  
...  

Context.— Pathologists may encounter extraneous pieces of tissue (tissue floaters) on glass slides because of specimen cross-contamination. Troubleshooting this problem, including performing molecular tests for tissue identification if available, is time consuming and often does not satisfactorily resolve the problem. Objective.— To demonstrate the feasibility of using an image search tool to resolve the tissue floater conundrum. Design.— A glass slide was produced containing 2 separate hematoxylin and eosin (H&E)-stained tissue floaters. This fabricated slide was digitized along with the 2 slides containing the original tumors used to create these floaters. These slides were then embedded into a dataset of 2325 whole slide images comprising a wide variety of H&E stained diagnostic entities. Digital slides were broken up into patches and the patch features converted into barcodes for indexing and easy retrieval. A deep learning-based image search tool was employed to extract features from patches via barcodes, hence enabling image matching to each tissue floater. Results.— There was a very high likelihood of finding a correct tumor match for the queried tissue floater when searching the digital database. Search results repeatedly yielded a correct match within the top 3 retrieved images. The retrieval accuracy improved when greater proportions of the floater were selected. The time to run a search was completed within several milliseconds. Conclusions.— Using an image search tool offers pathologists an additional method to rapidly resolve the tissue floater conundrum, especially for those laboratories that have transitioned to going fully digital for primary diagnosis.


Author(s):  
Bruno Leite Rodrigues ◽  
Glaucilene da Silva Costa ◽  
Paloma Helena Fernandes Shimabukuro

Abstract The transmission of pathogens that cause leishmaniases occurs by the bite of female sand flies (Diptera: Psychodidae) in their vertebrate hosts, which makes the identification of their bloodmeal sources an important step for the control and epidemiology of these diseases. In Brazil, the state of Roraima has a great diversity of sand flies, vertebrate hosts, and protozoan Leishmania, but little is known about the host blood-feeding preferences of sand flies. Thus, we evaluated the bloodmeal sources of sand flies collected from their sylvatic habitats in Parque Nacional do Viruá, Roraima. Fieldwork was carried-out between 13th and 18th August 2019 using CDC light traps. Sand flies were slide-mounted and morphologically identified using the head and last segments of the abdomen. Engorged females had their DNA extracted, followed by amplification and sequencing of the cytochrome b (cytb) molecular marker for vertebrates. Sequences were analyzed and compared with those from GenBank using the BLASTn search tool, in addition to the reconstruction of a phylogenetic tree to demonstrate the clustering pattern of these sequences. A total of 1,209 sand flies were identified, comprising 20 species, in which the most abundant were Psychodopygus ayrozai (Barretto and Coutinho) (42.10%) and Psychodopygus chagasi (Costa Lima) (26.22%). Bloodmeal source identification was successfully performed for 34 sand flies, that confirm four vertebrate species, being the most abundant the armadillo Dasypus novemcinctus Linnaeus, 1758 (Cingulata: Dasypodidae).


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Dimitri Boeckaerts ◽  
Michiel Stock ◽  
Bjorn Criel ◽  
Hans Gerstmans ◽  
Bernard De Baets ◽  
...  

AbstractNowadays, bacteriophages are increasingly considered as an alternative treatment for a variety of bacterial infections in cases where classical antibiotics have become ineffective. However, characterizing the host specificity of phages remains a labor- and time-intensive process. In order to alleviate this burden, we have developed a new machine-learning-based pipeline to predict bacteriophage hosts based on annotated receptor-binding protein (RBP) sequence data. We focus on predicting bacterial hosts from the ESKAPE group, Escherichia coli, Salmonella enterica and Clostridium difficile. We compare the performance of our predictive model with that of the widely used Basic Local Alignment Search Tool (BLAST). Our best-performing predictive model reaches Precision-Recall Area Under the Curve (PR-AUC) scores between 73.6 and 93.8% for different levels of sequence similarity in the collected data. Our model reaches a performance comparable to that of BLASTp when sequence similarity in the data is high and starts outperforming BLASTp when sequence similarity drops below 75%. Therefore, our machine learning methods can be especially useful in settings in which sequence similarity to other known sequences is low. Predicting the hosts of novel metagenomic RBP sequences could extend our toolbox to tune the host spectrum of phages or phage tail-like bacteriocins by swapping RBPs.


2020 ◽  
Vol 19 (1) ◽  
Author(s):  
Ommer Mohammed Dafalla ◽  
Mohammed Alzahrani ◽  
Ahmed Sahli ◽  
Mohammed Abdulla Al Helal ◽  
Mohammad Mohammad Alhazmi ◽  
...  

Abstract Background Artemisinin-based combination therapy (ACT) is recommended at the initial phase for treatment of Plasmodium falciparum, to reduce morbidity and mortality in all countries where malaria is endemic. Polymorphism in portions of P. falciparum gene encoding kelch (K13)-propeller domains is associated with delayed parasite clearance after ACT. Of about 124 different non-synonymous mutations, 46 have been identified in Southeast Asia (SEA), 62 in sub-Saharan Africa (SSA) and 16 in both the regions. This is the first study designed to analyse the prevalence of polymorphism in the P. falciparum k13-propeller domain in the Jazan region of southwest Saudi Arabia, where malaria is endemic. Methods One-hundred and forty P. falciparum samples were collected from Jazan region of southwest Saudi Arabia at three different times: 20 samples in 2011, 40 samples in 2016 and 80 samples in 2020 after the implementation of ACT. Plasmodium falciparum kelch13 (k13) gene DNA was extracted, amplified, sequenced, and analysed using a basic local alignment search tool (BLAST). Results This study obtained 51 non-synonymous (NS) mutations in three time groups, divided as follows: 6 single nucleotide polymorphisms (SNPs) ‘11.8%’ in samples collected in 2011 only, 3 (5.9%) in 2011and 2016, 5 (9.8%) in 2011 and 2020, 5 (9.8%) in 2016 only, 8 (15.7%) in 2016 and 2020, 14 (27.5%) in 2020 and 10 (19.6%) in all the groups. The BLAST revealed that the 2011 isolates were genetically closer to African isolates (53.3%) than Asian ones (46.7%). Interestingly, this proportion changed completely in 2020, to become closer to Asian isolates (81.6%) than to African ones (18.4%). Conclusions Despite the diversity of the identified mutations in the k13-propeller gene, these data did not report widespread artemisinin-resistant polymorphisms in the Jazan region where these samples were collected. Such a process would be expected to increase frequencies of mutations associated with the resistance of ACT.


2007 ◽  
Vol 7 (4) ◽  
pp. 463-468 ◽  
Author(s):  
Niyaz Ahmed ◽  
Ahmed A. Majeed ◽  
Irshad Ahmed ◽  
M. Abid Hussain ◽  
Ayesha Alvi ◽  
...  

2018 ◽  
Vol 2018 ◽  
pp. 1-10 ◽  
Author(s):  
Binbin Xie ◽  
Yiran Li ◽  
Rongjie Zhao ◽  
Yuzi Xu ◽  
Yuhui Wu ◽  
...  

Chemoresistance is a significant factor associated with poor outcomes of osteosarcoma patients. The present study aims to identify Chemoresistance-regulated gene signatures and microRNAs (miRNAs) in Gene Expression Omnibus (GEO) database. The results of Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) included positive regulation of transcription, DNA-templated, tryptophan metabolism, and the like. Then differentially expressed genes (DEGs) were uploaded to Search Tool for the Retrieval of Interacting Genes (STRING) to construct protein-protein interaction (PPI) networks, and 9 hub genes were screened, such as fucosyltransferase 3 (Lewis blood group) (FUT3) whose expression in chemoresistant samples was high, but with a better prognosis in osteosarcoma patients. Furthermore, the connection between DEGs and differentially expressed miRNAs (DEMs) was explored. GEO2R was utilized to screen out DEGs and DEMs. A total of 668 DEGs and 5 DEMs were extracted from GSE7437 and GSE30934 differentiating samples of poor and good chemotherapy reaction patients. The Database for Annotation, Visualization, and Integrated Discovery (DAVID) was used to perform GO and KEGG pathway enrichment analysis to identify potential pathways and functional annotations linked with osteosarcoma chemoresistance. The present study may provide a deeper understanding about regulatory genes of osteosarcoma chemoresistance and identify potential therapeutic targets for osteosarcoma.


2010 ◽  
Vol 2010 ◽  
pp. 1-12 ◽  
Author(s):  
Vesa A. Korhonen ◽  
Ritva Pyykkönen

We discuss how a short-range wireless communication service implemented for modern mobile communication devices can provide additional value for both the consumer and the service/product provider. When used as an information search tool, such systems allow services and products being promoted at the location they are available. For the customer, it may provide a “digitally augmented vision”, an enhanced view to the current environment. With data filtering and search rules, this may provide a self-manageable context, where the user's own personal environment and preferences to the features available in the current surroundings cooperate with a direct connection to the web-based social media. A preliminary design for such service is provided. The conclusion is that the method can generate additional revenue to the company and please the customers' buying process. In addition to the marketing, the principles described here are also applicable to other forms of human interaction.


Sign in / Sign up

Export Citation Format

Share Document