CaSe4SR: Using category sequence graph to augment session-based recommendation

ABSTRACTWe introduce Giraffe, a pangenome short read mapper that can efficiently map to a collection of haplotypes threaded through a sequence graph. Giraffe, part of the variation graph toolkit (vg)1, maps reads to thousands of human genomes at around the same speed BWA-MEM2 maps reads to a single reference genome, while maintaining comparable accuracy to VG-MAP, vg’s original mapper. We have developed efficient genotyping pipelines using Giraffe. We demonstrate improvements in genotyping for single nucleotide variations (SNVs), insertions and deletions (indels) and structural variations (SVs) genome-wide. We use Giraffe to genotype and phase 167 thousands structural variations ascertained from long read studies in 5,202 human genomes sequenced with short reads, including the complete 1000 Genomes Project dataset, at an average cost of $1.50 per sample. We determine the frequency of these variations in diverse human populations, characterize their complex allelic variations and identify thousands of expression quantitative trait loci (eQTLs) driven by these variations.

Download Full-text

ExpansionHunter: a sequence-graph-based tool to analyze variation in short tandem repeat regions

Bioinformatics ◽

10.1093/bioinformatics/btz431 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4754-4756 ◽

Cited By ~ 29

Author(s):

Egor Dolzhenko ◽

Viraj Deshpande ◽

Felix Schlesinger ◽

Peter Krusche ◽

Roman Petrovski ◽

...

Keyword(s):

Tandem Repeat ◽

Broad Class ◽

Source Code ◽

Computational Method ◽

Supplementary Information ◽

Dna Repeats ◽

Supplementary Data ◽

Sequence Graph ◽

Version 2.0 ◽

Short Tandem

Abstract Summary We describe a novel computational method for genotyping repeats using sequence graphs. This method addresses the long-standing need to accurately genotype medically important loci containing repeats adjacent to other variants or imperfect DNA repeats such as polyalanine repeats. Here we introduce a new version of our repeat genotyping software, ExpansionHunter, that uses this method to perform targeted genotyping of a broad class of such loci. Availability and implementation ExpansionHunter is implemented in C++ and is available under the Apache License Version 2.0. The source code, documentation, and Linux/macOS binaries are available at https://github.com/Illumina/ExpansionHunter/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A Novel Sequence Graph-Based Approach to Find Academic Research Trends

International Journal of Web Portals ◽

10.4018/ijwp.2020010104 ◽

2020 ◽

Vol 12 (1) ◽

pp. 45-56

Author(s):

Soumya George ◽

M. Sudheep Elayidom ◽

T. Santhanakrishnan

Keyword(s):

Computer Science ◽

Efficient Method ◽

Academic Research ◽

Subject Area ◽

Research Trends ◽

Challenges And Opportunities ◽

Sequence Graph ◽

Academic Publications ◽

Subject Areas

Research trends are dynamic, changing from time to time. It is an indicator of the latest innovations in each field of research, current areas of research, the latest technologies, and developments in each field of research. It also helps with future innovations and developments by providing current challenges and opportunities. This article proposes an efficient method to find research trends in each field of research of any subject area by using the graph-based subject classification of published papers. This methodology can be efficiently used to find research trends at any point of time, based on the published year of academic publications. A study of change in research trends in three subject areas - physics, mathematics, and computer science have been successfully conducted based on a total of 4500 publications since 2004.

Download Full-text

Dynamics and an efficient malware detection system using opcode sequence graph generation and ml algorithm

E3S Web of Conferences ◽

10.1051/e3sconf/202018401009 ◽

2020 ◽

Vol 184 ◽

pp. 01009

Author(s):

Bharathi Panduri ◽

Madhurika Vummenthala ◽

Spoorthi Jonnalagadda ◽

Garwandha Ashwini ◽

Naruvadi Nagamani ◽

...

Keyword(s):

Detection System ◽

Future Research ◽

Support Vector ◽

Web Page ◽

Code Injection ◽

Mission Success ◽

Sequence Graph ◽

N Gram ◽

Iot Devices ◽

Garbage Code

IoT(Internet of things), for the most part, comprises of the various scope of Internet-associated gadgets and hubs. In the context of military and defence systems (called as IoBT) these gadgets could be personnel wearable battle outfits, tracking devices, cameras, clinical gadgets etc., The integrity and safety of these devices are critical in mission success and it is of utmost importance to keep them secure. One of the typical ways of the attack on these gadgets is through the use of malware, whose aim could be to compromise the device and or breach the communications. Generally, these IoBT gadgets and hubs are a much more significant target for cyber criminals due to the value they pose, more so than IoT devices. In this paper we attempt at creating a significant learning based procedure to distinguish, classify and tracksuch malware in IoBT(Internet of battlefield things) through operational codes progression. This is achieved by transforming the aforementioned OpCodes into a vector space, upon which a Deep Eigen space learning technique is applied to differentiate between harmful and safe applications. For robust classification, Support vector machine and n gram Sequencing algorithms are proposed in this paper. Moreover, we evaluate the quality of our proposed approach in malware recognition and also its maintainability against garbage code injection assault. These results are presented on a web page which has separate components and levels of accessibility for user and admin credentials. For the purpose of tracking the prevalence of various malwares on the network, counts and against garbage code injection assault. These results are presented on a web page which has separate components and levels of accessibility for user and admin credentials. For the purpose of tracking the prevalence of various malwares on the network, counts and trends of different malicious opcodes are displayed for both user and admin. Thereby our proposed approach will be beneficial for the users, especially for those who want to communicate confidential information within the network. It is also beneficial if a user wants to know whether a message is secure or not. This has also been made malware test accessible, which ideally will profit future research endeavors.

Download Full-text

Protein function Motif extraction based on single function category sequence alignment in yeast

The International Conference on Electrical Engineering ◽

10.21608/iceeng.2012.32694 ◽

2012 ◽

Vol 8 (8) ◽

pp. 1-9

Author(s):

Khaled Ahmed

Keyword(s):

Sequence Alignment ◽

Protein Function ◽

Single Function ◽

Motif Extraction ◽

Category Sequence

Download Full-text

Incorporating convolutional neural networks and sequence graph transform for identifying multilabel protein Lysine PTM sites

Chemometrics and Intelligent Laboratory Systems ◽

10.1016/j.chemolab.2020.104171 ◽

2020 ◽

Vol 206 ◽

pp. 104171 ◽

Cited By ~ 1

Author(s):

Jo Nie Sua ◽

Si Yi Lim ◽

Mulyadi Halim Yulius ◽

Xingtong Su ◽

Edward Kien Yee Yapp ◽

...

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Sequence Graph ◽

Graph Transform

Download Full-text

Distance indexing and seed clustering in sequence graphs

Bioinformatics ◽

10.1093/bioinformatics/btaa446 ◽

2020 ◽

Vol 36 (Supplement_1) ◽

pp. i146-i153

Author(s):

Xian Chang ◽

Jordan Eizenga ◽

Adam M Novak ◽

Jouni Sirén ◽

Benedict Paten

Keyword(s):

Genetic Variation ◽

Minimum Distance ◽

Read Mapping ◽

Mapping Algorithms ◽

Graph Representations ◽

Sequence Graph ◽

Standard Linear ◽

New Generation ◽

Linear Genomes

Abstract Motivation Graph representations of genomes are capable of expressing more genetic variation and can therefore better represent a population than standard linear genomes. However, due to the greater complexity of genome graphs relative to linear genomes, some functions that are trivial on linear genomes become much more difficult in genome graphs. Calculating distance is one such function that is simple in a linear genome but complicated in a graph context. In read mapping algorithms such distance calculations are fundamental to determining if seed alignments could belong to the same mapping. Results We have developed an algorithm for quickly calculating the minimum distance between positions on a sequence graph using a minimum distance index. We have also developed an algorithm that uses the distance index to cluster seeds on a graph. We demonstrate that our implementations of these algorithms are efficient and practical to use for a new generation of mapping algorithms based upon genome graphs. Availability and implementation Our algorithms have been implemented as part of the vg toolkit and are available at https://github.com/vgteam/vg.

Download Full-text

Part of Speech Tagging Using Part of Speech Sequence Graph

Annals of Data Science ◽

10.1007/s40745-021-00359-4 ◽

2021 ◽

Author(s):

Pejman Gholami-Dastgerdi ◽

Mohammad-Reza Feizi-Derakhshi

Keyword(s):

Part Of Speech Tagging ◽

Part Of Speech ◽

Sequence Graph ◽

Speech Tagging

Download Full-text

PEPRF: Identification of Essential Proteins by Integrating Topological Features of PPI Network and Sequence-based Features via Random Forest

Current Bioinformatics ◽

10.2174/1574893616666210617162258 ◽

2021 ◽

Vol 16 ◽

Author(s):

Chuanyan Wu ◽

Bentao Lin ◽

Kai Shi ◽

Qingju Zhang ◽

Rui Gao ◽

...

Keyword(s):

Random Forest ◽

Feature Selection Method ◽

Selection Method ◽

Ppi Network ◽

Efficient Tool ◽

Essential Proteins ◽

Protein Protein Interaction ◽

Topological Features ◽

Sequence Graph ◽

Experimental Approaches

Background: Essential proteins play an important role in the process of life, which can be identified by experimental methods and computational approaches. Experimental approaches to identify essential proteins are of high accuracy but with the limitation of time and resource-consuming. Objective: Herein, we present a computational model (PEPRF) to identify essential proteins based on machine learning. Methods: Different features of proteins were extracted. Topological features of Protein-Protein Interaction (PPI) network-based were extracted. Based on the protein sequence, graph theory-based features, information-based features, composition, and physiochemical features, etc., were extracted. Finally, 282 features were constructed. In order to select the features that contributed most to the identification, the ReliefF-based feature selection method was adopted to measure the weights of these features. As a result, 212 features were curated to train random forest classifiers. Finally, PEPRF obtained an AUC of 0.71 and an accuracy of 0.742. Conclusion: Our results show that PEPRF may be applied as an efficient tool to identify essential proteins.

Download Full-text

CaSe4SR: Using category sequence graph to augment session-based recommendation

Set-Sequence-Graph: A Multi-View Approach Towards Exploiting Reviews for Recommendation

Genotyping common, large structural variations in 5,202 genomes using pangenomes, the Giraffe mapper, and the vg toolkit

ExpansionHunter: a sequence-graph-based tool to analyze variation in short tandem repeat regions

A Novel Sequence Graph-Based Approach to Find Academic Research Trends

Dynamics and an efficient malware detection system using opcode sequence graph generation and ml algorithm

Protein function Motif extraction based on single function category sequence alignment in yeast

Incorporating convolutional neural networks and sequence graph transform for identifying multilabel protein Lysine PTM sites

Distance indexing and seed clustering in sequence graphs

Part of Speech Tagging Using Part of Speech Sequence Graph

PEPRF: Identification of Essential Proteins by Integrating Topological Features of PPI Network and Sequence-based Features via Random Forest

Export Citation Format