mapping file Latest Research Papers

Abstract As reference genome assemblies are updated there is a need to convert epigenome sequence data from older genome assemblies to newer versions, to facilitate data integration and visualization on the same coordinate system. Conversion can be done by re-alignment of the original sequence data to the new assembly or by converting the coordinates of the data between assemblies using a mapping file, an approach referred to as ‘liftover’. Compared to re-alignment approaches, liftover is a more rapid and cost-effective solution. Here, we benchmark six liftover tools commonly used for conversion between genome assemblies by coordinates, including UCSC liftOver, rtracklayer::liftOver, CrossMap, NCBI Remap, flo and segment_liftover to determine how they performed for whole genome bisulphite sequencing (WGBS) and ChIP-seq data. Our results show high correlation between the six tools for conversion of 43 WGBS paired samples. For the chromatin sequencing data we found from interval conversion of 366 ChIP-Seq datasets, segment_liftover generates more reliable results than USCS liftOver. However, we found some regions do not always remain the same after liftover. To further increase the accuracy of liftover and avoid misleading results, we developed a three-step guideline that removes aberrant regions to ensure more robust genome conversion between reference assemblies.

Download Full-text

Semantic data mapping technology to solve semantic data problem on heterogeneity aspect

International Journal of Advances in Intelligent Informatics ◽

10.26555/ijain.v3i3.131 ◽

2017 ◽

Vol 3 (3) ◽

pp. 161 ◽

Cited By ~ 3

Author(s):

Arda Yunianta ◽

Omar Mohammed Barukab ◽

Norazah Yusof ◽

Nataniel Dengen ◽

Haviluddin Haviluddin ◽

...

Keyword(s):

Real Data ◽

Database Systems ◽

Data Mapping ◽

Semantic Data ◽

Ontology Language ◽

Data Problem ◽

Mapping Technology ◽

Application Data ◽

Semantic Aspect ◽

Mapping File

The diversity of applications developed with different programming languages, application/data architectures, database systems and representation of data/information leads to heterogeneity issues. One of the problem challenges in the problem of heterogeneity is about heterogeneity data in term of semantic aspect. The semantic aspect is about data that has the same name with different meaning or data that has a different name with the same meaning. The semantic data mapping process is the best solution in the current days to solve semantic data problem. There are many semantic data mapping technologies that have been used in recent years. This research aims to compare and analyze existing semantic data mapping technology using five criteria’s. After comparative and analytical process, this research provides recommendations of appropriate semantic data mapping technology based on several criteria’s. Furthermore, at the end of this research we apply the recommended semantic data mapping technology to be implemented with the real data in the specific application. The result of this research is the semantic data mapping file that contains all data structures in the application data source. This semantic data mapping file can be used to map, share and integrate with other semantic data mapping from other applications and can also be used to integrate with the ontology language.

Download Full-text

Keemei: cloud-based validation of tabular bioinformatics file formats in Google Sheets

10.7287/peerj.preprints.1670 ◽

2016 ◽

Author(s):

Jai Ram Rideout ◽

John H Chase ◽

Evan Bolyen ◽

Gail Ackermann ◽

Antonio Gonzalez ◽

...

Keyword(s):

Data Entry ◽

File Format ◽

Tabular Data ◽

Bioinformatics Analyses ◽

Web Browser ◽

File Formats ◽

Bioinformatics Software ◽

Data Files ◽

Metadata Mapping ◽

Mapping File

Bioinformatics software often requires human-generated tabular text files as input and have specific requirements for how those data are formatted. Users frequently manage these data in spreadsheet programs, which is convenient for researchers who are compiling the requisite information because the spreadsheet programs can easily be used on different platforms including laptops and tablets, and because they provide a familiar interface. It is increasingly common for many different researchers to be involved in compiling these data, including study coordinators, clinicians, lab technicians, and bioinformaticians. As a result, many research groups are shifting toward using cloud-based spreadsheet programs, such as Google Sheets, which support concurrent editing of a single spreadsheet by different users working on different platforms. Often most of the researchers who are entering data will not be familiar with the formatting requirements of the bioinformatics programs that will be used, so validating and correcting file formats is often a bottleneck prior to beginning bioinformatics analysis. We present Keemei, a Google Sheets Add-on for validating tabular files used in bioinformatics analyses. Keemei is available free of charge from Google’s Chrome Web Store. Keemei can be installed and run on any web browser supported by Google Sheets. Keemei currently supports validation of two widely used tabular bioinformatics formats, the QIIME sample metadata mapping file format, and the Spatially Referenced Genetic Data (SRGD) format, but is designed to easily support the addition of others. Keemei will save researchers time and frustration by providing a convenient interface for tabular bioinformatics file format validation. By allowing everyone involved with data entry for a project to easily validate their data, it will reduce the validation and formatting bottlenecks that are commonly encountered when human-generated data files are first used with a bioinformatics system. Simplifying the validation of essential tabular data files, such as sample metadata, will reduce common errors and thereby improve the quality and reliability of research outcomes.

Download Full-text

Keemei: cloud-based validation of tabular bioinformatics file formats in Google Sheets

10.7287/peerj.preprints.1670v1 ◽

2016 ◽

Author(s):

Jai Ram Rideout ◽

John H Chase ◽

Evan Bolyen ◽

Gail Ackermann ◽

Antonio Gonzalez ◽

...

Keyword(s):

Data Entry ◽

File Format ◽

Tabular Data ◽

Bioinformatics Analyses ◽

Web Browser ◽

File Formats ◽

Bioinformatics Software ◽

Data Files ◽

Metadata Mapping ◽

Mapping File

Bioinformatics software often requires human-generated tabular text files as input and have specific requirements for how those data are formatted. Users frequently manage these data in spreadsheet programs, which is convenient for researchers who are compiling the requisite information because the spreadsheet programs can easily be used on different platforms including laptops and tablets, and because they provide a familiar interface. It is increasingly common for many different researchers to be involved in compiling these data, including study coordinators, clinicians, lab technicians, and bioinformaticians. As a result, many research groups are shifting toward using cloud-based spreadsheet programs, such as Google Sheets, which support concurrent editing of a single spreadsheet by different users working on different platforms. Often most of the researchers who are entering data will not be familiar with the formatting requirements of the bioinformatics programs that will be used, so validating and correcting file formats is often a bottleneck prior to beginning bioinformatics analysis. We present Keemei, a Google Sheets Add-on for validating tabular files used in bioinformatics analyses. Keemei is available free of charge from Google’s Chrome Web Store. Keemei can be installed and run on any web browser supported by Google Sheets. Keemei currently supports validation of two widely used tabular bioinformatics formats, the QIIME sample metadata mapping file format, and the Spatially Referenced Genetic Data (SRGD) format, but is designed to easily support the addition of others. Keemei will save researchers time and frustration by providing a convenient interface for tabular bioinformatics file format validation. By allowing everyone involved with data entry for a project to easily validate their data, it will reduce the validation and formatting bottlenecks that are commonly encountered when human-generated data files are first used with a bioinformatics system. Simplifying the validation of essential tabular data files, such as sample metadata, will reduce common errors and thereby improve the quality and reliability of research outcomes.

Download Full-text

Multilevel Versioning

Proceedings of Balisage: The Markup Conference 2014 ◽

10.4242/balisagevol13.nordstrom01 ◽

2014 ◽

Cited By ~ 4

Author(s):

Ari Nordström

Keyword(s):

Business Rules ◽

Xml Documents ◽

Versioning System ◽

Different Levels ◽

Mapping File

“Straight” versioning systems for XML documents that produce a new version for every save, such as eXist DB's versioning extension, aren't as useful as they could be. They produce far too many versions, of which far too few are significant, and so each significant version is very hard to find or use. An old version, for example, cannot be easily located or reliably referenced. Adding check-out and check-in functionality would help alleviate some of the problems but not solve them. In this paper, I propose adding a multilevel, XML-based versioning abstraction on top of this “straight” versioning system, where any new versions are placed on different levels or stages, based on check-out and check-in operations that move the resources up or down in the versioning structure. The multilevel versioning is achieved using several different areas within the system, each of which in themselves is version handled using the system's “straight” versioning extension and where each save produces a system address to a specific (straight) version in that area. These addresses are kept track of and mapped to the multilevel versions in an XML-based version mapping file when a resource is checked in or out, as defined by the business rules for the abstraction.

Download Full-text

Creating on Persistence Mapping File for Vocabulary Comprehension Item Bank of English Network Examination

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.687-691.2284 ◽

2014 ◽

Vol 687-691 ◽

pp. 2284-2287

Author(s):

Feng Hua Li

Keyword(s):

Education System ◽

Present Situation ◽

Logical Structure ◽

Item Bank ◽

Modern Education ◽

Development Direction ◽

Vocabulary Comprehension ◽

Data Persistence ◽

Examination Question ◽

Mapping File

Network examination extends the meaning of the traditional examination, makes the examination more fair, more authoritative and reliable, accord with the modern education system, and represents the development direction of modern education examination. In order to solve the problems of the present situation that students’ scoring rate lower on the part of English vocabulary comprehension, develops a vocabulary comprehension item bank of English network examination. According to the data persistence mapping file to launch the research. Firstly, begins the design of database logical structure, designed the logical structure of "Examination question Main table", "Alternative vocabulary table" and "Standard answer table"; Then, begins the design of persistence mapping file, designs the database tables and data operation class corresponding to the mapping file which is based on NHibernate architecture. The research contents of this paper laid the foundation for the construction of item bank.

Download Full-text

Fast Display Algorithm of Large Map File Based on Memory Mapping File Technique

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.668-669.1064 ◽

2014 ◽

Vol 668-669 ◽

pp. 1064-1067

Author(s):

Dai Hong Jiang

Keyword(s):

Real Time ◽

Large Scale ◽

Image Pyramid ◽

Digital Map ◽

File Access ◽

Memory Scheduling ◽

Memory Mapping ◽

Display Algorithm ◽

Mapping File ◽

File Processing

This paper presents a Fast display Algorithm of large Map File based on memory mapping file technique, designing for reducing the number of pointer moves and the data in memory scheduling ,The algorithm uses a memory-mapped file access operation of a large-scale map file to resolve the problem of traditional file processing methods, which can not read more than 2G map image file; combined image pyramid technology to apply different resolutions depending on the display requirements map image, in order to achieve the purpose of roaming quickly. The experiment and analysis results show that the algorithm is able to achieve the best resolution real-time roaming of large-scale digital map.

Download Full-text

Extending MapMan Ontology to Tobacco for Visualization of Gene Expression

Dataset Papers in Biology ◽

10.7167/2013/706465 ◽

2013 ◽

Vol 2013 ◽

pp. 1-7 ◽

Cited By ~ 1

Author(s):

Maurice H. T. Ling ◽

Roel C. Rabara ◽

Prateek Tripathi ◽

Paul J. Rushton ◽

Xijin Ge

Keyword(s):

Plant Species ◽

Microarray Data ◽

Large Scale ◽

Ontology Mapping ◽

Manual Inspection ◽

Custom Made ◽

Genome Level ◽

The Many ◽

Mapping File ◽

Gene Index

Microarrays are a large-scale expression profiling method which has been used to study the transcriptome of plants under various environmental conditions. However, manual inspection of microarray data is difficult at the genome level because of the large number of genes (normally at least 30 000) and the many different processes that occur within any given plant. MapMan software, which was initially developed to visualize microarray data for Arabidopsis, has been adapted to other plant species by mapping other species onto MapMan ontology. This paper provides a detailed procedure and the relevant computing codes to generate a MapMan ontology mapping file for tobacco (Nicotiana tabacum L.) using potato and Arabidopsis as intermediates. The mapping file can be used directly with our custom-made NimbleGen oligoarray, which contains gene sequences from both the tobacco gene space sequence and the tobacco gene index 4 (NTGI4) collection of ESTs. The generated dataset will be informative for scientists working on tobacco as their model plant by providing a MapMan ontology mapping file to tobacco, homology between tobacco coding sequences and that of potato and Arabidopsis, as well as adapting our procedure and codes for other plant species where the complete genome is not yet available.

Download Full-text

mapping file
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Benchmark study comparing liftover tools for genome conversion of epigenome sequencing data

Semantic data mapping technology to solve semantic data problem on heterogeneity aspect

Keemei: cloud-based validation of tabular bioinformatics file formats in Google Sheets

Keemei: cloud-based validation of tabular bioinformatics file formats in Google Sheets

Multilevel Versioning

Creating on Persistence Mapping File for Vocabulary Comprehension Item Bank of English Network Examination

Fast Display Algorithm of Large Map File Based on Memory Mapping File Technique

Extending MapMan Ontology to Tobacco for Visualization of Gene Expression

Export Citation Format

mapping fileRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Benchmark study comparing liftover tools for genome conversion of epigenome sequencing data

Semantic data mapping technology to solve semantic data problem on heterogeneity aspect

Keemei: cloud-based validation of tabular bioinformatics file formats in Google Sheets

Keemei: cloud-based validation of tabular bioinformatics file formats in Google Sheets

Multilevel Versioning

Creating on Persistence Mapping File for Vocabulary Comprehension Item Bank of English Network Examination

Fast Display Algorithm of Large Map File Based on Memory Mapping File Technique

Extending MapMan Ontology to Tobacco for Visualization of Gene Expression

mapping file
Recently Published Documents