scholarly journals MOSGA 2: Comparative genomics and validation tools

2021 ◽  
Author(s):  
Roman Martin ◽  
Hagen Dreßler ◽  
Georges Hattab ◽  
Thomas Hackl ◽  
Matthias G Fischer ◽  
...  

Due to the highly growing number of available genomic information, the need for accessible and easy-to-use analysis tools is increasing. To facilitate eukaryotic genome annotations, we created MOSGA. In this work, we show how MOSGA~2 is developed by including several advanced analyses for genomic data. Since the genomic data quality greatly impacts the annotation quality, we included multiple tools to validate and ensure high-quality user-submitted genome assemblies. Moreover, thanks to the integration of comparative genomics methods, users can benefit from a broader genomic view by analyzing multiple genomic data sets simultaneously. Further, we demonstrate the new functionalities of MOSGA~2 by different use-cases and practical examples. MOSGA~2 extends the already established application to the quality control of the genomic data and integrates and analyzes multiple genomes in a larger context, e.g., by phylogenetics.

2017 ◽  
Vol 6 (2) ◽  
pp. 505-521 ◽  
Author(s):  
Luděk Vecsey ◽  
Jaroslava Plomerová ◽  
Petr Jedlička ◽  
Helena Munzarová ◽  
Vladislav Babuška ◽  
...  

Abstract. This paper focuses on major issues related to the data reliability and network performance of 20 broadband (BB) stations of the Czech (CZ) MOBNET (MOBile NETwork) seismic pool within the AlpArray seismic experiments. Currently used high-resolution seismological applications require high-quality data recorded for a sufficiently long time interval at seismological observatories and during the entire time of operation of the temporary stations. In this paper we present new hardware and software tools we have been developing during the last two decades while analysing data from several international passive experiments. The new tools help to assure the high-quality standard of broadband seismic data and eliminate potential errors before supplying data to seismological centres. Special attention is paid to crucial issues like the detection of sensor misorientation, timing problems, interchange of record components and/or their polarity reversal, sensor mass centring, or anomalous channel amplitudes due to, for example, imperfect gain. Thorough data quality control should represent an integral constituent of seismic data recording, preprocessing, and archiving, especially for data from temporary stations in passive seismic experiments. Large international seismic experiments require enormous efforts from scientists from different countries and institutions to gather hundreds of stations to be deployed in the field during a limited time period. In this paper, we demonstrate the beneficial effects of the procedures we have developed for acquiring a reliable large set of high-quality data from each group participating in field experiments. The presented tools can be applied manually or automatically on data from any seismic network.


2020 ◽  
Author(s):  
Carlo Cauzzi ◽  
Jarek Bieńkowski ◽  
Susana Custódio ◽  
Christos Evangelidis ◽  
Philippe Guéguen ◽  
...  

<p>ORFEUS (Observatories and Research Facilities for European Seismology) is a non-profit foundation that promotes seismology in the Euro-Mediterranean area through the collection, archival and distribution of seismic waveform data, metadata and closely related products. The data and services are collected or developed at national level by more than 60 contributing Institutions in Pan-Europe and further developed, integrated, standardized, homogenized and promoted through ORFEUS. Among the goals of ORFEUS are: (a) the development and coordination of waveform data products; (b) the coordination of a European data distribution system, and the support for seismic networks in archiving and exchanging digital seismic waveform data; (c) the encouragement of the adoption of best practices for seismic network operation, data quality control and data management; (d) the promotion of open access to seismic waveform data, products and services for the broader Earth science community.  These goals are achieved through the development and maintenance of services targeted to a broad community of seismological data users, ranging from earth scientists to earthquake engineering practitioners. Two Service Management Committees (SMCs) are consolidated within ORFEUS devoted to managing, operating and developing (with the support of one or more Infrastructure Development Groups): (i) the European Integrated waveform Data Archive (EIDA; https://www.orfeus-eu.org/data/eida/); and (ii) the European Strong-Motion databases (SM; https://www.orfeus-eu.org/data/strong/). A new SMC is being formed to represent the community of European mobile pools. Products and services for computational seismologists are also considered for integration in the ORFEUS domain. ORFEUS services currently provide access to the waveforms acquired by ~ 10,000 stations in Pan-Europe, including dense temporary experiments, with strong emphasis on open, high-quality data. Contributing to ORFEUS data archives means long-term archival, state-of-the-art quality control, improved access and increased  usage. Access to data and products is ensured through state-of-the-art information and communications technologies, with strong emphasis on federated web services that considerably improve seamless user access to data gathered and/or distributed by ORFEUS institutions. The web services also facilitate the automation of downstream products. Particular attention is paid to adopting clear policies and licences, and acknowledging the crucial role played by data providers / owners, who are part of the ORFEUS community. There are significant efforts by ORFEUS participating Institutions to enhance the existing services to tackle the challenges posed by the Big Data Era, with emphasis on data quality, improved user experience, and implementation of strategies for scalability, high-volume data access and archival. ORFEUS data and services are assessed and improved through the technical and scientific feedback of a User Advisory Group (UAG), comprised of European Earth scientists with expertise encompassing a broad range of disciplines. All ORFEUS services are developed in coordination with EPOS and are largely integrated in the EPOS Data Access Portal. ORFEUS is one of the founding Parties and fundamental pillars of EPOS Seismology. This contribution presents the current products and services of ORFEUS and introduces the planned key future activities. We aim at stimulating Community feedback about the current and planned ORFEUS strategies.</p>


2019 ◽  
Vol 9 (10) ◽  
pp. 3101-3104 ◽  
Author(s):  
Johnathan Lo ◽  
Michelle M. Jonika ◽  
Heath Blackmon

Microsatellites are repetitive DNA sequences usually found in non-coding regions of the genome. Their quantification and analysis have applications in fields from population genetics to evolutionary biology. As genome assemblies become commonplace, the need for software that can facilitate analyses has never been greater. In particular, R packages that can analyze genomic data are particularly important since this is one of the most popular software environments for biologists. We created an R package, micRocounter, to quantify microsatellites. We have optimized our package for speed, accessibility, and portability, making the automated analysis of large genomic data sets feasible. Computationally intensive algorithms were built in C++ to increase speed. Tests using benchmark datasets show a 200-fold improvement in speed over existing software. A moderately sized genome of 500 Mb can be processed in under 50 sec. Results are output as an object in R increasing accessibility and flexibility for practitioners.


2017 ◽  
Author(s):  
Luděk Vecsey ◽  
Jaroslava Plomerová ◽  
Petr Jedlička ◽  
Helena Munzarová ◽  
Vladislav Babuška ◽  
...  

Abstract. This paper focuses on major issues related to data reliability and MOBNET network performance in the AlpArray seismic experiments, in which twenty temporary broad-band stations of the Czech MOBNET pool of mobile stations have been involved. Currently used high-resolution scientific methods require high-quality data recorded for a sufficiently long time interval at observatories and during full time of operation of temporary stations. In this paper we present both new hardware and software tools that help to assure the high-quality standard of broad-band seismic data. Special attention is paid to issues like a detection of sensor mis-orientation, timing problems, exchange of record components and/or their polarity reversal, sensor mass centring, or anomalous channel amplitudes due to, e.g., imperfect gain. Thorough data-quality control should represent an integral constituent of seismic data recording, pre-processing and archiving, especially for data from temporary stations in passive seismic experiments. Large international seismic experiments require enormous efforts of scientists from different countries and institutions to gather hundreds of stations to be deployed in the field during a limited time period. In this paper, we demonstrate beneficial effects of the procedures we have developed for having a sufficiently large set of high-quality and reliable data from each group participating in field experiments.


2019 ◽  
Vol 2 (1) ◽  
pp. 19-37 ◽  
Author(s):  
Mikel Hernaez ◽  
Dmitri Pavlichin ◽  
Tsachy Weissman ◽  
Idoia Ochoa

Recently, there has been growing interest in genome sequencing, driven by advances in sequencing technology, in terms of both efficiency and affordability. These developments have allowed many to envision whole-genome sequencing as an invaluable tool for both personalized medical care and public health. As a result, increasingly large and ubiquitous genomic data sets are being generated. This poses a significant challenge for the storage and transmission of these data. Already, it is more expensive to store genomic data for a decade than it is to obtain the data in the first place. This situation calls for efficient representations of genomic information. In this review, we emphasize the need for designing specialized compressors tailored to genomic data and describe the main solutions already proposed. We also give general guidelines for storing these data and conclude with our thoughts on the future of genomic formats and compressors.


2021 ◽  
pp. 000276422110216
Author(s):  
Kazimierz M. Slomczynski ◽  
Irina Tomescu-Dubrow ◽  
Ilona Wysmulek

This article proposes a new approach to analyze protest participation measured in surveys of uneven quality. Because single international survey projects cover only a fraction of the world’s nations in specific periods, researchers increasingly turn to ex-post harmonization of different survey data sets not a priori designed as comparable. However, very few scholars systematically examine the impact of the survey data quality on substantive results. We argue that the variation in source data, especially deviations from standards of survey documentation, data processing, and computer files—proposed by methodologists of Total Survey Error, Survey Quality Monitoring, and Fitness for Intended Use—is important for analyzing protest behavior. In particular, we apply the Survey Data Recycling framework to investigate the extent to which indicators of attending demonstrations and signing petitions in 1,184 national survey projects are associated with measures of data quality, controlling for variability in the questionnaire items. We demonstrate that the null hypothesis of no impact of measures of survey quality on indicators of protest participation must be rejected. Measures of survey documentation, data processing, and computer records, taken together, explain over 5% of the intersurvey variance in the proportions of the populations attending demonstrations or signing petitions.


Author(s):  
Antonella D. Pontoriero ◽  
Giovanna Nordio ◽  
Rubaida Easmin ◽  
Alessio Giacomel ◽  
Barbara Santangelo ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document