scholarly journals Shared nomenclature and identifiers for telescopes and instruments

2018 ◽  
Vol 186 ◽  
pp. 04002
Author(s):  
Emmanuelle Perret ◽  
Mireille Louys ◽  
Mihaela Buga ◽  
Soizick Lesteven

In the context of sharing public data, science results are expected to be reproducible and therefore we need full traceability of the origin of the data. On the documentalist side, there is a need to relate instrumental origins to the published data. We propose to define a shared nomenclature to index each publication with unique designations for facilities, telescopes and instruments which could benefit from the Virtual Observatory work on semantics. This would help the documentalists to check the consistency of the instrument description in publications or make it more explicit. Observation period, data quality and spectral coverage, for instance, may be checked by referencing a global instrumentation service which gathers the nominal observation parameters for the telescope/facility/instrument involved. Based on this indexation mechanism, then the bibliographic metrics for telescope /instrument usage would be easy to compute, and tracking services like the ESO telescope bibliography database (TelBib) or others would be easier to feed. This paper traces the existing initiatives and gives the example of a facility description framework reusing Virtual Observatory metadata which could be fed by the community.

2021 ◽  
Vol 22 (S6) ◽  
Author(s):  
Yasmine Mansour ◽  
Annie Chateau ◽  
Anna-Sophie Fiston-Lavier

Abstract Background Meiotic recombination is a vital biological process playing an essential role in genome's structural and functional dynamics. Genomes exhibit highly various recombination profiles along chromosomes associated with several chromatin states. However, eu-heterochromatin boundaries are not available nor easily provided for non-model organisms, especially for newly sequenced ones. Hence, we miss accurate local recombination rates necessary to address evolutionary questions. Results Here, we propose an automated computational tool, based on the Marey maps method, allowing to identify heterochromatin boundaries along chromosomes and estimating local recombination rates. Our method, called BREC (heterochromatin Boundaries and RECombination rate estimates) is non-genome-specific, running even on non-model genomes as long as genetic and physical maps are available. BREC is based on pure statistics and is data-driven, implying that good input data quality remains a strong requirement. Therefore, a data pre-processing module (data quality control and cleaning) is provided. Experiments show that BREC handles different markers' density and distribution issues. Conclusions BREC's heterochromatin boundaries have been validated with cytological equivalents experimentally generated on the fruit fly Drosophila melanogaster genome, for which BREC returns congruent corresponding values. Also, BREC's recombination rates have been compared with previously reported estimates. Based on the promising results, we believe our tool has the potential to help bring data science into the service of genome biology and evolution. We introduce BREC within an R-package and a Shiny web-based user-friendly application yielding a fast, easy-to-use, and broadly accessible resource. The BREC R-package is available at the GitHub repository https://github.com/GenomeStructureOrganization.


2017 ◽  
Author(s):  
Amelia McNamara ◽  
Nicholas J Horton

Data wrangling is a critical foundation of data science, and wrangling of categorical data is an important component of this process. However, categorical data can introduce unique issues in data wrangling, particularly in real-world settings with collaborators and periodically-updated dynamic data. This paper discusses common problems arising from categorical variable transformations in R, demonstrates the use of factors, and suggests approaches to address data wrangling challenges. For each problem, we present at least two strategies for management, one in base R and the other from the ‘tidyverse.’ We consider several motivating examples, suggest defensive coding strategies, and outline principles for data wrangling to help ensure data quality and sound analysis.


2021 ◽  
Vol 251 ◽  
pp. 04010
Author(s):  
Thomas Britton ◽  
David Lawrence ◽  
Kishansingh Rajput

Data quality monitoring is critical to all experiments impacting the quality of any physics results. Traditionally, this is done through an alarm system, which detects low level faults, leaving higher level monitoring to human crews. Artificial Intelligence is beginning to find its way into scientific applications, but comes with difficulties, relying on the acquisition of new skill sets, either through education or acquisition, in data science. This paper will discuss the development and deployment of the Hydra monitoring system in production at Gluex. It will show how “off-the-shelf” technologies can be rapidly developed, as well as discuss what sociological hurdles must be overcome to successfully deploy such a system. Early results from production running of Hydra will also be shared as well as a future outlook for development of Hydra.


2018 ◽  
Vol 2 ◽  
pp. e25317
Author(s):  
Stijn Van Hoey ◽  
Peter Desmet

The ability to communicate and assess the quality and fitness for use of data is crucial to ensure maximum utility and re-use. Data consumers have certain requirements for the data they seek and need to be able to check if a data set conforms with these requirements. Data publishers aim to provide data with the highest possible quality and need to be able to identify potential errors that can be addressed with the available information at hand. The development and adoption of data publication guidelines is one approach to define and meet those requirements. However, the use of a guideline, the mapping decisions, and the requirements a dataset is expected to meet, are generally not communicated with the provided data. Moreover, these guidelines are typically intended for humans only. In this talk, we will present 'whip': a proposed syntax for data specifications. With whip, one can define column-based constraints for tabular (tidy) data using a number of rules, e.g. how data is structured following Darwin Core, how a term uses controlled vocabulary values, or what the expected minimum and maximum values are. These rules are human- and machine-readable, which communicates the specifications, and allows to automatically validate those in pipelines for data publication and quality assessment, such as Kurator. Whip can be formatted as a (yaml) text file that can be provided with the published data, communicating the specifications a dataset is expected to meet. The scope of these specifications can be specific to a dataset, but can also be used to express expected data quality and fitness for use of a publisher, consumer or community, allowing bottom-up and top-down adoption. As such, these specifications are complementary to the core set of data quality tests as currently under development by the TDWG Biodiversity Data Quality Task 2 Group 2. Whip rules are currently generic, but more specific ones can be defined to address requirements for biodiversity information.


Author(s):  
Kaleem Razzaq Malik ◽  
Tauqir Ahmad

This chapter will clearly show the need for better mapping techniques for Relational Database (RDB) all the way to Resource Description Framework (RDF). This includes coverage of each data model limitations and benefits for getting better results. Here, each form of data being transform has its own importance in the field of data science. As RDB is well known back end storage for information used to many kinds of applications; especially the web, desktop, remote, embedded, and network-based applications. Whereas, EXtensible Markup Language (XML) in the well-known standard for data for transferring among all computer related resources regardless of their type, shape, place, capability and capacity due to its form is in application understandable form. Finally, semantically enriched and simple of available in Semantic Web is RDF. This comes handy when with the use of linked data to get intelligent inference better and efficient. Multiple Algorithms are built to support this system experiments and proving its true nature of the study.


2020 ◽  
Author(s):  
Yasmine Mansour ◽  
Annie Chateau ◽  
Anna-Sophie Fiston-Lavier

AbstractMotivationMeiotic recombination is a vital biological process playing an essential role in genomes structural and functional dynamics. Genomes exhibit highly various recombination profiles along chromosomes associated with several chromatin states. However, eu-heterochromatin boundaries are not available nor easily provided for non-model organisms, especially for newly sequenced ones. Hence, we miss accurate local recombination rates, necessary to address evolutionary questions.ResultsHere, we propose an automated computational tool, based on the Marey maps method, allowing to identify heterochromatin boundaries along chromosomes and estimating local recombination rates. Our method, called BREC (heterochromatin Boundaries and RECombination rate estimates) is non-genome-specific, running even on non-model genomes as long as genetic and physical maps are available. BREC is based on pure statistics and is data-driven, implying that good input data quality remains a strong requirement. Therefore, a data pre-processing module (data quality control and cleaning) is provided. Experiments show that BREC handles different markers density and distribution issues. BREC’s heterochromatin boundaries have been validated with cytological equivalents experimentally generated on the fruit fly Drosophila melanogaster genome, for which BREC returns congruent corresponding values. Also, BREC’s recombination rates have been compared with previously reported estimates. Based on the promising results, we believe our tool has the potential to help bring data science into the service of genome biology and evolution. We introduce BREC within an R-package and a Shiny web-based user-friendly application yielding a fast, easy-to-use, and broadly accessible resource.AvailabilityBREC R-package is available at the GitHub repository https://github.com/ymansour21/BREC.


2018 ◽  
Vol 2 ◽  
pp. e26083
Author(s):  
Teresa Mayfield

At an institution without a permanent collections manager or curators, who has time to publish data or research issues on that data? Collections with little or no institutional support often benefit from passionate volunteers who continually seek ways to keep them relevant. The University of Texas at El Paso Biodiversity Collections (UTEP-BC) has been cared for in this manner by a small group of dedicated faculty and emeritus curators who have managed with no budget to care for the specimens, perform and publish research about them, and publish a good portion of the collections data. An IMLS grant allowed these dedicated volunteers to hire a Collections Manager who would migrate the already published data from the collections and add unpublished specimen records from the in-house developed FileMaker Pro database to a new collection management system (Arctos) that would allow for better records management and ease of publication. Arctos is a publicly searchable web-based system, but most collections also see the benefit of participation with biodiversity data aggregators such as the Global Biodiversity Information Facility (GBIF), iDigBio, and a multitude of discipline-specific aggregators. Publication of biodiversity data to aggregators is loaded with hidden pathways, acronyms, and tech-speak with which a curator, registrar, or collections manager may not be familiar. After navigating the process to publish the data the reward is feedback! Now data can be improved, and everyone wins, right? In the case of UTEP-BC data, the feedback sits idle as the requirements of the grant under which the Collection Manager was hired take precedence. It will likely remain buried until long after the grant has run its course. Fortunately, the selection of Arctos as a collection management system allowed the UTEP-BC Collection Manager to confer with others publishing biodiversity data to the data aggregators. Members of the Arctos Community have carried on multiple conversations about publishing to aggregators and how to handle the resulting data quality flags. These conversations provide a synthesis of the challenges experienced by collections in over 20 institutions when publishing biodiversity data to aggregators and responding (or not) to their data quality flags. This presentation will cover the experiences and concerns of one Collection Manager as well as those of the Arctos Community related to publishing data to aggregators, deciphering their data quality flags, and development of appropriate responses to those flags.


2017 ◽  
Author(s):  
Anob M. Chakrabarti ◽  
Nejc Haberman ◽  
Arne Praznik ◽  
Nicholas M. Luscombe ◽  
Jernej Ule

AbstractAn interplay of experimental and computational methods is required to achieve a comprehensive understanding of protein-RNA interactions. Crosslinking and immunoprecipitation (CLIP) identifies endogenous interactions by sequencing RNA fragments that co-purify with a selected RBP under stringent conditions. Here we focus on approaches for the analysis of resulting data and appraise the methods for peak calling, visualisation, analysis and computational modelling of protein-RNA binding sites. We advocate a combined assessment of cDNA complexity and specificity for data quality control. Moreover, we demonstrate the value of analysing sequence motif enrichment in peaks assigned from CLIP data, and of visualising RNA maps, which examine the positional distribution of peaks around regulated landmarks in transcripts. We use these to assess how variations in CLIP data quality, and in different peak calling methods, affect the insights into regulatory mechanisms. We conclude by discussing future opportunities for the computational analysis of protein-RNA interaction experiments.


Trudy VNIRO ◽  
2020 ◽  
Vol 181 ◽  
pp. 92-101
Author(s):  
M.A. Solovyeva ◽  
◽  
G.U. Pilipenko ◽  
D.M. Glazov ◽  
V.A. Peterfeld ◽  
...  

In this article presented new data about movements activity of the Baikal seal (Pusa sibirica) — endemic of the Lake Baikal, obtained using satellite telemetry from July 2019 to March 2020. The average distances during the day was 9.9 ± 2.7 SE km for females, 17.0 ± 2.1 km for males, range of movements during the observation period was up to 5459 km for females and up to 8220 km for males. The most active movements occurred in August and December for males and in November for females. In October, males and females moved the least actively, which may be associated with their movement to shallow, rapidly freezing bays and sores. A sharp decline in activity also took place in January-February, when seals probably began a “settlement” ice period. Data consistent with previous tagging of subadult Baikal seals in 1990–1991. We obtained lower values of covered distances and average indicators for the month for females compared to males. However, we not found statistically significant differences between males and females, and question of differences in movement between subadult males and females still open.


Sign in / Sign up

Export Citation Format

Share Document