flat file
Recently Published Documents


TOTAL DOCUMENTS

59
(FIVE YEARS 16)

H-INDEX

7
(FIVE YEARS 2)

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Jorge Oliveira ◽  
Miguel Antunes ◽  
Claudia P. Godinho ◽  
Miguel C. Teixeira ◽  
Isabel Sá-Correia ◽  
...  

AbstractNumerous genomes are sequenced and made available to the community through the NCBI portal. However, and, unlike what happens for gene function annotation, annotation of promoter sequences and the underlying prediction of regulatory associations is mostly unavailable, severely limiting the ability to interpret genome sequences in a functional genomics perspective. Here we present an approach where one can download a genome of interest from NCBI in the GenBank Flat File (.gbff) format and, with a minimum set of commands, have all the information parsed, organized and made available through the platform web interface. Also, the new genomes are compared with a given genome of reference in search of homologous genes, shared regulatory elements and predicted transcription associations. We present this approach within the context of Community YEASTRACT of the YEASTRACT + portal, thus benefiting from immediate access to all the comparative genomics queries offered in the YEASTRACT + portal. Besides the yeast community, other communities can install the platform independently, without any constraints. In this work, we exemplify the usefulness of the presented tool, within Community YEASTRACT, in constructing a dedicated database and analysing the genome of the highly promising oleaginous red yeast species Rhodotorula toruloides currently poorly studied at the genome and transcriptome levels and with limited genome editing tools. Regulatory prediction is based on the conservation of promoter sequences and available regulatory networks. The case-study examined is focused on the Haa1 transcription factor—a key regulator of yeast resistance to acetic acid, an important inhibitor of industrial bioconversion of lignocellulosic hydrolysates. The new tool described here led to the prediction of a RtHaa1 regulon with expected impact in the optimization of R. toruloides robustness for lignocellulosic and pectin-rich residue biorefinery processes.


Author(s):  
Sara Sgobba ◽  
Chiara Felicetta ◽  
Giovanni Lanzano ◽  
Fadel Ramadan ◽  
Maria D’Amico ◽  
...  

ABSTRACT We present an extended and updated version of the worldwide NEar-Source Strong-motion (NESS) flat file, which includes an increased number of moderate-to-strong earthquakes recorded in epicentral area, new source metadata and intensity measures, comprising spectral displacements and fling-step amplitudes retrieved from the extended baseline correction processing of velocity time series. The resulting dataset consists of 81 events with moment magnitude≥5.5 and hypocentral depth shallower than 40 km, corresponding to 1189 three-component waveforms, which are selected to have a maximum source-to-site distance within one fault length. Details on the flat files, metadata, and ground-motion parameters, processing scheme, and statistical findings are presented and discussed. The analysis of these data allows recognizing the presence of distinctive features (such as pulse-like waveforms, large vertical components, and hanging-wall effects) that can be exploited to assess their impact on near-source seismic motion. As an example, we use the NESS2.0 dataset for calibrating an empirical correction factor of a regional ground-motion model (GMM) mainly based on far-field records. In this way, we can adjust the median predictions to account for near-source effects not fully captured by the reference model. The final goal of this work is to promote the use of the NESS2 flat file as a tool to disseminate qualified and referenced near-source data and metadata in the light of improving the constraints of GMMs (both empirical and physics-based) close to the source.


2021 ◽  
Vol 19 (6) ◽  
pp. 2343-2370
Author(s):  
Federico Passeri ◽  
Cesare Comina ◽  
Sebastiano Foti ◽  
Laura Valentina Socco

AbstractThe compilation and maintenance of experimental databases are of crucial importance in all research fields, allowing for researchers to develop and test new methodologies. In this work, we present a flat-file database of experimental dispersion curves and shear wave velocity profiles, mainly from active surface wave testing, but including also data from passive surface wave testing and invasive methods. The Polito Surface Wave flat-file Database (PSWD) is a gathering of experimental measurements collected within the past 25 years at different Italian sites. Discussion on the database content is reported in this paper to evaluate some statistical properties of surface wave test results. Comparisons with other methods for shear wave velocity measurements are also considered. The main novelty of this work is the homogeneity of the PSWD in terms of processing and interpretation methods. A common processing strategy and a new inversion approach were applied to all the data in the PSWD to guarantee consistency. The PSWD can be useful for further correlation studies and is made available as a reference benchmark for the validation and verification of novel interpretation procedures by other researchers.


2021 ◽  
Author(s):  
Karyna Rodriguez ◽  
Neil Hodgson

<p>Seismic data has been and continues to be the main tool for hydrocarbon exploration. Storing very large quantities of seismic data, as well as making it easily accessible and with machine learning functionality, is the way forward to gain regional and local understanding of petroleum systems. Seismic data has been made available as a streamed service through a web-based platform allowing seismic data access on the spot, from large datasets stored in the cloud. A data lake can be defined as transformed data used for tasks such as reporting, visualization, advanced analytics and machine learning. The global library of data has been deconstructed from the rigid flat file format traditionally associated with seismic and transformed into a distributed, scalable, big data store. This allows for rapid access, complex queries, and efficient use of computer power – fundamental criteria for enabling Big Data technologies such as deep learning.  </p><p>This data lake concept is already changing the way we access seismic data, enhancing the efficiency of gaining insights into any hydrocarbon basin. Examples include the identification of potentially prolific mixed turbidite/contourite systems in the Trujillo Basin offshore Peru, together with important implications of BSR-derived geothermal gradients, which are much higher than expected in a fore arc setting, opening new exploration opportunities. Another example is de-risking and ranking of offshore Malvinas Basin blocks by gaining new insights into areas until very recently considered to be non-prospective. Further de-risking was achieved by carrying out an in-depth source rock analysis in the Malvinas and conjugate southern South Africa Basins. Additionally, the data lake enabled the development of machine learning algorithms for channel recognition which were successfully applied to data offshore Australia and Norway.</p><p>“On demand” regional seismic dataset access is proving invaluable in our efforts to make hydrocarbon exploration more efficient and successful. Machine learning algorithms are helping to automate the more mechanical tasks, leaving time for the more valuable task of analysing the results. The geological insights gained by combining these 2 aspects confirm the value of seismic data lakes.</p>


Geosciences ◽  
2021 ◽  
Vol 11 (2) ◽  
pp. 67
Author(s):  
Erika Schiappapietra ◽  
Chiara Felicetta ◽  
Maria D’Amico

We present an upgraded processing scheme (eBASCO, extended BASeline COrrection) to remove the baseline of strong-motion records by means of a piece-wise linear detrending of the velocity time history. Differently from standard processing schemes, eBASCO does not apply any filtering to remove the low-frequency content of the signal. This approach preserves both the long-period near-source ground-motion, featured by one-side pulse in the velocity trace, and the offset at the end of the displacement trace (fling-step). The software is suitable for a rapid identification of fling-containing waveforms within large strong-motion datasets. The ground displacement of about 600 three-component near-source waveforms has been recovered with the aim of (1) extensively testing the eBASCO capability to capture the long-period content of near-source records, and (2) compiling a qualified strong-motion flat-file useful to calibrate attenuation models for peak ground displacement (PGD), 5% damped displacement response spectra (DS), and permanent displacement amplitude (PD). The results provide a more accurate estimate of ground motions that can be adopted for different engineering purposes, such as performance-based seismic design of structures.


2021 ◽  
Author(s):  
Alex H Wagner ◽  
Lawrence Babb ◽  
Gil Alterovitz ◽  
Michael Baudis ◽  
Matthew Brush ◽  
...  

AbstractMaximizing the personal, public, research, and clinical value of genomic information will require that clinicians, researchers, and testing laboratories exchange genetic variation data reliably. Developed by a partnership among national information resource providers, public initiatives, and diagnostic testing laboratories under the auspices of the Global Alliance for Genomics and Health (GA4GH), the Variation Representation Specification (VRS, pronounced “verse”) is an extensible framework for the semantically precise and computable representation of variation that complements contemporary human-readable and flat file standards for variation representation. VRS objects are designed to be semantically precise representations of variation, and leverage this design to enable unique, federated identification of molecular variation. We describe the components of this framework, including the terminology and information model, schema, data sharing conventions, and a reference implementation, each of which is intended to be broadly useful and freely available for community use. The specification, documentation, examples, and community links are available at https://vrs.ga4gh.org/.


Geosciences ◽  
2020 ◽  
Vol 11 (1) ◽  
pp. 15
Author(s):  
Sara Sgobba ◽  
Giovanni Lanzano ◽  
Francesca Pacor ◽  
Chiara Felicetta

Near-source effects can amplify seismic ground motion, causing large demand to structures and thus their identification and characterization is fundamental for engineering applications. Among the most relevant features, forward-directivity effects may generate near-fault records characterized by a large velocity pulse and unusual response spectral shape amplified in a narrow frequency-band. In this paper, we explore the main statistical features of acceleration and displacement response spectra of a suite of 230 pulse-like signals (impulsive waveforms) contained in the NESS1 (NEar Source Strong-motion) flat-file. These collected pulse-like signals are analyzed in terms of pulse period and pulse azimuthal orientation. We highlight the most relevant differences of the pulse-like spectra compared to the ordinary (i.e., no-pulse) ones, and quantify the contribution of the pulse through a corrective factor of the spectral ordinates. Results show that the proposed empirical factors are able to capture the amplification effect induced by near-fault directivity, and thus they could be usefully included in the framework of probabilistic seismic hazard analysis to adjust ground-motion model (GMM) predictions.


2020 ◽  
Vol 2 ◽  
pp. 1-1
Author(s):  
Dalia E. Varanka

Abstract. Knowledge graphs (KG) are a virtual layer connecting disparate databases into an interoperable framework. Though the application of KGs for enterprises are increasing, geospatial KG design is not common. This presentation describes U.S. Geological Survey (USGS) research to build KGs for integrating geospatial and non-spatial attribute semantics of topographic data. Those geographic information system databases are composed of various feature types and metadata attributes organized various themes and stored in different data formats, such as geodatabases, flat-file spreadsheets, and raster images. The system being created tests two research objectives: 1) the feasibility of semantic technology approaches for geospatial data within the context of national topographic data and 2) the contribution to building a body of knowledge about system architecture for geospatial ontologies and linked open data. This presentation discusses the context of topographic data semantics, the problem and aims of building the system, and the integrated KG framework. The basic workflow and operations of the system architecture consisting of open-source software are described. The architecture modifies existing software with unique solutions such as performing GeoSPAQL queries with Postgres, a relational table datastore, and a map interface with extensions to support linked data queries as browseable graphs. As public spatial data infrastructure, the system is made available as a Docker Container on GitHub.


Sign in / Sign up

Export Citation Format

Share Document