scholarly journals No budget mitogenomics: Assembling 14 new mitogenomes for the ant subfamily Pseudomyrmecinae from public data

Author(s):  
Gabriel A Vieira ◽  
Francisco Prosdocimi

The advent of Next Generation Sequencing has reduced sequencing costs and increased genomic projects from a huge amount of organismal taxa, generating an unprecedented amount of genomic datasets publicly available. Often, only a tiny fraction of outstanding relevance of the genome data produced by researchers is used in their works. This fact allows the data generated to be recycled in further projects worldwide. The assembly of complete mitogenomes is frequently overlooked though it is useful to understand evolutionary relationships among taxa, especially those presenting poor mtDNA sampling at the level of genera and families. This is exactly the case for ants (Hymenoptera:Formicidae) and more specifically for the subfamily Pseudomyrmecinae, a group of arboreal ants with several cases of convergent coevolution without any complete mitochondrial sequence available. In this work, we assembled, annotated and performed comparative genomics analyses of 14 new complete mitochondria from Pseudomyrmecinae species relying solely on public datasets available from the Sequence Read Archive (SRA). We used all complete mitogenomes available for ants to study the gene order conservation and also to generate two phylogenetic trees using both (i) concatenated set of 13 mitochondrial genes and (ii) the whole mitochondrial sequences. Even though the tree topologies diverged subtly from each other (and from previous studies), our results confirm several known relationships and generate new evidences for sister clade classification inside Pseudomyrmecinae clade. We also performed a synteny analysis for Formcidae and identified possible sites in which nucleotidic insertions happened in mitogenomes of pseudomyrmecine ants. Using a data mining/bioinformatics approach, the current work increased the number of complete mitochondrial genomes available for ants from 15 to 29, demonstrating the unique potential of public databases for mitogenomics studies. The wide applications of mitogenomes in research and presence of mitochondrial data in different public dataset types makes the “no budget mitogenomics” approach ideal for comprehensive molecular studies, especially for subsampled taxa.

2018 ◽  
Author(s):  
Gabriel A Vieira ◽  
Francisco Prosdocimi

The advent of Next Generation Sequencing has reduced sequencing costs and increased genomic projects from a huge amount of organismal taxa, generating an unprecedented amount of genomic datasets publicly available. Often, only a tiny fraction of outstanding relevance of the genome data produced by researchers is used in their works. This fact allows the data generated to be recycled in further projects worldwide. The assembly of complete mitogenomes is frequently overlooked though it is useful to understand evolutionary relationships among taxa, especially those presenting poor mtDNA sampling at the level of genera and families. This is exactly the case for ants (Hymenoptera:Formicidae) and more specifically for the subfamily Pseudomyrmecinae, a group of arboreal ants with several cases of convergent coevolution without any complete mitochondrial sequence available. In this work, we assembled, annotated and performed comparative genomics analyses of 14 new complete mitochondria from Pseudomyrmecinae species relying solely on public datasets available from the Sequence Read Archive (SRA). We used all complete mitogenomes available for ants to study the gene order conservation and also to generate two phylogenetic trees using both (i) concatenated set of 13 mitochondrial genes and (ii) the whole mitochondrial sequences. Even though the tree topologies diverged subtly from each other (and from previous studies), our results confirm several known relationships and generate new evidences for sister clade classification inside Pseudomyrmecinae clade. We also performed a synteny analysis for Formcidae and identified possible sites in which nucleotidic insertions happened in mitogenomes of pseudomyrmecine ants. Using a data mining/bioinformatics approach, the current work increased the number of complete mitochondrial genomes available for ants from 15 to 29, demonstrating the unique potential of public databases for mitogenomics studies. The wide applications of mitogenomes in research and presence of mitochondrial data in different public dataset types makes the “no budget mitogenomics” approach ideal for comprehensive molecular studies, especially for subsampled taxa.


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e6271 ◽  
Author(s):  
Gabriel A. Vieira ◽  
Francisco Prosdocimi

The advent of Next Generation Sequencing has reduced sequencing costs and increased genomic projects from a huge amount of organismal taxa, generating an unprecedented amount of genomic datasets publicly available. Often, only a tiny fraction of outstanding relevance of the genomic data produced by researchers is used in their works. This fact allows the data generated to be recycled in further projects worldwide. The assembly of complete mitogenomes is frequently overlooked though it is useful to understand evolutionary relationships among taxa, especially those presenting poor mtDNA sampling at the level of genera and families. This is exactly the case for ants (Hymenoptera:Formicidae) and more specifically for the subfamily Pseudomyrmecinae, a group of arboreal ants with several cases of convergent coevolution without any complete mitochondrial sequence available. In this work, we assembled, annotated and performed comparative genomics analyses of 14 new complete mitochondria from Pseudomyrmecinae species relying solely on public datasets available from the Sequence Read Archive (SRA). We used all complete mitogenomes available for ants to study the gene order conservation and also to generate two phylogenetic trees using both (i) concatenated set of 13 mitochondrial genes and (ii) the whole mitochondrial sequences. Even though the tree topologies diverged subtly from each other (and from previous studies), our results confirm several known relationships and generate new evidences for sister clade classification inside Pseudomyrmecinae clade. We also performed a synteny analysis for Formicidae and identified possible sites in which nucleotidic insertions happened in mitogenomes of pseudomyrmecine ants. Using a data mining/bioinformatics approach, the current work increased the number of complete mitochondrial genomes available for ants from 15 to 29, demonstrating the unique potential of public databases for mitogenomics studies. The wide applications of mitogenomes in research and presence of mitochondrial data in different public dataset types makes the “no budget mitogenomics” approach ideal for comprehensive molecular studies, especially for subsampled taxa.


GigaScience ◽  
2021 ◽  
Vol 10 (1) ◽  
Author(s):  
Taras K Oleksyk ◽  
Walter W Wolfsberger ◽  
Alexandra M Weber ◽  
Khrystyna Shchubelka ◽  
Olga T Oleksyk ◽  
...  

Abstract Background The main goal of this collaborative effort is to provide genome-wide data for the previously underrepresented population in Eastern Europe, and to provide cross-validation of the data from genome sequences and genotypes of the same individuals acquired by different technologies. We collected 97 genome-grade DNA samples from consented individuals representing major regions of Ukraine that were consented for public data release. BGISEQ-500 sequence data and genotypes by an Illumina GWAS chip were cross-validated on multiple samples and additionally referenced to 1 sample that has been resequenced by Illumina NovaSeq6000 S4 at high coverage. Results The genome data have been searched for genomic variation represented in this population, and a number of variants have been reported: large structural variants, indels, copy number variations, single-nucletide polymorphisms, and microsatellites. To our knowledge, this study provides the largest to-date survey of genetic variation in Ukraine, creating a public reference resource aiming to provide data for medical research in a large understudied population. Conclusions Our results indicate that the genetic diversity of the Ukrainian population is uniquely shaped by evolutionary and demographic forces and cannot be ignored in future genetic and biomedical studies. These data will contribute a wealth of new information bringing forth a wealth of novel, endemic and medically related alleles.


2021 ◽  
Vol 10 (3) ◽  
pp. 154
Author(s):  
Robert Jeansoulin

Providing long-term data about the evolution of railway networks in Europe may help us understand how European Union (EU) member states behave in the long-term, and how they can comply with present EU recommendations. This paper proposes a methodology for collecting data about railway stations, at the maximal extent of the French railway network, a century ago.The expected outcome is a geocoded dataset of French railway stations (gares), which: (a) links gares to each other, (b) links gares with French communes, the basic administrative level for statistical information. Present stations are well documented in public data, but thousands of past stations are sparsely recorded, not geocoded, and often ignored, except in volunteer geographic information (VGI), either collaboratively through Wikipedia or individually. VGI is very valuable in keeping track of that heritage, and remote sensing, including aerial photography is often the last chance to obtain precise locations. The approach is a series of steps: (1) meta-analysis of the public datasets, (2) three-steps fusion: measure-decision-combination, between public datasets, (3) computer-assisted geocoding for ‘gares’ where fusion fails, (4) integration of additional gares gathered from VGI, (5) automated quality control, indicating where quality is questionable. These five families of methods, form a comprehensive computer-assisted reconstruction process (CARP), which constitutes the core of this paper. The outcome is a reliable dataset—in geojson format under open license—encompassing (by January 2021) more than 10,700 items linked to about 7500 of the 35,500 communes of France: that is 60% more than recorded before. This work demonstrates: (a) it is possible to reconstruct transport data from the past, at a national scale; (b) the value of remote sensing and of VGI is considerable in completing public sources from an historical perspective; (c) data quality can be monitored all along the process and (d) the geocoded outcome is ready for a large variety of further studies with statistical data (demography, density, space coverage, CO2 simulation, environmental policies, etc.).


2020 ◽  
Vol 36 (20) ◽  
pp. 5115-5116 ◽  
Author(s):  
August E Woerner ◽  
Jennifer Churchill Cihlar ◽  
Utpal Smart ◽  
Bruce Budowle

Abstract Motivation Assays in mitochondrial genomics rely on accurate read mapping and variant calling. However, there are known and unknown nuclear paralogs that have fundamentally different genetic properties than that of the mitochondrial genome. Such paralogs complicate the interpretation of mitochondrial genome data and confound variant calling. Results Remove the Numts! (RtN!) was developed to categorize reads from massively parallel sequencing data not based on the expected properties and sequence identities of paralogous nuclear encoded mitochondrial sequences, but instead using sequence similarity to a large database of publicly available mitochondrial genomes. RtN! removes low-level sequencing noise and mitochondrial paralogs while not impacting variant calling, while competing methods were shown to remove true variants from mitochondrial mixtures. Availability and implementation https://github.com/Ahhgust/RtN Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 34 (01) ◽  
pp. 865-872
Author(s):  
Soham Pal ◽  
Yash Gupta ◽  
Aditya Shukla ◽  
Aditya Kanade ◽  
Shirish Shevade ◽  
...  

Machine learning models are increasingly being deployed in practice. Machine Learning as a Service (MLaaS) providers expose such models to queries by third-party developers through application programming interfaces (APIs). Prior work has developed model extraction attacks, in which an attacker extracts an approximation of an MLaaS model by making black-box queries to it. We design ActiveThief – a model extraction framework for deep neural networks that makes use of active learning techniques and unannotated public datasets to perform model extraction. It does not expect strong domain knowledge or access to annotated data on the part of the attacker. We demonstrate that (1) it is possible to use ActiveThief to extract deep classifiers trained on a variety of datasets from image and text domains, while querying the model with as few as 10-30% of samples from public datasets, (2) the resulting model exhibits a higher transferability success rate of adversarial examples than prior work, and (3) the attack evades detection by the state-of-the-art model extraction detection method, PRADA.


2019 ◽  
Vol 19 (1) ◽  
Author(s):  
Jia-Qian Liu ◽  
Wen-Xing Li ◽  
Jun-Juan Zheng ◽  
Qing-Nan Tian ◽  
Jing-Fei Huang ◽  
...  

Abstract Background Various apolipoproteins widely distributed among vertebrata play key roles in lipid metabolism and have a direct correlation with human diseases as diagnostic markers. However, the evolutionary progress of apolipoproteins in species remains unclear. Nine human apolipoproteins and well-annotated genome data of 30 species were used to identify 210 apolipoprotein family members distributed among species from fish to humans. Our study focused on the evolution of nine exchangeable apolipoproteins (ApoA-I/II/IV/V, ApoC-I~IV and ApoE) from Chondrichthyes, Holostei, Teleostei, Amphibia, Sauria (including Aves), Prototheria, Marsupialia and Eutheria. Results In this study, we reported the overall distribution and the frequent gain and loss evolutionary events of apolipoprotein family members in vertebrata. Phylogenetic trees of orthologous apolipoproteins indicated evident divergence between species evolution and apolipoprotein phylogeny. Successive gain and loss events were found by evaluating the presence and absence of apolipoproteins in the context of species evolution. For example, only ApoA-I and ApoA-IV occurred in cartilaginous fish as ancient apolipoproteins. ApoA-II, ApoE, and ApoC-I/ApoC-II were found in Holostei, Coelacanthiformes, and Teleostei, respectively, but the latter three apolipoproteins were absent from Aves. ApoC-I was also absent from Cetartiodactyla. The apolipoprotein ApoC-III emerged in terrestrial animals, and ApoC-IV first arose in Eutheria. The results indicate that the order of the emergence of apolipoproteins is most likely ApoA-I/ApoA-IV, ApoE, ApoA-II, ApoC-I/ApoC-II, ApoA-V, ApoC-III, and ApoC-IV. Conclusions This study reveals not only the phylogeny of apolipoprotein family members in species from Chondrichthyes to Eutheria but also the occurrence and origin of new apolipoproteins. The broad perspective of gain and loss events and the evolutionary scenario of apolipoproteins across vertebrata provide a significant reference for the research of apolipoprotein function and related diseases.


2012 ◽  
Vol 3 (1) ◽  
Author(s):  
Nell Sedransk ◽  
Linda J. Young ◽  
Cliff Spiegelman

Making published, scientific research data publicly available can benefit scientists and policy makers only if there is sufficient information for these data to be intelligible. Thus the necessary meta-data go beyond the scientific, technological detail and extend to the statistical approach and methodologies applied to these data. The statistical principles that give integrity to researchers’ analyses and interpretations of their data require documentation. This is true when the intent is to verify or validate the published research findings; it is equally true when the intent is to utilize the scientific data in conjunction with other data or new experimental data to explore complex questions; and it is profoundly important when the scientific results and interpretations are taken outside the world of science to establish a basis for policy, for legal precedent or for decision-making. When research draws on already public data bases, e.g., a large federal statistical data base or a large scientific data base, selection of data for analysis, whether by selection (subsampling) or by aggregating, is specific to that research so that this (statistical) methodology is a crucial part of the meta-data. Examples illustrate the role of statistical meta-data in the use and reuse of these public datasets and the impact on public policy and precedent.


Phytotaxa ◽  
2014 ◽  
Vol 162 (4) ◽  
pp. 223 ◽  
Author(s):  
Richard Verano Dumilag ◽  
Arturo Lluisma

Although the phylogeny of the genus Kappaphycus has been the subject of a number of published studies, the phylogenetic placement of Kappaphycus inermis within the genus has remained unresolved.  In this study, we sought to determine the phylogenetic affinities of K. inermis with the other congeneric species using mitochondrial (cox1 and cox2–3 spacer) and plastid (rbcL and RuBisCo spacer) markers, using specimens collected from northwestern Philippines. Morphological observations of the collected materials confirmed the presence of key morphological features that distinguish K. inermis from the other members of Kappaphycus. Molecular analyses based on the organellar genetic markers revealed that K. inermis is indeed phylogenetically distinct from K. alvarezii, K. striatus, K. cottonii and K. malesianus, a species which was recently erected based on specimens from Malaysia. The Philippine K. inermis specimens formed a sister clade to K. malesianus (also referred to as “Aring-aring” in Malaysia) in phylogenetic trees inferred from cox1, cox2–3 spacer and rbcL, but not the RuBisCo spacer whose sequence is identical in both K. inermis and K. malesianus.  The analysis also revealed  that specimens of unidentified Kappaphycus species collected from two other sites in the Philippines and referred to as “Aring-aring” by local farmers/traders were varieties of K. alvarezii and K. striatus.


Author(s):  
Yun-Young Hwang Et.al

In order to make public data more useful, it is necessary to provide relevant data sets that meet the needs of users. We introduce the method of linkage between datasets. We provide a method for deriving linkages between fields of structured datasets provided by public data portals. We defined a dataset and connectivity between datasets. The connectivity between them is based on the metadata of the dataset and the linkage between the actual data field names and values. We constructed the standard field names. Based on this standard, we established the relationship between the datasets. This paper covers 31,692 structured datasets (as of May 31, 2020) among the public data portal datasets. We extracted 1,185,846 field names from over 30,000 datasets. We extracted 1,185,846 field names from over 30,000 datasets. As a result of analyzing the field names, the field names related to spatial information were the most common at 35%. This paper verified the method of deriving the relation between data sets, focusing on the field names classified as spatial information. For this reason, we have defined spatial standard field names. To derive similar field names, we extracted related field names into spaces such as locations, coordinates, addresses, and zip codes used in public datasets. The standard field name of spatial information was designed and derived 43% cooperation rate of 31,692 datasets. In the future, we plan to apply similar field names additionally to improve the data set cooperation rate of the spatial information standard.


Sign in / Sign up

Export Citation Format

Share Document