scholarly journals Accessible molecular phylogenomics at no cost: obtaining 14 new mitogenomes for the ant subfamily Pseudomyrmecinae from public data

PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e6271 ◽  
Author(s):  
Gabriel A. Vieira ◽  
Francisco Prosdocimi

The advent of Next Generation Sequencing has reduced sequencing costs and increased genomic projects from a huge amount of organismal taxa, generating an unprecedented amount of genomic datasets publicly available. Often, only a tiny fraction of outstanding relevance of the genomic data produced by researchers is used in their works. This fact allows the data generated to be recycled in further projects worldwide. The assembly of complete mitogenomes is frequently overlooked though it is useful to understand evolutionary relationships among taxa, especially those presenting poor mtDNA sampling at the level of genera and families. This is exactly the case for ants (Hymenoptera:Formicidae) and more specifically for the subfamily Pseudomyrmecinae, a group of arboreal ants with several cases of convergent coevolution without any complete mitochondrial sequence available. In this work, we assembled, annotated and performed comparative genomics analyses of 14 new complete mitochondria from Pseudomyrmecinae species relying solely on public datasets available from the Sequence Read Archive (SRA). We used all complete mitogenomes available for ants to study the gene order conservation and also to generate two phylogenetic trees using both (i) concatenated set of 13 mitochondrial genes and (ii) the whole mitochondrial sequences. Even though the tree topologies diverged subtly from each other (and from previous studies), our results confirm several known relationships and generate new evidences for sister clade classification inside Pseudomyrmecinae clade. We also performed a synteny analysis for Formicidae and identified possible sites in which nucleotidic insertions happened in mitogenomes of pseudomyrmecine ants. Using a data mining/bioinformatics approach, the current work increased the number of complete mitochondrial genomes available for ants from 15 to 29, demonstrating the unique potential of public databases for mitogenomics studies. The wide applications of mitogenomes in research and presence of mitochondrial data in different public dataset types makes the “no budget mitogenomics” approach ideal for comprehensive molecular studies, especially for subsampled taxa.

2018 ◽  
Author(s):  
Gabriel A Vieira ◽  
Francisco Prosdocimi

The advent of Next Generation Sequencing has reduced sequencing costs and increased genomic projects from a huge amount of organismal taxa, generating an unprecedented amount of genomic datasets publicly available. Often, only a tiny fraction of outstanding relevance of the genome data produced by researchers is used in their works. This fact allows the data generated to be recycled in further projects worldwide. The assembly of complete mitogenomes is frequently overlooked though it is useful to understand evolutionary relationships among taxa, especially those presenting poor mtDNA sampling at the level of genera and families. This is exactly the case for ants (Hymenoptera:Formicidae) and more specifically for the subfamily Pseudomyrmecinae, a group of arboreal ants with several cases of convergent coevolution without any complete mitochondrial sequence available. In this work, we assembled, annotated and performed comparative genomics analyses of 14 new complete mitochondria from Pseudomyrmecinae species relying solely on public datasets available from the Sequence Read Archive (SRA). We used all complete mitogenomes available for ants to study the gene order conservation and also to generate two phylogenetic trees using both (i) concatenated set of 13 mitochondrial genes and (ii) the whole mitochondrial sequences. Even though the tree topologies diverged subtly from each other (and from previous studies), our results confirm several known relationships and generate new evidences for sister clade classification inside Pseudomyrmecinae clade. We also performed a synteny analysis for Formcidae and identified possible sites in which nucleotidic insertions happened in mitogenomes of pseudomyrmecine ants. Using a data mining/bioinformatics approach, the current work increased the number of complete mitochondrial genomes available for ants from 15 to 29, demonstrating the unique potential of public databases for mitogenomics studies. The wide applications of mitogenomes in research and presence of mitochondrial data in different public dataset types makes the “no budget mitogenomics” approach ideal for comprehensive molecular studies, especially for subsampled taxa.


2018 ◽  
Author(s):  
Gabriel A Vieira ◽  
Francisco Prosdocimi

The advent of Next Generation Sequencing has reduced sequencing costs and increased genomic projects from a huge amount of organismal taxa, generating an unprecedented amount of genomic datasets publicly available. Often, only a tiny fraction of outstanding relevance of the genome data produced by researchers is used in their works. This fact allows the data generated to be recycled in further projects worldwide. The assembly of complete mitogenomes is frequently overlooked though it is useful to understand evolutionary relationships among taxa, especially those presenting poor mtDNA sampling at the level of genera and families. This is exactly the case for ants (Hymenoptera:Formicidae) and more specifically for the subfamily Pseudomyrmecinae, a group of arboreal ants with several cases of convergent coevolution without any complete mitochondrial sequence available. In this work, we assembled, annotated and performed comparative genomics analyses of 14 new complete mitochondria from Pseudomyrmecinae species relying solely on public datasets available from the Sequence Read Archive (SRA). We used all complete mitogenomes available for ants to study the gene order conservation and also to generate two phylogenetic trees using both (i) concatenated set of 13 mitochondrial genes and (ii) the whole mitochondrial sequences. Even though the tree topologies diverged subtly from each other (and from previous studies), our results confirm several known relationships and generate new evidences for sister clade classification inside Pseudomyrmecinae clade. We also performed a synteny analysis for Formcidae and identified possible sites in which nucleotidic insertions happened in mitogenomes of pseudomyrmecine ants. Using a data mining/bioinformatics approach, the current work increased the number of complete mitochondrial genomes available for ants from 15 to 29, demonstrating the unique potential of public databases for mitogenomics studies. The wide applications of mitogenomes in research and presence of mitochondrial data in different public dataset types makes the “no budget mitogenomics” approach ideal for comprehensive molecular studies, especially for subsampled taxa.


2021 ◽  
Vol 10 (3) ◽  
pp. 154
Author(s):  
Robert Jeansoulin

Providing long-term data about the evolution of railway networks in Europe may help us understand how European Union (EU) member states behave in the long-term, and how they can comply with present EU recommendations. This paper proposes a methodology for collecting data about railway stations, at the maximal extent of the French railway network, a century ago.The expected outcome is a geocoded dataset of French railway stations (gares), which: (a) links gares to each other, (b) links gares with French communes, the basic administrative level for statistical information. Present stations are well documented in public data, but thousands of past stations are sparsely recorded, not geocoded, and often ignored, except in volunteer geographic information (VGI), either collaboratively through Wikipedia or individually. VGI is very valuable in keeping track of that heritage, and remote sensing, including aerial photography is often the last chance to obtain precise locations. The approach is a series of steps: (1) meta-analysis of the public datasets, (2) three-steps fusion: measure-decision-combination, between public datasets, (3) computer-assisted geocoding for ‘gares’ where fusion fails, (4) integration of additional gares gathered from VGI, (5) automated quality control, indicating where quality is questionable. These five families of methods, form a comprehensive computer-assisted reconstruction process (CARP), which constitutes the core of this paper. The outcome is a reliable dataset—in geojson format under open license—encompassing (by January 2021) more than 10,700 items linked to about 7500 of the 35,500 communes of France: that is 60% more than recorded before. This work demonstrates: (a) it is possible to reconstruct transport data from the past, at a national scale; (b) the value of remote sensing and of VGI is considerable in completing public sources from an historical perspective; (c) data quality can be monitored all along the process and (d) the geocoded outcome is ready for a large variety of further studies with statistical data (demography, density, space coverage, CO2 simulation, environmental policies, etc.).


2020 ◽  
Vol 34 (01) ◽  
pp. 865-872
Author(s):  
Soham Pal ◽  
Yash Gupta ◽  
Aditya Shukla ◽  
Aditya Kanade ◽  
Shirish Shevade ◽  
...  

Machine learning models are increasingly being deployed in practice. Machine Learning as a Service (MLaaS) providers expose such models to queries by third-party developers through application programming interfaces (APIs). Prior work has developed model extraction attacks, in which an attacker extracts an approximation of an MLaaS model by making black-box queries to it. We design ActiveThief – a model extraction framework for deep neural networks that makes use of active learning techniques and unannotated public datasets to perform model extraction. It does not expect strong domain knowledge or access to annotated data on the part of the attacker. We demonstrate that (1) it is possible to use ActiveThief to extract deep classifiers trained on a variety of datasets from image and text domains, while querying the model with as few as 10-30% of samples from public datasets, (2) the resulting model exhibits a higher transferability success rate of adversarial examples than prior work, and (3) the attack evades detection by the state-of-the-art model extraction detection method, PRADA.


2012 ◽  
Vol 3 (1) ◽  
Author(s):  
Nell Sedransk ◽  
Linda J. Young ◽  
Cliff Spiegelman

Making published, scientific research data publicly available can benefit scientists and policy makers only if there is sufficient information for these data to be intelligible. Thus the necessary meta-data go beyond the scientific, technological detail and extend to the statistical approach and methodologies applied to these data. The statistical principles that give integrity to researchers’ analyses and interpretations of their data require documentation. This is true when the intent is to verify or validate the published research findings; it is equally true when the intent is to utilize the scientific data in conjunction with other data or new experimental data to explore complex questions; and it is profoundly important when the scientific results and interpretations are taken outside the world of science to establish a basis for policy, for legal precedent or for decision-making. When research draws on already public data bases, e.g., a large federal statistical data base or a large scientific data base, selection of data for analysis, whether by selection (subsampling) or by aggregating, is specific to that research so that this (statistical) methodology is a crucial part of the meta-data. Examples illustrate the role of statistical meta-data in the use and reuse of these public datasets and the impact on public policy and precedent.


Phytotaxa ◽  
2014 ◽  
Vol 162 (4) ◽  
pp. 223 ◽  
Author(s):  
Richard Verano Dumilag ◽  
Arturo Lluisma

Although the phylogeny of the genus Kappaphycus has been the subject of a number of published studies, the phylogenetic placement of Kappaphycus inermis within the genus has remained unresolved.  In this study, we sought to determine the phylogenetic affinities of K. inermis with the other congeneric species using mitochondrial (cox1 and cox2–3 spacer) and plastid (rbcL and RuBisCo spacer) markers, using specimens collected from northwestern Philippines. Morphological observations of the collected materials confirmed the presence of key morphological features that distinguish K. inermis from the other members of Kappaphycus. Molecular analyses based on the organellar genetic markers revealed that K. inermis is indeed phylogenetically distinct from K. alvarezii, K. striatus, K. cottonii and K. malesianus, a species which was recently erected based on specimens from Malaysia. The Philippine K. inermis specimens formed a sister clade to K. malesianus (also referred to as “Aring-aring” in Malaysia) in phylogenetic trees inferred from cox1, cox2–3 spacer and rbcL, but not the RuBisCo spacer whose sequence is identical in both K. inermis and K. malesianus.  The analysis also revealed  that specimens of unidentified Kappaphycus species collected from two other sites in the Philippines and referred to as “Aring-aring” by local farmers/traders were varieties of K. alvarezii and K. striatus.


Author(s):  
Yun-Young Hwang Et.al

In order to make public data more useful, it is necessary to provide relevant data sets that meet the needs of users. We introduce the method of linkage between datasets. We provide a method for deriving linkages between fields of structured datasets provided by public data portals. We defined a dataset and connectivity between datasets. The connectivity between them is based on the metadata of the dataset and the linkage between the actual data field names and values. We constructed the standard field names. Based on this standard, we established the relationship between the datasets. This paper covers 31,692 structured datasets (as of May 31, 2020) among the public data portal datasets. We extracted 1,185,846 field names from over 30,000 datasets. We extracted 1,185,846 field names from over 30,000 datasets. As a result of analyzing the field names, the field names related to spatial information were the most common at 35%. This paper verified the method of deriving the relation between data sets, focusing on the field names classified as spatial information. For this reason, we have defined spatial standard field names. To derive similar field names, we extracted related field names into spaces such as locations, coordinates, addresses, and zip codes used in public datasets. The standard field name of spatial information was designed and derived 43% cooperation rate of 31,692 datasets. In the future, we plan to apply similar field names additionally to improve the data set cooperation rate of the spatial information standard.


Author(s):  
Taghi Ghassemi-Khademi ◽  
Mohammad Ali Oshaghi ◽  
Hassan Vatandoost ◽  
Seyed Massoud Madjdzadeh ◽  
Mohammad Amin Gorouhi

Background: Among the blood-sucking insects, Anopheles mosquitoes have a very special position, because they transmit parasites of the genus Plasmodium, which cause malaria as one of the main vector-borne disease worldwide. The aim of this review study was to evaluate utility of complete mitochondrial genomes in phylogenetic classification of the species of Anopheles. Methods: The complete mitochondrial genome sequences belonging to 28 species of the genus Anopheles (n=32) were downloaded from NCBI. The phylogenetic trees were constructed using the ML, NJ, ME, and Bayesian inference methods. Results: In general, the results of the present survey revealed that the complete mitochondrial genomes act very accu- rately in recognition of the taxonomic and phylogenetic status of these species and provide a higher level of support than those based on individual or partial mitochondrial genes so that by using them, we can meticulously reconstruct and modify Anopheles classification. Conclusion: Understanding the taxonomic position of Anopheles, can be a very effective step in better planning for controlling these malaria vectors in the world and will improve our knowledge of their evolutionary biology.


2019 ◽  
Author(s):  
Gang Liu ◽  
Lizhi Zhou ◽  
Guanghong Zhao

The phylogenetic relationships between owls and nightjars are rather complex and controversial. To clarify these relationships, we determined the complete mitochondrial genomes of Glaucidium cuculoides, Otus scops, Glaucidium brodiei, Caprimulgus indicus, and Strix leptogrammica, and estimated phylogenetic trees based on the complete mitochondrial genomes and aligned sequences from closely related species that were obtained in GenBank. The complete mitochondrial genomes were 17392, 17317, 17549, 17536, and 16307 bp in length. All mitochondrial genomes contained 13 protein-coding genes, two rRNAs, 22 tRNAs, and a putative control region. All mitochondrial genomes except for that of Strix leptogrammica contained a pseudo-control region. ATG, GTG, and ATA are generally start codons, whereas TAA is the most frequent stop codon. All tRNAs in the new mtDNAs could be folded into canonical cloverleaf secondary structures except for tRNASer (AGY) and tRNALeu (CUN) , which missing the “DHU” arm. The phylogenetic relationships demonstrated that Strigiformes and Caprimulgiformes are independent orders, and Aegothelidae is a family within Caprimulgiformes. The results also revealed that Accipitriformes is an independent order, and Pandionidae and Sagittariidae are independent families. The results also supported that Apodiformes is polyphyletic, and hummingbirds (family Trochilidae) belong to Apodiformes. Piciformes was most distantly related to all other analyzed orders.


Sensors ◽  
2020 ◽  
Vol 20 (5) ◽  
pp. 1466 ◽  
Author(s):  
Eduardo Casilari ◽  
Raúl Lora-Rivera ◽  
Francisco García-Lagos

Due to the repercussion of falls on both the health and self-sufficiency of older people and on the financial sustainability of healthcare systems, the study of wearable fall detection systems (FDSs) has gained much attention during the last years. The core of a FDS is the algorithm that discriminates falls from conventional Activities of Daily Life (ADLs). This work presents and evaluates a convolutional deep neural network when it is applied to identify fall patterns based on the measurements collected by a transportable tri-axial accelerometer. In contrast with most works in the related literature, the evaluation is performed against a wide set of public data repositories containing the traces obtained from diverse groups of volunteers during the execution of ADLs and mimicked falls. Although the method can yield very good results when it is hyper-parameterized for a certain dataset, the global evaluation with the other repositories highlights the difficulty of extrapolating to other testbeds the network architecture that was configured and optimized for a particular dataset.


Sign in / Sign up

Export Citation Format

Share Document