public repositories
Recently Published Documents


TOTAL DOCUMENTS

218
(FIVE YEARS 147)

H-INDEX

11
(FIVE YEARS 5)

2022 ◽  
Author(s):  
Claudio Filipi Gonçalves dos Santos ◽  
João Paulo Papa

Several image processing tasks, such as image classification and object detection, have been significantly improved using Convolutional Neural Networks (CNN). Like ResNet and EfficientNet, many architectures have achieved outstanding results in at least one dataset by the time of their creation. A critical factor in training concerns the network’s regularization, which prevents the structure from overfitting. This work analyzes several regularization methods developed in the last few years, showing significant improvements for different CNN models. The works are classified into three main areas: the first one is called “data augmentation”, where all the techniques focus on performing changes in the input data. The second, named “internal changes”, which aims to describe procedures to modify the feature maps generated by the neural network or the kernels. The last one, called “label”, concerns transforming the labels of a given input. This work presents two main differences comparing to other available surveys about regularization: (i) the first concerns the papers gathered in the manuscript, which are not older than five years, and (ii) the second distinction is about reproducibility, i.e., all works refered here have their code available in public repositories or they have been directly implemented in some framework, such as TensorFlow or Torch.


2022 ◽  
Vol 29 (1) ◽  
pp. 91-101
Author(s):  
Gustavo Caetano Borges ◽  
Julio Cesar Dos Reis ◽  
Claudia Bauzer Medeiros

Scientific research in all fields has advanced in complexity and in the amount of data generated. The heterogeneity of data repositories, data meaning and their metadata standards makes this problem even more significant. In spite of several proposals to find and retrieve research data from public repositories, there is still need for more comprehensive retrieval solutions. In this article, we specify and develop a mechanism to search for scientific data that takes advantage of metadata records and semantic methods. We present the conception of our architecture and how we have implemented it in a use case in the agriculture domain.


2022 ◽  
Author(s):  
Eric Lai ◽  
David Becker ◽  
Pius Brzoska ◽  
Tyler Cassens ◽  
Jeremy Davis-Turak ◽  
...  

The rapid emergence of new SARS-CoV-2 variants raises a number of public health questions including the capability of diagnostic tests to detect new strains, the efficacy of vaccines, and how to map the geographical distribution of variants to better understand patterns of transmission and possible load on healthcare resources. Next-Generation Sequencing (NGS) is the primary method for detecting and tracing the emergence of new variants, but it is expensive, and it can take weeks before sequence data is available in public repositories. Here, we describe a Polymerase Chain Reaction (PCR)-based genotyping approach that is significantly less expensive, accelerates reporting on SARS-CoV-2 variants, and can be implemented in any testing lab performing PCR. Specific Single Nucleotide Polymorphisms (SNPs) and indels are identified that have high positive percent agreement (PPA) and negative percent agreement (NPA) compared to NGS for the major genotypes that circulated in 2021. Using a 48-marker panel, testing on 1,128 retrospective samples yielded a PPA and NPA in the 96.3 to 100% and 99.2 to 100% range, respectively, for the top 10 most prevalent lineages. The effect on PPA and NPA of reducing the number of panel markers was also explored. In addition, with the emergence of Omicron, we also developed an Omicron genotyping panel that distinguishes the Delta and Omicron variants using four (4) highly specific SNPs. Data from testing demonstrates the capability to use the panel to rapidly track the growing prevalence of the Omicron variant in the United States in December 2021.


2021 ◽  
Vol 12 ◽  
Author(s):  
Neha Jha ◽  
Dwight Hall ◽  
Akshay Kanakan ◽  
Priyanka Mehta ◽  
Ranjeet Maurya ◽  
...  

Globally, SARS-CoV-2 has moved from one tide to another with ebbs in between. Genomic surveillance has greatly aided the detection and tracking of the virus and the identification of the variants of concern (VOC). The knowledge and understanding from genomic surveillance is important for a populous country like India for public health and healthcare officials for advance planning. An integrative analysis of the publicly available datasets in GISAID from India reveals the differential distribution of clades, lineages, gender, and age over a year (Apr 2020–Mar 2021). The significant insights include the early evidence towards B.1.617 and B.1.1.7 lineages in the specific states of India. Pan-India longitudinal data highlighted that B.1.36* was the predominant clade in India until January–February 2021 after which it has gradually been replaced by the B.1.617.1 lineage, from December 2020 onward. Regional analysis of the spread of SARS-CoV-2 indicated that B.1.617.3 was first seen in India in the month of October in the state of Maharashtra, while the now most prevalent strain B.1.617.2 was first seen in Bihar and subsequently spread to the states of Maharashtra, Gujarat, and West Bengal. To enable a real time understanding of the transmission and evolution of the SARS-CoV-2 genomes, we built a transmission map available on https://covid19-indiana.soic.iupui.edu/India/EmergingLineages/April2020/to/March2021. Based on our analysis, the rate estimate for divergence in our dataset was 9.48 e-4 substitutions per site/year for SARS-CoV-2. This would enable pandemic preparedness with the addition of future sequencing data from India available in the public repositories for tracking and monitoring the VOCs and variants of interest (VOI). This would help aid decision making from the public health perspective.


2021 ◽  
Author(s):  
Megha Mathur ◽  
Sumeet Patiyal ◽  
Anjali Dhall ◽  
Shipra Jain ◽  
Ritu Tomer ◽  
...  

In the past few decades, public repositories on nucleotides have increased with exponential rates. This pose a major challenge to researchers to predict the structure and function of nucleotide sequences. In order to annotate function of nucleotide sequences it is important to compute features/attributes for predicting function of these sequences using machine learning techniques. In last two decades, several software/platforms have been developed to elicit a wide range of features for nucleotide sequences. In order to complement the existing methods, here we present a platform named Nfeature developed for computing wide range of features of DNA and RNA sequences. It comprises of three major modules namely Composition, Correlation, and Binary profiles. Composition module allow to compute different type of compositions that includes mono-/di-tri-nucleotide composition, reverse complement composition, pseudo composition. Correlation module allow to compute various type of correlations that includes auto-correlation, cross-correlation, pseudo-correlation. Similarly, binary profile is developed for computing binary profile based on nucleotides, di-nucleotides, di-/tri-nucleotide properties. Nfeature also allow to compute entropy of sequences, repeats in sequences and distribution of nucleotides in sequences. In addition to compute feature in whole sequence, it also allows to compute features from part of sequence like split-composition, N-terminal, C-terminal. In a nutshell, Nfeature amalgamates existing features as well as number of novel features like nucleotide repeat index, distance distribution, entropy, binary profile, and properties. This tool computes a total of 29217 and 14385 features for DNA and RNA sequence, respectively. In order to provide, a highly efficient and user-friendly tool, we have developed a standalone package and web-based platform (https://webs.iiitd.edu.in/raghava/nfeature).


2021 ◽  
Author(s):  
Gáspár Lukács ◽  
Andreas Gartus

Conducting research via the internet is a formidable and ever-increasingly popular option for behavioral scientists. However, it is widely acknowledged that web-browsers are not optimized for research: In particular, the timing of display changes (e.g., a stimulus appearing on the screen), still leaves room for improvement. So far, the typically recommended best (or least worst) timing method has been a single requestAnimationFrame (RAF) JavaScript function call within which one would give the display command and obtain the time of that display change. In our Study 1, we assessed two alternatives: Calling the RAF twice consecutively, or calling the RAF during a continually ongoing independent loop of recursive RAF calls. While the former has shown little or no improvement as compared to single RAF calls, with the latter we significantly and substantially improved overall precision, and achieved practically faultless precision in most practical cases. In Study 2, we reassessed this “RAF loop” timing method with images in combination with three different display methods: We found that the precision remained high when using either visibility or opacity changes – while drawing on a canvas element consistently led to comparatively lower precision. We recommend the “RAF loop” display timing method for improved precision in future studies, and visibility or opacity changes when using image stimuli. We have also shared, in public repositories, the easy-to-use code for this method, exactly as employed in our studies.


2021 ◽  
Author(s):  
Amanda Warr ◽  
Caitlin Newman ◽  
Nicky Craig ◽  
Ingrida Vendelė ◽  
Rizalee Pilare ◽  
...  

AbstractAfrican Swine Fever virus (ASFV) is the causative agent of a deadly, panzootic disease, infecting wild and domesticated suid populations. Contained for a long time to the African continent, an outbreak of a particularly infectious variant in Georgia in 2007 initiated the spread of the virus around the globe, severely impacting pork production and local economies. The virus is highly contagious and has a mortality of up to 100% in domestic pigs. It is critical to track the spread of the virus, detect variants associated with pathology, and implement biosecurity measures in the most effective way to limit its spread. Due to its size and other limitations, the 170-190kbp large DNA virus has not been well sequenced with fewer than 200 genome sequences available in public repositories. Here we present an efficient, low-cost method of sequencing ASFV at scale. The method uses tiled PCR amplification of the virus to achieve greater coverage, multiplexability and accuracy on a portable sequencer than achievable using shotgun sequencing. We also present Lilo, a pipeline for assembling tiled amplicon data from viral or microbial genomes without relying on polishing against a reference, allowing for structural variation and hypervariable region assembly other methods fail on. The resulting ASFV genomes are near complete, lacking only parts of the highly repetitive 3’- and 5’telomeric regions, and have a high level of accuracy. Our results will allow sequencing of ASFV at optimal efficiency and high throughput to monitor and act on the spread of the virus.


Diversity ◽  
2021 ◽  
Vol 13 (12) ◽  
pp. 628
Author(s):  
Serena Cavallero ◽  
Margherita Montalbano Di Filippo ◽  
Emiliano Mori ◽  
Andrea Viviano ◽  
Claudio De Liberato ◽  
...  

Adult specimens of Trichuris sp. collected from crested porcupines (Hystrix cristata) from Italy were characterized using an integrative taxonomic approach involving morphological and molecular tools. The morphological features of this Trichuris sp. were compared to data already available for Trichuris spp. from Hystrix sp., revealing diagnostic traits, such as spicule length in males or vulva shape in females, which distinguish this Trichuris sp. from the other species. Evidence from sequences analysis of the partial mitochondrial COX1 region indicated that the taxon under study is a distinct lineage. Biometrical and genetic data suggested this Trichuris sp. to be a valid and separated taxon. However, since molecular data from other Trichuris spp. infecting Hystrix, such as T. infundibulus, T. hystricis, T. javanica, T.landak and T. lenkorani, are missing in public repositories, the number and identity of distinct lineages able to infect porcupines remain only partially defined.


2021 ◽  
Author(s):  
Yu-Ning Huang ◽  
Anushka Rajesh ◽  
Ram Ayyala ◽  
Aditya Sarkar ◽  
Ruiwei Guo ◽  
...  

The scientific community has accumulated enormous amounts of genomic data stored in specialized public repositories. Genomic data is easily accessible and available from public genomic repositories allowing the biomedical community to effectively share the omics datasets. However, improperly annotated or incomplete metadata accompanying the raw omics data can negatively impact the utility of shared for secondary analysis. In this study, we perform a comprehensive analysis under 137 studies over 18,559 samples across six therapeutics fields to assess the completeness of metadata accompanying omics studies in both publication and its related online repositories across and make observations about how the process of data sharing could be made reliable. This analysis involved an initial literature survey in finding studies based on the seven therapeutic fields, that are Alzheimers disease, acute myeloid leukemia, cystic fibrosis, cardiovascular diseases, inflammatory bowel disease, sepsis, and tuberculosis. We carefully examined the availability of metadata over nine clinical variables, that included disease condition, age, organism, sex, tissue type, ethnicity, country, mortality, and clinical severity. By comparing the metadata availability in both original publications and online repositories, we observed discrepancies in sharing the metadata. We determine that the overall availability of metadata is 72.8%, where the most complete reported phenotypes are disease condition and organism, and the least is mortality. Additionally, we examined the completeness of metadata reported separately in original publications and online repositories. The completeness of metadata from the original publication across the nine clinical phenotypes is 71.1%. In contrast, the overall completeness of metadata information from the public repositories is 48.6%. Our study is the first one to assess the completeness of metadata accompanying raw data across a large number of studies and phenotypes and opens a crucial discussion about solutions to improve completeness and accessibility of metadata accompanying omics studies.


2021 ◽  
Vol 12 ◽  
Author(s):  
Jay Nimavat ◽  
Chandrashekar Mootapally ◽  
Neelam M. Nathani ◽  
Devyani Dave ◽  
Mukesh N. Kher ◽  
...  

Humankind has suffered many pandemics in history including measles, SARS, MERS, Ebola, and recently the novel Coronavirus disease caused by SARS-CoV-2. As of September 2021, it has affected over 200 million people and caused over 4 million deaths. India is the second most affected country in the world. Up to this date, more than 38 Lakh viral genomes have been submitted to public repositories like GISAID and NCBI to analyze the virus phylogeny and mutations. Here, we analyzed 2349 genome sequences of SARS-CoV-2 submitted in GISAID by a single institute pertaining to infections from the Gujarat state to know their variants and phylogenetic distributions with a major focus on the spike protein. More than 93% of the genomes had one or more mutations in the spike glycoprotein. The D614G variant in spike protein is reported to have a very high frequency of >95% globally followed by the L452R and P681R, thus getting significant attention. The antigenic propensity of a small peptide of 29 residues from 597 to 625 of the spike protein variants having D614 and G614 showed that G614 has a little higher antigenic propensity. Thus, the D614G is the cause for higher viral antigenicity, however, it has not been reported to be effective to be causing more deaths.


Sign in / Sign up

Export Citation Format

Share Document