scholarly journals Absent from DNA and protein: genomic characterization of nullomers and nullpeptides across functional categories and evolution

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ilias Georgakopoulos-Soares ◽  
Ofer Yizhar-Barnea ◽  
Ioannis Mouratidis ◽  
Martin Hemberg ◽  
Nadav Ahituv

Abstract Nullomers and nullpeptides are short DNA or amino acid sequences that are absent from a genome or proteome, respectively. One potential cause for their absence could be their having a detrimental impact on an organism. Results Here, we identify all possible nullomers and nullpeptides in the genomes and proteomes of thirty eukaryotes and demonstrate that a significant proportion of these sequences are under negative selection. We also identify nullomers that are unique to specific functional categories: coding sequences, exons, introns, 5′UTR, 3′UTR, promoters, and show that coding sequence and promoter nullomers are most likely to be selected against. By analyzing all protein sequences across the tree of life, we further identify 36,081 peptides up to six amino acids in length that do not exist in any known organism, termed primes. We next characterize all possible single base pair mutations that can lead to the appearance of a nullomer in the human genome, observing a significantly higher number of mutations than expected by chance for specific nullomer sequences in transposable elements, likely due to their suppression. We also annotate nullomers that appear due to naturally occurring variants and show that a subset of them can be used to distinguish between different human populations. Analysis of nullomers and nullpeptides across vertebrate evolution shows they can also be used as phylogenetic classifiers. Conclusions We provide a catalog of nullomers and nullpeptides in distinct functional categories, develop methods to systematically study them, and highlight the use of variability in these sequences in other analyses

2020 ◽  
Author(s):  
Ilias Georgakopoulos-Soares ◽  
Ofer Yizhar Barnea ◽  
Ioannis Mouratidis ◽  
Martin Hemberg ◽  
Nadav Ahituv

AbstractNullomers and nullpeptides are short DNA or amino acid sequences that are absent from a genome or proteome, respectively. One potential cause for their absence could be that they have a detrimental impact on an organism. Here, we identified all possible nullomers and nullpeptides in the genomes and proteomes of over thirty species and show that a significant proportion of these sequences are under negative selection. We assign nullomers to different functional categories (coding sequences, exons, introns, 5’UTR, 3’UTR and promoters) and show that nullomers from coding sequences and promoters are most likely to be selected against. Utilizing variants in the human population, we annotate variant-associated nullomers, highlighting their potential use as DNA ‘fingerprints’. Phylogenetic analyses of nullomers and nullpeptides across evolution shows that they could be used to build phylogenetic trees. Our work provides a catalog of genomic and proteome derived absent k-mers, together with a novel scoring function to determine their potential functional importance. In addition, it shows how these unique sequences could be used as DNA ‘fingerprints’ or for phylogenetic analyses.


Plant Disease ◽  
2018 ◽  
Vol 102 (12) ◽  
pp. 2571-2577 ◽  
Author(s):  
Scott Adkins ◽  
Tom D’Elia ◽  
Kornelia Fillmer ◽  
Patchara Pongam ◽  
Carlye A. Baker

Foliar symptoms suggestive of virus infection were observed on the ornamental plant hoya (Hoya spp.; commonly known as waxflower) in Florida. An agent that reacted with commercially available tobamovirus detection reagents was mechanically transmitted to Chenopodium quinoa and Nicotiana benthamiana. Rod-shaped particles ∼300 nm in length and typical of tobamoviruses were observed in partially purified virion preparations by electron microscopy. An experimental host range was determined by mechanical inoculation with virions, and systemic infections were observed in plants in the Asclepiadaceae, Apocynaceae, and Solanaceae families. Some species in the Solanaceae and Chenopodiaceae families allowed virus replication only in inoculated leaves, and were thus only local hosts for the virus. Tested plants in the Amaranthaceae, Apiaceae, Brassicaceae, Cucurbitaceae, Fabaceae, and Malvaceae did not support either local or systemic virus infection. The complete genome for the virus was sequenced and shown to have a typical tobamovirus organization. Comparisons of genome nucleotide sequence and individual gene deduced amino acid sequences indicate that it is a novel tobamovirus sharing the highest level of sequence identity with Streptocarpus flower break virus and members of the Brassicaceae-infecting subgroup of tobamoviruses. The virus, for which the name Hoya chlorotic spot virus (HoCSV) is proposed, was detected in multiple hoya plants from different locations in Florida.


2020 ◽  
Author(s):  
Chunyu Liu ◽  
Jessica L. Fetterman ◽  
Yong Qian ◽  
Xianbang Sun ◽  
Kaiyu Yan ◽  
...  

ABSTRACTWe investigated the concordance of mitochondrial DNA heteroplasmic mutations (heteroplasmies) in different types of maternal pairs (n=6,745 pairs) of European (EA, n=4,718 pairs) and African (AA, n=2,027 pairs) Americans with whole genome sequences (WGSs). The average concordance rate of heteroplasmies was highest between mother-offspring pairs, followed by sibling-sibling pairs and more distantly related maternal pairs in both EA and AA participants. The allele fractions of concordant heteroplasmies exhibited high correlation (R2=0.8) between paired individuals. Compared to concordant heteroplasmies, discordant ones were more likely to locate in coding regions, be nonsynonymous or nonsynonymous-deleterious (p<0.001). The average number of heteroplasmies per individual (i.e. heteroplasmic burden) was at a similar level until older age (70-80 years old) and increased significantly thereafter (p<0.01). The burden of deleterious heteroplasmies (combined annotation-dependent depletion score≥15), however, was significantly correlated with advancing age (20-44, 45-64, ≥65 years, p-trend=0.01). A genome-wide association analysis of the heteroplasmic burden identified many significant (P<5e-8) common variants (minor allele frequency>0.05) at 11p11.12. Many of the top SNPs act as strong long-range cis regulators of protein tyrosine phosphatase receptor type J. This study provides further evidence that mtDNA heteroplasmies may be inherited or somatic. Somatic heteroplasmic variants increase with advancing age and are more likely to have an adverse impact on mitochondrial function. Further studies are warranted for functional characterization of the deleterious heteroplasmies occurring with advancing age and the association of the 11p11.12 region of the nuclear genome with mtDNA heteroplasmy.


2013 ◽  
Vol 94 (10) ◽  
pp. 2266-2277 ◽  
Author(s):  
Yuding Fan ◽  
Shujing Rao ◽  
Lingbing Zeng ◽  
Jie Ma ◽  
Yong Zhou ◽  
...  

A novel fish reovirus, Hubei grass carp disease reovirus (HGDRV; formerly grass carp reovirus strain 104, GCRV104), was isolated from diseased grass carp in China in 2009 and the full genome sequence was determined. This reovirus was propagated in a grass carp kidney cell line with a typical cytopathic effect. The total size of the genome was 23 706 bp with a 51 mol% G+C content, and the 11 dsRNA segments encoded 12 proteins (two proteins encoded by segment 11). A nucleotide sequence similarity search using blastn found no significant matches except for segment 2, which partially matched that of the RNA-dependent RNA polymerase (RdRp) from several viruses in the genera Aquareovirus and Orthoreovirus of the family Reoviridae. At the amino acid level, seven segments (Seg-1 to Seg-6, and Seg-8) matched with species in the genera Aquareovirus (15–46 % identities) and Orthoreovirus (12–44 % identities), while for four segments (Seg-7, Seg-9, Seg-10 and Seg-11) no similarities in these genera were found. Conserved terminal sequences, 5′-GAAUU----UCAUC-3′, were found in each HGDRV segment at the 5′ and 3′ ends, and the 5′-terminal nucleotides were different from any known species in the genus Aquareovirus. Phylogenetic analysis based on RdRp amino acid sequences from members of the family Reoviridae showed that HGDRV clustered with aquareoviruses prior to joining a branch common with orthoreoviruses. Based on these observations, we propose that HGDRV is a new species in the genus Aquareovirus that is distantly related to any known species within this genus.


2006 ◽  
Vol 87 (2) ◽  
pp. 387-394 ◽  
Author(s):  
Yang Li ◽  
Li Tan ◽  
Yanqiu Li ◽  
Wuguo Chen ◽  
Jiamin Zhang ◽  
...  

Genomic characterization of Heliothis armigera cypovirus (HaCPV) isolated from China showed that insects were co-infected with several cypoviruses (CPVs). One of the CPVs (HaCPV-5) could be separated from the others by changing the rearing conditions of the Heliothis armigera larvae. This finding was further confirmed by nucleotide sequencing analysis. Genomic sequences of segments S10–S7 from HaCPV-14, S10 and S7 from HaCPV-5, and S10 from Heliothis assulta CPV-14 were compared. Results from database searches showed that the nucleotide sequences and deduced amino acid sequences of the newly identified CPVs had high levels of identity with those of reported CPVs of the same type, but not with CPVs of different types. Putative amino acid sequences of HaCPV-5 S7 were similar to that of the protein from Rice ragged stunt virus (genus Oryzavirus, family Reoviridae), suggesting that CPVs and oryzaviruses are related more closely than other genera of the family Reoviridae. Conserved motifs were also identified at the ends of each RNA segment of the same virus type: type 14, 5′-AGAAUUU…CAGCU-3′; and type 5, 5′-AGUU…UUGC-3′. Our results are consistent with classification of CPV types based on the electrophoretic patterns of CPV double-stranded RNA.


2017 ◽  
Vol 2017 ◽  
pp. 1-14 ◽  
Author(s):  
Gary Xie ◽  
Shannon L. Johnson ◽  
Karen W. Davenport ◽  
Mathumathi Rajavel ◽  
Torsten Waldminghaus ◽  
...  

The genetic make-up of most bacteria is encoded in a single chromosome while about 10% have more than one chromosome. Among these, Vibrio cholerae, with two chromosomes, has served as a model system to study various aspects of chromosome maintenance, mainly replication, and faithful partitioning of multipartite genomes. Here, we describe the genomic characterization of strains that are an exception to the two chromosome rules: naturally occurring single-chromosome V. cholerae. Whole genome sequence analyses of NSCV1 and NSCV2 (natural single-chromosome vibrio) revealed that the Chr1 and Chr2 fusion junctions contain prophages, IS elements, and direct repeats, in addition to large-scale chromosomal rearrangements such as inversions, insertions, and long tandem repeats elsewhere in the chromosome compared to prototypical two chromosome V. cholerae genomes. Many of the known cholera virulence factors are absent. The two origins of replication and associated genes are generally intact with synonymous mutations in some genes, as are recA and mismatch repair (MMR) genes dam, mutH, and mutL; MutS function is probably impaired in NSCV2. These strains are ideal tools for studying mechanistic aspects of maintenance of chromosomes with multiple origins and other rearrangements and the biological, functional, and evolutionary significance of multipartite genome architecture in general.


2002 ◽  
Vol 76 (11) ◽  
pp. 5339-5349 ◽  
Author(s):  
Javier Martín ◽  
Philip D. Minor

ABSTRACT CHAT and Cox type 1 live-attenuated poliovirus strains were developed in the 1950s to be used as vaccines for humans. This paper describes their characterization with respect to virulence, sensitivity for growth at high temperatures, and complete nucleotide and amino acid sequences. The results are compared to those for their common parental wild virus, the Mahoney strain, and to those for two other poliovirus strains derived from Mahoney, the Sabin 1 vaccine strain and the mouse-adapted LS-a virus. Analysis of four isolates from cases of vaccine-associated paralytic poliomyelitis related to the CHAT vaccine revealed genetic and phenotypic properties of the CHAT strain following replication in the human gut. CHAT-VAPP strain 134 contained a genome highly evolved from that of CHAT (1.1% nucleotide differences), suggesting long-term circulation of a vaccine-derived strain in the human population. The molecular mechanisms of attenuation and evolution of poliovirus in humans are discussed in the context of the global polio eradication initiative.


2022 ◽  
Author(s):  
Fateh Singh ◽  
Katherukamem Rajukumar ◽  
Dhanapal Senthilkumar ◽  
Govindarajulu Venkatesh ◽  
Deepali Srivast ◽  
...  

Abstract During a surveillance study to monitor porcine epidemic diarrohoea virus and transmissible gastroenteritis virus in India, a total of 1043 swine samples including faeces (n=264) and clotted blood (n=779) were collected and tested. Five samples (four faecal and one serum) showed cytopathic effects in Vero cells. Transmission electron microscopy of infective cell supernatant revealed the presence of two types of virions. Next generation sequencing (de novo) enabled complete genome assembly of Mammalian orthorubulavirus 5 (MRuV5; 15246 bp) and all 10 gene segments of Mammalian orthoreovirus (MRV; 22219 bp and 20512 bp). Genetic analysis of the MRuV5 revealed grouping of the Indian MRuV5 with those isolated from various mammalian species in South Korea and China, sharing more than 99% nucleotide identity. Deduced amino acid sequences of the HN, NP and F genes of MRuV5 isolates showed three (92L, 111R, 447H), two (86S, 121S) and two (139T, 246T) amino acid substitutions, respectively, compared to previously reported virus strains. The Indian MRV isolates were identified as MRV type-3 based on genetic analysis of S1 gene, showing the highest nucleotide identity (97.73%) with the MRV3 strain ZJ2013 isolated from pigs in China. Deduced amino acid sequences of MRV3 S1 gene revealed amino acid residues 198-204NLAIRLP, 249I, 340D, 419E known for sialic acid binding and neurotropism. We report the co-isolation and whole-genomic characterization of MRuV5 and MRV3 recorded incidentally for the first time from domestic pigs in India. It attracts attention to perform detailed surveillance studies and continuous monitoring of evolution and spread of emerging viruses, which may have pathogenic potential in animal and human hosts.


PLoS ONE ◽  
2020 ◽  
Vol 15 (12) ◽  
pp. e0243132
Author(s):  
Haifen Li ◽  
Xuanqiang Liang ◽  
Baojin Zhou ◽  
Xiaoping Chen ◽  
Yanbin Hong ◽  
...  

In order to obtain more valuable insights into the protein dynamics and accumulation of allergens in seeds during underground development, we performed a proteomic study on developing peanut seeds at seven different stages. A total of 264 proteins with altered abundance and contained at least one unique peptide was detected by matrix-assisted laser desorption ionization time-of-flight/time-of-flight mass spectrometry (MALDI-TOF/TOF MS). All identified proteins were classified into five functional categories as level 1 and 20 secondary functional categories as level 2. Among them, 88 identified proteins (IPs) were related to carbohydrate/ amino acid/ lipid transport and metabolism, indicating that carbohydrate/amino acid/ lipid metabolism played a key role in the underground development of peanut seeds. Hierarchical cluster analysis showed that all IPs could be classified into eight cluster groups according to the abundance profiles, suggesting that the modulatory patterns of these identified proteins were complicated during seed development. The largest group contained 41 IPs, the expression of which decreased at R 2 and reached a maximum at R3 but gradually decreased from R4. A total of 14 IPs were identified as allergen-like proteins by BLAST with A genome (Arachis duranensis) or B genome (Arachis ipaensis) translated allergen sequences. Abundance profile analysis of 14 identified allergens showed that the expression of all allergen proteins was low or undetectable by 2-DE at the early stages (R1 to R4), and began to accumulate from the R5 stage and gradually increased. Network analysis showed that most of the significant proteins were involved in active metabolic pathways in early development. Real time RT-PCR analysis revealed that transcriptional regulation was approximately consistent with expression at the protein level for 8 selected identified proteins. In addition, some amino acid sequences that may be associated with new allergens were also discussed.


Author(s):  
W. W. Barker ◽  
W. E. Rigsby ◽  
V. J. Hurst ◽  
W. J. Humphreys

Experimental clay mineral-organic molecule complexes long have been known and some of them have been extensively studied by X-ray diffraction methods. The organic molecules are adsorbed onto the surfaces of the clay minerals, or intercalated between the silicate layers. Natural organo-clays also are widely recognized but generally have not been well characterized. Widely used techniques for clay mineral identification involve treatment of the sample with H2 O2 or other oxidant to destroy any associated organics. This generally simplifies and intensifies the XRD pattern of the clay residue, but helps little with the characterization of the original organoclay. Adequate techniques for the direct observation of synthetic and naturally occurring organoclays are yet to be developed.


Sign in / Sign up

Export Citation Format

Share Document