original sequence
Recently Published Documents


TOTAL DOCUMENTS

84
(FIVE YEARS 21)

H-INDEX

16
(FIVE YEARS 2)

Author(s):  
Mieradilijiang Maimaiti ◽  
Yang Liu ◽  
Huanbo Luan ◽  
Zegao Pan ◽  
Maosong Sun

Data augmentation is an approach for several text generation tasks. Generally, in the machine translation paradigm, mainly in low-resource language scenarios, many data augmentation methods have been proposed. The most used approaches for generating pseudo data mainly lay in word omission, random sampling, or replacing some words in the text. However, previous methods barely guarantee the quality of augmented data. In this work, we try to build the data by using paraphrase embedding and POS-Tagging. Namely, we generate the fake monolingual corpus by replacing the main four POS-Tagging labels, such as noun, adjective, adverb, and verb, based on both the paraphrase table and their similarity. We select the bigger corpus size of the paraphrase table with word level and obtain the word embedding of each word in the table, then calculate the cosine similarity between these words and tagged words in the original sequence. In addition, we exploit the ranking algorithm to choose highly similar words to reduce semantic errors and leverage the POS-Tagging replacement to mitigate syntactic error to some extent. Experimental results show that our augmentation method consistently outperforms all previous SOTA methods on the low-resource language pairs in seven language pairs from four corpora by 1.16 to 2.39 BLEU points.


2021 ◽  
Author(s):  
Yu M. Kulikov ◽  
E. E. Son

Abstract This paper considers the canonical problem of a thin shear layer evolution at Reynolds number Re = 400000 using the novel Compact Accurately Boundary Adjusting high-Resolution Technique (CABARET). The study is focused on the effect of the specific mesh refinement in the high shear rate areas on the flow properties under the influence of the developing instability. The original sequence of computational meshes (256^2, 512^2, 1024^2, 2048^2 cells) is modified using an iterative refinement algorithm based on the hyperbolic tangent. The properties of the solutions obtained are discussed in terms of the initial momentum thickness and the initial vorticity thickness, viscous and dilatational dissipation rates and also integral enstrophy. The growth rate for the most unstable mode depending on the mesh resolution is considered. In conclusion the accuracy of calculated mesh functions is estimated via L1, L2, L∞ norms.


Author(s):  
Randolph T. Williams

Abstract A Poisson process giving rise to earthquakes that occur randomly in time has become a de facto null hypothesis when assessing the periodicity of large (Mw>∼6–7) earthquakes in the paleoseismic record. This implies an exponential distribution of inter-event times (IETs) and therefore an abundance of IETs that are very short relative to the mean value. As such, the Poisson model posits that large ruptures occurring in rapid succession should be relatively common. Below some threshold IET defined by site specific conditions, however, these short IET earthquakes are unlikely to be recorded as distinct events in the paleoseismic record. This article presents the results of simple Monte Carlo simulations that quantify the potential effects of truncation of short IETs on the apparent periodicity of large earthquakes generated by a Poisson process. Results indicate that this truncation effect results in chronologies that appear systematically more periodic than the original sequence of events. The magnitude of this discrepancy depends primarily on the ratio of the minimum preserved and mean IETs in addition to the number of events in the chronology. As such, previous statistical analyses that have assessed for periodicity in the paleoseismic record of large earthquakes likely incorporated a bias in favor of apparent periodicity by employing Poisson behavior as a null hypothesis. This bias can be corrected if the minimum preserved IET can be determined or reasonably assumed for a particular paleoseismic record or site.


2021 ◽  
Vol 59 (5) ◽  
pp. 578-583
Author(s):  
K. E. Glemba ◽  
I. A. Guseva ◽  
A. E. Karateev ◽  
M. A. Makarov ◽  
E. Yu. Samarkina ◽  
...  

Postoperative pain (POP) is a serious complication that reduces the result of total knee (TKA) or hip arthroplasty (THA) in patients with osteoarthritis (OA). The search for predictors of postoperative pain is an actual problem.The aim of the study – to assessing relationship the polymorphisms of the KCNS1, COMT and OPRM1 genes and the development of POP in OA patients who underwent TKA or THA.Material and methods. The study group consisted of 95 patients with OA knee or hip (64.6% of women, 65.4±9.0 years) who underwent TKA (47.8%) or THA (52.2%). The presence of POP was determined when pain in the area of surgical intervention ≥40 mm (100 mm visual analog scale, VAS) persisted or appeared 3 and 6 months after surgery. All patients underwent genotyping of polymorphisms of the genes KCNS1 (rs734784), COMT (rs6269, rs4633) and OPRM1 (rs1799971) by polymerase chain reaction in real time using original sequence-specific primers and samples labeled with various fluorescent labels. Registration and interpretation of the obtained results were carried out on the DT-96 amplifier (DNA-Technology LLC, Russia).Results. POP was observed in 32.6% of patients who underwent TKA or THA. The frequency of POP after TKA and THA was 30.2% and 34.0% (p=0.882). Statistical analysis revealed no differences in the frequencies of the genotypes of the studied genes (p>0,05). The presence of a homozygous genotype of the GG polymorphism of the KCNS1 gene (rs734784) was associated with the presence of POP in accordance with the recessive genetic model (GG vs AA+AG; odds ratio (OR) – 3.96 [95% confidence interval (CI): 1.51; 10.37]; p=0.005). The presence of the mutant allele T (TT+CT) in the genotype of the COMT polymorphism (rs4633) reduced the risk of POP compared to the carrier of the CC genotype (OR=0.32 [95% CI: 0.12; 0.83]; p=0.02) in accordance with the dominant genetic model. There was no significant correlation between the development of POP and the carrier of different genotypes and alleles of the COMT (rs6269) and OPRM1 (rs1799971) genes.Conclusions. There is a statistically significant association the polymorphism of the KCNS1 (rs734784) and COMT (rs4633) genes and the development of chronic POP in patients who underwent TKA or THA. Further studies of the genetic predisposition to POP are required on more clinical material.


2021 ◽  
Vol 27 (10) ◽  
pp. 531-541
Author(s):  
G. N. Zhukova ◽  
◽  
M. V. Ulyanov ◽  
◽  

The problem of constructing a periodic sequence consisting of at least eight periods is considered, based on a given sequence obtained from an unknown periodic sequence, also containing at least eight periods, by introducing noise of deletion, replacement, and insertion of symbols. To construct a periodic sequence that approximates a given one, distorted by noise, it is first required to estimate the length of the repeating fragment (period). Further, the distorted original sequence is divided into successive sections of equal length; the length takes on integer values from 80 to 120 % of the period estimate. Each obtained section is compared with each of the remaining sections, a section is selected to build a periodic sequence that has the minimum edit distance (Levenshtein distance) to any of the remaining sections, minimization is carried out over all sections of a fixed length, and then along all lengths from 80 to 120 % of period estimates. For correct comparison of fragments of different lengths, we consider the ration between the edit distance and the length of the fragment. The length of a fragment that minimizes the ratio of the edit distance to another fragment of the same length to the fragment length is considered the period of the approximating periodic sequence, and the fragment itself, repeating the required number of times, forms an approximating sequence. The constructed sequence may contain an incomplete repeating fragment at the end. The quality of the approximation is estimated by the ratio of the edit distance from the original distorted sequence to the constructed periodic sequence of the same length and this length.


2021 ◽  
Author(s):  
F Javier Ibarrondo ◽  
Christian Hofmann ◽  
Ayub Ali ◽  
Paul Ayoub ◽  
Donald B Kohn ◽  
...  

SARS-CoV-2 continues to evolve in humans. Spike protein mutations increase transmission and potentially evade antibodies raised against the original sequence used in current vaccines. Our evaluation of serum neutralizing activity in both persons soon after SARS-CoV-2 infection (in April 2020 or earlier) or vaccination without prior infection confirmed that common spike mutations can reduce antibody antiviral activity. However, when the persons with prior infection were subsequently vaccinated, their antibodies attained an apparent biologic ceiling of neutralizing potency against all tested variants, equivalent to the original spike sequence. These findings indicate that additional antigenic exposure further improves antibody efficacy against variants.


2021 ◽  
Author(s):  
Kristoffer Sahlin

Short-read genome alignment is a fundamental computational step used in many bioinformatic analyses. It is therefore desirable to align such data as fast as possible. Most alignment algorithms consider a seed-and-extend approach. Several popular programs perform the seeding step based on the Burrows-Wheeler Transform with a low memory footprint, but they are relatively slow compared to more recent approaches that use a minimizer-based seeding-and-chaining strategy. Recently, syncmers and strobemers were proposed for sequence comparison. Both protocols were designed for improved conservation of matches between sequences under mutations. Syncmers is a thinning protocol proposed as an alternative to minimizers, while strobemers is a linking protocol for gapped sequences and was proposed as an alternative to k-mers. The main contribution in this work is a new seeding approach that combines syncmers and strobemers. We use a strobemer protocol (randstrobes) to link together syncmers (i.e., in syncmer-space) instead of over the original sequence. Our protocol allows us to create longer seeds while preserving mapping accuracy. A longer seed length reduces the number of candidate regions which allows faster mapping and alignment. We also contribute the insight that speed-wise, this protocol is particularly effective when syncmers are canonical. Canonical syncmers can be created for specific parameter combinations and reduce the computational burden of computing the non-canonical randstrobes in reverse complement. We implement our idea in a proof-of-concept short-read aligner strobealign that aligns short reads 3-4x faster than minimap2 and 15-23x faster than BWA and Bowtie2. Many implementation versions of, e.g., BWA, achieve high speed on specific hardware. Our contribution is algorithmic and requires no hardware architecture or system-specific instructions. Strobealign is available at https://github.com/ksahlin/StrobeAlign.


npj Vaccines ◽  
2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Jing Zou ◽  
Xuping Xie ◽  
Camila R. Fontes-Garfias ◽  
Kena A. Swanson ◽  
Isis Kanevsky ◽  
...  

AbstractInitial COVID-19 vaccine candidates were based on the original sequence of SARS-CoV-2. However, the virus has since accumulated mutations, among which the spike D614G is dominant in circulating virus, raising questions about potential virus escape from vaccine-elicited immunity. Here, we report that the D614G mutation modestly reduced (1.7–2.4-fold) SARS-CoV-2 neutralization by BNT162b2 vaccine-elicited mouse, rhesus, and human sera, concurring with the 95% vaccine efficacy observed in clinical trial.


2021 ◽  
Vol 292 ◽  
pp. 02062
Author(s):  
He Peng-xiang ◽  
Sun Sheng-xiang

Considering that some emerging industries have not developed for a long time, the amount of data available for forecasting future economic problems is relatively limited, complex and changeable. Based on the principle of combinatorial prediction, a combinatorial buffer operator based on different order buffer operators is proposed, and the correlation area between the generated sequence and the original sequence after the buffer operator is used as weighting criterion. The grey GM (1,1) prediction model based on the combined buffer operator was established, which effectively overcame the influence of abnormal data and restored the change rule of data series. The average prediction error of the data in literature [7] by using the combined buffer operator established in this paper was 0.98%. Compared with 6.89%, 11i.59% and 1.30% of the original method, the predction accuracy is significantly improved.


Entiere identities between Invertebrate Ophiocomina nigra IGKappa gene and Human IGK gene are confirmed, in the present work, at the level of immunoglobulin domains (constant and variable). The transcriptome of the Ophuirid : Ophiocomina nigra IGKappa gene was discovered recently(1).Since it was synthesized de novo and cloned in a pUC-GW-Kan plasmid (2) which was a gift of Bo Huang laboratories. The original sequence of the IGKappa gene, after cloning, was the following in 5’-3’ :


Sign in / Sign up

Export Citation Format

Share Document