High Quality
Recently Published Documents


(FIVE YEARS 23898)



2022 ◽  
Vol 31 (1) ◽  
pp. 1-46
Chao Liu ◽  
Cuiyun Gao ◽  
Xin Xia ◽  
David Lo ◽  
John Grundy ◽  

Context: Deep learning (DL) techniques have gained significant popularity among software engineering (SE) researchers in recent years. This is because they can often solve many SE challenges without enormous manual feature engineering effort and complex domain knowledge. Objective: Although many DL studies have reported substantial advantages over other state-of-the-art models on effectiveness, they often ignore two factors: (1) reproducibility —whether the reported experimental results can be obtained by other researchers using authors’ artifacts (i.e., source code and datasets) with the same experimental setup; and (2) replicability —whether the reported experimental result can be obtained by other researchers using their re-implemented artifacts with a different experimental setup. We observed that DL studies commonly overlook these two factors and declare them as minor threats or leave them for future work. This is mainly due to high model complexity with many manually set parameters and the time-consuming optimization process, unlike classical supervised machine learning (ML) methods (e.g., random forest). This study aims to investigate the urgency and importance of reproducibility and replicability for DL studies on SE tasks. Method: In this study, we conducted a literature review on 147 DL studies recently published in 20 SE venues and 20 AI (Artificial Intelligence) venues to investigate these issues. We also re-ran four representative DL models in SE to investigate important factors that may strongly affect the reproducibility and replicability of a study. Results: Our statistics show the urgency of investigating these two factors in SE, where only 10.2% of the studies investigate any research question to show that their models can address at least one issue of replicability and/or reproducibility. More than 62.6% of the studies do not even share high-quality source code or complete data to support the reproducibility of their complex models. Meanwhile, our experimental results show the importance of reproducibility and replicability, where the reported performance of a DL model could not be reproduced for an unstable optimization process. Replicability could be substantially compromised if the model training is not convergent, or if performance is sensitive to the size of vocabulary and testing data. Conclusion: It is urgent for the SE community to provide a long-lasting link to a high-quality reproduction package, enhance DL-based solution stability and convergence, and avoid performance sensitivity on different sampled data.

2021 ◽  
Lasata Shrestha ◽  
Michelle J. Lin ◽  
Hong Xie ◽  
Margaret G. Mills ◽  
Shah A.M. Bakhash ◽  

Amplicon-based sequencing methods have been central in characterizing the diversity, transmission and evolution of SARS-CoV-2, but need to be rigorously assessed for clinical utility. Here, we validated the Swift Biosciences SARS-CoV-2 Swift Normalase Amplicon Panels using remnant clinical specimens. High quality genomes meeting our established library and sequence quality criteria were recovered from positive specimens with a 95% limit of detection of 40.08 SARS-CoV-2 copies/PCR reaction. Breadth of genome recovery was evaluated across a range of Ct values (11.3 - 36.7, median 21.6). Out of 428 positive samples, 406 (94.9%) generated genomes with < 10% Ns, with a mean genome coverage of 13,545X/SD 8,382X. No genomes were recovered from PCR-negative specimens (n = 30), or from specimens positive for non-SARS-CoV-2 respiratory viruses (n = 20). Compared to whole-genome shotgun metagenomic sequencing (n = 14) or Sanger sequencing for the spike gene (n = 11), pairwise identity between consensus sequences was 100% in all cases, with highly concordant allele frequencies (R2 = 0.99) between Swift and shotgun libraries. When samples from different clades were mixed at varying ratios, expected variants were detected even in 1:99 mixtures. When deployed as a clinical test, 268 tests were performed in the first 23 weeks with a median turnaround time of 11 days, ordered primarily for outbreak investigations and infection control.

BMC Medicine ◽  
2021 ◽  
Vol 19 (1) ◽  
Nina Wilson ◽  
Katie Biggs ◽  
Sarah Bowden ◽  
Julia Brown ◽  
Munyaradzi Dimairo ◽  

Abstract Background Adaptive designs offer great promise in improving the efficiency and patient-benefit of clinical trials. An important barrier to further increased use is a lack of understanding about which additional resources are required to conduct a high-quality adaptive clinical trial, compared to a traditional fixed design. The Costing Adaptive Trials (CAT) project investigated which additional resources may be required to support adaptive trials. Methods We conducted a mock costing exercise amongst seven Clinical Trials Units (CTUs) in the UK. Five scenarios were developed, derived from funded clinical trials, where a non-adaptive version and an adaptive version were described. Each scenario represented a different type of adaptive design. CTU staff were asked to provide the costs and staff time they estimated would be needed to support the trial, categorised into specified areas (e.g. statistics, data management, trial management). This was calculated separately for the non-adaptive and adaptive version of the trial, allowing paired comparisons. Interviews with 10 CTU staff who had completed the costing exercise were conducted by qualitative researchers to explore reasons for similarities and differences. Results Estimated resources associated with conducting an adaptive trial were always (moderately) higher than for the non-adaptive equivalent. The median increase was between 2 and 4% for all scenarios, except for sample size re-estimation which was 26.5% (as the adaptive design could lead to a lengthened study period). The highest increase was for statistical staff, with lower increases for data management and trial management staff. The percentage increase in resources varied across different CTUs. The interviews identified possible explanations for differences, including (1) experience in adaptive trials, (2) the complexity of the non-adaptive and adaptive design, and (3) the extent of non-trial specific core infrastructure funding the CTU had. Conclusions This work sheds light on additional resources required to adequately support a high-quality adaptive trial. The percentage increase in costs for supporting an adaptive trial was generally modest and should not be a barrier to adaptive designs being cost-effective to use in practice. Informed by the results of this research, guidance for investigators and funders will be developed on appropriately resourcing adaptive trials.

2021 ◽  
Vol 12 ◽  
Xiujia Yang ◽  
Yan Zhu ◽  
Sen Chen ◽  
Huikun Zeng ◽  
Junjie Guan ◽  

Detailed knowledge of the diverse immunoglobulin germline genes is critical for the study of humoral immunity. Hundreds of alleles have been discovered by analyzing antibody repertoire sequencing (Rep-seq or Ig-seq) data via multiple novel allele detection tools (NADTs). However, the performance of these NADTs through antibody sequences with intrinsic somatic hypermutations (SHMs) is unclear. Here, we developed a tool to simulate repertoires by integrating the full spectrum features of an antibody repertoire such as germline gene usage, junctional modification, position-specific SHM and clonal expansion based on 2152 high-quality datasets. We then systematically evaluated these NADTs using both simulated and genuine Ig-seq datasets. Finally, we applied these NADTs to 687 Ig-seq datasets and identified 43 novel allele candidates (NACs) using defined criteria. Twenty-five alleles were validated through findings of other sources. In addition to the NACs detected, our simulation tool, the results of our comparison, and the streamline of this process may benefit further humoral immunity studies via Ig-seq.

2021 ◽  
Alexander Keller ◽  
Irenäus Wlokas ◽  
Maximilian Kohns ◽  
Hans Hasse

The simulation of spray flame processes for the production of high-quality nanoparticles relies on thermophysical properties of the precursor solutions, for which literature data are scarce. Here, we report experimental thermophysical data of solutions of iron(III) nitrate nonahydrate (INN) in (1-propanol + water) mixed solvents. The specific density, viscosity, thermal conductivity, and isobaric heat capacity of the solutions were measured at 101.3 kPa between 288.15 and 333.15 K, solvent compositions ranging from 0.73 mol mol–1 1-propanol to pure water, and INN molalities up to 1.3 mol kg–1. Empirical correlations of the experimental data are provided.

2021 ◽  
Vol 8 ◽  
Xiao Fu ◽  
Xiaojie Liu ◽  
Jing Li ◽  
Meng Zhang ◽  
Jingjing Jiang ◽  

Objective: The objective of this study was to provide a descriptive analysis of the clinical outcomes achieved in oocyte vitrification in cases where sperm was unavailable on oocyte retrieval day, and to identify predictors of oocyte survival.Methods: This retrospective cohort study used data from a university-affiliated reproductive medical center. There were 321 cycles in which some of, or all oocytes were vitrified owing to the unavailability of sperm between March 2009 and October 2017. A descriptive analysis of the clinical outcomes including both fresh embryo transfers and cryopreserved embryo transfers was provided. The ability of an individual parameter to forecast oocyte survival per thawing cycle was assessed by binary logistic regression analysis. The cumulative probability of live birth (CPLB) was estimated by using the Kaplan-Meier method according to the total number of oocytes thawed in consecutive procedures.Results: The average survival rate was 83.13%. High-quality embryo rate and blastocyst rate decreased significantly decreased significantly in vitrification oocyte group compared to fresh control oocytes. The comparison of sibling oocytes in part-oocyte-vitrified cycles shows fewer high-quality embryos developed in the vitrified group. The live birth rate per warmed-oocyte was 4.3%. Reasons for lack of sperm availability on oocyte retrieval day and serum cholesterol levels were found to be associated with oocyte survival rate in the present study. Kaplan-Meier analysis showed no significant difference in CPLB between patients ≤35 vs. &gt;35 years.Conclusions: Oocyte vitrification is an indispensable and effective alternative when sperm are not available on oocyte retrieval day. The present study provided evidence that oocytes from infertile couples were more likely to suffer oocyte/embryo vitrification injury. Clinicians need to take this into account when advising patients in similar situations. Further studies will be necessary to clarify the correlation between serum metabolism parameters and human oocyte survival after vitrification.

2021 ◽  
Vol 11 (1) ◽  
Barbara Iadarola ◽  
Denise Lavezzari ◽  
Alessandra Modi ◽  
Chiara Degli Esposti ◽  
Cristina Beltrami ◽  

AbstractMummified remains of relevant historical figures are nowadays an important source of information to retrace data concerning their private life and health, especially when historical archives are not available. Next-generation-sequencing was proved to be a valuable tool to unravel the characteristics of these individuals through their genetic heritage. Using the strictest criteria currently available for the validation of ancient DNA sequences, whole-genome and whole-exome sequencing were generated from the mummy remains of an Italian nobleman died almost 700 years ago, Cangrande della Scala. While its genome sequencing could not yield sufficient coverage for in depth investigation, exome sequencing could overcome the limitations of this approach to achieve significantly high coverage on coding regions, thus allowing to perform the first extensive exome analysis of a mummy genome. Similar to a standard “clinical exome analysis” conducted on modern DNA, an in-depth variant annotation, high-quality filtering and interpretation was performed, leading to the identification of a genotype associated with late-onset Pompe disease (glycogen storage disease type II). This genetic diagnosis was concordant with the limited clinical history available for Cangrande della Scala, who likely represents the earliest known case of this autosomal recessive metabolic disorder.

2021 ◽  
Vol 2021 ◽  
pp. 1-14
Tongxin Wei ◽  
Qingbao Li ◽  
Zhifeng Chen ◽  
Jinjin Liu

Recent works based on deep learning and facial priors have performed well in superresolving severely degraded facial images. However, due to the limitation of illumination, pixels of the monitoring probe itself, focusing area, and human motion, the face image is usually blurred or even deformed. To address this problem, we properly propose Face Restoration Generative Adversarial Networks to improve the resolution and restore the details of the blurred face. They include the Head Pose Estimation Network, Postural Transformer Network, and Face Generative Adversarial Networks. In this paper, we employ the following: (i) Swish-B activation function that is used in Face Generative Adversarial Networks to accelerate the convergence speed of the cross-entropy cost function, (ii) a special prejudgment monitor that is added to improve the accuracy of the discriminant, and (iii) the modified Postural Transformer Network that is used with 3D face reconstruction network to correct faces at different expression pose angles. Our method improves the resolution of face image and performs well in image restoration. We demonstrate how our method can produce high-quality faces, and it is superior to the most advanced methods on the reconstruction task of blind faces for in-the-wild images; especially, our 8 × SR SSIM and PSNR are, respectively, 0.078 and 1.16 higher than FSRNet in AFLW.

2021 ◽  
Tiago Graf ◽  
Gonzalo Bello ◽  
Felipe Gomes Naveca ◽  
Marcelo Gomes ◽  
Vanessa Leiko Oikawa Cardoso ◽  

The COVID-19 epidemic in Brazil experienced two major country-wide lineage replacements, the first driven by the lineage P.2, formerly classified as variant of interest (VOI) Zeta in late 2020 and the second by the variant of concern (VOC) Gamma in early 2021. To better understand how these SARS-CoV-2 lineage turnovers occurred in Brazil, we analyzed 11,724 high-quality SARS-CoV-2 whole genomes of samples collected in different country regions between September 2020 and April 2021. Our findings indicate that the spatial dispersion of both variants in Brazil was driven by short and long-distance viral transmission. The lineage P.2 harboring Spike mutation E484K probably emerged around late July 2020 in the Rio de Janeiro (RJ) state, which contributed with most (~50%) inter-state viral disseminations, and only became locally established in most Brazilian states by October 2020. The VOC Gamma probably arose in November 2020 in the Amazonas (AM) state, which was responsible for 60-70% of the inter-state viral dissemination, and the earliest timing of community transmission of this VOC in many Brazilian states was already traced to December 2020. We estimate that variant Gamma was 1.56-3.06 more transmissible than variant P.2 co-circulating in RJ and that the median effective reproductive number (Re) of Gamma in RJ and SP states (Re = 1.59-1.91) was lower than in AM (Re = 3.55). In summary, although the epicenter of the lineage P.2 dissemination in Brazil was the heavily interconnected Southeastern region, it displayed a slower rate of spatial spread than the VOC Gamma originated in the more isolated Northern Brazilian region. Our findings also support that the VOC Gamma was more transmissible than lineage P.2, although the viral Re of the VOC varied according to the geographic context.

Sign in / Sign up

Export Citation Format

Share Document