Fast Numerical Optimization for Genome Sequencing Data in Population Biobanks

AbstractWe develop two efficient solvers for optimization problems arising from large-scale regularized regressions on millions of genetic variants sequenced from hundreds of thousands of individuals. These genetic variants are encoded by the values in the set {0, 1, 2, NA}. We take advantage of this fact and use two bits to represent each entry in a genetic matrix, which reduces memory requirement by a factor of 32 compared to a double precision floating point representation. Using this representation, we implemented an iteratively reweighted least square algorithm to solve Lasso regressions on genetic matrices, which we name snpnet-2.0. When the dataset contains many rare variants, the predictors can be encoded in a sparse matrix. We utilize the sparsity in the predictor matrix to further reduce memory requirement and computational speed. Our sparse genetic matrix implementation uses both the compact 2-bit representation and a simplified version of compressed sparse block format so that matrix-vector multiplications can be effectively parallelized on multiple CPU cores. To demonstrate the effectiveness of this representation, we implement an accelerated proximal gradient method to solve group Lasso on these sparse genetic matrices. This solver is named sparse-snpnet, and will also be included as part of snpnet R package. Our implementation is able to solve group Lasso problems on sparse genetic matrices with more than 1, 000, 000 columns and almost 100, 000 rows within 10 minutes and using less than 32GB of memory.

Download Full-text

Variant calling and quality control of large-scale human genome sequencing data

Emerging Topics in Life Sciences ◽

10.1042/etls20190007 ◽

2019 ◽

Vol 3 (4) ◽

pp. 399-409 ◽

Cited By ~ 1

Author(s):

Brandon Jew ◽

Jae Hoon Sul

Keyword(s):

Quality Control ◽

Genome Sequencing ◽

Genetic Variants ◽

Large Scale ◽

Variant Calling ◽

Sequencing Data ◽

Computational Approaches ◽

Sequencing Errors ◽

Human Genome Sequencing ◽

Number Of Individuals

Abstract Next-generation sequencing has allowed genetic studies to collect genome sequencing data from a large number of individuals. However, raw sequencing data are not usually interpretable due to fragmentation of the genome and technical biases; therefore, analysis of these data requires many computational approaches. First, for each sequenced individual, sequencing data are aligned and further processed to account for technical biases. Then, variant calling is performed to obtain information on the positions of genetic variants and their corresponding genotypes. Quality control (QC) is applied to identify individuals and genetic variants with sequencing errors. These procedures are necessary to generate accurate variant calls from sequencing data, and many computational approaches have been developed for these tasks. This review will focus on current widely used approaches for variant calling and QC.

Download Full-text

Fast sparse image reconstruction method in through-the-wall radars using limited memory Broyden–Fletcher–Goldfarb–Shanno algorithm

International Journal of Microwave and Wireless Technologies ◽

10.1017/s1759078721000866 ◽

2021 ◽

pp. 1-11

Author(s):

Candida Mwisomba ◽

Abdi T. Abdalla ◽

Idrissa Amour ◽

Florian Mkemwa ◽

Baraka Maiseli

Keyword(s):

Image Reconstruction ◽

Large Scale ◽

Optimization Problems ◽

Classical Method ◽

Computational Cost ◽

Least Square ◽

Computational Time ◽

Reconstruction Method ◽

Nonlinear Methods ◽

Limited Memory

Abstract Compressed sensing allows recovery of image signals using a portion of data – a technique that has drastically revolutionized the field of through-the-wall radar imaging (TWRI). This technique can be accomplished through nonlinear methods, including convex programming and greedy iterative algorithms. However, such (nonlinear) methods increase the computational cost at the sensing and reconstruction stages, thus limiting the application of TWRI in delicate practical tasks (e.g. military operations and rescue missions) that demand fast response times. Motivated by this limitation, the current work introduces the use of a numerical optimization algorithm, called Limited Memory Broyden–Fletcher–Goldfarb–Shanno (LBFGS), to the TWRI framework to lower image reconstruction time. LBFGS, a well-known Quasi-Newton algorithm, has traditionally been applied to solve large scale optimization problems. Despite its potential applications, this algorithm has not been extensively applied in TWRI. Therefore, guided by LBFGS and using the Euclidean norm, we employed the regularized least square method to solve the cost function of the TWRI problem. Simulation results show that our method reduces the computational time by 87% relative to the classical method, even under situations of increased number of targets or large data volume. Moreover, the results show that the proposed method remains robust when applied to noisy environment.

Download Full-text

Trajectory Optimization Using Analytical Target Cascading

Journal of Mechanical Design ◽

10.1115/1.4037714 ◽

2017 ◽

Vol 139 (12) ◽

Author(s):

Xiang Li ◽

Xiaonpeng Wang ◽

Houjun Zhang ◽

Yuheng Guo

Keyword(s):

Trajectory Optimization ◽

Large Scale ◽

Optimization Problems ◽

Sparse Matrix ◽

Functional Dependence ◽

Analytical Target Cascading ◽

Scale Problem ◽

Direct Collocation ◽

Target Cascading ◽

Direct Collocation Method

In the previous reports, analytical target cascading (ATC) is generally applied to product optimization. In this paper, the application area of ATC is expanded to trajectory optimization. Direct collocation method is utilized to convert a trajectory optimization into a nonlinear programing (NLP) problem. The converted NLP is a large-scale problem with sparse matrix of functional dependence table (FDT) suitable for the application of ATC. Three numerical case studies are provided to show the effects of ATC in solving trajectory optimization problems.

Download Full-text

New genetic variants associated with major adverse cardiovascular events in patients with acute coronary syndromes and treated with clopidogrel and aspirin

The Pharmacogenomics Journal ◽

10.1038/s41397-021-00245-5 ◽

2021 ◽

Author(s):

Xiaomin Liu ◽

Hanshi Xu ◽

Huaiqian Xu ◽

Qingshan Geng ◽

Wai-Ho Mak ◽

...

Keyword(s):

Genetic Variants ◽

Acute Coronary Syndromes ◽

Cardiovascular Events ◽

Large Scale ◽

Predictive Performance ◽

Coronary Intervention ◽

Major Adverse Cardiovascular Events ◽

Sequencing Data ◽

Coronary Syndromes ◽

High Depth

AbstractAlthough a few studies have reported the effects of several polymorphisms on major adverse cardiovascular events (MACE) in patients with acute coronary syndromes (ACS) and those undergoing percutaneous coronary intervention (PCI), these genotypes account for only a small fraction of the variation and evidence is insufficient. This study aims to identify new genetic variants associated with MACE end point during the 18-month follow-up period by a two-stage large-scale sequencing data, including high-depth whole exome sequencing of 168 patients in the discovery cohort and high-depth targeted sequencing of 1793 patients in the replication cohort. We discovered eight new genotypes and their genes associated with MACE in patients with ACS, including MYOM2 (rs17064642), WDR24 (rs11640115), NECAB1 (rs74569896), EFR3A (rs4736529), AGAP3 (rs75750968), ZDHHC3 (rs3749187), ECHS1 (rs140410716), and KRTAP10-4 (rs201441480). Notably, the expressions of MYOM2 and ECHS1 are downregulated in both animal models and patients with phenotypes related to MACE. Importantly, we developed the first superior classifier for predicting 18-month MACE and achieved high predictive performance (AUC ranged between 0.92 and 0.94 for three machine-learning methods). Our findings shed light on the pathogenesis of cardiovascular outcomes and may help the clinician to make a decision on the therapeutic intervention for ACS patients.

Download Full-text

New genetic variants associated with major adverse cardiovascular events in patients with acute coronary syndromes and treated with clopidogrel and aspirin

10.1101/411165 ◽

2018 ◽

Cited By ~ 1

Author(s):

Xiaomin Liu ◽

Hanshi Xu ◽

Huaiqian Xu ◽

Qingshan Geng ◽

Wai-Ho Mak ◽

...

Keyword(s):

Genetic Variants ◽

Acute Coronary Syndromes ◽

Cardiovascular Events ◽

Large Scale ◽

Predictive Accuracy ◽

Major Adverse Cardiovascular Events ◽

Sequencing Data ◽

Clinical Trial Registry ◽

Coronary Syndromes ◽

High Depth

AbstractImportanceAlthough a few studies have reported the effects of several polymorphisms on major adverse cardiovascular events (MACE) in patients with acute coronary syndromes (ACS) and those undergoing percutaneous coronary intervention (PCI), these genotypes account for only a small fraction of the variation and evidence is insufficient. This study aims to identify new genetic variants associated with MACE by large-scale sequencing data.ObjectiveTo identify the genetic variants that caused MACE.DesignAll patients in this study were allocated to dual antiplatelet therapy for up to 12 months and have the follow-up duration of 18 months.SettingA two-stage association study was performed.ParticipantsWe evaluated the associations of genetic variants and MACE in 1961 patients with ACS undergoing PCI (2009-2012), including high-depth whole exome sequencing of 168 patients in the discovery cohort and high-depth targeted sequencing of 1793 patients in the replication cohort.Main Outcomes and MeasureThe primary clinical efficacy endpoint was the major adverse cardiovascular events (MACE) composite endpoint, including cardiovascular death, myocardial infarction (MI), stroke (CT or MR scan confirmed) and repeated revascularization (RR).ResultsWe discovered and confirmed six new genotypes associated with MACE in patients with ACS. Of which, rs17064642 at MYOM2 increased the risk of MACE (hazard ratio [HR] 2.76; P = 2.95 × 10-9) and reached genome-wide significance. The other five suggestive variants were KRTAP10-4 (rs201441480), WDR24 (rs11640115), ECHS1 (rs140410716), AGAP3 (rs75750968) and NECAB1 (rs74569896). Notably, the expressions of MYOM2 and ECHS1 are down-regulated in both animal models and patients with phenotypes related to MACE. Importantly, we developed the first superior classifier for predicting MACE and achieved high predictive accuracy (0.809).Conclusions and RelevanceWe identified six new genotypes associated with MACE and developed a superior classifier for predicting MACE. Our findings shed light on the pathogenesis of cardiovascular outcomes and may help clinician to make decision on the therapeutic intervention for ACS patients.Trial RegistrationThis study has been registered in the Chinese Clinical Trial Registry (http://www.chictr.org.cn, Registration number: ChiCTR-OCH-11001198).

Download Full-text

Large Scale Plant Optimization Problems

10.23919/acc.1988.4790048 ◽

1988 ◽

Cited By ~ 1

Author(s):

Paul Cronin ◽

Harry Woerde ◽

Rob Vasbinder

Keyword(s):

Large Scale ◽

Optimization Problems ◽

Plant Optimization

Download Full-text

Plasmids or no plasmids? A comparison between the agilent TapeStation and whole-genome sequencing data in a large-scale bacterial sequencing project

10.26226/morressier.56d5ba27d462b80296c95fe7 ◽

2016 ◽

Author(s):

Sarah Alexander

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Large Scale ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Sequencing Project

Download Full-text

Double Precision Is Not Needed for Many-Body Calculations: New Conventional Wisdom

10.26434/chemrxiv.6104804.v1 ◽

2018 ◽

Author(s):

Pavel Pokhilko ◽

Evgeny Epifanovsky ◽

Anna I. Krylov

Keyword(s):

Large Scale ◽

Computation Time ◽

Coupled Cluster ◽

Double Precision ◽

Many Body ◽

Single Precision ◽

Parallel Performance ◽

Point Representation ◽

Electron Repulsion Integrals ◽

Cluster Methods

Using single precision floating point representation reduces the size of data and computation time by a factor of two relative to double precision conventionally used in electronic structure programs. For large-scale calculations, such as those encountered in many-body theories, reduced memory footprint alleviates memory and input/output bottlenecks. Reduced size of data can lead to additional gains due to improved parallel performance on CPUs and various accelerators. However, using single precision can potentially reduce the accuracy of computed observables. Here we report an implementation of coupled-cluster and equation-of-motion coupled-cluster methods with single and double excitations in single precision. We consider both standard implementation and one using Cholesky decomposition or resolution-of-the-identity of electron-repulsion integrals. Numerical tests illustrate that when single precision is used in correlated calculations, the loss of accuracy is insignificant and pure single-precision implementation can be used for computing energies, analytic gradients, excited states, and molecular properties. In addition to pure single-precision calculations, our implementation allows one to follow a single-precision calculation by clean-up iterations, fully recovering double-precision results while retaining significant savings.

Download Full-text