functional linear model
Recently Published Documents


TOTAL DOCUMENTS

53
(FIVE YEARS 21)

H-INDEX

10
(FIVE YEARS 2)

2021 ◽  
Author(s):  
Hongzhi Tong

Abstract To cope with the challenges of memory bottleneck and algorithmic scalability when massive data sets are involved, we propose a distributed least squares procedure in the framework of functional linear model and reproducing kernel Hilbert space. This approach divides the big data set into multiple subsets, applies regularized least squares regression on each of them, and then averages the individual outputs as a final prediction. We establish the non-asymptotic prediction error bounds for the proposed learning strategy under some regularity conditions. When the target function only has weak regularity, we also introduce some unlabelled data to construct a semi-supervised approach to enlarge the number of the partitioned subsets. Results in present paper provide a theoretical guarantee that the distributed algorithm can achieve the optimal rate of convergence while allowing the whole data set to be partitioned into a large number of subsets for parallel processing.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Weiwei Xiao ◽  
Yixuan Wang ◽  
Haiyan Liu

AbstractIn this paper, a generalized partially functional linear regression model is proposed and the asymptotic property of the proposed estimated coefficients in the model is established. Extensive simulation experiment results are consistent with the theoretical result. Finally, two application examples of the model are given. One is sleep quality study where we studied the effects of heart rate, percentage of sleep time on total sleep in bed, wake after sleep onset and number of wakening during the night on sleep quality in 22 healthy people. The other one is mortality rate where we studied the effects of air quality index, temperature, relative humidity, GDP per capita and the number of beds per thousand people on the mortality rate across 80 major cities in China.


2021 ◽  
Author(s):  
Weiwei Xiao ◽  
Yixuan Wang ◽  
Haiyan Liu

Abstract In this paper, we propose a generalized functional linear regression model with scalar and functional multiple predictors. We develop maximum likelihood estimators for the regression coefficients. For the functional predictors, we adopt the method of functional principal component analysis to reduce their dimensions. We then propose the generalized auto-covariance operator, based on which an appropriate measure quantifies the difference between the estimators and their true values is established. The asymptotic joint distribution of estimated regression functions is proved. For the scalar predictors, we establish a distance between the estimated value and the true value, and prove the asymptotic property of the estimated regression coefficients. Extensive simulation experiment results are consistent with the theoretical result. Finally, two application examples of the model are given. One is sleep quality study where we studied the effects of heart rate, percentage of sleep time on total sleep in bed, wake after sleep onset and number of wakening during the night on sleep quality in 22 healthy people. The other one is mortality rate where we studied the effects of air quality index, temperature, relative humidity , GDP per capita and the number of beds per thousand people on the mortality rate across 80 major cities in China.


Stats ◽  
2021 ◽  
Vol 4 (3) ◽  
pp. 550-576
Author(s):  
Jiayu Huang ◽  
Jie Yang ◽  
Zhangrong Gu ◽  
Wei Zhu ◽  
Song Wu

In genome-wide association studies (GWAS), efficient incorporation of linkage disequilibria (LD) among densely typed genetic variants into association analysis is a critical yet challenging problem. Functional linear models (FLM), which impose a smoothing structure on the coefficients of correlated covariates, are advantageous in genetic mapping of multiple variants with high LD. Here we propose a novel constrained generalized FLM (cGFLM) framework to perform simultaneous association tests on a block of linked SNPs with various trait types, including continuous, binary and zero-inflated count phenotypes. The new cGFLM applies a set of inequality constraints on the FLM to ensure model identifiability under different genetic codings. The method is implemented via B-splines, and an augmented Lagrangian algorithm is employed for parameter estimation. For hypotheses testing, a test statistic that accounts for the model constraints was derived, following a mixture of chi-square distributions. Simulation results show that cGFLM is effective in identifying causal loci and gene clusters compared to several competing methods based on single markers and SKAT-C. We applied the proposed method to analyze a candidate gene-based COGEND study and a large-scale GWAS data on dental caries risk.


2021 ◽  
Vol 18 (176) ◽  
Author(s):  
Jack D. Hywood ◽  
Gregory Rice ◽  
Sophie V. Pageon ◽  
Mark N. Read ◽  
Maté Biro

Swarming has been observed in various biological systems from collective animal movements to immune cells. In the cellular context, swarming is driven by the secretion of chemotactic factors. Despite the critical role of chemotactic swarming, few methods to robustly identify and quantify this phenomenon exist. Here, we present a novel method for the analysis of time series of positional data generated from realizations of agent-based processes. We convert the positional data for each individual time point to a function measuring agent aggregation around a given area of interest, hence generating a functional time series. The functional time series, and a more easily visualized swarming metric of agent aggregation derived from these functions, provide useful information regarding the evolution of the underlying process over time. We extend our method to build upon the modelling of collective motility using drift–diffusion partial differential equations (PDEs). Using a functional linear model, we are able to use the functional time series to estimate the drift and diffusivity terms associated with the underlying PDE. By producing an accurate estimate for the drift coefficient, we can infer the strength and range of attraction or repulsion exerted on agents, as in chemotaxis. Our approach relies solely on using agent positional data. The spatial distribution of diffusing chemokines is not required, nor do individual agents need to be tracked over time. We demonstrate our approach using random walk simulations of chemotaxis and experiments investigating cytotoxic T cells interacting with tumouroids.


Sensors ◽  
2020 ◽  
Vol 20 (21) ◽  
pp. 6394
Author(s):  
William F. Fadel ◽  
Jacek K. Urbanek ◽  
Nancy W. Glynn ◽  
Jaroslaw Harezlak

Various methods exist to measure physical activity. Subjective methods, such as diaries and surveys, are relatively inexpensive ways of measuring one’s physical activity; however, they are prone to measurement error and bias due to self-reporting. Wearable accelerometers offer a non-invasive and objective measure of one’s physical activity and are now widely used in observational studies. Accelerometers record high frequency data and each produce an unlabeled time series at the sub-second level. An important activity to identify from the data collected is walking, since it is often the only form of activity for certain populations. Currently, most methods use an activity summary which ignores the nuances of walking data. We propose methodology to model specific continuous responses with a functional linear model utilizing spectra obtained from the local fast Fourier transform (FFT) of walking as a predictor. Utilizing prior knowledge of the mechanics of walking, we incorporate this as additional information for the structure of our transformed walking spectra. The methods were applied to the in-the-laboratory data obtained from the Developmental Epidemiologic Cohort Study (DECOS).


Stats ◽  
2020 ◽  
Vol 3 (4) ◽  
pp. 510-525
Author(s):  
Eduardo L. Montoya

In a functional linear model (FLM) with scalar response, the parameter curve quantifies the relationship between a functional explanatory variable and a scalar response. While these models can be ill-posed, a penalized regression spline approach may be used to obtain an estimate of the parameter curve. The penalized regression spline estimate will be dependent on the value of a smoothing parameter. However, the ability to obtain a reasonable parameter curve estimate is reliant on how much information is present in the covariate functions for estimating the parameter curve. We propose to quantify the information present in the covariate functions to estimate the parameter curve. In addition, we examine the influence of this information on the stability of the parameter curve estimator and on the performance of smoothing parameter selection methods in a FLM with a scalar response.


Biostatistics ◽  
2020 ◽  
Author(s):  
Yang Li ◽  
Fan Wang ◽  
Mengyun Wu ◽  
Shuangge Ma

Summary In recent biomedical research, genome-wide association studies (GWAS) have demonstrated great success in investigating the genetic architecture of human diseases. For many complex diseases, multiple correlated traits have been collected. However, most of the existing GWAS are still limited because they analyze each trait separately without considering their correlations and suffer from a lack of sufficient information. Moreover, the high dimensionality of single nucleotide polymorphism (SNP) data still poses tremendous challenges to statistical methods, in both theoretical and practical aspects. In this article, we innovatively propose an integrative functional linear model for GWAS with multiple traits. This study is the first to approximate SNPs as functional objects in a joint model of multiple traits with penalization techniques. It effectively accommodates the high dimensionality of SNPs and correlations among multiple traits to facilitate information borrowing. Our extensive simulation studies demonstrate the satisfactory performance of the proposed method in the identification and estimation of disease-associated genetic variants, compared to four alternatives. The analysis of type 2 diabetes data leads to biologically meaningful findings with good prediction accuracy and selection stability.


Sign in / Sign up

Export Citation Format

Share Document