scholarly journals Outlier detection for questionnaire data in biobanks

2019 ◽  
Vol 48 (4) ◽  
pp. 1305-1315 ◽  
Author(s):  
Rieko Sakurai ◽  
Masao Ueki ◽  
Satoshi Makino ◽  
Atsushi Hozawa ◽  
Shinichi Kuriyama ◽  
...  

Abstract Background Biobanks increasingly collect, process and store omics with more conventional epidemiologic information necessitating considerable effort in data cleaning. An efficient outlier detection method that reduces manual labour is highly desirable. Method We develop an unsupervised machine-learning method for outlier detection, namely kurPCA, that uses principal component analysis combined with kurtosis to ascertain the existence of outliers. In addition, we propose a novel regression adjustment approach to improve detection, namely the regression adjustment for data by systematic missing patterns (RAMP). Result Application to epidemiological record data in a large-scale biobank (Tohoku Medical Megabank Organization, Japan) shows that a combination of kurPCA and RAMP effectively detects known errors or inconsistent patterns. Conclusions We confirm through the results of the simulation and the application that our methods showed good performance. The proposed methods are useful for many practical analysis scenarios.

2015 ◽  
Vol 2015 ◽  
pp. 1-13 ◽  
Author(s):  
Weiming Kuang ◽  
Shi An ◽  
Huifu Jiang

Large-scale GPS data contain hidden information and provide us with the opportunity to discover knowledge that may be useful for transportation systems using advanced data mining techniques. In major metropolitan cities, many taxicabs are equipped with GPS devices. Because taxies operate continuously for nearly 24 hours per day, they can be used as reliable sensors for the perceived traffic state. In this paper, the entire city was divided into subregions by roads, and taxi GPS data were transformed into traffic flow data to build a traffic flow matrix. In addition, a highly efficient anomaly detection method was proposed based on wavelet transform and PCA (principal component analysis) for detecting anomalous traffic events in urban regions. The traffic anomaly is considered to occur in a subregion when the values of the corresponding indicators deviate significantly from the expected values. This method was evaluated using a GPS dataset that was generated by more than 15,000 taxies over a period of half a year in Harbin, China. The results show that this detection method is effective and efficient.


Author(s):  
Pooja Prabhu ◽  
A. K. Karunakar ◽  
Sanjib Sinha ◽  
N. Mariyappa ◽  
G. K. Bhargava ◽  
...  

AbstractIn a general scenario, the brain images acquired from magnetic resonance imaging (MRI) may experience tilt, distorting brain MR images. The tilt experienced by the brain MR images may result in misalignment during image registration for medical applications. Manually correcting (or estimating) the tilt on a large scale is time-consuming, expensive, and needs brain anatomy expertise. Thus, there is a need for an automatic way of performing tilt correction in three orthogonal directions (X, Y, Z). The proposed work aims to correct the tilt automatically by measuring the pitch angle, yaw angle, and roll angle in X-axis, Z-axis, and Y-axis, respectively. For correction of the tilt around the Z-axis (pointing to the superior direction), image processing techniques, principal component analysis, and similarity measures are used. Also, for correction of the tilt around the X-axis (pointing to the right direction), morphological operations, and tilt correction around the Y-axis (pointing to the anterior direction), orthogonal regression is used. The proposed approach was applied to adjust the tilt observed in the T1- and T2-weighted MR images. The simulation study with the proposed algorithm yielded an error of 0.40 ± 0.09°, and it outperformed the other existing studies. The tilt angle (in degrees) obtained is ranged from 6.2 ± 3.94, 2.35 ± 2.61, and 5 ± 4.36 in X-, Z-, and Y-directions, respectively, by using the proposed algorithm. The proposed work corrects the tilt more accurately and robustly when compared with existing studies.


2021 ◽  
Vol 503 (1) ◽  
pp. 270-291
Author(s):  
F Navarete ◽  
A Damineli ◽  
J E Steiner ◽  
R D Blum

ABSTRACT W33A is a well-known example of a high-mass young stellar object showing evidence of a circumstellar disc. We revisited the K-band NIFS/Gemini North observations of the W33A protostar using principal components analysis tomography and additional post-processing routines. Our results indicate the presence of a compact rotating disc based on the kinematics of the CO absorption features. The position–velocity diagram shows that the disc exhibits a rotation curve with velocities that rapidly decrease for radii larger than 0.1 arcsec (∼250 au) from the central source, suggesting a structure about four times more compact than previously reported. We derived a dynamical mass of 10.0$^{+4.1}_{-2.2}$ $\rm {M}_\odot$ for the ‘disc + protostar’ system, about ∼33 per cent smaller than previously reported, but still compatible with high-mass protostar status. A relatively compact H2 wind was identified at the base of the large-scale outflow of W33A, with a mean visual extinction of ∼63 mag. By taking advantage of supplementary near-infrared maps, we identified at least two other point-like objects driving extended structures in the vicinity of W33A, suggesting that multiple active protostars are located within the cloud. The closest object (Source B) was also identified in the NIFS field of view as a faint point-like object at a projected distance of ∼7000 au from W33A, powering extended K-band continuum emission detected in the same field. Another source (Source C) is driving a bipolar $\rm {H}_2$ jet aligned perpendicular to the rotation axis of W33A.


2021 ◽  
Vol 13 (10) ◽  
pp. 5359
Author(s):  
Afrika Onguko Okello ◽  
Jonathan Makau Nzuma ◽  
David Jakinda Otieno ◽  
Michael Kidoido ◽  
Chrysantus Mbi Tanga

The utilization of insect-based feeds (IBF) as an alternative protein source is increasingly gaining momentum worldwide owing to recent concerns over the impact of food systems on the environment. However, its large-scale adoption will depend on farmers’ acceptance of its key qualities. This study evaluates farmer’s perceptions of commercial IBF products and assesses the factors that would influence its adoption. It employs principal component analysis (PCA) to develop perception indices that are subsequently used in multiple regression analysis of survey data collected from a sample of 310 farmers. Over 90% of the farmers were ready and willing to use IBF. The PCA identified feed performance, social acceptability of the use of insects in feed formulation, feed versatility and marketability of livestock products reared on IBF as the key attributes that would inform farmers’ purchase decisions. Awareness of IBF attributes, group membership, off-farm income, wealth status and education significantly influenced farmers’ perceptions of IBF. Interventions such as experimental demonstrations that increase farmers’ technical knowledge on the productivity of livestock fed on IBF are crucial to reducing farmers’ uncertainties towards acceptability of IBF. Public partnerships with resource-endowed farmers and farmer groups are recommended to improve knowledge sharing on IBF.


2011 ◽  
Vol 24 (13) ◽  
pp. 3457-3468 ◽  
Author(s):  
Keyan Fang ◽  
Xiaohua Gou ◽  
Fahu Chen ◽  
Edward Cook ◽  
Jinbao Li ◽  
...  

Abstract A preliminary study of a point-by-point spatial precipitation reconstruction for northwestern (NW) China is explored, based on a tree-ring network of 132 chronologies. Precipitation variations during the past ~200–400 yr (the common reconstruction period is from 1802 to 1990) are reconstructed for 26 stations in NW China from a nationwide 160-station dataset. The authors introduce a “search spatial correlation contour” method to locate candidate tree-ring predictors for the reconstruction data of a given climate station. Calibration and verification results indicate that most precipitation reconstruction models are acceptable, except for a few reconstructions (stations Hetian, Hami, Jiuquan, and Wuwei) with degraded quality. Additionally, the authors compare four spatial precipitation factors in the instrumental records and reconstructions derived from a rotated principal component analysis (RPCA). The northern and southern Xinjiang factors from the instrumental and reconstructed data agree well with each other. However, differences in spatial patterns between the instrumentation and reconstruction data are also found for the other two factors, which probably result from the relatively poor quality of a few stations. Major drought events documented in previous studies—for example, from the 1920s through the 1930s for the eastern part of NW China—are reconstructed in this study.


Sign in / Sign up

Export Citation Format

Share Document