scholarly journals Optimal Cluster Analysis for Objective Regionalization of Seasonal Precipitation in Regions of High Spatial–Temporal Variability: Application to Western Ethiopia

2016 ◽  
Vol 29 (10) ◽  
pp. 3697-3717 ◽  
Author(s):  
Ying Zhang ◽  
Semu Moges ◽  
Paul Block

Abstract Defining homogeneous precipitation regions is fundamental for hydrologic applications, yet nontrivial, particularly for regions with highly varied spatial–temporal patterns. Traditional approaches typically include aspects of subjective delineation around sparsely distributed precipitation stations. Here, hierarchical and nonhierarchical (k means) clustering techniques on a gridded dataset for objective and automatic delineation are evaluated. Using a spatial sensitivity analysis test, the k-means clustering method is found to produce much more stable cluster boundaries. To identify a reasonable optimal k, various performance indicators, including the within-cluster sum of square errors (WSS) metric, intra- and intercluster correlations, and postvisualization are evaluated. Two new objective selection metrics (difference in minimum WSS and difference in difference) are developed based on the elbow method and gap statistics, respectively, to determine k within a desired range. Consequently, eight homogenous regions are defined with relatively clear and smooth boundaries, as well as low intercluster correlations and high intracluster correlations. The underlying physical mechanisms for the regionalization outcomes not only help justify the optimal number of clusters selected, but also prove informative in understanding the local- and large-scale climate factors affecting Ethiopian summertime precipitation. A principal component linear regression model to produce cluster-level seasonal forecasts also proves skillful.

Rangifer ◽  
2009 ◽  
Vol 27 (2) ◽  
pp. 107-119
Author(s):  
Henrik Lundqvist ◽  
Öje Danell

The 51 reindeer herding districts in Sweden vary in productivity and prerequisites for reindeer herding. In this study we characterize and group reindeer herding districts based on relevant factors affecting reindeer productivity, i.e. topography, vegetation, forage value, habitat fragmentation and reachability, as well as season lengths, snow fall, ice-crust probability, and insect harassment, totally quantified in 15 variables. The herding districts were grouped into seven main groups and three single outliers through cluster analyses. The largest group, consisting of 14 herding districts, was further divided into four subgroups. The range properties of herding districts and groups of districts were characterized through principal component analyses. By comparisons of the suggested grouping of herding districts with existing administrative divisions, these appeared not to coincide. A new division of herding districts into six administrative sets of districts was suggested in order to improve administrative planning and management of the reindeer herding industry. The results also give possibilities for projections of alterations caused by an upcoming global climate change. Large scale investigations using geographical information systems (GIS) and meteorological data would be helpful for administrative purposes, both nationally and internationally, as science-based decision tools in legislative, economical, ecological and structural assessments. Abstract in Swedish / Sammanfattning: Multivariat gruppering av svenska samebyar baserat på renbetesmarkernas grundförutsettningar Svenska renskötselområdet består av 51 samebyar som varierar i produktivitet och förutsättningar för renskötsel. Vi analyserade variationen mellan samebyar med avseende på 15 variabler som beskriver topografi, vegetation, betesvärde, fragmentering av betesmarker, klimat, skareförekomst och aktivitet av parasiterande insekter och vi föreslår en indelning av samebyar i tio grupper. Den största gruppen, som bestod av 14 samebyar, delades vidare in i 4 undergrupper. Klusteranalyser med 4 olika linkage-varianter användes till att gruppera samebyarna. Principalkomponentsanalys användes för att kartlägga undersökta variabler och de resulterande samebygruppernas karaktär. Samebygrupperna följde inte länsgränser och tre samebyar föll ut som enskilda grupper. Denna undersökning ger underlag för jämförelser mellan samebyar med beaktande av likheter och olikheter i fråga om produktivitet och funktionella särdrag istället för länsgränser och historik. Vi föreslår en ny administrativ indelning i sex områden som skulle kunna fungera som ett alternativt underlag för planering och beslut som rör produktionsaspekter i rennäringen. Resultaten ger också underlag för förutsägelser av förändringar i samebyars produktionsförutsättningar till följd av klimatförändringar.


2019 ◽  
Vol 35 (19) ◽  
pp. 3679-3683 ◽  
Author(s):  
Aritra Bose ◽  
Vassilis Kalantzis ◽  
Eugenia-Maria Kontopoulou ◽  
Mai Elkady ◽  
Peristera Paschou ◽  
...  

Abstract Motivation Principal Component Analysis is a key tool in the study of population structure in human genetics. As modern datasets become increasingly larger in size, traditional approaches based on loading the entire dataset in the system memory (Random Access Memory) become impractical and out-of-core implementations are the only viable alternative. Results We present TeraPCA, a C++ implementation of the Randomized Subspace Iteration method to perform Principal Component Analysis of large-scale datasets. TeraPCA can be applied both in-core and out-of-core and is able to successfully operate even on commodity hardware with a system memory of just a few gigabytes. Moreover, TeraPCA has minimal dependencies on external libraries and only requires a working installation of the BLAS and LAPACK libraries. When applied to a dataset containing a million individuals genotyped on a million markers, TeraPCA requires <5 h (in multi-threaded mode) to accurately compute the 10 leading principal components. An extensive experimental analysis shows that TeraPCA is both fast and accurate and is competitive with current state-of-the-art software for the same task. Availability and implementation Source code and documentation are both available at https://github.com/aritra90/TeraPCA. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Samuel E. Champer ◽  
Suh Yeon Oh ◽  
Chen Liu ◽  
Zhaoxin Wen ◽  
Andrew G. Clark ◽  
...  

ABSTRACTCRISPR homing gene drives potentially have the capacity for large-scale population modification or suppression. However, resistance alleles formed by the drives can prevent them from successfully spreading. Such alleles have been found to form at high rates in most studies, including those in both insects and mammals. One possible solution to this issue is the use of multiple guide RNAs (gRNAs), thus allowing cleavage by the drive even if resistance sequences are present at some of the gRNA target sequences. Here, we develop a high-fidelity model incorporating several factors affecting the performance of drives with multiple gRNAs, including timing of cleavage, reduction in homology-directed repair efficiency due to imperfect homology around the cleavage site, Cas9 activity saturation, variance in the activity level of individual gRNAs, and formation of resistance alleles due to incomplete homology-directed repair. We parameterize the model using data from homing drive experiments designed to investigate these factors and then use it to analyze several types of homing gene drives. We find that each type of drive has an optimal number of gRNAs, usually between two and eight, dependent on drive type and performance parameters. Our model indicates that utilization of multiple gRNAs is insufficient for construction of successful gene drives, but that it provides a critical boost to drive efficiency when combined with other strategies for population modification or suppression.


2014 ◽  
Author(s):  
Gad Abraham ◽  
Michael Inouye

Principal component analysis (PCA) is routinely used to analyze genome-wide single-nucleotide polymorphism (SNP) data, for detecting population structure and potential outliers. However, the size of SNP datasets has increased immensely in recent years and PCA of large datasets has become a time consuming task. We have developed flashpca, a highly efficient PCA implementation based on randomized algorithms, which delivers identical accuracy in extracting the top principal components compared with existing tools, in substantially less time. We demonstrate the utility of flashpca on both HapMap3 and on a large Immunochip dataset. For the latter, flashpca performed PCA of 15,000 individuals up to 125 times faster than existing tools, with identical results, and PCA of 150,000 individuals using flashpca completed in 4 hours. The increasing size of SNP datasets will make tools such as flashpca essential as traditional approaches will not adequately scale. This approach will also help to scale other applications that leverage PCA or eigen-decomposition to substantially larger datasets.


Author(s):  
Jiahui Wu ◽  
Enrique Frias-Martinez ◽  
Vanessa Frias-Martinez

Urban hotspots can be used to model the structure of urban environments and to study or predict various aspects of urban life. An increasing interest in the analysis of urban hotspots has been triggered by the emergence of pervasive technologies that produce massive amounts of spatio-temporal data including cell phone traces (or Call Detail Records). Although hotspot analyses using cell phone traces are extensive, there is no consensus among researchers about the process followed to compute them in terms of four important methodological choices: city boundaries, spatial units, interpolation methods, and hotspot variables. Using a large-scale CDR dataset from Mexico, we provide an interpretable systematic spatial sensitivity analysis of the impact that these methodological choices might have on the stability of the hotspot variables in both static and dynamic settings.


2019 ◽  
Vol 19 (1) ◽  
pp. 4-16 ◽  
Author(s):  
Qihui Wu ◽  
Hanzhong Ke ◽  
Dongli Li ◽  
Qi Wang ◽  
Jiansong Fang ◽  
...  

Over the past decades, peptide as a therapeutic candidate has received increasing attention in drug discovery, especially for antimicrobial peptides (AMPs), anticancer peptides (ACPs) and antiinflammatory peptides (AIPs). It is considered that the peptides can regulate various complex diseases which are previously untouchable. In recent years, the critical problem of antimicrobial resistance drives the pharmaceutical industry to look for new therapeutic agents. Compared to organic small drugs, peptide- based therapy exhibits high specificity and minimal toxicity. Thus, peptides are widely recruited in the design and discovery of new potent drugs. Currently, large-scale screening of peptide activity with traditional approaches is costly, time-consuming and labor-intensive. Hence, in silico methods, mainly machine learning approaches, for their accuracy and effectiveness, have been introduced to predict the peptide activity. In this review, we document the recent progress in machine learning-based prediction of peptides which will be of great benefit to the discovery of potential active AMPs, ACPs and AIPs.


Author(s):  
Pooja Prabhu ◽  
A. K. Karunakar ◽  
Sanjib Sinha ◽  
N. Mariyappa ◽  
G. K. Bhargava ◽  
...  

AbstractIn a general scenario, the brain images acquired from magnetic resonance imaging (MRI) may experience tilt, distorting brain MR images. The tilt experienced by the brain MR images may result in misalignment during image registration for medical applications. Manually correcting (or estimating) the tilt on a large scale is time-consuming, expensive, and needs brain anatomy expertise. Thus, there is a need for an automatic way of performing tilt correction in three orthogonal directions (X, Y, Z). The proposed work aims to correct the tilt automatically by measuring the pitch angle, yaw angle, and roll angle in X-axis, Z-axis, and Y-axis, respectively. For correction of the tilt around the Z-axis (pointing to the superior direction), image processing techniques, principal component analysis, and similarity measures are used. Also, for correction of the tilt around the X-axis (pointing to the right direction), morphological operations, and tilt correction around the Y-axis (pointing to the anterior direction), orthogonal regression is used. The proposed approach was applied to adjust the tilt observed in the T1- and T2-weighted MR images. The simulation study with the proposed algorithm yielded an error of 0.40 ± 0.09°, and it outperformed the other existing studies. The tilt angle (in degrees) obtained is ranged from 6.2 ± 3.94, 2.35 ± 2.61, and 5 ± 4.36 in X-, Z-, and Y-directions, respectively, by using the proposed algorithm. The proposed work corrects the tilt more accurately and robustly when compared with existing studies.


2021 ◽  
Vol 503 (1) ◽  
pp. 270-291
Author(s):  
F Navarete ◽  
A Damineli ◽  
J E Steiner ◽  
R D Blum

ABSTRACT W33A is a well-known example of a high-mass young stellar object showing evidence of a circumstellar disc. We revisited the K-band NIFS/Gemini North observations of the W33A protostar using principal components analysis tomography and additional post-processing routines. Our results indicate the presence of a compact rotating disc based on the kinematics of the CO absorption features. The position–velocity diagram shows that the disc exhibits a rotation curve with velocities that rapidly decrease for radii larger than 0.1 arcsec (∼250 au) from the central source, suggesting a structure about four times more compact than previously reported. We derived a dynamical mass of 10.0$^{+4.1}_{-2.2}$ $\rm {M}_\odot$ for the ‘disc + protostar’ system, about ∼33 per cent smaller than previously reported, but still compatible with high-mass protostar status. A relatively compact H2 wind was identified at the base of the large-scale outflow of W33A, with a mean visual extinction of ∼63 mag. By taking advantage of supplementary near-infrared maps, we identified at least two other point-like objects driving extended structures in the vicinity of W33A, suggesting that multiple active protostars are located within the cloud. The closest object (Source B) was also identified in the NIFS field of view as a faint point-like object at a projected distance of ∼7000 au from W33A, powering extended K-band continuum emission detected in the same field. Another source (Source C) is driving a bipolar $\rm {H}_2$ jet aligned perpendicular to the rotation axis of W33A.


2021 ◽  
Vol 13 (10) ◽  
pp. 5359
Author(s):  
Afrika Onguko Okello ◽  
Jonathan Makau Nzuma ◽  
David Jakinda Otieno ◽  
Michael Kidoido ◽  
Chrysantus Mbi Tanga

The utilization of insect-based feeds (IBF) as an alternative protein source is increasingly gaining momentum worldwide owing to recent concerns over the impact of food systems on the environment. However, its large-scale adoption will depend on farmers’ acceptance of its key qualities. This study evaluates farmer’s perceptions of commercial IBF products and assesses the factors that would influence its adoption. It employs principal component analysis (PCA) to develop perception indices that are subsequently used in multiple regression analysis of survey data collected from a sample of 310 farmers. Over 90% of the farmers were ready and willing to use IBF. The PCA identified feed performance, social acceptability of the use of insects in feed formulation, feed versatility and marketability of livestock products reared on IBF as the key attributes that would inform farmers’ purchase decisions. Awareness of IBF attributes, group membership, off-farm income, wealth status and education significantly influenced farmers’ perceptions of IBF. Interventions such as experimental demonstrations that increase farmers’ technical knowledge on the productivity of livestock fed on IBF are crucial to reducing farmers’ uncertainties towards acceptability of IBF. Public partnerships with resource-endowed farmers and farmer groups are recommended to improve knowledge sharing on IBF.


Sign in / Sign up

Export Citation Format

Share Document