scholarly journals Influence of meteorological input data on backtrajectory cluster analysis – a seven-year study for southeastern Spain

2008 ◽  
Vol 2 (1) ◽  
pp. 65-70 ◽  
Author(s):  
M. Cabello ◽  
J. A. G. Orza ◽  
V. Galiano ◽  
G. Ruiz

Abstract. Backtrajectory differences and clustering sensitivity to the meteorological input data are studied. Trajectories arriving in Southeast Spain (Elche), at 3000, 1500 and 500 m for the 7-year period 2000–2006 have been computed employing two widely used meteorological data sets: the NCEP/NCAR Reanalysis and the FNL data sets. Differences between trajectories grow linearly at least up to 48 h, showing faster growing after 72 h. A k-means cluster analysis performed on each set of trajectories shows differences in the identified clusters (main flows), partially because the number of clusters of each clustering solution differs for the trajectories arriving at 3000 and 1500 m. Trajectory membership to the identified flows is in general more sensitive to the input meteorological data than to the initial selection of cluster centroids.

1990 ◽  
Vol 29 (03) ◽  
pp. 200-204 ◽  
Author(s):  
J. A. Koziol

AbstractA basic problem of cluster analysis is the determination or selection of the number of clusters evinced in any set of data. We address this issue with multinomial data using Akaike’s information criterion and demonstrate its utility in identifying an appropriate number of clusters of tumor types with similar profiles of cell surface antigens.


Complexity ◽  
2018 ◽  
Vol 2018 ◽  
pp. 1-16 ◽  
Author(s):  
Yiwen Zhang ◽  
Yuanyuan Zhou ◽  
Xing Guo ◽  
Jintao Wu ◽  
Qiang He ◽  
...  

The K-means algorithm is one of the ten classic algorithms in the area of data mining and has been studied by researchers in numerous fields for a long time. However, the value of the clustering number k in the K-means algorithm is not always easy to be determined, and the selection of the initial centers is vulnerable to outliers. This paper proposes an improved K-means clustering algorithm called the covering K-means algorithm (C-K-means). The C-K-means algorithm can not only acquire efficient and accurate clustering results but also self-adaptively provide a reasonable numbers of clusters based on the data features. It includes two phases: the initialization of the covering algorithm (CA) and the Lloyd iteration of the K-means. The first phase executes the CA. CA self-organizes and recognizes the number of clusters k based on the similarities in the data, and it requires neither the number of clusters to be prespecified nor the initial centers to be manually selected. Therefore, it has a “blind” feature, that is, k is not preselected. The second phase performs the Lloyd iteration based on the results of the first phase. The C-K-means algorithm combines the advantages of CA and K-means. Experiments are carried out on the Spark platform, and the results verify the good scalability of the C-K-means algorithm. This algorithm can effectively solve the problem of large-scale data clustering. Extensive experiments on real data sets show that the accuracy and efficiency of the C-K-means algorithm outperforms the existing algorithms under both sequential and parallel conditions.


2015 ◽  
Vol 17 (3) ◽  
pp. 149
Author(s):  
Pande Made Udiyani ◽  
Sri Kuntjoro

ABSTRAK PENGARUH KONDISI ATMOSFERIK TERHADAP PERHITUNGAN PROBABILISTIK DAMPAK RADIOLOGI KECELAKAAN PWR 1000-MWe.  Perhitungan dampak kecelakaan radiologi terhadap lepasan produk fisi akibat kecelakaan potensial yang mungkin terjadi di Pressurized Water Reactor (PWR) diperlukan secara probabilistik. Mengingat kondisi atmosfer sangat berperan terhadap dispersi radionuklida di lingkungan, dalam penelitian ini akan dianalisis pengaruh kondisi atmosferik terhadap perhitungan probabilistik dari konsekuensi kecelakaan reaktor.  Tujuan penelitian adalah melakukan analisis terhadap pengaruh kondisi atmosfer berdasarkan model data input meteorologi terhadap dampak radiologi kecelakaan PWR 1000-MWe yang disimulasikan pada tapak yang mempunyai kondisi meteorologi yang berbeda. Simulasi menggunakan program PC-Cosyma dengan moda perhitungan probabilistik, dengan data input meteorologi yang dieksekusi secara cyclic dan stratified, dan disimulasikan di Tapak Semenanjung Muria dan Pesisir Serang. Data meteorologi diambil setiap jam untuk jangka waktu satu tahun. Hasil perhitungan menunjukkan bahwa frekuensi kumulatif  untuk model input yang sama untuk Tapak pesisir Serang lebih tinggi dibandingkan dengan Semenanjung Muria. Untuk tapak yang sama, frekuensi kumulatif model input cyclic lebih tinggi dibandingkan model stratified. Model cyclic memberikan keleluasan dalam menentukan tingkat ketelitian perhitungan dan tidak membutuhkan data acuan dibandingkan dengan model stratified. Penggunaan model cyclic dan stratified melibatkan jumlah data yang besar dan pengulangan perhitungan  akan meningkatkan  ketelitian nilai-nilai statistika perhitungan. Kata kunci: dampak kecelakaan, PWR 1000-MWe,  probabilistik,  atmosferik, PC-Cosyma   ABSTRACT THE INFLUENCE OF ATMOSPHERIC CONDITIONS TO PROBABILISTIC CALCULATION OF IMPACT OF RADIOLOGY ACCIDENT ON PWR-1000MWe. The calculation of the radiological impact of the fission products releases due to potential accidents that may occur in the PWR (Pressurized Water Reactor) is required in a  probabilistic. The atmospheric conditions greatly contribute to the dispersion of radionuclides in the environment, so that in this study will be analyzed the influence of atmospheric conditions on probabilistic calculation of the reactor accidents consequences. The objective of this study is to conduct an analysis of the influence of atmospheric conditions based on meteorological input data models on the radiological consequences of PWR-1000MWe accidents. Simulations using PC-Cosyma code with probabilistic calculations mode, the meteorological data input executed cyclic and stratified, the meteorological input data are executed in the cyclic and stratified, and simulated in Muria Peninsula and Serang Coastal. Meteorological data were taken every hour for the duration of the year. The result showed that the cumulative frequency for the same input models for Serang coastal is higher than the Muria Peninsula. For the same site, cumulative frequency on cyclic input models is higher than stratified models. The cyclic models provide flexibility in determining the level of accuracy of calculations and do not require reference data compared to stratified models. The use of cyclic and stratified models involving large amounts of data and calculation repetition will improve the accuracy of statistical calculation values. Keywords: accident impact, PWR 1000 MWe, probabilistic, atmospheric, PC-Cosyma


2019 ◽  
Vol 4 (1) ◽  
pp. 64-67
Author(s):  
Pavel Kim

One of the fundamental tasks of cluster analysis is the partitioning of multidimensional data samples into groups of clusters – objects, which are closed in the sense of some given measure of similarity. In a some of problems, the number of clusters is set a priori, but more often it is required to determine them in the course of solving clustering. With a large number of clusters, especially if the data is “noisy,” the task becomes difficult for analyzing by experts, so it is artificially reduces the number of consideration clusters. The formal means of merging the “neighboring” clusters are considered, creating the basis for parameterizing the number of significant clusters in the “natural” clustering model [1].


2020 ◽  
Vol 13 (11) ◽  
pp. 5277-5310
Author(s):  
Anne Tipka ◽  
Leopold Haimberger ◽  
Petra Seibert

Abstract. Flex_extract is an open-source software package to efficiently retrieve and prepare meteorological data from the European Centre for Medium-Range Weather Forecasts (ECMWF) as input for the widely used Lagrangian particle dispersion model FLEXPART and the related trajectory model FLEXTRA. ECMWF provides a variety of data sets which differ in a number of parameters (available fields, spatial and temporal resolution, forecast start times, level types etc.). Therefore, the selection of the right data for a specific application and the settings needed to obtain them are not trivial. Consequently, the data sets which can be retrieved through flex_extract by both member-state users and public users as well as their properties are explained. Flex_extract 7.1.2 is a substantially revised version with completely restructured code, mainly written in Python 3, which is introduced with all its input and output files and an explanation of the four application modes. Software dependencies and the methods for calculating the native vertical velocity η˙, the handling of flux data and the preparation of the final FLEXPART input files are documented. Considerations for applications give guidance with respect to the selection of data sets, caveats related to the land–sea mask and orography, etc. Formal software quality-assurance methods have been applied to flex_extract. A set of unit and regression tests as well as code metric data are also supplied. A short description of the installation and usage of flex_extract is provided in the Appendix. The paper points also to an online documentation which will be kept up to date with respect to future versions.


2017 ◽  
Vol 13 (2) ◽  
pp. 1-12 ◽  
Author(s):  
Jungmok Ma

One of major obstacles in the application of the k-means clustering algorithm is the selection of the number of clusters k. The multi-attribute utility theory (MAUT)-based k-means clustering algorithm is proposed to tackle the problem by incorporating user preferences. Using MAUT, the decision maker's value structure for the number of clusters and other attributes can be quantitatively modeled, and it can be used as an objective function of the k-means. A target clustering problem for military targeting process is used to demonstrate the MAUT-based k-means and provide a comparative study. The result shows that the existing clustering algorithms do not necessarily reflect user preferences while the MAUT-based k-means provides a systematic framework of preferences modeling in cluster analysis.


2011 ◽  
Vol 34 (3) ◽  
pp. 461-481 ◽  
Author(s):  
AGNES TELLINGS ◽  
KARIEN COPPENS ◽  
JOHN GELISSEN ◽  
ROB SCHREUDER

ABSTRACTOften, the classification of words does not go beyond “difficult” (i.e., infrequent, late-learned, nonimageable, etc.) or “easy” (i.e., frequent, early-learned, imageable, etc.) words. In the present study, we used a latent cluster analysis to divide 703 Dutch words with scores for eight word properties into seven clusters of words. Each cluster represents a group of words that share a particular configuration of word properties. This model was empirically validated with three data sets from Grades 2 to 4 children who made either a lexical decision task or a use decision task with a selection of the words. Significant differences were found between the clusters of words within the three data sets. Implications for further study and for practice are discussed.


2011 ◽  
Vol 8 (2) ◽  
pp. 3571-3597
Author(s):  
M. C. Casper ◽  
G. Grigoryan ◽  
O. Gronz ◽  
O. Gutjahr ◽  
G. Heinemann ◽  
...  

Abstract. To precisely map the changes in hydrologic response of catchments (e.g., water balance, reactivity or extremes) we need sensitive and interpretable indicators. In this study we defined nine hydrologically meaningful signature indices: five indices were sampled on the flow duration curve, four indices were closely linked to the distribution of event runoff coefficients. We applied these signature indices to the output from three hydrologic catchment models located in the Nahe basin (Western Germany) to detect differences in runoff behavior resulting from different meteorological input data. The models were driven by measured and simulated (COSMO-CLM) meteorological data. It could be shown that application of signature indices is a very sensitive tool to assess differences in simulated runoff behavior resulting from climatic data sets of different sources. The hydrological model acts as a filter for the meteorological input and is therefore very sensitive to biases in mean and spatio-temporal distribution of precipitation and temperature. The selected signature indices allow assessing changes in water balance, vertical water distribution, reactivity, seasonality and runoff generation. Bias correction of temperature fields and adjustment of bias correction of precipitation fields seemed to be indispensable. For this reason, future work will focus on improving bias correction for CCLM data sets. Signature indices may then act as indirect "efficiency measures" or "similarity measures" for the reference period of the simulation.


Author(s):  
N.S. Virtseva ◽  
I.E. Vishnyakov ◽  
I.P. Ivanov

Currently, one of the urgent tasks of graph analysis is community detection. A large number of algorithms have been developed for detecting communities in graphs. Meanwhile, these communities have nothing to do with groups of people, i.e., family, colleagues, friends, and are used to simplify the graph representation. For a large number of tasks, it is useful to detect a group of people who closely communicate with each other. Many algorithms for detecting communities do not take into account that one participant can belong to several communities, and this is a prerequisite for detecting social circles. The paper overviews the main approaches to community detection, and among these emphasizes the approaches based on functionality optimization, clique problem, cluster analysis and label distribution. The approaches based on the analysis of ego-networks, i.e., considering the subgraph formed by the connections of one participant, are considered separately. The study gives the basic algorithms that are applicable for the selection of communities with certain relationship types based on billing information. Findings of research are useful for community detection depending on the task and available input data


Sign in / Sign up

Export Citation Format

Share Document