validation measure
Recently Published Documents


TOTAL DOCUMENTS

28
(FIVE YEARS 15)

H-INDEX

4
(FIVE YEARS 1)

Author(s):  
Michael Netzer ◽  
Friedrich Hanser ◽  
Maximilian Ledochowski ◽  
Daniel Baumgarten

Hydrogen breath tests are a well-established method to help diagnose functional intestinal disorders such as carbohydrate malabsorption or small intestinal bacterial overgrowth. In this work we apply unsupervised machine learning techniques to analyze hydrogen breath test datasets. We propose a method that uses 26 internal cluster validation measures to determine a suitable number of clusters. In an induced external validation step we use a predefined categorization proposed by a medical expert. The results indicate that the majority of the considered internal validation indexes was not able to produce a reasonable clustering. Considering a predefined categorization performed by a medical expert, a novel shape-based method obtained the highest external validation measure in terms of adjusted rand index. The predefined clusterings constitute the basis of a supervised machine learning step that is part of our ongoing research.


2021 ◽  
Author(s):  
Khalid G. Biro Turk ◽  
Faisal I. Zeineldin ◽  
Abdulrahman M. Alghannam

Evapotranspiration (ET) is an essential process for defining the mass and energy relationship between soil, crop and atmosphere. This study was conducted in the Eastern Region of Saudi Arabia, to estimate the actual daily, monthly and annual evapotranspiration (ETa) for different land-use systems using Landsat-8 satellite data during the year 2017/2018. Initially, six land-use and land-cover (LULC) types were identified, namely: date palm, cropland, bare land, urban land, aquatic vegetation, and open water bodies. The Surface Energy Balance Algorithm for Land (SEBAL) supported by climate data was used to compute the ETa. The SEBAL model outputs were validated using the FAO Penman-Monteith (FAO P-M) method coupled with field observation. The results showed that the annual ETa values varied between 800 and 1400 mm.year−1 for date palm, 2000 mm.year−1 for open water and 800 mm.year−1 for croplands. The validation measure showed a significant agreement level between the SEBAL model and the FAO P-M method with RMSE of 0.84, 0.98 and 1.38 mm.day−1 for date palm, open water and cropland respectively. The study concludes that the ETa produced from the satellite data and the SEBAL model is useful for water resource management under arid ecosystem of the study area.


2021 ◽  
Author(s):  
Martin Daumiller ◽  
Stefan Siegel ◽  
Markus Dresel

Research is often specialised and varies in its nature between disciplines, making it difficult to assess and compare the performance of individual researchers. Specific qualitative and quantitative indicators are usually complex and do not work equally well for different research fields. Therefore, the aim of the present study was to develop an economical questionnaire that is valid across disciplines. We constructed a Short Multidisciplinary Research Performance Questionnaire (SMRPQ), with which researchers can briefly report 11 quantitative and qualitative performance aspects from four areas (research quality, facilitation, transfer/exchange, and reputation) in relation to their peer reference groups (fellow researchers with the same status and discipline). To validate this questionnaire, 557 German researchers from Physics, History, and Psychology fields (53% male, 34% post-docs, 19% full professors) completed it, and for the purpose of convergent and discriminant validation additionally made assessments regarding specific quantitative and qualitative indicators of research performance as well as affective, cognitive, and behavioural aspects of their research activities (perceptions of positive affect, help-seeking, procrastination). The results attested reliable measurement, endorsed the postulated structure of the newly developed instrument, and confirmed its invariance across the three disciplines. The SMRPQ and the validation measure were strongly positively correlated, and both demonstrated similar associations with affect, cognition, and behaviour at work. Therefore, it can be considered a valid and economical approach for assessing research performance of individual researchers across different disciplines, especially within nomothetic research (e.g. regarding personal antecedents of successful research).


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Zhou Shen ◽  
Colin Zhi Wei Teo ◽  
Kartik Ayyer ◽  
N. Duane Loh

AbstractWe propose an encryption–decryption framework for validating diffraction intensity volumes reconstructed using single-particle imaging (SPI) with X-ray free-electron lasers (XFELs) when the ground truth volume is absent. This conceptual framework exploits each reconstructed volumes’ ability to decipher latent variables (e.g. orientations) of unseen sentinel diffraction patterns. Using this framework, we quantify novel measures of orientation disconcurrence, inconsistency, and disagreement between the decryptions by two independently reconstructed volumes. We also study how these measures can be used to define data sufficiency and its relation to spatial resolution, and the practical consequences of focusing XFEL pulses to smaller foci. This conceptual framework overcomes critical ambiguities in using Fourier Shell Correlation (FSC) as a validation measure for SPI. Finally, we show how this encryption-decryption framework naturally leads to an information-theoretic reformulation of the resolving power of XFEL-SPI, which we hope will lead to principled frameworks for experiment and instrument design.


Electronics ◽  
2020 ◽  
Vol 9 (11) ◽  
pp. 1782
Author(s):  
Aurelio López-Fernández ◽  
Domingo S. Rodríguez-Baena ◽  
Francisco Gómez-Vela

Nowadays, Biclustering is one of the most widely used machine learning techniques to discover local patterns in datasets from different areas such as energy consumption, marketing, social networks or bioinformatics, among them. Particularly in bioinformatics, Biclustering techniques have become extremely time-consuming, also being huge the number of results generated, due to the continuous increase in the size of the databases over the last few years. For this reason, validation techniques must be adapted to this new environment in order to help researchers focus their efforts on a specific subset of results in an efficient, fast and reliable way. The aforementioned situation may well be considered as Big Data context. In this sense, multiple machine learning techniques have been implemented by the application of Graphic Processing Units (GPU) technology and CUDA architecture to accelerate the processing of large databases. However, as far as we know, this technology has not yet been applied to any bicluster validation technique. In this work, a multi-GPU version of one of the most used bicluster validation measure, Mean Squared Residue (MSR), is presented. It takes advantage of all the hardware and memory resources offered by GPU devices. Because of to this, gMSR is able to validate a massive number of biclusters in any Biclustering-based study within a Big Data context.


2020 ◽  
Author(s):  
D O'Neill ◽  
Andrew Lensen ◽  
Bing Xue ◽  
Mengjie Zhang

© 2018 IEEE. Clustering, an important unsupervised learning task, is very challenging on high-dimensional data, since the generated clusters can be significantly less meaningful as the number of features increases. Feature selection and/or feature weighting can address this issue by selecting and weighting only informative features. These techniques have been extensively studied in supervised learning, e.g. classification, but they are very difficult to use with clustering due to the lack of effective similarity/distance and validation measures. This paper utilises the powerful global search ability of particle swarm optimisation (PSO) on continuous problems, to propose a PSO based method for simultaneous feature selection and feature weighting for clustering on high-dimensional data, where a new validation measure is also proposed as the fitness function of the PSO method. Experiments on datasets with varying dimensionalities and different number of known clusters show that the proposed method can successfully improve clustering performance of different types of clustering algorithms over using the baseline of the original feature set.


2020 ◽  
Author(s):  
D O'Neill ◽  
Andrew Lensen ◽  
Bing Xue ◽  
Mengjie Zhang

© 2018 IEEE. Clustering, an important unsupervised learning task, is very challenging on high-dimensional data, since the generated clusters can be significantly less meaningful as the number of features increases. Feature selection and/or feature weighting can address this issue by selecting and weighting only informative features. These techniques have been extensively studied in supervised learning, e.g. classification, but they are very difficult to use with clustering due to the lack of effective similarity/distance and validation measures. This paper utilises the powerful global search ability of particle swarm optimisation (PSO) on continuous problems, to propose a PSO based method for simultaneous feature selection and feature weighting for clustering on high-dimensional data, where a new validation measure is also proposed as the fitness function of the PSO method. Experiments on datasets with varying dimensionalities and different number of known clusters show that the proposed method can successfully improve clustering performance of different types of clustering algorithms over using the baseline of the original feature set.


Symmetry ◽  
2020 ◽  
Vol 12 (9) ◽  
pp. 1514
Author(s):  
Ji Hoon Ryoo ◽  
Seohee Park ◽  
Seongeun Kim ◽  
Hyun Suk Ryoo

Fuzzy clustering has been broadly applied to classify data into K clusters by assigning membership probabilities of each data point close to K centroids. Such a function has been applied into characterizing the clusters associated with a statistical model such as structural equation modeling. The characteristics identified by the statistical model further define the clusters as heterogeneous groups selected from a population. Recently, such statistical model has been formulated as fuzzy clusterwise generalized structured component analysis (fuzzy clusterwise GSCA). The same as in fuzzy clustering, the clusters are enumerated to infer the population and its parameters within the fuzzy clusterwise GSCA. However, the identification of clusters in fuzzy clustering is a difficult task because of the data-dependence of classification indexes, which is known as a cluster validity problem. We examined the cluster validity problem within the fuzzy clusterwise GSCA framework and proposed a new criterion for selecting the most optimal number of clusters using both fit indexes of the GSCA and the fuzzy validity indexes in fuzzy clustering. The criterion, named the FIT-FHV method combining a fit index, FIT, from GSCA and a cluster validation measure, FHV, from fuzzy clustering, performed better than any other indices used in fuzzy clusterwise GSCA.


Sign in / Sign up

Export Citation Format

Share Document