Maximum Dissimilarity-Based Algorithm for Discretization of Metocean Data Into Clusters of Arbitrary Size and Dimension

In order to accurately estimate the fatigue life a floating structure, it is necessary to have a large set of discrete environmental conditions. If the damage to a structure largely stems from wave-induced forces, then the creation of a set of environmental conditions or ‘bins’ is trivial. However, when considering a floating platform supporting a wind turbine, it is necessary to consider not only the wave conditions, but also the wind conditions (and perhaps current, if possible). Thus, it is common to have greater than 5 dimensions in the timeseries (e.g., significant wave height, wave period, wave direction, wind speed, wind direction, etc). The creation of bins in two dimensions is quite easily solved by creating an arbitrary grid and taking the mean of all the observations which fall in a specific cell. In higher dimensions, a p-dimensional cell is not easily visualized and so the resulting set of bins cannot easily be graphically represented. In this paper, an iterative algorithm is developed to convert N observations, each with p-dimensions, into a set with M discrete bins, where M << N. The algorithm presented borrows heavily from the maximum dissimilarity algorithm used in a wide array of fields. The benefit of using this algorithm is that there is no ‘bias’ introduced by an initial grid from the user. That is, given a desired final number of clusters and a certain distance tolerance, a unique set of cluster exists for a given data set. Inherently, the algorithm selects a diverse array of observations, usually including extreme events or outliers, which may have undue impact on the fatigue life of a structure. Although the algorithm is computationally expensive O(N2M), reductions in computational cost are possible. Most importantly, the algorithm can be written in such a way that memory constraints are not an issue even for N = O(105). The clustering algorithm is described in both graphical and logical terms. A case study is presented, using publicly available data from the Netherlands Enterprise Agency. The data is visualized in two dimensions with the final number of bins equaling approximately 50, 100, 200, 500, 1000, and 2000 bins. These bins are compared with a previous algorithm from these authors. Various measures are presented to assess the fidelity of a set of bins with respect to the initial observations. Each set of bins are analyzed and it is clear the MDA-based algorithm outperforms the previous algorithm.

Download Full-text

Efficient Algorithm for Discretization of Metocean Data Into Clusters of Arbitrary Size and Dimension

Volume 10: Ocean Renewable Energy ◽

10.1115/omae2017-62077 ◽

2017 ◽

Author(s):

Samuel Kanner ◽

Alexia Aubault ◽

Antoine Peiffer ◽

Bingbin Yu

Keyword(s):

Environmental Conditions ◽

Two Dimensions ◽

Offshore Wind Turbine ◽

Specific Cell ◽

Test Cases ◽

Large Set ◽

Data Set ◽

Arbitrary Size ◽

Tolerance Level ◽

Population Threshold

In order to run a fatigue analysis on a floating structure, it is common practice among ocean engineers to rely upon a large set of test cases, each with a unique set of environmental conditions. For a specific test site, the issue remains of how to obtain a limited set of environmental conditions for these test cases, sometimes known as bins, which can accurately recreate the conditions. When considering a floating offshore wind turbine, it is necessary to obtain a timeseries of not only the wave conditions, but also the wind conditions (and perhaps current, if possible). Thus, it is common to have greater than 5 dimensions in the time-series (e.g., significant wave height, wave period, wave direction, wind speed, wind direction, etc). The creation of bins in two dimensions is quite easily solved by creating an arbitrary grid and taking the mean of all the observations which fall in a specific cell. In higher dimensions, an N-dimensional cell is not easily visualized and so the resulting set of bins cannot easily be graphically represented. In this paper, an efficient, iterative algorithm is developed to convert N-dimensional metocean data into a set of discrete bins of arbitrary size. The algorithm works by setting a tolerance level on the number of observations that must be included in a cell in order to create a bin. If the population threshold is not met, the observations remain unbinned and another iteration is required. Generally, the population threshold can be a function of iteration number so that all observations will be binned. The algorithm can properly take into account extreme data by setting a tolerance level on the N-dimensional distance by which an observation can be included in a certain bin. A quality measure, q, is created to measure the level of representation of the original data by a set of bins, independent of the number of bins. Depending on the tolerance levels, the algorithm can be completed in seconds on a normal laptop for the available data set of 20 years with a 3-hour sampling rate. The observations and bins from a case study are shown as an example of how the bins can be created and visualized.

Download Full-text

Diluvian Clustering: A Fast, Effective Algorithm for Clustering Compositional and Other Data

Microscopy and Microanalysis ◽

10.1017/s1431927615014701 ◽

2015 ◽

Vol 21 (5) ◽

pp. 1173-1183

Author(s):

Nicholas W. M. Ritchie

Keyword(s):

Real World ◽

Clustering Algorithm ◽

Compositional Data ◽

Two Dimensions ◽

Data Sets ◽

Computationally Efficient ◽

Worst Case ◽

Data Set ◽

Large Sets ◽

Data Points

AbstractDiluvian Clustering is an unsupervised grid-based clustering algorithm well suited to interpreting large sets of noisy compositional data. The algorithm is notable for its ability to identify clusters that are either compact or diffuse and clusters that have either a large number or a small number of members. Diluvian Clustering is fundamentally different from most algorithms previously applied to cluster compositional data in that its implementation does not depend upon a metric. The algorithm reduces in two-dimensions to a case for which there is an intuitive, real-world parallel. Furthermore, the algorithm has few tunable parameters and these parameters have intuitive interpretations. By eliminating the dependence on an explicit metric, it is possible to derive reasonable clusters with disparate variances like those in real-world compositional data sets. The algorithm is computationally efficient. While the worst case scales as O(N2) most cases are closer to O(N) where N is the number of discrete data points. On a mid-range 2014 vintage computer, a typical 20,000 particle, 30 element data set can be clustered in a fraction of a second.

Download Full-text

Molecular Views of Ice-Embedded Lumbricus Terrestris Erythrocruorin Obtained By Invariant Classification

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100181002 ◽

1990 ◽

Vol 48 (1) ◽

pp. 450-451

Author(s):

Michael schatz ◽

Joachim Jäger ◽

Marin van Heel

Keyword(s):

Image Interpretation ◽

Lumbricus Terrestris ◽

Reference Image ◽

Large Set ◽

Multivariate Statistical ◽

Data Set ◽

Low Contrast ◽

Negative Stain ◽

Reference Images ◽

Statistical Resolution

Lumbricus terrestris erythrocruorin is a giant oxygen-transporting macromolecule in the blood of the common earth worm (worm "hemoglobin"). In our current study, we use specimens (kindly provided by Drs W.E. Royer and W.A. Hendrickson) embedded in vitreous ice (1) to avoid artefacts encountered with the negative stain preparation technigue used in previous studies (2-4).Although the molecular structure is well preserved in vitreous ice, the low contrast and high noise level in the micrographs represent a serious problem in image interpretation. Moreover, the molecules can exhibit many different orientations relative to the object plane of the microscope in this type of preparation. Existing techniques of analysis requiring alignment of the molecular views relative to one or more reference images often thus yield unsatisfactory results.We use a new method in which first rotation-, translation- and mirror invariant functions (5) are derived from the large set of input images, which functions are subsequently classified automatically using multivariate statistical techniques (6). The different molecular views in the data set can therewith be found unbiasedly (5). Within each class, all images are aligned relative to that member of the class which contributes least to the classes′ internal variance (6). This reference image is thus the most typical member of the class. Finally the aligned images from each class are averaged resulting in molecular views with enhanced statistical resolution.

Download Full-text

An approach for document retrieval using cluster-based inverted indexing

Journal of Information Science ◽

10.1177/01655515211018401 ◽

2021 ◽

pp. 016555152110184

Author(s):

Gunjan Chandwani ◽

Anil Ahlawat ◽

Gaurav Dubey

Keyword(s):

High Performance ◽

Clustering Algorithm ◽

Pearson Correlation ◽

Relevant Information ◽

Document Retrieval ◽

Bhattacharyya Distance ◽

Data Set ◽

Query Matching ◽

Inverted Indexing ◽

Query Optimisation

Document retrieval plays an important role in knowledge management as it facilitates us to discover the relevant information from the existing data. This article proposes a cluster-based inverted indexing algorithm for document retrieval. First, the pre-processing is done to remove the unnecessary and redundant words from the documents. Then, the indexing of documents is done by the cluster-based inverted indexing algorithm, which is developed by integrating the piecewise fuzzy C-means (piFCM) clustering algorithm and inverted indexing. After providing the index to the documents, the query matching is performed for the user queries using the Bhattacharyya distance. Finally, the query optimisation is done by the Pearson correlation coefficient, and the relevant documents are retrieved. The performance of the proposed algorithm is analysed by the WebKB data set and Twenty Newsgroups data set. The analysis exposes that the proposed algorithm offers high performance with a precision of 1, recall of 0.70 and F-measure of 0.8235. The proposed document retrieval system retrieves the most relevant documents and speeds up the storing and retrieval of information.

Download Full-text

US2RO: Union of Superpoints to Recognize Objects

International Journal of Semantic Computing ◽

10.1142/s1793351x21400146 ◽

2021 ◽

Vol 15 (04) ◽

pp. 513-537

Author(s):

Marcel Tiator ◽

Anna Maria Kerkmann ◽

Christian Geiger ◽

Paul Grimm

Keyword(s):

Imitation Learning ◽

Network Architectures ◽

Application Development ◽

Repetitive Work ◽

Data Set ◽

Partition System ◽

Performance Metric ◽

Growing Environment ◽

The Creation ◽

Near Future

The creation of interactive virtual reality (VR) applications from 3D scanned content usually includes a lot of manual and repetitive work. Our research aim is to develop agents that recognize objects to enhance the creation of interactive VR applications. We trained partition agents in our superpoint growing environment that we extended with an expert function. This expert function solves the sparse reward signal problem of the previous approaches and enables to use a variant of imitation learning and deep reinforcement learning with dense feedback. Additionally, the function allows to calculate a performance metric for the degree of imitation for different partitions. Furthermore, we introduce an environment to optimize the superpoint generation. We trained our agents with 1182 scenes of the ScanNet data set. More specifically, we trained different neural network architectures with 1170 scenes and tested their performance with 12 scenes. Our intermediate results are promising such that our partition system might be able to assist the VR application development from 3D scanned content in near future.

Download Full-text

Empirical Evaluation of Genetic Clustering Methods Using Multilocus Genotypes From 20 Chicken Breeds

Genetics ◽

10.1093/genetics/159.2.699 ◽

2001 ◽

Vol 159 (2) ◽

pp. 699-713

Author(s):

Noah A Rosenberg ◽

Terry Burke ◽

Kari Elo ◽

Marcus W Feldman ◽

Paul J Freidlin ◽

...

Keyword(s):

Cluster Analysis ◽

Population Structure ◽

Clustering Algorithm ◽

Empirical Evaluation ◽

Unknown Origin ◽

Clustering Methods ◽

Genetic Cluster ◽

Data Set ◽

Multilocus Genotypes ◽

Chicken Breeds

Abstract We tested the utility of genetic cluster analysis in ascertaining population structure of a large data set for which population structure was previously known. Each of 600 individuals representing 20 distinct chicken breeds was genotyped for 27 microsatellite loci, and individual multilocus genotypes were used to infer genetic clusters. Individuals from each breed were inferred to belong mostly to the same cluster. The clustering success rate, measuring the fraction of individuals that were properly inferred to belong to their correct breeds, was consistently ~98%. When markers of highest expected heterozygosity were used, genotypes that included at least 8–10 highly variable markers from among the 27 markers genotyped also achieved >95% clustering success. When 12–15 highly variable markers and only 15–20 of the 30 individuals per breed were used, clustering success was at least 90%. We suggest that in species for which population structure is of interest, databases of multilocus genotypes at highly variable markers should be compiled. These genotypes could then be used as training samples for genetic cluster analysis and to facilitate assignments of individuals of unknown origin to populations. The clustering algorithm has potential applications in defining the within-species genetic units that are useful in problems of conservation.

Download Full-text

Application of fiber-optic measuring technology and phase-chronometric method for control and monitoring of technical condition of aircraft structures

Izmeritel`naya Tekhnika ◽

10.32446/0368-1025it.2021-2-49-56 ◽

2021 ◽

pp. 49-56

Author(s):

Stanislav S. Khabarov ◽

Alexander S. Komshin

Keyword(s):

Fatigue Life ◽

Fiber Optic ◽

Technical Condition ◽

Point Of View ◽

Flight Performance ◽

Diagnostic Systems ◽

Safe Operation ◽

Excess Reserves ◽

The Creation ◽

Original Approach

Problems of ensuring the safe operation of an aircraft from the point of view of the fatigue life of its structure are considered. The relevance of the creation and implementation of diagnostic systems for monitoring the technical condition of structures of complex technical objects is shown on the example of a helicopter. An original approach to the creation and implementation of complex systems for diagnostics and monitoring of the technical condition of complex technical objects is presented, combining fiber-optic measuring technology and phase-chronometric method. It is shown that the use of monitoring and diagnostic systems ensures the transition to operation based on the actual technical condition. The proposed approach makes it possible to increase the time between overhaul intervals and reduce excess reserves in terms of the reliability factors of structures, which increases the flight performance of aircraft.

Download Full-text

Evaluating the Coupledness of the Aerodynamics and Hydrodynamics on the Estimation of Fatigue Damage Equivalent Load for a Floating Offshore Wind Platform

ASME 2018 1st International Offshore Wind Technical Conference ◽

10.1115/iowtc2018-1045 ◽

2018 ◽

Author(s):

Samuel Kanner ◽

Bingbin Yu

Keyword(s):

Fatigue Life ◽

Fatigue Damage ◽

Environmental Conditions ◽

Offshore Wind ◽

Coupled Analysis ◽

Offshore Wind Turbine ◽

Wave Conditions ◽

Equivalent Load ◽

The Waves

In this research, the estimation of the fatigue life of a semi-submersible floating offshore wind platform is considered. In order to accurately estimate the fatigue life of a platform, coupled aerodynamic-hydrodynamic simulations are performed to obtain dynamic stress values. The simulations are performed at a multitude of representative environmental states, or “bins,” which can mimic the conditions the structure may endure at a given site, per ABS Floating Offshore Wind Turbine Installation guidelines. To accurately represent the variety of wind and wave conditions, the number of environmental states can be of the order of 103. Unlike other offshore structures, both the wind and wave conditions must be accounted for, which are generally considered independent parameters, drastically increasing the number of states. The stress timeseries from these simulations can be used to estimate the damage at a particular location on the structure by using commonly accepted methods, such as the rainflow counting algorithm. The damage due to either the winds or the waves can be estimated by using a frequency decomposition of the stress timeseries. In this paper, a similar decoupled approach is used to attempt to recover the damages induced from these coupled simulations. Although it is well-known that a coupled, aero-hydro analysis is necessary in order to accurately simulate the nonlinear rigid-body motions of the platform, it is less clear if the same statement could be made about the fatigue properties of the platform. In one approach, the fatigue damage equivalent load is calculated independently from both scatter diagrams of the waves and a rose diagram of the wind. De-coupled simulations are performed to estimate the response at an all-encompassing range of environmental conditions. A database of responses based on these environmental conditions is constructed. The likelihood of occurrence at a case-study site is used to compare the damage equivalent from the coupled simulations. The OC5 platform in the Borssele wind farm zone is used as a case-study and the damage equivalent load from the de-coupled methods are compared to those from the coupled analysis in order to assess these methodologies.

Download Full-text

Separating damage from environmental effects affecting civil structures for near real-time damage detection

Structural Health Monitoring ◽

10.1177/1475921717722060 ◽

2017 ◽

Vol 17 (4) ◽

pp. 850-868 ◽

Cited By ~ 7

Author(s):

William Soo Lon Wah ◽

Yung-Tsang Chen ◽

Gethin Wyn Roberts ◽

Ahmed Elamin

Keyword(s):

Damage Detection ◽

Real Time ◽

Environmental Effects ◽

Environmental Conditions ◽

Principal Component ◽

Detection Methods ◽

Real Time Monitoring ◽

Civil Structures ◽

Data Set ◽

Wide Range

Analyzing changes in vibration properties (e.g. natural frequencies) of structures as a result of damage has been heavily used by researchers for damage detection of civil structures. These changes, however, are not only caused by damage of the structural components, but they are also affected by the varying environmental conditions the structures are faced with, such as the temperature change, which limits the use of most damage detection methods presented in the literature that did not account for these effects. In this article, a damage detection method capable of distinguishing between the effects of damage and of the changing environmental conditions affecting damage sensitivity features is proposed. This method eliminates the need to form the baseline of the undamaged structure using damage sensitivity features obtained from a wide range of environmental conditions, as conventionally has been done, and utilizes features from two extreme and opposite environmental conditions as baselines. To allow near real-time monitoring, subsequent measurements are added one at a time to the baseline to create new data sets. Principal component analysis is then introduced for processing each data set so that patterns can be extracted and damage can be distinguished from environmental effects. The proposed method is tested using a two-dimensional truss structure and validated using measurements from the Z24 Bridge which was monitored for nearly a year, with damage scenarios applied to it near the end of the monitoring period. The results demonstrate the robustness of the proposed method for damage detection under changing environmental conditions. The method also works despite the nonlinear effects produced by environmental conditions on damage sensitivity features. Moreover, since each measurement is allowed to be analyzed one at a time, near real-time monitoring is possible. Damage progression can also be given from the method which makes it advantageous for damage evolution monitoring.

Download Full-text

A revision of Hofstede’s model of national culture: old evidence and new data from 56 countries

Cross Cultural & Strategic Management ◽

10.1108/ccsm-03-2017-0033 ◽

2018 ◽

Vol 25 (2) ◽

pp. 231-256 ◽

Cited By ~ 51

Author(s):

Michael Minkov

Keyword(s):

Cultural Differences ◽

National Culture ◽

Well Being ◽

Two Dimensions ◽

Internal Reliability ◽

Data Set ◽

Content Type ◽

Gender Egalitarianism ◽

Fifth Dimension ◽

National Differences

PurposeHofstede’s model of national culture has enjoyed enormous popularity but rests partly on faith. It has never been fully replicated and its predictive properties have been challenged. The purpose of this paper is to provide a test of the model’s coherence and utility.Design/methodology/approachAnalyses of secondary data, including the World Values Survey, and a new survey across 56 countries represented by nearly 53,000 probabilistically selected respondents.FindingsImproved operationalizations of individualism-collectivism (IDV-COLL) suggest it is a robust dimension of national culture. A modern IDV-COLL index supersedes Hofstede’s 50 year-old original one. Power distance (PD) seems to be a logical facet of IDV-COLL, rather than an independent dimension. Uncertainty avoidance (UA) lacks internal reliability. Approval of restrictive societal rules and laws is a facet of COLL and is not associated with national anxiety or neuroticism. UA is not a predictor of any of its presumed main correlates: importance of job security, preference for a safe job, trust, racism and xenophobia, subjective well-being, innovation, and economic freedom. The dimension of masculinity-femininity (MAS-FEM) lacks coherence. MAS and FEM job goals and broader values are correlated positively, not negatively, and are not related to the MAS-FEM index. MAS-FEM is not a predictor of any of its presumed main correlates: achievement and competition orientation, help and compassion, preference for a workplace with likeable people, work orientation, religiousness, gender egalitarianism, foreign aid. After a radical reconceptualization and a new operationalization, the so-called “fifth dimension” (CWD or long-term orientation) becomes more coherent and useful. The new version, called flexibility-monumentalism (FLX-MON), explains the cultural differences between East Asian Confucian societies at one extreme and Latin America plus Africa at the other, and is the best predictor of national differences in educational achievement.Research limitations/implicationsDifferences between subsidiaries of a multinational company, such as IBM around 1970, are not necessarily a good source of knowledge about broad cultural differences. A model of national culture must be validated across a large number of countries from all continents and its predictions should withstand various plausible controls. Much of Hofstede’s model (UA, MAS-FEM) fails this test while the remaining part (IDV-COLL, PD, LTO) needs a serious revision.Practical implicationsConsultancies and business schools still teach Hofstede’s model uncritically. They need to be aware of its deficiencies.Originality/valueAs UA and MAS-FEM are apparently misleading artifacts of Hofstede’s IBM data set, a thorough revision of Hofstede’s model is proposed, reducing it to two dimensions: IDV-COLL and FLX-MON.

Download Full-text