Efficient Algorithm for Discretization of Metocean Data Into Clusters of Arbitrary Size and Dimension
In order to run a fatigue analysis on a floating structure, it is common practice among ocean engineers to rely upon a large set of test cases, each with a unique set of environmental conditions. For a specific test site, the issue remains of how to obtain a limited set of environmental conditions for these test cases, sometimes known as bins, which can accurately recreate the conditions. When considering a floating offshore wind turbine, it is necessary to obtain a timeseries of not only the wave conditions, but also the wind conditions (and perhaps current, if possible). Thus, it is common to have greater than 5 dimensions in the time-series (e.g., significant wave height, wave period, wave direction, wind speed, wind direction, etc). The creation of bins in two dimensions is quite easily solved by creating an arbitrary grid and taking the mean of all the observations which fall in a specific cell. In higher dimensions, an N-dimensional cell is not easily visualized and so the resulting set of bins cannot easily be graphically represented. In this paper, an efficient, iterative algorithm is developed to convert N-dimensional metocean data into a set of discrete bins of arbitrary size. The algorithm works by setting a tolerance level on the number of observations that must be included in a cell in order to create a bin. If the population threshold is not met, the observations remain unbinned and another iteration is required. Generally, the population threshold can be a function of iteration number so that all observations will be binned. The algorithm can properly take into account extreme data by setting a tolerance level on the N-dimensional distance by which an observation can be included in a certain bin. A quality measure, q, is created to measure the level of representation of the original data by a set of bins, independent of the number of bins. Depending on the tolerance levels, the algorithm can be completed in seconds on a normal laptop for the available data set of 20 years with a 3-hour sampling rate. The observations and bins from a case study are shown as an example of how the bins can be created and visualized.