The potential of clustering methods to define intersection test scenarios: Assessing real-life performance of AEB

2018 ◽  
Vol 113 ◽  
pp. 1-11 ◽  
Author(s):  
Ulrich Sander ◽  
Nils Lubbe
2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Gregoire Preud’homme ◽  
Kevin Duarte ◽  
Kevin Dalleau ◽  
Claire Lacomblez ◽  
Emmanuel Bresso ◽  
...  

AbstractThe choice of the most appropriate unsupervised machine-learning method for “heterogeneous” or “mixed” data, i.e. with both continuous and categorical variables, can be challenging. Our aim was to examine the performance of various clustering strategies for mixed data using both simulated and real-life data. We conducted a benchmark analysis of “ready-to-use” tools in R comparing 4 model-based (Kamila algorithm, Latent Class Analysis, Latent Class Model [LCM] and Clustering by Mixture Modeling) and 5 distance/dissimilarity-based (Gower distance or Unsupervised Extra Trees dissimilarity followed by hierarchical clustering or Partitioning Around Medoids, K-prototypes) clustering methods. Clustering performances were assessed by Adjusted Rand Index (ARI) on 1000 generated virtual populations consisting of mixed variables using 7 scenarios with varying population sizes, number of clusters, number of continuous and categorical variables, proportions of relevant (non-noisy) variables and degree of variable relevance (low, mild, high). Clustering methods were then applied on the EPHESUS randomized clinical trial data (a heart failure trial evaluating the effect of eplerenone) allowing to illustrate the differences between different clustering techniques. The simulations revealed the dominance of K-prototypes, Kamila and LCM models over all other methods. Overall, methods using dissimilarity matrices in classical algorithms such as Partitioning Around Medoids and Hierarchical Clustering had a lower ARI compared to model-based methods in all scenarios. When applying clustering methods to a real-life clinical dataset, LCM showed promising results with regard to differences in (1) clinical profiles across clusters, (2) prognostic performance (highest C-index) and (3) identification of patient subgroups with substantial treatment benefit. The present findings suggest key differences in clustering performance between the tested algorithms (limited to tools readily available in R). In most of the tested scenarios, model-based methods (in particular the Kamila and LCM packages) and K-prototypes typically performed best in the setting of heterogeneous data.


2017 ◽  
Vol 2017 ◽  
pp. 1-12 ◽  
Author(s):  
Cheng Zhang

This study focuses on the aircraft recovery problem (ARP). In real-life operations, disruptions always cause schedule failures and make airlines suffer from great loss. Therefore, the main objective of the aircraft recovery problem is to minimize the total recovery cost and solve the problem within reasonable runtimes. An aircraft recovery model (ARM) is proposed herein to formulate the ARP and use feasible line of flights as the basic variables in the model. We define the feasible line of flights (LOFs) as a sequence of flights flown by an aircraft within one day. The number of LOFs exponentially grows with the number of flights. Hence, a two-stage heuristic is proposed to reduce the problem scale. The algorithm integrates a heuristic scoring procedure with an aggregated aircraft recovery model (AARM) to preselect LOFs. The approach is tested on five real-life test scenarios. The computational results show that the proposed model provides a good formulation of the problem and can be solved within reasonable runtimes with the proposed methodology. The two-stage heuristic significantly reduces the number of LOFs after each stage and finally reduces the number of variables and constraints in the aircraft recovery model.


2021 ◽  
Vol 268 ◽  
pp. 01043
Author(s):  
Xi Hu ◽  
Yu Liu ◽  
Xiaopan An

This paper presents a technical method to derive engine test cycle by establishing a vehicle-to-engine cycle transform model ; Firstly, input, process and transform vehicle cycle and test vehicle data to get corresponding engine condition; Then, apply model built-in gear use strategy to select gear; Finally, under the selected gear, transform vehicle cycle into engine cycle termed by normalized speed and load. In addition, Comparison between model output cycle and WHTC cycle demonstrates that this transform method features consistency with present emission test standard, adaptability to various engine technologies and representative of real-life test scenarios.


2005 ◽  
Vol 2 (2) ◽  
Author(s):  
Matej Francetič ◽  
Mateja Nagode ◽  
Bojan Nastav

Clustering methods are among the most widely used methods in multivariate analysis. Two main groups of clustering methods can be distinguished: hierarchical and non-hierarchical. Due to the nature of the problem examined, this paper focuses on hierarchical methods such as the nearest neighbour, the furthest neighbour, Ward's method, between-groups linkage, within-groups linkage, centroid and median clustering. The goal is to assess the performance of different clustering methods when using concave sets of data, and also to figure out in which types of different data structures can these methods reveal and correctly assign group membership. The simulations were run in a two- and threedimensional space. Using different standard deviations of points around the skeleton further modified each of the two original shapes. In this manner various shapes of sets with different inter-cluster distances were generated. Generating the data sets provides the essential knowledge of cluster membership for comparing the clustering methods' performances. Conclusions are important and interesting since real life data seldom follow the simple convex-shaped structure, but need further work, such as the bootstrap application, the inclusion of the dendrogram-based analysis or other data structures. Therefore this paper can serve as a basis for further study of hierarchical clustering performance with concave sets.


2020 ◽  
Vol 10 (12) ◽  
pp. 4355 ◽  
Author(s):  
Raquel Redondo ◽  
Álvaro Herrero ◽  
Emilio Corchado ◽  
Javier Sedano

In recent years, the digital transformation has been advancing in industrial companies, supported by the Key Enabling Technologies (Big Data, IoT, etc.) of Industry 4.0. As a consequence, companies have large volumes of data and information that must be analyzed to give them competitive advantages. This is of the utmost importance in fields such as Failure Detection (FD) and Predictive Maintenance (PdM). Finding patterns in such data is not easy, but cutting-edge technologies, such as Machine Learning (ML), can make great contributions. As a solution, this study extends Hybrid Unsupervised Exploratory Plots (HUEPs), as a visualization technique that combines Exploratory Projection Pursuit (EPP) and Clustering methods. An extended formulation of HUEPs is proposed, adding for the first time the following EPP methods: Classical Multidimensional Scaling, Sammon Mapping and Factor Analysis. Extended HUEPs are validated in a case study associated with a multinational company in the automotive industry sector. Two real-life datasets containing data gathered from a Waterjet Cutting tool are visualized in an intuitive and informative way. The obtained results show that HUEPs is a technique that supports the continuous monitoring of machines in order to anticipate failures. This contribution to visual data analytics can help companies in decision-making, regarding FD and PdM projects.


Sensors ◽  
2021 ◽  
Vol 21 (23) ◽  
pp. 7830
Author(s):  
Paweł Stączek ◽  
Jakub Pizoń ◽  
Wojciech Danilczuk ◽  
Arkadiusz Gola

The contemporary market creates a demand for continuous improvement of production, service, and management processes. Increasingly advanced IT technologies help designers to meet this demand, as they allow them to abandon classic design and design-testing methods in favor of techniques that do not require the use of real-life systems and thus significantly reduce the costs and time of implementing new solutions. This is particularly important when re-engineering production and logistics processes in existing production companies, where physical testing is often infeasible as it would require suspension of production for the testing period. In this article, we showed how the Digital Twin technology can be used to test the operating environment of an autonomous mobile robot (AMR). In particular, the concept of the Digital Twin was used to assess the correctness of the design assumptions adopted for the early phase of the implementation of an AMR vehicle in a company’s production hall. This was done by testing and improving the case of a selected intralogistics task in a potentially “problematic” part of the shop floor with narrow communication routes. Three test scenarios were analyzed. The results confirmed that the use of digital twins could accelerate the implementation of automated intralogistics systems and reduce its costs.


Author(s):  
Onur Önay

Data science and data analytics are becoming increasingly important. It is widely used in scientific and real-life applications. These methods enable us to analyze, understand, and interpret the data in every field. In this study, k-means and k-medoids clustering methods are applied to cluster the Statistical Regions of Turkey in Level 2. Clustering analyses are done for 2017 and 2018 years. The datasets consist of “Distribution of expenditure groups according to Household Budget Survey” 2017 and 2018 values, “Gini coefficient by equivalised household disposable income” 2017 and 2018 values, and some features of “Regional Purchasing Power Parities for the main groups of consumption expenditures” 2017 values. Elbow method and average silhouette method are applied for the determining the number of the clusters at the beginning. Results are given and interpreted at the conclusion.


Author(s):  
Radhwane Gherbaoui ◽  
Mohammed Ouali ◽  
Nacéra Benamrane

The ad hoc nature of the clustering methods makes simulated data paramount in assessing the performance of clustering methods. Real datasets could be used in the evaluation of clustering methods with the major drawback of missing the assessment of many test scenarios. In this paper, we propose a formal quantification of component overlap. This quantification is derived from a set of theorems which allow us to derive an automatic method for artificial data generation. We also derive a method to estimate parameters of existing models and to evaluate the results of other approaches. Automatic estimation of the overlap rate can also be used as an unsupervised learning approach in data mining to determine the parameters of mixture models from actual observations.


2021 ◽  
Vol 3 ◽  
Author(s):  
Hinta Meijerink ◽  
Camilla Mauroy ◽  
Mia Karoline Johansen ◽  
Sindre Møgster Braaten ◽  
Christine Ursin Steen Lunde ◽  
...  

The coronavirus disease 2019 (COVID-19) response in most countries has relied on testing, isolation, contact tracing, and quarantine (TITQ), which is labor- and time-consuming. Therefore, several countries worldwide launched Bluetooth-based apps as supplementary tools. The aim of using contact tracing apps is to rapidly notify people about their possible exposure to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and thus make the process of TITQ more efficient, especially upon exposure in public places. We evaluated the Norwegian Google Apple exposure notification (GAEN)-based contact tracing app Smittestopp v2 under relevant “real-life” test scenarios. We used a total of 40 devices, representing six different brands, and compared two different exposure configurations, experimented with different time thresholds and weights of the Bluetooth attenuation levels (buckets), and calculated the true notification rates among close contacts (≤2 m and ≥15 min) and false notification of sporadic contacts. In addition, we assessed the impact of using different operating systems and locations of the phone (hand/pocket). The best configuration tested to trigger exposure notification resulted in the correct notification of 80% of the true close contacts and incorrect notification of 34% of the sporadic contacts. Among those who incorrectly received notifications, most (67%) were within 2 m but the duration of contact was <15 min and thus they were not, per se, considered as “close contacts.” Lower sensitivity was observed when using the iOS operating systems or carrying the phone in the pocket instead of in the hand. The results of this study were used to improve and evaluate the performance of the Norwegian contact-tracing app Smittestopp.


Author(s):  
Eric U.O. ◽  
Michael O.O. ◽  
Oberhiri-Orumah G. ◽  
Chike H. N.

Cluster analysis is an unsupervised learning method that classifies data points, usually multidimensional into groups (called clusters) such that members of one cluster are more similar (in some sense) to each other than those in other clusters. In this paper, we propose a new k-means clustering method that uses Minkowski’s distance as its metric in a normed vector space which is the generalization of both the Euclidean distance and the Manhattan distance. The k-means clustering methods discussed in this paper are Forgy’s method, Lloyd’s method, MacQueen’s method, Hartigan and Wong’s method, Likas’ method and Faber’s method which uses the usual Euclidean distance. It was observed that the new k-means clustering method performed favourably in comparison with the existing methods in terms of minimization of the total intra-cluster variance using simulated data and real-life data sets.


Sign in / Sign up

Export Citation Format

Share Document