A Comparative Review of Incremental Clustering Methods for Large Dataset

Several algorithms have developed for analyzing large incremental datasets. Incremental algorithms are relatively efficient in dynamic evolving environment to seek out small clusters in large datasets. Many algorithms have devised for limiting the search space, building, and updating arbitrary shaped clusters in large incremented datasets. Within the real time visualization of real time data, when data in motion and growing dynamically, new data points arrive that generates instant cluster labels. In this paper, the comparative review of Incremental clustering methods for large dataset has done.

2020 ◽  
Author(s):  
Panagiotis Argyrakis ◽  
Theodore Chinis ◽  
Alexandra Moshou ◽  
Nikolaos Sagias

<p>Several stations (seismological, geodetical, etc.) suffer from communications problems, such problems create data gaps in real-time data transmission, also excess humidity and temperatures further than manufacturer limits, usually make components and circuitry, of expensive instruments, failure, and results to unaffordable service or unrepairable damage.</p><p>We create a low-cost opensource device that will raise the reliability of the stations and secure the instruments from severe damage, such a device installed as prototype at UOA (University of Athens) seismological station KARY (Karistos Greece) for a year and the reliability of the station raised tremendously, since then the device upgraded to provide wireless connection and IoT GUI (mobile app). A local server was built to serve all the devices uninterrupted and provide a secured network.</p><p>The software is fully customizable and multiple inputs can provide addon sensors capability, for example, gas sensor, humidity sensor, etc., all the data are collected to a remote database for real-time visualization and archiving for further analysis.</p><p>The shell which covers the circuitry is 3D-printed with a high temperature and humidity-resistant material and it’s also fully customizable by the user. </p>


2012 ◽  
Vol 532-533 ◽  
pp. 1611-1615
Author(s):  
Shi Bin Liu ◽  
Xing Yan Liu ◽  
Yan Ping Yang ◽  
Cheng Wang

When mapping large-scale battlefield terrain, in order to make a real-time show it on the computer it is required to keep the vector data (points, lines, areas) at the high curvature and delete the vector data at the low curvature as much as possible. This article puts forward a simplified algorithm to calculate the average curvature of the space mesh points with level of detail (LOD) model technique as well as the visualization implementation of this algorithm. Then according to this algorithm, work out the curvature value of the mesh points, delete the center points of the low curvature and triangularize the cavity left so as to simplify LOD model and reduce dramatically the number of the vector data (points, lines, areas) in the field of view of the large-scale battlefield terrain. Experiments show the average curvature algorithm put forward in this article can better solve the contradiction between the big vector data and the limited real-time processing capacity of the computer. As a result, it can meet the requirements of mapping great number of real-time data of the large-scale battlefield terrain by 3 D visualization.


2022 ◽  
pp. 1-22
Author(s):  
Salem Al-Gharbi ◽  
Abdulaziz Al-Majed ◽  
Abdulazeez Abdulraheem ◽  
Zeeshan Tariq ◽  
Mohamed Mahmoud

Abstract The age of easy oil is ending, the industry started drilling in remote unconventional conditions. To help produce safer, faster, and most effective operations, the utilization of artificial intelligence and machine learning (AI/ML) has become essential. Unfortunately, due to the harsh environments of drilling and the data-transmission setup, a significant amount of the real-time data could defect. The quality and effectiveness of AI/ML models are directly related to the quality of the input data; only if the input data are good, the AI/ML generated analytical and prediction models will be good. Improving the real-time data is therefore critical to the drilling industry. The objective of this paper is to propose an automated approach using eight statistical data-quality improvement algorithms on real-time drilling data. These techniques are Kalman filtering, moving average, kernel regression, median filter, exponential smoothing, lowess, wavelet filtering, and polynomial. A dataset of +150,000 rows is fed into the algorithms, and their customizable parameters are calibrated to achieve the best improvement result. An evaluation methodology is developed based on real-time drilling data characteristics to analyze the strengths and weaknesses of each algorithm were highlighted. Based on the evaluation criteria, the best results were achieved using the exponential smoothing, median filter, and moving average. Exponential smoothing and median filter techniques improved the quality of data by removing most of the invalid data points, the moving average removed more invalid data-points but trimmed the data range.


Author(s):  
G. Berthe ◽  
V. Rouchon ◽  
M. Ben Gaid ◽  
A. El Feki

Abstract. The reduction of atmospheric greenhouse gas emissions is a major challenge. In this context, each natural or industrial release such as methane (CH4), carbon dioxide (CO2) has to be monitored, localized and quantified. IFP Energies nouvelles (IFPEN) is developing a mobile measurement system called Flair car whose purpose is the detection of different abnormal gas emissions. Flair car system incorporates various gas sensors, including a weather station and GPS (Global Positioning System) module, mounted on a plugin hybrid electric vehicle. This enables the real-time monitoring and the recording of geo-time-stamped gas concentration measurements. Flair map corresponds to the on board real-time visualization software.Flair map development required two important challenges: a quick and agile software modification capability together with a real-time display of measurements on maps. In order to meet these two challenges, we adopted a software rapid-prototyping approach based on the xDash tool. In this paper, our proposed real-time data visualisation approach is first introduced. Then, the rapid-prototyping development methodology which resulted in the Flair map software is described. Finally, two main operational usages of Flair map are illustrated. The first involves real-time visualization aboard the car of the maps representing data acquisition from gas concentration sensors. The second shows the a-posteriori analysis of measurement campaigns for the purpose of methane anomalies study.


Author(s):  
Murat Tasci

Although the concept of the natural rate of employment, NAIRU, or “U star” is used to measure the amount of slack in the labor market, it is an unobservable quantity that must be estimated using data currently available. This Commentary investigates the degree to which our estimates of U star at various points in the current business cycle have changed as real-time data have been revised and as more data points have accumulated. I find that the availability of additional data has contributed to a significant change in our estimates of U star at earlier points in the business cycle, a result that suggests we might have been underestimating the level of labor market slack during some of the recent recovery period. In retrospect, our updated estimates of U star suggest labor markets were not as tight as we thought they were then.


Sensors ◽  
2021 ◽  
Vol 21 (20) ◽  
pp. 6750
Author(s):  
Mubashir Rehman ◽  
Raza Ali Shah ◽  
Muhammad Bilal Khan ◽  
Syed Aziz Shah ◽  
Najah Abed AbuAli ◽  
...  

The recent severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), also known as coronavirus disease (COVID)-19, has appeared as a global pandemic with a high mortality rate. The main complication of COVID-19 is rapid respirational deterioration, which may cause life-threatening pneumonia conditions. Global healthcare systems are currently facing a scarcity of resources to assist critical patients simultaneously. Indeed, non-critical patients are mostly advised to self-isolate or quarantine themselves at home. However, there are limited healthcare services available during self-isolation at home. According to research, nearly 20–30% of COVID patients require hospitalization, while almost 5–12% of patients may require intensive care due to severe health conditions. This pandemic requires global healthcare systems that are intelligent, secure, and reliable. Tremendous efforts have been made already to develop non-contact sensing technologies for the diagnosis of COVID-19. The most significant early indication of COVID-19 is rapid and abnormal breathing. In this research work, RF-based technology is used to collect real-time breathing abnormalities data. Subsequently, based on this data, a large dataset of simulated breathing abnormalities is generated using the curve fitting technique for developing a machine learning (ML) classification model. The advantages of generating simulated breathing abnormalities data are two-fold; it will help counter the daunting and time-consuming task of real-time data collection and improve the ML model accuracy. Several ML algorithms are exploited to classify eight breathing abnormalities: eupnea, bradypnea, tachypnea, Biot, sighing, Kussmaul, Cheyne–Stokes, and central sleep apnea (CSA). The performance of ML algorithms is evaluated based on accuracy, prediction speed, and training time for real-time breathing data and simulated breathing data. The results show that the proposed platform for real-time data classifies breathing patterns with a maximum accuracy of 97.5%, whereas by introducing simulated breathing data, the accuracy increases up to 99.3%. This work has a notable medical impact, as the introduced method mitigates the challenge of data collection to build a realistic model of a large dataset during the pandemic.


2009 ◽  
Vol 14 (2) ◽  
pp. 109-119 ◽  
Author(s):  
Ulrich W. Ebner-Priemer ◽  
Timothy J. Trull

Convergent experimental data, autobiographical studies, and investigations on daily life have all demonstrated that gathering information retrospectively is a highly dubious methodology. Retrospection is subject to multiple systematic distortions (i.e., affective valence effect, mood congruent memory effect, duration neglect; peak end rule) as it is based on (often biased) storage and recollection of memories of the original experience or the behavior that are of interest. The method of choice to circumvent these biases is the use of electronic diaries to collect self-reported symptoms, behaviors, or physiological processes in real time. Different terms have been used for this kind of methodology: ambulatory assessment, ecological momentary assessment, experience sampling method, and real-time data capture. Even though the terms differ, they have in common the use of computer-assisted methodology to assess self-reported symptoms, behaviors, or physiological processes, while the participant undergoes normal daily activities. In this review we discuss the main features and advantages of ambulatory assessment regarding clinical psychology and psychiatry: (a) the use of realtime assessment to circumvent biased recollection, (b) assessment in real life to enhance generalizability, (c) repeated assessment to investigate within person processes, (d) multimodal assessment, including psychological, physiological and behavioral data, (e) the opportunity to assess and investigate context-specific relationships, and (f) the possibility of giving feedback in real time. Using prototypic examples from the literature of clinical psychology and psychiatry, we demonstrate that ambulatory assessment can answer specific research questions better than laboratory or questionnaire studies.


Diabetes ◽  
2020 ◽  
Vol 69 (Supplement 1) ◽  
pp. 399-P
Author(s):  
ANN MARIE HASSE ◽  
RIFKA SCHULMAN ◽  
TORI CALDER

Author(s):  
LAKSHMI PRANEETHA

Now-a-days data streams or information streams are gigantic and quick changing. The usage of information streams can fluctuate from basic logical, scientific applications to vital business and money related ones. The useful information is abstracted from the stream and represented in the form of micro-clusters in the online phase. In offline phase micro-clusters are merged to form the macro clusters. DBSTREAM technique captures the density between micro-clusters by means of a shared density graph in the online phase. The density data in this graph is then used in reclustering for improving the formation of clusters but DBSTREAM takes more time in handling the corrupted data points In this paper an early pruning algorithm is used before pre-processing of information and a bloom filter is used for recognizing the corrupted information. Our experiments on real time datasets shows that using this approach improves the efficiency of macro-clusters by 90% and increases the generation of more number of micro-clusters within in a short time.


Sign in / Sign up

Export Citation Format

Share Document