data clusters
Recently Published Documents


TOTAL DOCUMENTS

141
(FIVE YEARS 63)

H-INDEX

11
(FIVE YEARS 2)

Sensors ◽  
2021 ◽  
Vol 21 (23) ◽  
pp. 8047
Author(s):  
Mina Bagherzade Ghazvini ◽  
Miquel Sànchez-Marrè ◽  
Edgar Bahilo ◽  
Cecilio Angulo

Operational modes of a process are described by a number of relevant features that are indicative of the state of the process. Hundreds of sensors continuously collect data in industrial systems, which shows how the relationship between different variables changes over time and identifies different modes of operation. Gas turbines’ operational modes are usually defined regarding their expected energy production, and most research works either are focused a priori on obtaining these modes solely based on one variable, the active load, or assume a fixed number of states and build up predictive models to classify new situations as belonging to the predefined operational modes. However, in this work, we take into account all available parameters based on sensors’ data because other factors can influence the system status, leading to the identification of a priori unknown operational modes. Furthermore, for gas turbine management, a key issue is to detect these modes using a real-time monitoring system. Our approach is based on using unsupervised machine learning techniques, specifically an ensemble of clusters to discover consistent clusters, which group data into similar groups, and to generate in an automatic way their description. This description, upon interpretation by experts, becomes identified and characterized as operational modes of an industrial process without any kind of a priori bias of what should be the operational modes obtained. Our proposed methodology can discover and identify unknown operational modes through data-driven models. The methodology was tested in our case study with Siemens gas turbine data. From available sensors’ data, clusters descriptions were obtained in an automatic way from aggregated clusters. They improved the quality of partitions tuning one consistency parameter and excluding outlier clusters by defining filtering thresholds. Finally, operational modes and/or sub-operational modes were identified with the interpretation of the clusters description by process experts, who evaluated the results very positively.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Claus Metzner ◽  
Achim Schilling ◽  
Maximilian Traxdorf ◽  
Holger Schulze ◽  
Patrick Krauss

AbstractIn clinical practice, human sleep is classified into stages, each associated with different levels of muscular activity and marked by characteristic patterns in the EEG signals. It is however unclear whether this subdivision into discrete stages with sharply defined boundaries is truly reflecting the dynamics of human sleep. To address this question, we consider one-channel EEG signals as heterogeneous random walks: stochastic processes controlled by hyper-parameters that are themselves time-dependent. We first demonstrate the heterogeneity of the random process by showing that each sleep stage has a characteristic distribution and temporal correlation function of the raw EEG signals. Next, we perform a super-statistical analysis by computing hyper-parameters, such as the standard deviation, kurtosis, and skewness of the raw signal distributions, within subsequent 30-second epochs. It turns out that also the hyper-parameters have characteristic, sleep-stage-dependent distributions, which can be exploited for a simple Bayesian sleep stage detection. Moreover, we find that the hyper-parameters are not piece-wise constant, as the traditional hypnograms would suggest, but show rising or falling trends within and across sleep stages, pointing to an underlying continuous rather than sub-divided process that controls human sleep. Based on the hyper-parameters, we finally perform a pairwise similarity analysis between the different sleep stages, using a quantitative measure for the separability of data clusters in multi-dimensional spaces.


Electronics ◽  
2021 ◽  
Vol 10 (23) ◽  
pp. 2950
Author(s):  
Marián Trnka ◽  
Sakhia Darjaa ◽  
Marian Ritomský ◽  
Róbert Sabo ◽  
Milan Rusko ◽  
...  

A frequently used procedure to examine the relationship between categorical and dimensional descriptions of emotions is to ask subjects to place verbal expressions representing emotions in a continuous multidimensional emotional space. This work chooses a different approach. It aims at creating a system predicting the values of Activation and Valence (AV) directly from the sound of emotional speech utterances without the use of its semantic content or any other additional information. The system uses X-vectors to represent sound characteristics of the utterance and Support Vector Regressor for the estimation the AV values. The system is trained on a pool of three publicly available databases with dimensional annotation of emotions. The quality of regression is evaluated on the test sets of the same databases. Mapping of categorical emotions to the dimensional space is tested on another pool of eight categorically annotated databases. The aim of the work was to test whether in each unseen database the predicted values of Valence and Activation will place emotion-tagged utterances in the AV space in accordance with expectations based on Russell’s circumplex model of affective space. Due to the great variability of speech data, clusters of emotions create overlapping clouds. Their average location can be represented by centroids. A hypothesis on the position of these centroids is formulated and evaluated. The system’s ability to separate the emotions is evaluated by measuring the distance of the centroids. It can be concluded that the system works as expected and the positions of the clusters follow the hypothesized rules. Although the variance in individual measurements is still very high and the overlap of emotion clusters is large, it can be stated that the AV coordinates predicted by the system lead to an observable separation of the emotions in accordance with the hypothesis. Knowledge from training databases can therefore be used to predict AV coordinates of unseen data of various origins. This could be used to detect high levels of stress or depression. With the appearance of more dimensionally annotated training data, the systems predicting emotional dimensions from speech sound will become more robust and usable in practical applications in call-centers, avatars, robots, information-providing systems, security applications, and the like.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Ji Feng ◽  
Bokai Zhang ◽  
Ruisheng Ran ◽  
Wanli Zhang ◽  
Degang Yang

Traditional clustering methods often cannot avoid the problem of selecting neighborhood parameters and the number of clusters, and the optimal selection of these parameters varies among different shapes of data, which requires prior knowledge. To address the above parameter selection problem, we propose an effective clustering algorithm based on adaptive neighborhood, which can obtain satisfactory clustering results without setting the neighborhood parameters and the number of clusters. The core idea of the algorithm is to first iterate adaptively to a logarithmic stable state and obtain neighborhood information according to the distribution characteristics of the dataset, and then mark and peel the boundary points according to this neighborhood information, and finally cluster the data clusters with the core points as the centers. We have conducted extensive comparative experiments on datasets of different sizes and different distributions and achieved satisfactory experimental results.


2021 ◽  
Vol 9 (2) ◽  
pp. 19
Author(s):  
Iwan Ady Prabowo ◽  
Yohan Alief Rizaldy ◽  
Sri Siswanti

Strict business competition in the field of mountain equipment providers and selling the same product makes the mapping of onsight agents needed to determine the priority of agents prioritized. Fuzzy C-means is one of the data grouping techniques in which the existence of each data point in a cluster is determined by the level of membership. The purpose of this study is to design and make applications for grouping agents. The research method used is direct interview to obtain information in the form of ordered item data. The design model uses the System Development Life Cycle (SDLC). The system design method used is the Unified Modeling Language (UML). Agent mapping system with web-based fuzzy c-means clustering uses the PHP and MySQL programming languages as the database. The results of this study are in the form of three data clusters that can be used to support decisions for priority and from 30 data agents, the first cluster consists of 15 agents, the second cluster consists of 1 agent, and the third cluster consists of 14 agents


2021 ◽  
Vol 11 (18) ◽  
pp. 8416
Author(s):  
Changki Lee ◽  
Uk Jung

Measuring the dissimilarity between two observations is the basis of many data mining and machine learning algorithms, and its effectiveness has a significant impact on learning outcomes. The dissimilarity or distance computation has been a manageable problem for continuous data because many numerical operations can be successfully applied. However, unlike continuous data, defining a dissimilarity between pairs of observations with categorical variables is not straightforward. This study proposes a new method to measure the dissimilarity between two categorical observations, called a context-based geodesic dissimilarity measure, for the categorical data clustering problem. The proposed method considers the relationships between categorical variables and discovers the implicit topological structures in categorical data. In other words, it can effectively reflect the nonlinear patterns of arbitrarily shaped categorical data clusters. Our experimental results confirm that the proposed measure that considers both nonlinear data patterns and relationships among the categorical variables yields better clustering performance than other distance measures.


Author(s):  
Md. Zakir Hossain ◽  
Md. Jakirul Islam ◽  
Md. Waliur Rahman Miah ◽  
Jahid Hasan Rony ◽  
Momotaz Begum

<p>The amount of data has been increasing exponentially in every sector such as banking securities, healthcare, education, manufacturing, consumer-trade, transportation, and energy. Most of these data are noise, different in shapes, and outliers. In such cases, it is challenging to find the desired data clusters using conventional clustering algorithms. DBSCAN is a popular clustering algorithm which is widely used for noisy, arbitrary shape, and outlier data. However, its performance highly depends on the proper selection of cluster radius <em>(Eps)</em> and the minimum number of points <em>(MinPts)</em> that are required for forming clusters for the given dataset. In the case of real-world clustering problems, it is a difficult task to select the exact value of Eps and <em>(MinPts)</em> to perform the clustering on unknown datasets. To address these, this paper proposes a dynamic DBSCAN algorithm that calculates the suitable value for <em>(Eps)</em> and <em>(MinPts)</em> dynamically by which the clustering quality of the given problem will be increased. This paper evaluates the performance of the dynamic DBSCAN algorithm over seven challenging datasets. The experimental results confirm the effectiveness of the dynamic DBSCAN algorithm over the well-known clustering algorithms.</p>


2021 ◽  
Vol 25 (110) ◽  
pp. 127-137
Author(s):  
Sonia Tigua Moreira ◽  
Edison Cruz Navarrete ◽  
Geovanny Cordova Perez

The world of finance is immersed in multiple controversies, laden with contradictions and uncertainties typical of a social ecosystem, generating dynamic changes that lead to significant transformations, where the thematic discussion of Big Data becomes crucial for real-time logical decision-making. In this field of knowledge is located this article, which reports as a general objective to explore the strengths, weaknesses and future trends of Big Data in the financial sector, using as a methodology for exploration a scientific approach with the bibliographic tools scopus and scielo, using as a search equation the Big Data, delimited to the financial sector. The findings showed the growing importance of gaining knowledge from the huge amount of financial data generated daily globally, developing predictive capacity towards creating scenarios inclined to find solutions and make timely decisions. Keywords: Big Data, financial sector, decision-making. References [1]D. Reinsel, J. Gantz y J. Rydning, «Data Age 2025: The Evolution of Data to Life-Critical,» IDC White Pape, 2017. [2]R. Barranco Fragoso, «Que es big data IBM Developer works,» 18 Junio 2012. [Online]. Available: https://developer.ibm.com/es/articles/que-es-big-data/. [3]IBM, «IBM What is big data? - Bringing big data to the enterprise,» 2014. [Online]. Available: http://www.ibm.com/big-data/us/en/. [4]IDC, «Resumen Ejecutivo -Big Data: Un mercado emergente.,» Junio 2012. [Online]. Available: https://www.diarioabierto.es/wp-content/uploads/2012/06/Resumen-Ejecutivo-IDC-Big-Data.pdf. [5]Factor humano Formación, «Factor humano formación escuela internacional de postgrado.,» 2014. [Online]. Available: http//factorhumanoformación.com/big-data-ii/. [6]J. Luna, «Las tecnologías Big Data,» 23 Mayo 2018. [Online]. Available: https://www.teldat.com/blog/es/procesado-de-big-data-base-de-datos-de-big-data-clusters-nosql-mapreduce/#:~:text=Tecnolog%C3%ADas%20de%20procesamiento%20Big%20Data&text=De%20este%20modo%20es%20posible,las%20necesidades%20de%20procesado%20disminuyan. [7]T.A.S Foundation, "Apache cassandra 2015", The apache cassandra project, 2015. [8]E. Dede, B. Sendir, P. Kuzlu, J. Hartog y M. Govindaraju, «"An Evaluation of Cassandra for Hadoop",» de 2013 IEEE Sixth International Conference on Cloud Computing, Santa Clara, CA, USA, 2013. [9]The Apache Software Foundation, «"Apache HBase",» 04 Agosto 2017. [Online]. Available: http://hbase.apache.org/. [10]G. Deka, «"A Survey of Cloud Database Systems",» IT Professional, vol. 16, nº 02, pp. 50-57, 2014. [11]P. Dueñas, «Introducción al sistema financiero y bancario,» Bogotá. Politécnico Grancolombiano, 2008. [12]V. Mesén Figueroa, «Contabilización de CONTRATOS de FUTUROS, OPCIONES, FORWARDS y SWAPS,» Tec Empresarial, vol. 4, nº 1, pp. 42-48, 2010. [13] A. Castillo, «Cripto educación es lo que se necesita para entender el mundo de la Cripto-Alfabetización,» Noticias Artech Digital , 04 Junio 2018. [Online].Available: https://www.artechdigital.net/cripto-educacion-cripto-alfabetizacion/. [14]Conceptodefinicion.de, «Definicion de Cienciometría,» 16 Diciembre 2020. [Online]. Available: https://conceptodefinicion.de/cienciometria/. [15]Elsevier, «Scopus The Largest database of peer-reviewed literature» https//www.elsevier.com/solutions/scopus., 2016. [16]J. Russell, «Obtención de indicadores bibliométricos a partir de la utilización de las herramientas tradicionales de información,» de Conferencia presentada en el Congreso Internacional de información-INFO 2004, La Habana, Cuba, 2004. [17]J. Durán, Industrialized and Ready for Digital Transformation?, Barcelona: IESE Business School, 2015. [18]P. Orellana, «Omnicanalidad,» 06 Julio 2020. [Online]. Available: https://economipedia.com/definiciones/omnicanalidad.html. [19]G. Electrics, «Innovation Barometer,» 2018. [20]D. Chicoma y F. Casafranca, Interviewees, Entrevista a Daniel Chicoma y Fernando Casafranca, docentes del PADE Internacional en Gerencia de Tecnologías de la Información en ESAN. [Entrevista]. 2018. [21]L.R. La república, «La importancia del mercadeo en la actualidad,» 21 Junio 2013. [Online]. Available: https://www.larepublica.co/opinion/analistas/la-importancia-del-mercadeo-en-la-actualidad-2041232#:~:text=El%20mercadeo%20es%20cada%20d%C3%ADa,en%20los%20mercados%20(clientes). [22]UNED, «Acumulación de datos y Big data: Las preguntas correctas,» 10 Noviembre 2017. [Online]. Available: https://www.masterbigdataonline.com/index.php/en-el-blog/150-el-big-data-y-las-preguntas-correctas. [23]J. García, Banca aburrida: el negocio bancario tras la crisis económica, Fundacion Funcas - economía y sociedad, 2015, pp. 101 - 150. [24]G. Cutipa, «Las 5 principales ventajas y desventajas de bases de datos relacionales y no relacionales: NoSQL vs SQL,» 20 Abril 2020. [Online]. Available: https://guidocutipa.blog.bo/principales-ventajas-desventajas-bases-de-datos-relacionales-no-relacionales-nosql-vs-sql/. [25]R. Martinez, «Jornadas Big Data ANALYTICS,»19 Septiembre 2019. [Online]. Available: https://www.cfp.upv.es/formacion-permanente/curso/jornada-big-data-analytics_67010.html. [26]J. Rifkin, The End of Work: The Decline of the Global Labor Force and the Dawn of the Post-Market Era, Putnam Publishing Group, 1995. [27]R. Conde del Pozo, «Los 5 desafíos a los que se enfrenta el Big Data,» 13 Agosto 2019. [Online]. Available: https://diarioti.com/los-5-desafios-a-los-que-se-enfrenta-el-big-data/110607.


Author(s):  
McKinlee M. Salazar ◽  
Mônica T. Pupo ◽  
Amanda M. V. Brown

Interactions between insect symbionts and plant pathogens are dynamic and complex, sometimes involving direct antagonism or synergy and sometimes involving ecological and evolutionary leaps, as insect symbionts transmit through plant tissues or plant pathogens transition to become insect symbionts. Hemipterans such as aphids, whiteflies, psyllids, leafhoppers, and planthoppers are well-studied plant pests that host diverse symbionts and vector plant pathogens. The related hemipteran treehoppers (family Membracidae) are less well-studied but offer a potentially new and diverse array of symbionts and plant pathogenic interactions through their distinct woody plant hosts and ecological interactions with diverse tending hymenopteran taxa. To explore membracid symbiont–pathogen diversity and co-occurrence, this study performed shotgun metagenomic sequencing on 20 samples (16 species) of treehopper, and characterized putative symbionts and pathogens using a combination of rapid blast database searches and phylogenetic analysis of assembled scaffolds and correlation analysis. Among the 8.7 billion base pairs of scaffolds assembled were matches to 9 potential plant pathogens, 12 potential primary and secondary insect endosymbionts, numerous bacteriophages, and other viruses, entomopathogens, and fungi. Notable discoveries include a divergent Brenneria plant pathogen-like organism, several bee-like Bombella and Asaia strains, novel strains of Arsenophonus-like and Sodalis-like symbionts, Ralstonia sp. and Ralstonia-type phages, Serratia sp., and APSE-type phages and bracoviruses. There were several short Phytoplasma and Spiroplasma matches, but there was no indication of plant viruses in these data. Clusters of positively correlated microbes such as yeast-like symbionts and Ralstonia, viruses and Serratia, and APSE phage with parasitoid-type bracoviruses suggest directions for future analyses. Together, results indicate membracids offer a rich palette for future study of symbiont–plant pathogen interactions.


Author(s):  
Bettina Grün ◽  
Gertraud Malsiner-Walli ◽  
Sylvia Frühwirth-Schnatter

AbstractIn model-based clustering, the Galaxy data set is often used as a benchmark data set to study the performance of different modeling approaches. Aitkin (Stat Model 1:287–304) compares maximum likelihood and Bayesian analyses of the Galaxy data set and expresses reservations about the Bayesian approach due to the fact that the prior assumptions imposed remain rather obscure while playing a major role in the results obtained and conclusions drawn. The aim of the paper is to address Aitkin’s concerns about the Bayesian approach by shedding light on how the specified priors influence the number of estimated clusters. We perform a sensitivity analysis of different prior specifications for the mixtures of finite mixture model, i.e., the mixture model where a prior on the number of components is included. We use an extensive set of different prior specifications in a full factorial design and assess their impact on the estimated number of clusters for the Galaxy data set. Results highlight the interaction effects of the prior specifications and provide insights into which prior specifications are recommended to obtain a sparse clustering solution. A simulation study with artificial data provides further empirical evidence to support the recommendations. A clear understanding of the impact of the prior specifications removes restraints preventing the use of Bayesian methods due to the complexity of selecting suitable priors. Also, the regularizing properties of the priors may be intentionally exploited to obtain a suitable clustering solution meeting prior expectations and needs of the application.


Sign in / Sign up

Export Citation Format

Share Document