cluster assignment
Recently Published Documents


TOTAL DOCUMENTS

80
(FIVE YEARS 38)

H-INDEX

10
(FIVE YEARS 3)

2021 ◽  
Vol 7 (1) ◽  
pp. 304-313
Author(s):  
Edyta Kuk ◽  
Michał Kuk ◽  
Damian Janiga ◽  
Paweł Wojnarowski ◽  
Jerzy Stopa

Artificial Intelligence plays an increasingly important role in many industrial applications as it has great potential for solving complex engineering problems. One of such applications is the optimization of petroleum reservoirs production. It is crucial to produce hydrocarbons efficiently as their geological resources are limited. From an economic point of view, optimization of hydrocarbon well control is an important factor as it affects the whole market. The solution proposed in this paper is based on state-of-the-art artificial intelligence methods, optimal control, and decision tree theory. The proposed idea is to apply a novel temporal clustering algorithm utilizing an autoencoder for temporal dimensionality reduction and a temporal clustering layer for cluster assignment, to cluster wells into groups depending on the production situation that occurs in the vicinity of the well, which allows reacting proactively. Then the optimal control of wells belonging to specific groups is determined using an auto-adaptive decision tree whose parameters are optimized using a novel sequential model-based algorithm configuration method. Optimization of petroleum reservoirs production translates directly into several economic benefits: reduction in operation costs, increase in the production effectiveness and increase in overall income without any extra expenditure as only control is changed. This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.


Author(s):  
Fanny Wegner ◽  
Tim Roloff ◽  
Michael Huber ◽  
Samuel Cordey ◽  
Alban Ramette ◽  
...  

Objective: This first pilot on external quality assessment (EQA) of SARS-CoV-2 whole genome sequencing, initiated by the ESCMID Study Group for Genomic and Molecular Diagnostics (ESGMD) and Swiss Society for Microbiology (SSM), aims to build a framework between laboratories in order to improve pathogen surveillance sequencing. Methods: Ten samples with varying viral loads were sent out to 15 clinical laboratories who had free choice of sequencing methods and bioinformatic analyses. The key aspects on which the individual centres were compared on were identification of 1) SNPs and indels, 2) Pango lineages, and 3) clusters between samples. Results: The participating laboratories used a wide array of methods and analysis pipelines. Most were able to generate whole genomes for all samples. Genomes were sequenced to varying depth (up to 100-fold difference across centres). There was a very good consensus regarding the majority of reporting criteria, but there were a few discrepancies in lineage and cluster assignment. Additionally, there were inconsistencies in variant calling. The main reasons for discrepancies were missing data, bioinformatic choices, and interpretation of data. Conclusions: The pilot EQA was an overall success. It was able to show the high quality of participating labs and provide valuable feedback in cases where problems occurred, thereby improving the sequencing setup of laboratories. A larger follow-up EQA should, however, improve on defining the variables and format of the report. Additionally, contamination and/or minority variants should be a further aspect of assessment.


Sensors ◽  
2021 ◽  
Vol 21 (18) ◽  
pp. 6070
Author(s):  
Hai Van Pham ◽  
Dat Hoang Thanh ◽  
Philip Moore

Deep learning methods predicated on convolutional neural networks and graph neural networks have enabled significant improvement in node classification and prediction when applied to graph representation with learning node embedding to effectively represent the hierarchical properties of graphs. An interesting approach (DiffPool) utilises a differentiable graph pooling technique which learns ‘differentiable soft cluster assignment’ for nodes at each layer of a deep graph neural network with nodes mapped on sets of clusters. However, effective control of the learning process is difficult given the inherent complexity in an ‘end-to-end’ model with the potential for a large number parameters (including the potential for redundant parameters). In this paper, we propose an approach termed FPool, which is a development of the basic method adopted in DiffPool (where pooling is applied directly to node representations). Techniques designed to enhance data classification have been created and evaluated using a number of popular and publicly available sensor datasets. Experimental results for FPool demonstrate improved classification and prediction performance when compared to alternative methods considered. Moreover, FPool shows a significant reduction in the training time over the basic DiffPool framework.


PLoS Genetics ◽  
2021 ◽  
Vol 17 (8) ◽  
pp. e1009713
Author(s):  
Hanna Julienne ◽  
Vincent Laville ◽  
Zachary R. McCaw ◽  
Zihuai He ◽  
Vincent Guillemot ◽  
...  

Genome-wide association studies (GWASs) have uncovered a wealth of associations between common variants and human phenotypes. Here, we present an integrative analysis of GWAS summary statistics from 36 phenotypes to decipher multitrait genetic architecture and its link with biological mechanisms. Our framework incorporates multitrait association mapping along with an investigation of the breakdown of genetic associations into clusters of variants harboring similar multitrait association profiles. Focusing on two subsets of immunity and metabolism phenotypes, we then demonstrate how genetic variants within clusters can be mapped to biological pathways and disease mechanisms. Finally, for the metabolism set, we investigate the link between gene cluster assignment and the success of drug targets in randomized controlled trials.


2021 ◽  
Vol 11 (16) ◽  
pp. 7595
Author(s):  
Alessia Bastianoni ◽  
Enrico Guastaldi ◽  
Alessio Barbagli ◽  
Stefano Bernardinetti ◽  
Andrea Zirulia ◽  
...  

The hydrogeochemical characteristics of the significant subterranean water body between “Cecina River and San Vincenzo” (Italy) was evaluated using multivariate statistical analysis methods, like principal component analysis and self-organizing maps (SOMs), with the objective to study the spatiotemporal relationships of the aquifer. The dataset used consisted of the chemical composition of groundwater samples collected between 2010 and 2018 at 16 wells distributed across the whole aquifer. For these wells, all major ions were determined. A self-organizing map of 4 × 8 was constructed to evaluate spatiotemporal changes in the water body. After SOM clustering, we obtained three clusters that successfully grouped all data with similar chemical characteristics. These clusters can be viewed to reflect the presence of three water types: (i) Cluster 1: low salinity/mixed waters; (ii) Cluster 2: high salinity waters; and (iii) Cluster 3: low salinity/fresh waters. Results showed that the major ions had the greater influence over the groundwater chemistry, and the difference in their concentrations allowed the definition of three clusters among the obtained SOM. Temporal changes in cluster assignment were only observed in two wells, located in areas more susceptible to changes in the water table levels, and therefore, hydrodynamic conditions. The result of the SOM clustering was also displayed using the classical hydrochemical approach of the Piper plot. It was observed that these changes were not as easily identified when the raw data were used. The spatial display of the clustering results, allowed the evaluation in a hydrogeological context in a quick and cost-effective way. Thus, our approach can be used to quickly analyze large datasets, suggest recharge areas, and recognize spatiotemporal patterns.


2021 ◽  
pp. annrheumdis-2021-220331
Author(s):  
Alexander Platzer ◽  
Farideh Alasti ◽  
Josef S Smolen ◽  
Daniel Aletaha ◽  
Helga Radner ◽  
...  

ObjectivesIdentification of trajectories of radiographic damage in rheumatoid arthritis (RA) by clustering patients according to the shape of their curve of Sharp-van der Heijde scores (SHSs) over time. Developing models to predict their progression cluster from baseline characteristics.MethodsPatient-level data over a 2-year period from five large randomised controlled trials on tumour necrosis factor inhibitors in RA were used. SHSs were clustered in a shape-respecting manner to identify distinct clusters of radiographic progression. Characteristics of patients within different progression clusters were compared at baseline and over time. Logistic regression models were developed to predict trajectory of radiographic progression using information at baseline.ResultsIn total, 1887 patients with 7738 X-rays were used for cluster analyses. We identified four distinct clusters with characteristic shapes of radiographic progression: one with a stable SHS over the whole 2-year period (C0/lowChange; 86%); one with relentless progression (C1/rise; 5.8%); one with decreasing SHS (C2/improvement; 6.9%); one going up and down (C3/bothWays; 1.4%) of the SHS. Robustness of clusters were confirmed using different clustering methods. Regression models identified disease duration, baseline C-reactive protein (CRP) and SHS and treatment status as predictors for cluster assignment.ConclusionsWe were able to identify and partly characterise four different clusters of radiographic progression over time in patients with RA, most remarkably one with relentless progression and another one with amelioration of joint damage over time, suggesting the existence of distinct patterns of joint damage accrual in RA.


2021 ◽  
Author(s):  
Albert Dominguez Mantes ◽  
Daniel Mas Montserrat ◽  
Carlos Bustamante ◽  
Xavier Giró-i-Nietó ◽  
Alexander G Ioannidis

Characterizing the genetic substructure of large cohorts has become increasingly important as genetic association and prediction studies are extended to massive, increasingly diverse, biobanks. ADMIXTURE and STRUCTURE are widely used unsupervised clustering algorithms for characterizing such ancestral genetic structure. These methods decompose individual genomes into fractional cluster assignments with each cluster representing a vector of DNA marker frequencies. The assignments, and clusters, provide an interpretable representation for geneticists to describe population substructure at the sample level. However, with the rapidly increasing size of population biobanks and the growing numbers of variants genotyped (or sequenced) per sample, such traditional methods become computationally intractable. Furthermore, multiple runs with different hyperparameters are required to properly depict the population clustering using these traditional methods, increasing the computational burden. This can lead to days of compute. In this work we present Neural ADMIXTURE, a neural network autoencoder that follows the same modeling assumptions as ADMIXTURE, providing similar (or better) clustering, while reducing the compute time by orders of magnitude. In addition, this network can include multiple outputs, providing the equivalent results as running the original ADMIXTURE algorithm many times with different numbers of clusters. These models can also be stored, allowing later cluster assignment to be performed with a linear computational time.


2021 ◽  
Author(s):  
María Eugenia Videla ◽  
Juliana Iglesias ◽  
Cecilia Bruno

Abstract A number of clustering algorithms are available to depict population genetic structure (PGS) with genomic data; however, there is no consensus on which methods are the best performing ones. We conducted a simulation study of three PGS scenarios with subpopulations k=2, 5 and 10, recreating several maize genomes as a model to: (i) compare three well-known clustering methods: UPGMA, k-means and, Bayesian method (BM), (ii) asses four internal validation indices: CH, Connectivity, Dunn and Silhouette, to determine the reliable number of groups defining a PGS, and (iii) estimate the misclassification rate for each validation index. Moreover, a publicly available maize dataset was used to illustrate the outcomes of our simulation. BM was the best method to classify individuals in all tested scenarios, without assignment errors. Conversely, UPGMA was the method with the highest misclassification rate. In scenarios with 5 and 10 subpopulations, CH and Connectivity indices had the maximum underestimation of group number for all cluster algorithms. Dunn and Silhouette indices showed the best performance with BM. Nevertheless, since Silhouette measures the degree of confidence in cluster assignment, and BM measures the probability of cluster membership, these results should be considered with caution. In this study we found that BM showed to be efficient to depict the PGS in both simulated and real maize datasets. This study offers a robust alternative to unveil the existing PGS, thereby facilitating population studies and breeding strategies in maize programs. Moreover, the present findings may have implications for other crop species.


Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 2950
Author(s):  
Paul Myland ◽  
Sebastian Babilon ◽  
Tran Quoc Khanh

Intelligent systems for interior lighting strive to balance economical, ecological, and health-related needs. For this purpose, they rely on sensors to assess and respond to the current room conditions. With an augmented demand for more dedicated control, the number of sensors used in parallel increases considerably. In this context, the present work focuses on optical sensors with three spectral channels used to capture color-related information of the illumination conditions such as their chromaticities and correlated color temperatures. One major drawback of these devices, in particular with regard to intelligent lighting control, is that even same-type color sensors show production related differences in their color registration. Standard methods for color correction are either impractical for large-scale production or they result in large colorimetric errors. Therefore, this article shows the feasibility of a novel sensor binning approach using the sensor responses to a single white light source for cluster assignment. A cluster specific color correction is shown to significantly reduce the registered color differences for a selection of test stimuli to values in the range of 0.003–0.008 Δu′v′, which enables the wide use of such sensors in practice and, at the same time, requires minimal additional effort in sensor commissioning.


Sign in / Sign up

Export Citation Format

Share Document