Augmenting the sensor network around Helgoland using unsupervised machine learning methods

Author(s):  
Viktoria Wichert ◽  
Holger Brix

<p>A sensor network surrounds the island of Helgoland, supplying marine data centers with autonomous measurements of variables such as temperature, salinity, chlorophyll and oxygen saturation. The output is a data collection containing information about the complicated conditions around Helgoland, lying at the edge between coastal area and open sea. Spatio-temporal phenomena, such as passing river plumes and pollutant influx through flood events can be found in this data set. Through the data provided by the existing measurement network, these events can be detected and investigated.</p><p> Because of its important role in understanding the transition between coastal and sea conditions, plans are made to augment the sensor network around Helgoland with another underwater sensor station, an Underwater Node (UWN). The new node is supposed to optimally complement the existing sensor network. Therefore, it makes sense to place it in an area that is not yet represented well by other sensors. The exact spatial and temporal extent of the area of representativity around a sensor is hard to determine, but is assumed to have similar statistical conditions as the sensor measures. This is difficult to specify in the complex system around Helgoland and might change with both, space and time.</p><p>Using an unsupervised machine learning approach, I determine areas of representativity around Helgoland with the goal of finding an ideal placement for a new sensor node. The areas of representativity are identified by clustering a dataset containing time series of the existing sensor network and complementary model data for a period of several years. The computed areas of representativity are compared to the existing sensor placements to decide where to deploy the additional UWN to achieve a good coverage for further investigations on spatio-temporal phenomena.</p><p>A challenge that occurs during the clustering analysis is to determine whether the spatial areas of representativity remain stable enough over time to base the decision of long-term sensor placement on its results. I compare results across different periods of time and investigate how fast areas of representativity change spatially with time and if there are areas that remain stable over the course of several years. This also allows insights on the occurrence and behavior of spatio-temporal events around Helgoland in the long-term.    </p><p>Whether spatial areas of representativity remain stable enough temporally to be taken into account for augmenting sensor networks, influences future network design decisions. This way, the extended sensor network can capture a greater variety of the spatio-temporal phenomena around Helgoland, as well as allow an overview on the long-term behavior of the marine system.</p>

2020 ◽  
pp. 1-12
Author(s):  
Qinglong Ding ◽  
Zhenfeng Ding

Sports competition characteristics play an important role in judging the fairness of the game and improving the skills of the athletes. At present, the feature recognition of sports competition is affected by the environmental background, which causes problems in feature recognition. In order to improve the effect of feature recognition of sports competition, this study improves the TLD algorithm, and uses machine learning to build a feature recognition model of sports competition based on the improved TLD algorithm. Moreover, this study applies the TLD algorithm to the long-term pedestrian tracking of PTZ cameras. In view of the shortcomings of the TLD algorithm, this study improves the TLD algorithm. In addition, the improved TLD algorithm is experimentally analyzed on a standard data set, and the improved TLD algorithm is experimentally verified. Finally, the experimental results are visually represented by mathematical statistics methods. The research shows that the method proposed by this paper has certain effects.


Blood ◽  
2021 ◽  
Vol 138 (Supplement 1) ◽  
pp. 2372-2372
Author(s):  
Habib Hamidi ◽  
Christopher R Bolen ◽  
Elisabeth A Lasater ◽  
Diana Dunshee ◽  
Elizabeth A Punnoose ◽  
...  

Abstract Introduction: AML is a heterogeneous disease with a wide array of common genetic aberrations. Traditional classification of AML leverages both classical cytogenetics and mutational profiling to stratify patients into four distinct risk groups (ELN). However, tumor gene expression profiles can play an important role in response to therapy, and are potentially useful for unravelling the heterogeneity of AML. In this study, we hypothesized that clinical outcomes and variable responses to therapeutic modalities in AML may be driven by patterns of gene expression, and sought to identify clinically actionable molecular subtypes using the available RNAseq data from the BEAT AML functional genomics study. Methods: Unsupervised machine learning approach based on consensus non-negative matrix factorization (cNMF) was applied to VOOM normalized BEAT-AML RNAseq data from patient samples with ≥50% blasts (N=389) to identify transcriptomic-based molecular subtypes. The subtypes were then compared to the genomic based subtypes for their association with clinical outcome (log-rank test) and ex-vivo drug sensitivity (Kruskal Wallis test). Subtypes were also biologically characterized by gene signature scoring using well curated pathway signatures (GSVA analysis using Hallmark pathways), cell type enrichment (xCell enrichment) and AML differentiation state (scRNAseq signature based on Van Galen et. al). Finally, a random forest classifier was defined based on samples from BEAT AML to predict the NMF subtypes in an independent data set (TCGA AML cohort). Results: Our cNMF based analysis identified six clusters of patients based on the 5,060 (top 10%) most variable genes. These novel subtypes were strongly prognostic (Figure 1A, log rank p=2.79e-08), and were independent of ELN genomic based subtypes (anova p=4.45e-07). Comparison to other genomic based classification is ongoing. The prognostic value of the transcriptomic subtypes was further validated by predicting the subtypes in an independent cohort (TCGA LAML, N=200). We observed a significant association with outcome (Figure 1B, p=0.00013), with clusters 5 and 1 showing markedly better prognosis, similar to BEATAML. These subtypes also displayed unique biological profiles, including significant association with scRNAseq-derived AML differentiation state cell types, Hallmark pathways and cellularity signatures. Notably, clusters 1 and 3 showed a mature phenotype, while clusters 2, 4, and 5 were more progenitor-like (table 1). Importantly, the transcriptomic subtypes were highly predictive of ex-vivo drug sensitivity, with sensitivity to 70 compounds significantly associated with cNMF subtype (Kruskal Wallis p>0.01), compared with 4 in the ELN subtypes.Of the tested molecules, single agent Venetoclax was the most strongly associated with subtype (p=1.7e-13); two subtypes were strongly resistant (median IC50 of 10uM) and four were sensitive, with IC50s in the sub-micromolar range (Table 1). No association was seen between the ELN subtypes and venetoclax sensitivity (p=.35). Conclusions: Unsupervised machine learning-based clustering analysis of transcriptomic data identified six novel subtypes which are similarly prognostic as the ELN genomic based subtype and provide a novel avenue for identifying clinically actionable subsets of AML. Figure 1 Figure 1. Disclosures Hamidi: Genentech: Current Employment, Current equity holder in publicly-traded company. Bolen: Genentech: Current Employment; F. Hoffmann-La Roche: Current equity holder in publicly-traded company. Lasater: Genentech: Current Employment, Current equity holder in publicly-traded company. Dunshee: Genentech/Roche: Current Employment, Current equity holder in publicly-traded company. Punnoose: Genentech: Current Employment, Current equity holder in publicly-traded company. Dail: Genentech/Roche: Current Employment, Current equity holder in publicly-traded company.


2020 ◽  
Author(s):  
Anil Kumar ◽  
Manish Prateek

Abstract Background: This study aimed significance of Ki-67 labels and calculated the proliferation score based on the counting of immunopositive and immunonegative nuclear sections with the help of machine learning to predict the intensity of breast carcinoma.Methods: BreCaHAD (Breast Cancer Histopathological Annotation and Diagnosis) dataset includes various malignant cases of different patients in their routine diagnosis. It contains H&E stained microscopic histopathological images at 40x magnification and stored in .tiff format using RGB band. In this study, the method start with preprocessing that focuses on resizing, smoothing and enhancement. After preprocessing, it is decomposed RGB sample into HSI values. BreCaHAD data set is hematoxylin and eosin (H&E) stained, where brown and blue color level have a major role to differentiate the immunopositive and immunonegative nuclear sections. Blue color in RGB and Hue in HSI are the intrinsic characteristic of H&E Ki-67. The shape parameters are calculated after segmentation preceded by Otsu thresholding and unsupervised machine learning. Morphological operators help to solve the problem of overlapping of nucleus section in sample images so that the counting will be correct and increase the accuracy of automatic segmentation.Result: With the help of nine morphological features and supported by unsupervised machine learning technique on BreCaHAD dataset, it is predicted the label of breast carcinoma. The performance measures like precision: 95.7%, recall: 93.8%, f-score: 94.74%, accuracy: 0.9088, specificity: 0.6803, BCR: 0.7975 and MCC: 0.5855 are obtained in proposed methodology which is better than existing techniques. Conclusion: This study developed an efficient automated nuclear section segmentation model implemented on BreCaHAD dataset contains H&E stained microscopic biopsy images. Potentially, this model will assist the pathologist for fast, effective, efficient and accurate computation of Ki-67 proliferation score on breast IHC carcinoma images.


Author(s):  
Pavel Kikin ◽  
Alexey Kolesnikov ◽  
Alexey Portnov ◽  
Denis Grischenko

The state of ecological systems, along with their general characteristics, is almost always described by indicators that vary in space and time, which leads to a significant complication of constructing mathematical models for predicting the state of such systems. One of the ways to simplify and automate the construction of mathematical models for predicting the state of such systems is the use of machine learning methods. The article provides a comparison of traditional and based on neural networks, algorithms and machine learning methods for predicting spatio-temporal series representing ecosystem data. Analysis and comparison were carried out among the following algorithms and methods: logistic regression, random forest, gradient boosting on decision trees, SARIMAX, neural networks of long-term short-term memory (LSTM) and controlled recurrent blocks (GRU). To conduct the study, data sets were selected that have both spatial and temporal components: the values of the number of mosquitoes, the number of dengue infections, the physical condition of tropical grove trees, and the water level in the river. The article discusses the necessary steps for preliminary data processing, depending on the algorithm used. Also, Kolmogorov complexity was calculated as one of the parameters that can help formalize the choice of the most optimal algorithm when constructing mathematical models of spatio-temporal data for the sets used. Based on the results of the analysis, recommendations are given on the application of certain methods and specific technical solutions, depending on the characteristics of the data set that describes a particular ecosystem


Water ◽  
2019 ◽  
Vol 11 (6) ◽  
pp. 1268 ◽  
Author(s):  
Zhenzhen Di ◽  
Miao Chang ◽  
Peikun Guo ◽  
Yang Li ◽  
Yin Chang

Most worldwide industrial wastewater, including in China, is still directly discharged to aquatic environments without adequate treatment. Because of a lack of data and few methods, the relationships between pollutants discharged in wastewater and those in surface water have not been fully revealed and unsupervised machine learning techniques, such as clustering algorithms, have been neglected in related research fields. In this study, real-time monitoring data for chemical oxygen demand (COD), ammonia nitrogen (NH3-N), pH, and dissolved oxygen in the wastewater discharged from 2213 factories and in the surface water at 18 monitoring sections (sites) in 7 administrative regions in the Yangtze River Basin from 2016 to 2017 were collected and analyzed by the partitioning around medoids (PAM) and expectation–maximization (EM) clustering algorithms, Welch t-test, Wilcoxon test, and Spearman correlation. The results showed that compared with the spatial cluster comprising unpolluted sites, the spatial cluster comprised heavily polluted sites where more wastewater was discharged had relatively high COD (>100 mg L−1) and NH3-N (>6 mg L−1) concentrations and relatively low pH (<6) from 15 industrial classes that respected the different discharge limits outlined in the pollutant discharge standards. The results also showed that the economic activities generating wastewater and the geographical distribution of the heavily polluted wastewater changed from 2016 to 2017, such that the concentration ranges of pollutants in discharges widened and the contributions from some emerging enterprises became more important. The correlations between the quality of the wastewater and the surface water strengthened as the whole-year data sets were reduced to the heavily polluted periods by the EM clustering and water quality evaluation. This study demonstrates how unsupervised machine learning algorithms play an objective and effective role in data mining real-time monitoring information and highlighting spatio–temporal relationships between pollutants in wastewater discharges and surface water to support scientific water resource management.


2019 ◽  
Vol 6 (1) ◽  
Author(s):  
Luca Pappalardo ◽  
Paolo Cintia ◽  
Alessio Rossi ◽  
Emanuele Massucco ◽  
Paolo Ferragina ◽  
...  

Abstract Soccer analytics is attracting increasing interest in academia and industry, thanks to the availability of sensing technologies that provide high-fidelity data streams for every match. Unfortunately, these detailed data are owned by specialized companies and hence are rarely publicly available for scientific research. To fill this gap, this paper describes the largest open collection of soccer-logs ever released, containing all the spatio-temporal events (passes, shots, fouls, etc.) that occured during each match for an entire season of seven prominent soccer competitions. Each match event contains information about its position, time, outcome, player and characteristics. The nature of team sports like soccer, halfway between the abstraction of a game and the reality of complex social systems, combined with the unique size and composition of this dataset, provide an ideal ground for tackling a wide range of data science problems, including the measurement and evaluation of performance, both at individual and at collective level, and the determinants of success and failure.


2020 ◽  
Author(s):  
Josefine Umlauft ◽  
Philippe Roux ◽  
Florent Gimbert ◽  
Albanne Lecointre ◽  
Bertrand Rouet-LeDuc ◽  
...  

&lt;p&gt;The cryosphere is a highly active and dynamic environment that rapidly responds to changing climatic conditions. processes behind are poorly understood they remain challenging to observe. &lt;span&gt;Glacial dynamics are&lt;/span&gt; strongly intermittent in time and heterogeneous in space. Thus, monitoring with high spatio-temporal resolution is essential. In course of the RESOLVE project, continuous seismic observations were obtained using a dense seismic network (100 nodes, &amp;#216; 700 m) installed on the Argenti&amp;#232;re Glacier (French Alpes) during May in 2018. This unique data set offers the chance to study targeted processes and dynamics within the cryosphere on a local scale in detail.&lt;/p&gt;&lt;p align=&quot;justify&quot;&gt;We classical beamforming within the of the array (matched field processing) and unsupervised machine learning&lt;span&gt; techniques&lt;/span&gt; to identify, cluster and locate seismic sources in 5D (x, y, z, velocity, time). Sources located with high resolution and accuracy related to processes and activity within the ice body, e.g. the geometry and dynamics of crevasses or the interaction at the glacier/bedrock interface, depending on the meteorological conditions such as daily temperature fluctuations or snow fall. &lt;span&gt;Our preliminary&lt;/span&gt; results indicate strong potential in poorly resolved sources, which can be observed with statistical consistency reveal new insights into structural features/ physical properties of the glacier (e.g. analysis of scatterers).&lt;/p&gt;


Sign in / Sign up

Export Citation Format

Share Document