Deep Clustering to Identify Sources of Urban Seismic Noise in Long Beach, California

Author(s):  
Dylan Snover ◽  
Christopher W. Johnson ◽  
Michael J. Bianco ◽  
Peter Gerstoft

Abstract Ambient seismic noise consists of emergent and impulsive signals generated by natural and anthropogenic sources. Developing techniques to identify specific cultural noise signals will benefit studies performing seismic imaging from continuous records. We examine spectrograms of urban cultural noise from a spatially dense seismic array located in Long Beach, California. The spectral features of the waveforms are used to develop a self-supervised clustering model for differentiating cultural noise into separable types of signals. We use 161 hr of seismic data from 5200 geophones that contain impulsive signals originating from human activity. The model uses convolutional autoencoders, a self-supervised machine-learning technique, to learn latent features from spectrograms produced from the data. The latent features are evaluated using a deep clustering algorithm to separate the noise signals into different classes. We evaluate the separation of data and analyze the classes to identify the likely sources of the signals present in the data. To interpret the model performance, we examine the time–frequency domain features of the signals and the spatiotemporal evolution observed for each class. We demonstrate that clustering using deep autoencoders is a useful approach to characterizing seismic noise and identifying novel signals in the data.

BMJ Open ◽  
2021 ◽  
Vol 11 (2) ◽  
pp. e044500
Author(s):  
Yauhen Statsenko ◽  
Fatmah Al Zahmi ◽  
Tetiana Habuza ◽  
Klaus Neidl-Van Gorkom ◽  
Nazar Zaki

BackgroundDespite the necessity, there is no reliable biomarker to predict disease severity and prognosis of patients with COVID-19. The currently published prediction models are not fully applicable to clinical use.ObjectivesTo identify predictive biomarkers of COVID-19 severity and to justify their threshold values for the stratification of the risk of deterioration that would require transferring to the intensive care unit (ICU).MethodsThe study cohort (560 subjects) included all consecutive patients admitted to Dubai Mediclinic Parkview Hospital from February to May 2020 with COVID-19 confirmed by the PCR. The challenge of finding the cut-off thresholds was the unbalanced dataset (eg, the disproportion in the number of 72 patients admitted to ICU vs 488 non-severe cases). Therefore, we customised supervised machine learning (ML) algorithm in terms of threshold value used to predict worsening.ResultsWith the default thresholds returned by the ML estimator, the performance of the models was low. It was improved by setting the cut-off level to the 25th percentile for lymphocyte count and the 75th percentile for other features. The study justified the following threshold values of the laboratory tests done on admission: lymphocyte count <2.59×109/L, and the upper levels for total bilirubin 11.9 μmol/L, alanine aminotransferase 43 U/L, aspartate aminotransferase 32 U/L, D-dimer 0.7 mg/L, activated partial thromboplastin time (aPTT) 39.9 s, creatine kinase 247 U/L, C reactive protein (CRP) 14.3 mg/L, lactate dehydrogenase 246 U/L, troponin 0.037 ng/mL, ferritin 498 ng/mL and fibrinogen 446 mg/dL.ConclusionThe performance of the neural network trained with top valuable tests (aPTT, CRP and fibrinogen) is admissible (area under the curve (AUC) 0.86; 95% CI 0.486 to 0.884; p<0.001) and comparable with the model trained with all the tests (AUC 0.90; 95% CI 0.812 to 0.902; p<0.001). Free online tool at https://med-predict.com illustrates the study results.


1991 ◽  
Vol 81 (4) ◽  
pp. 1101-1114
Author(s):  
Jerry A. Carter ◽  
Noel Barstow ◽  
Paul W. Pomeroy ◽  
Eric P. Chael ◽  
Patrick J. Leahy

Abstract Evidence is presented supporting the view that high-frequency seismic noise decreases with increased depth. Noise amplitudes are higher near the free surface where surface-wave noise, cultural noise, and natural (wind-induced) noise predominate. Data were gathered at a hard-rock site in the northwestern Adirondack lowlands of northern New York. Between 15- and 40-Hz noise levels at this site are more than 10 dB less at 945-m depth than they are at the surface, and from 40 to 100 Hz the difference is more than 20 dB. In addition, time variability of the spectra is shown to be greater at the surface than at either 335- or 945-m depths. Part of the difference between the surface and subsurface noise variability may be related to wind-induced noise. Coherency measurements between orthogonal components of motion show high-frequency seismic noise is more highly organized at the surface than it is at depth. Coherency measurements between the same component of motion at different vertical offsets show a strong low-frequency coherence at least up to 945-m vertical offsets. As the vertical offset decreases, the frequency band of high coherence increases.


2020 ◽  
Author(s):  
Xiao Lai ◽  
Pu Tian

AbstractSupervised machine learning, especially deep learning based on a wide variety of neural network architectures, have contributed tremendously to fields such as marketing, computer vision and natural language processing. However, development of un-supervised machine learning algorithms has been a bottleneck of artificial intelligence. Clustering is a fundamental unsupervised task in many different subjects. Unfortunately, no present algorithm is satisfactory for clustering of high dimensional data with strong nonlinear correlations. In this work, we propose a simple and highly efficient hierarchical clustering algorithm based on encoding by composition rank vectors and tree structure, and demonstrate its utility with clustering of protein structural domains. No record comparison, which is an expensive and essential common step to all present clustering algorithms, is involved. Consequently, it achieves linear time and space computational complexity hierarchical clustering, thus applicable to arbitrarily large datasets. The key factor in this algorithm is definition of composition, which is dependent upon physical nature of target data and therefore need to be constructed case by case. Nonetheless, the algorithm is general and applicable to any high dimensional data with strong nonlinear correlations. We hope this algorithm to inspire a rich research field of encoding based clustering well beyond composition rank vector trees.


Author(s):  
U. K. Sridevi ◽  
N. Nagaveni

Clustering is an important topic to find relevant content from a document collection and it also reduces the search space. The current clustering research emphasizes the development of a more efficient clustering method without considering the domain knowledge and user’s need. In recent years the semantics of documents have been utilized in document clustering. The discussed work focuses on the clustering model where ontology approach is applied. The major challenge is to use the background knowledge in the similarity measure. This paper presents an ontology based annotation of documents and clustering system. The semi-automatic document annotation and concept weighting scheme is used to create an ontology based knowledge base. The Particle Swarm Optimization (PSO) clustering algorithm can be applied to obtain the clustering solution. The accuracy of clustering has been computed before and after combining ontology with Vector Space Model (VSM). The proposed ontology based framework gives improved performance and better clustering compared to the traditional vector space model. The result using ontology was significant and promising.


2020 ◽  
Vol 18 (1) ◽  
Author(s):  
Kerry E. Poppenberg ◽  
Vincent M. Tutino ◽  
Lu Li ◽  
Muhammad Waqas ◽  
Armond June ◽  
...  

Abstract Background Intracranial aneurysms (IAs) are dangerous because of their potential to rupture. We previously found significant RNA expression differences in circulating neutrophils between patients with and without unruptured IAs and trained machine learning models to predict presence of IA using 40 neutrophil transcriptomes. Here, we aim to develop a predictive model for unruptured IA using neutrophil transcriptomes from a larger population and more robust machine learning methods. Methods Neutrophil RNA extracted from the blood of 134 patients (55 with IA, 79 IA-free controls) was subjected to next-generation RNA sequencing. In a randomly-selected training cohort (n = 94), the Least Absolute Shrinkage and Selection Operator (LASSO) selected transcripts, from which we constructed prediction models via 4 well-established supervised machine-learning algorithms (K-Nearest Neighbors, Random Forest, and Support Vector Machines with Gaussian and cubic kernels). We tested the models in the remaining samples (n = 40) and assessed model performance by receiver-operating-characteristic (ROC) curves. Real-time quantitative polymerase chain reaction (RT-qPCR) of 9 IA-associated genes was used to verify gene expression in a subset of 49 neutrophil RNA samples. We also examined the potential influence of demographics and comorbidities on model prediction. Results Feature selection using LASSO in the training cohort identified 37 IA-associated transcripts. Models trained using these transcripts had a maximum accuracy of 90% in the testing cohort. The testing performance across all methods had an average area under ROC curve (AUC) = 0.97, an improvement over our previous models. The Random Forest model performed best across both training and testing cohorts. RT-qPCR confirmed expression differences in 7 of 9 genes tested. Gene ontology and IPA network analyses performed on the 37 model genes reflected dysregulated inflammation, cell signaling, and apoptosis processes. In our data, demographics and comorbidities did not affect model performance. Conclusions We improved upon our previous IA prediction models based on circulating neutrophil transcriptomes by increasing sample size and by implementing LASSO and more robust machine learning methods. Future studies are needed to validate these models in larger cohorts and further investigate effect of covariates.


2016 ◽  
Vol 4 (2) ◽  
pp. 285-307 ◽  
Author(s):  
Arnaud Burtin ◽  
Niels Hovius ◽  
Jens M. Turowski

Abstract. In seismology, the signal is usually analysed for earthquake data, but earthquakes represent less than 1 % of continuous recording. The remaining data are considered as seismic noise and were for a long time ignored. Over the past decades, the analysis of seismic noise has constantly increased in popularity, and this has led to the development of new approaches and applications in geophysics. The study of continuous seismic records is now open to other disciplines, like geomorphology. The motion of mass at the Earth's surface generates seismic waves that are recorded by nearby seismometers and can be used to monitor mass transfer throughout the landscape. Surface processes vary in nature, mechanism, magnitude, space and time, and this variability can be observed in the seismic signals. This contribution gives an overview of the development and current opportunities for the seismic monitoring of geomorphic processes. We first describe the common principles of seismic signal monitoring and introduce time–frequency analysis for the purpose of identification and differentiation of surface processes. Second, we present techniques to detect, locate and quantify geomorphic events. Third, we review the diverse layout of seismic arrays and highlight their advantages and limitations for specific processes, like slope or channel activity. Finally, we illustrate all these characteristics with the analysis of seismic data acquired in a small debris-flow catchment where geomorphic events show interactions and feedbacks. Further developments must aim to fully understand the richness of the continuous seismic signals, to better quantify the geomorphic activity and to improve the performance of warning systems. Seismic monitoring may ultimately allow the continuous survey of erosion and transfer of sediments in the landscape on the scales of external forcing.


Geophysics ◽  
1975 ◽  
Vol 40 (6) ◽  
pp. 1066-1072 ◽  
Author(s):  
H. M. Iyer

A seismic noise experiment was conducted in the East Mesa area of Imperial Valley, California, by the U.S. Geological Survey (USGS) in May 1972. There is a pronounced heat flow anomaly over the area, and between July 1972 and the present five deep test wells have been drilled over the anomaly by the U.S. Bureau of Reclamation (U.S. Bureau of Reclamation, 1974). At the time of our survey, we were aware of results from a preliminary seismic noise survey in East Mesa by Teledyne Geotech (Douze and Sorrells, 1972). A detailed noise survey was conducted by Teledyne Geotech soon after our experiment (Geothermal Staff of Teledyne Geotech, 1972). Both the Teledyne Geotech surveys show noise levels (in the 3.0 to 5.0 hz band) 12–18 db higher over the area where the thermal gradients and heat flow reach maximum values than in the surroundings. Our results, on the other hand, show that the seismic noise field in the area is dominated by cultural noise, and it is impossible to see a noise anomaly that can be related to the geothermal phenomena in East Mesa. We think that it is important to take into account this disagreement between the two results in order to make a critical evaluation of the utility of seismic noise as a geothermal prospecting tool. The purpose of this note is to put our findings on record.


Author(s):  
Ting Xie ◽  
Taiping Zhang

As a powerful unsupervised learning technique, clustering is the fundamental task of big data analysis. However, many traditional clustering algorithms for big data that is a collection of high dimension, sparse and noise data do not perform well both in terms of computational efficiency and clustering accuracy. To alleviate these problems, this paper presents Feature K-means clustering model on the feature space of big data and introduces its fast algorithm based on Alternating Direction Multiplier Method (ADMM). We show the equivalence of the Feature K-means model in the original space and the feature space and prove the convergence of its iterative algorithm. Computationally, we compare the Feature K-means with Spherical K-means and Kernel K-means on several benchmark data sets, including artificial data and four face databases. Experiments show that the proposed approach is comparable to the state-of-the-art algorithm in big data clustering.


2020 ◽  
Vol 91 (5) ◽  
pp. 2757-2768 ◽  
Author(s):  
Han Xiao ◽  
Zachary Cohen Eilon ◽  
Chen Ji ◽  
Toshiro Tanimoto

Abstract Seismic noise with frequencies above 1 Hz is often called “cultural noise” and is generally correlated quite well with human activities. Recently, cities in mainland China and Italy imposed restrictions on travel and day-to-day activity in response to COVID-19, which gave us an unprecedented opportunity to study the relationship between seismic noise above 1 Hz and human activities. Using seismic records from stations in China and Italy, we show that seismic noise above 1 Hz was primarily generated by the local transportation systems. The lockdown of the cities and the imposition of travel restrictions led to an ∼4–12  dB decrease in seismic noise power in mainland China. Data also show that different Chinese cities experienced distinct periods of diminished cultural noise, related to differences in local response to the epidemic. In contrast, there was only ∼1–6  dB decrease of seismic noise power in Italy, after the country was put under a lockdown. The noise data indicate that traffic flow did not decrease as much in Italy and show how different cities reacted distinctly to the lockdown conditions.


2012 ◽  
Vol 51 (2) ◽  
pp. 185-190 ◽  
Author(s):  
Alex J. Cannon

AbstractRegression-guided clustering is introduced as a means of constructing circulation-to-environment synoptic climatological classifications. Rather than applying an unsupervised clustering algorithm to synoptic-scale atmospheric circulation data, one instead augments the atmospheric circulation dataset with predictions from a supervised regression model linking circulation to environment. The combined dataset is then entered into the clustering algorithm. The level of influence of the environmental dataset can be controlled by a simple weighting factor. The method is generic in that the choice of regression model and clustering algorithm is left to the user. Examples are given using standard multivariate linear regression models and the k-means clustering algorithm, both established methods in synoptic climatology. Results for southern British Columbia, Canada, indicate that model performance can be made to range between that of a fully unsupervised algorithm and a fully supervised algorithm.


Sign in / Sign up

Export Citation Format

Share Document