scholarly journals INSTANCE – the Italian seismic dataset for machine learning

2021 ◽  
Vol 13 (12) ◽  
pp. 5509-5544
Author(s):  
Alberto Michelini ◽  
Spina Cianetti ◽  
Sonja Gaviano ◽  
Carlo Giunchi ◽  
Dario Jozinović ◽  
...  

Abstract. The Italian earthquake waveform data are collected here in a dataset suited for machine learning analysis (ML) applications. The dataset consists of nearly 1.2 million three-component (3C) waveform traces from about 50 000 earthquakes and more than 130 000 noise 3C waveform traces, for a total of about 43 000 h of data and an average of 21 3C traces provided per event. The earthquake list is based on the Italian Seismic Bulletin (http://terremoti.ingv.it/bsi, last access: 15 February 2020​​​​​​​) of the Istituto Nazionale di Geofisica e Vulcanologia between January 2005 and January 2020, and it includes events in the magnitude range between 0.0 and 6.5. The waveform data have been recorded primarily by the Italian National Seismic Network (network code IV) and include both weak- (HH, EH channels) and strong-motion (HN channels) recordings. All the waveform traces have a length of 120 s, are sampled at 100 Hz, and are provided both in counts and ground motion physical units after deconvolution of the instrument transfer functions. The waveform dataset is accompanied by metadata consisting of more than 100 parameters providing comprehensive information on the earthquake source, the recording stations, the trace features, and other derived quantities. This rich set of metadata allows the users to target the data selection for their own purposes. Much of these metadata can be used as labels in ML analysis or for other studies. The dataset, assembled in HDF5 format, is available at http://doi.org/10.13127/instance (Michelini et al., 2021).

2021 ◽  
Author(s):  
Alberto Michelini ◽  
Spina Cianetti ◽  
Sonja Gaviano ◽  
Carlo Giunchi ◽  
Dario Jozinovic ◽  
...  

Abstract. The Italian earthquake waveform data are here collected in a dataset suited for machine learning analysis (ML) applications. The dataset consists of near 1.2 million three-component (3C) waveform traces from about 50,000 earthquakes and more than 130,000 noise 3C waveform traces, for a total of about 43,000 hours of data and an average of 21 3C traces are provided per event. The earthquake list is based on the Italian seismic bulletin (http://terremoti.ingv.it/bsi) of the ``Istituto Nazionale di Geofisica e Vulcanologia'' between January 2005 and January 2020 and it includes events in the magnitude range between 0.0 and 6.5. The waveform data have been recorded primarily by the Italian National Seismic Network (network code IV) and include both weak (HH, EH channels) and strong motion recordings (HN channels). All the waveform traces have a length of 120 s, are sampled at 100 Hz, and are provided both in counts and ground motion units after deconvolution of the instrument transfer functions. The waveform dataset is accompanied by metadata consisting of more than 100 parameters providing comprehensive information on the earthquake source, the recording stations, the trace features, and other derived quantities. This rich set of metadata allows the users to target the data selection for their own purposes. Many of these metadata can be used as labels in ML analysis or for other studies. The dataset, assembled in HDF5 format, is available at http://doi.org/10.13127/instance (Michelini et al., 2021).


Stroke ◽  
2021 ◽  
Vol 52 (Suppl_1) ◽  
Author(s):  
Rania Abdelkhaleq ◽  
Victor Lopez-Rivera ◽  
Sergio Salazar-Marioni ◽  
Songmi Lee ◽  
Youngran Kim ◽  
...  

Introduction: Evaluation of infarct core by advanced neuroimaging has facilitated patient selection for endovascular stroke therapy (EST), however the accuracy of machine-learning analysis compared to these modalities remains unexplored. We test the performance of computed tomography-Alberta Stroke Program Early Computed Tomography Score (CT- ASPECTS) vs. Computed Tomography Perfusion (CTP)-RAPID, vs. an extension of our novel machine-learning model, Deep Symmetry-sensitive Network (DeepSymNet [ref]), using the final infarct volume (FIV) in patients with rapid and successful endovascular reperfusion as the gold standard. Methods and Materials: We identified consecutive patients with large vessel occlusion acute ischemic stroke that underwent EST with TICI 2b/3 reperfusion. FIV was determined by volumetric measurements on 24-48h DWI MRI. The DeepSymNet algorithm combines symmetric and absolute brain representations and had been trained to predict CTP-RAPID core size from CTA source images acquired at presentation. Performance at predicting FIV was determined by Pearson’s correlation for CT- ASPECTS, CTP-RAPID, and DeepSymNet. Data are presented as median [IQR]. Results: Among the 76 patients that met inclusion criteria, 55.2% were male, the median age was 68 years [54-77], and 32.8% were White. 71% of the patients demonstrated an MCA occlusion, and 55% of all occlusions were left-sided. Median ASPECTS on presentation was 8 [7-8.5] and the median FIV was 10 mL [2-37]. ASPECTS, CTP-RAPID and DeepSymNet all correlated with FIV, with comparable performances from ASPECTS (R 2 =-0.398) and CTP-RAPID (R 2 =0.403) and superior performance by DeepSymNet (R 2 =-0.606)(Table). Conclusions: The DeepSymNet machine learning model analyzing CTA source images demonstrated superior performance to ASPECTS and CTP-RAPID in FIV prediction. These findings suggest machine learning models may provide improved predictions of infarct core and selection for EST.


2021 ◽  
Vol 14 (3) ◽  
pp. 101016 ◽  
Author(s):  
Jim Abraham ◽  
Amy B. Heimberger ◽  
John Marshall ◽  
Elisabeth Heath ◽  
Joseph Drabick ◽  
...  

Author(s):  
Dhiraj J. Pangal ◽  
Guillaume Kugener ◽  
Shane Shahrestani ◽  
Frank Attenello ◽  
Gabriel Zada ◽  
...  

Author(s):  
John J. Squiers ◽  
Jeffrey E. Thatcher ◽  
David Bastawros ◽  
Andrew J. Applewhite ◽  
Ronald D. Baxter ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document