Data pre-processing pipeline generation for AutoETL

2021 ◽  
pp. 101957
Author(s):  
Joseph Giovanelli ◽  
Besim Bilalli ◽  
Alberto Abelló
Keyword(s):  
2019 ◽  
Vol 12 (2) ◽  
pp. 120-127 ◽  
Author(s):  
Wael Farag

Background: In this paper, a Convolutional Neural Network (CNN) to learn safe driving behavior and smooth steering manoeuvring, is proposed as an empowerment of autonomous driving technologies. The training data is collected from a front-facing camera and the steering commands issued by an experienced driver driving in traffic as well as urban roads. Methods: This data is then used to train the proposed CNN to facilitate what it is called “Behavioral Cloning”. The proposed Behavior Cloning CNN is named as “BCNet”, and its deep seventeen-layer architecture has been selected after extensive trials. The BCNet got trained using Adam’s optimization algorithm as a variant of the Stochastic Gradient Descent (SGD) technique. Results: The paper goes through the development and training process in details and shows the image processing pipeline harnessed in the development. Conclusion: The proposed approach proved successful in cloning the driving behavior embedded in the training data set after extensive simulations.


2010 ◽  
Vol 17 (4) ◽  
pp. 550-559 ◽  
Author(s):  
C. Hintermüller ◽  
F. Marone ◽  
A. Isenegger ◽  
M. Stampanoni

GigaScience ◽  
2017 ◽  
Vol 6 (2) ◽  
Author(s):  
Mohamed Mysara ◽  
Mercy Njima ◽  
Natalie Leys ◽  
Jeroen Raes ◽  
Pieter Monsieurs

2012 ◽  
Vol 396 (3) ◽  
pp. 032121 ◽  
Author(s):  
S Zimmer ◽  
L Arrabito ◽  
T Glanzman ◽  
T Johnson ◽  
C Lavalley ◽  
...  

2021 ◽  
Author(s):  
Joseph H. Kennedy ◽  
Krik Hogenson ◽  
Andrew Johnston ◽  
Heidi Kristenson ◽  
Alex Lewandowski ◽  
...  

<p>Synthetic Aperture Radar (SAR), with its capability of imaging day or night, ability to penetrate dense cloud cover, and suitability for interferometry, is a robust dataset for event/change monitoring. SAR data can be used to inform decision makers dealing with natural and anthropogenic hazards such as floods, earthquakes, deforestation and glacier movement. However, SAR data has only recently become freely available with global coverage, and requires complex processing with specialized software to generate analysis-ready datasets. Furthermore, processing SAR is often resource-intensive, in terms of computing power and memory, and the sheer volume of data available for processing can be overwhelming. For example, ESA's Sentinel-1 has produced ~10PB of data since launch in 2014. Even subsetting the data to a small scientific area of interest can result in many thousands of scenes, which must be processed into an analysis-ready format.</p><p>The Alaska Satellite Facility (ASF) Hybrid Pluggable Processing Pipeline (HyP3), which is now out of beta and available to the public, provides custom, on-demand processing of Sentinel-1 SAR data at no cost to users. HyP3 is integrated directly into Vertex, ASF's primary data discovery tool, so users can easily select an area of interest on the Earth, find available SAR products, and click a button to send them (individually, or as a batch) to HyP3 for Radiometric Terrain Correction (RTC), Interferometric SAR (InSAR), or Change Detection processing. Processing leverages AWS cloud computing and is done in parallel for rapid product generation. Each process provides options to customize the processing and final output products, and provides metadata-rich, analysis-ready final products to users.</p><p>In addition to the Vertex user interface, HyP3 provides a RESTful API and a python software developers kit (SDK) to allow programmatic access and the ability to build HyP3 into user workflows. HyP3 is open source and designed to allow users to develop new processing plugins or stand up their own custom processing pipeline.</p><p>We will present an overview of using HyP3, both inside Vertex and programmatically, and the available output products. We will demonstrate using HyP3 to investigate the consequences of natural hazards and very briefly discuss the technologies and software design principles used in the development of HyP3 and how users could contribute new plugins, or stand up their own custom processing pipeline.</p>


2021 ◽  
Author(s):  
Abul Hasan ◽  
Mark Levene ◽  
David Weston ◽  
Renate Fromson ◽  
Nicolas Koslover ◽  
...  

BACKGROUND The COVID-19 pandemic has created a pressing need for integrating information from disparate sources, in order to assist decision makers. Social media is important in this respect, however, to make sense of the textual information it provides and be able to automate the processing of large amounts of data, natural language processing methods are needed. Social media posts are often noisy, yet they may provide valuable insights regarding the severity and prevalence of the disease in the population. In particular, machine learning techniques for triage and diagnosis could allow for a better understanding of what social media may offer in this respect. OBJECTIVE This study aims to develop an end-to-end natural language processing pipeline for triage and diagnosis of COVID-19 from patient-authored social media posts, in order to provide researchers and other interested parties with additional information on the symptoms, severity and prevalence of the disease. METHODS The text processing pipeline first extracts COVID-19 symptoms and related concepts such as severity, duration, negations, and body parts from patients’ posts using conditional random fields. An unsupervised rule-based algorithm is then applied to establish relations between concepts in the next step of the pipeline. The extracted concepts and relations are subsequently used to construct two different vector representations of each post. These vectors are applied separately to build support vector machine learning models to triage patients into three categories and diagnose them for COVID-19. RESULTS We report that Macro- and Micro-averaged F_{1\ }scores in the range of 71-96% and 61-87%, respectively, for the triage and diagnosis of COVID-19, when the models are trained on human labelled data. Our experimental results indicate that similar performance can be achieved when the models are trained using predicted labels from concept extraction and rule-based classifiers, thus yielding end-to-end machine learning. Also, we highlight important features uncovered by our diagnostic machine learning models and compare them with the most frequent symptoms revealed in another COVID-19 dataset. In particular, we found that the most important features are not always the most frequent ones. CONCLUSIONS Our preliminary results show that it is possible to automatically triage and diagnose patients for COVID-19 from natural language narratives using a machine learning pipeline, in order to provide additional information on the severity and prevalence of the disease through the eyes of social media.


2011 ◽  
Vol 2 (3) ◽  
pp. 137-156 ◽  
Author(s):  
Michael Schmidt ◽  
Marc Reichenbach ◽  
Andreas Loos ◽  
Dietmar Fey

Sign in / Sign up

Export Citation Format

Share Document