Hands-on training about overfitting

Overfitting is one of the critical problems in developing models by machine learning. With machine learning becoming an essential technology in computational biology, we must include training about overfitting in all courses that introduce this technology to students and practitioners. We here propose a hands-on training for overfitting that is suitable for introductory level courses and can be carried out on its own or embedded within any data science course. We use workflow-based design of machine learning pipelines, experimentation-based teaching, and hands-on approach that focuses on concepts rather than underlying mathematics. We here detail the data analysis workflows we use in training and motivate them from the viewpoint of teaching goals. Our proposed approach relies on Orange, an open-source data science toolbox that combines data visualization and machine learning, and that is tailored for education in machine learning and explorative data analysis.

Download Full-text

Workshop Preview: Data Analytics and Machine Learning Hackathon 2021: A deep dive into the open-source data challenge for E&P

The Leading Edge ◽

10.1190/tle40010068.1 ◽

2021 ◽

Vol 40 (1) ◽

pp. 68-71

Author(s):

Haibin Di ◽

Anisha Kaul ◽

Leigh Truelove ◽

Weichang Li ◽

Wenyi Hu ◽

...

Keyword(s):

Machine Learning ◽

Data Analytics ◽

Learning Experience ◽

Data Set ◽

Deep Dive ◽

Open Source Data ◽

Research Workshop ◽

Hands On ◽

Source Data ◽

Exploration And Production

We present a data challenge as part of the hackathon planned for the August 2021 SEG Research Workshop on Data Analytics and Machine Learning for Exploration and Production. The hackathon aims to provide hands-on machine learning experience for beginners and advanced practitioners, using a relatively well-defined problem and a carefully curated data set. The seismic data are from New Zealand's Taranaki Basin. The labels for a subset of the data have been generated by an experienced geologist. The objective of the challenge is to develop innovative machine learning solutions to identify key horizons.

Download Full-text

VISUALIZATION OF E-COMMERCE DATA USING AUGMENTED REALITY

Journal of Information System and Technology Management ◽

10.35631/jistm.519009 ◽

2020 ◽

Vol 5 (19) ◽

pp. 104-122

Author(s):

Azzan Amin ◽

Haslina Arshad ◽

Ummul Hanan Mohamad

Keyword(s):

Machine Learning ◽

Augmented Reality ◽

Data Visualization ◽

Data Science ◽

Moving Average ◽

Open Data ◽

Visual Presentation ◽

Depth Dimension ◽

2D Data ◽

Visualization Application

Data visualization is viewed as a significant element in data analysis and communication. As the data engagement becomes more and more complex, visual presentation of data does help users understand the data. So far, two-dimensional (2D) data visuals are often used for the data visualization process, but the lack of depth dimension leads to inefficient and limited understanding of the data. Therefore, the effectiveness of augmented reality (AR) in data visualization was studied through the development of an AR Data Visualization application using E-commerce data. Machine learning models are also involved in the development of this AR application for the provision of data using predictive analysis functions. To provide quality E-commerce data and an optimal machine learning model, the data science process is carried out using the python programming language. The E-commerce data selected for this study is open data taken through the Kaggle Website. This database has 9994 data numbers and 21 attributes. This AR data visualization application will make it easier for users to understand the E-commerce data in-depth through the use of AR technology and be able to visualize the forecasts for sales profit based on the algorithm model "Auto-Regressive Integrated Moving Average" (ARIMA).

Download Full-text

Open-source data analysis and visualization software platform: SAGUARO

10.1117/12.894908 ◽

2011 ◽

Cited By ~ 3

Author(s):

Dae Wook Kim ◽

Benjamin J. Lewis ◽

James H. Burge

Keyword(s):

Data Analysis ◽

Open Source ◽

Software Platform ◽

Visualization Software ◽

Open Source Data ◽

Source Data

Download Full-text

A Bayesian machine learning model for estimating building occupancy from open source data

Natural Hazards ◽

10.1007/s11069-016-2164-9 ◽

2016 ◽

Vol 81 (3) ◽

pp. 1929-1956 ◽

Cited By ~ 8

Author(s):

Robert Stewart ◽

Marie Urban ◽

Samantha Duchscherer ◽

Jason Kaufman ◽

April Morton ◽

...

Keyword(s):

Machine Learning ◽

Open Source ◽

Learning Model ◽

Open Source Data ◽

Machine Learning Model ◽

Source Data ◽

Building Occupancy ◽

Bayesian Machine Learning

Download Full-text

openMNGlab: Data Analysis Framework for Microneurography – A Technical Report

10.3233/shti210556 ◽

2021 ◽

Author(s):

Fabian Schlebusch ◽

Frederic Kehrein ◽

Rainer Röhrig ◽

Barbara Namer ◽

Ekaterina Kutafina

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Data Acquisition ◽

Data Visualization ◽

Fiber Tracking ◽

Data Formats ◽

Technical Report ◽

Electrophysiological Technique ◽

Feature Database ◽

Major Data

openMNGlab is an open-source software framework for data analysis, tailored for the specific needs of microneurography – a type of electrophysiological technique particularly important for research on peripheral neural fibers coding. Currently, openMNGlab loads data from Spike2 and Dapsys, which are two major data acquisition solutions. By building on top of the Neo software, openMNGlab can be easily extended to handle the most common electrophysiological data formats. Furthermore, it provides methods for data visualization, fiber tracking, and a modular feature database to extract features for data analysis and machine learning.

Download Full-text

Data Visualization on Crime Against Women

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1014.0782s419 ◽

2019 ◽

Vol 8 (2S4) ◽

pp. 83-85

Keyword(s):

Machine Learning ◽

Data Visualization ◽

Data Science ◽

Learning Algorithm ◽

Current Data ◽

Machine Learning Algorithm ◽

Graphical Approach ◽

Crime Data ◽

Statistical Variance

Visualization ensures the modern expectation of all forms of data. It is important to understand the data and its statistical variance graphically. Visualization on crime data would be supportive to analyze and prevent the threats in society. According to recent surveys and records, India has undergone many crime issues which occur on women. In order to prevent and analyze the crime issues against women, Data visualization is a useful approach to deal with it. The current data technologies available are appropriate to accomplish the task of visualization for women safety. Efficient visualization with effective machine learning algorithm and its performance finds the response for data related requests in the field of data science. This paper clarifies the details of crime against women through a graphical approach and illustrates about how to notify the unsafe levels by alert to safeguard the women

Download Full-text

Machine Learning Approach to Extracting Emotions Information from Open Source Data for Relative Forecasting of Stock Prices

2018 10th Computer Science and Electronic Engineering (CEEC) ◽

10.1109/ceec.2018.8674180 ◽

2018 ◽

Cited By ~ 1

Author(s):

Ashish Bhatia ◽

Hani Hagras ◽

Jason J. Lepley

Keyword(s):

Machine Learning ◽

Open Source ◽

Stock Prices ◽

Learning Approach ◽

Open Source Data ◽

Machine Learning Approach ◽

Source Data

Download Full-text

Airbnb (Air Bed and Breakfast) Listing Analysis Through Machine Learning Techniques

10.4018/978-1-7998-8455-2.ch008 ◽

2022 ◽

pp. 209-232

Author(s):

Xiang Li ◽

Jingxi Liao ◽

Tianchuan Gao

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Data Science ◽

Principal Component ◽

Machine Learning Techniques ◽

Classification Models ◽

Performance Measurements ◽

Learning Techniques ◽

Source Data ◽

Bed And Breakfast

Machine learning is a broad field that contains multiple fields of discipline including mathematics, computer science, and data science. Some of the concepts, like deep neural networks, can be complicated and difficult to explain in several words. This chapter focuses on essential methods like classification from supervised learning, clustering, and dimensionality reduction that can be easily interpreted and explained in an acceptable way for beginners. In this chapter, data for Airbnb (Air Bed and Breakfast) listings in London are used as the source data to study the effect of each machine learning technique. By using the K-means clustering, principal component analysis (PCA), random forest, and other methods to help build classification models from the features, it is able to predict the classification results and provide some performance measurements to test the model.

Download Full-text

The democratization of data science education

10.7287/peerj.preprints.3195v1 ◽

2017 ◽

Cited By ~ 1

Author(s):

Sean Kross ◽

Roger D Peng ◽

Brian S Caffo ◽

Ira Gooding ◽

Jeffrey T Leek

Keyword(s):

Machine Learning ◽

Science Education ◽

Data Analysis ◽

Data Science ◽

Online Data ◽

The Past ◽

The Us ◽

Science Curricula ◽

The Impact ◽

And Training

Over the last three decades data has become ubiquitous and cheap. This transition has accelerated over the last five years and training in statistics, machine learning, and data analysis have struggled to keep up. In April 2014 we launched a program of nine courses, the Johns Hopkins Data Science Specialization, which has now had more than 4 million enrollments over the past three years. Here the program is described and compared to both standard and more recently developed data science curricula. We show that novel pedagogical and administrative decisions introduced in our program are now standard in online data science programs. The impact of the Data Science Specialization on data science education in the US is also discussed. Finally we conclude with some thoughts about the future of data science education in a data democratized world.

Download Full-text

Interferometric SAR and Machine Learning: Using Open Source Data to Detect Archaeological Looting and Destruction

Journal of Computer Applications in Archaeology ◽

10.5334/jcaa.70 ◽

2021 ◽

Vol 4 (1) ◽

pp. 47-62

Author(s):

Hassan El-Hajj

Keyword(s):

Machine Learning ◽

Open Source ◽

Open Source Data ◽

Source Data ◽

Interferometric Sar

Download Full-text