Delivering a machine learning course on HPC resources

In recent years, proficiency in data science and machine learning (ML) became one of the most requested skills for jobs in both industry and academy. Machine learning algorithms typically require large sets of data to train the models and extensive usage of computing resources, both for training and inference. Especially for deep learning algorithms, training performances can be dramatically improved by exploiting Graphical Processing Units (GPUs). The needed skill set for a data scientist is therefore extremely broad, and ranges from knowledge of ML models to distributed programming on heterogeneous resources. While most of the available training resources focus on ML algorithms and tools such as TensorFlow, we designed a course for doctoral students where model training is tightly coupled with underlying technologies that can be used to dynamically provision resources. Throughout the course, students have access to a dedicated cluster of computing nodes on local premises. A set of libraries and helper functions is provided to execute a parallelized ML task by automatically deploying a Spark driver and several Spark execution nodes as Docker containers. Task scheduling is managed by an orchestration layer (Kubernetes). This solution automates the delivery of the software stack required by a typical ML workflow and enables scalability by allowing the execution of ML tasks, including training, over commodity (i.e. CPUs) or high-performance (i.e. GPUs) resources distributed over different hosts across a network. The adaptation of the same model on OCCAM, the HPC facility at the University of Turin, is currently under development.

Download Full-text

Development of use-specific high-performance cyber-nanomaterial optical detectors by effective choice of machine learning algorithms

Machine Learning: Science and Technology ◽

10.1088/2632-2153/ab8967 ◽

2020 ◽

Vol 1 (2) ◽

pp. 025007 ◽

Cited By ~ 2

Author(s):

Davoud Hejazi ◽

Shuangjun Liu ◽

Amirreza Farnoosh ◽

Sarah Ostadabbas ◽

Swastik Kar

Keyword(s):

Machine Learning ◽

High Performance ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Optical Detectors ◽

Effective Choice

Download Full-text

Predicting the Risk of Hypertension Based on Several Easy-to-Collect Risk Factors: A Machine Learning Method

Frontiers in Public Health ◽

10.3389/fpubh.2021.619429 ◽

2021 ◽

Vol 9 ◽

Author(s):

Huanhuan Zhao ◽

Xiaoyu Zhang ◽

Yang Xu ◽

Lisheng Gao ◽

Zuchang Ma ◽

...

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Logistic Regression ◽

Risk Prediction ◽

Disease Risk ◽

Learning Algorithms ◽

Large Population ◽

Machine Learning Algorithms ◽

Hypertension Risk ◽

Model Training

Hypertension is a widespread chronic disease. Risk prediction of hypertension is an intervention that contributes to the early prevention and management of hypertension. The implementation of such intervention requires an effective and easy-to-implement hypertension risk prediction model. This study evaluated and compared the performance of four machine learning algorithms on predicting the risk of hypertension based on easy-to-collect risk factors. A dataset of 29,700 samples collected through a physical examination was used for model training and testing. Firstly, we identified easy-to-collect risk factors of hypertension, through univariate logistic regression analysis. Then, based on the selected features, 10-fold cross-validation was utilized to optimize four models, random forest (RF), CatBoost, MLP neural network and logistic regression (LR), to find the best hyper-parameters on the training set. Finally, the performance of models was evaluated by AUC, accuracy, sensitivity and specificity on the test set. The experimental results showed that the RF model outperformed the other three models, and achieved an AUC of 0.92, an accuracy of 0.82, a sensitivity of 0.83 and a specificity of 0.81. In addition, Body Mass Index (BMI), age, family history and waist circumference (WC) are the four primary risk factors of hypertension. These findings reveal that it is feasible to use machine learning algorithms, especially RF, to predict hypertension risk without clinical or genetic data. The technique can provide a non-invasive and economical way for the prevention and management of hypertension in a large population.

Download Full-text

Machine Learning Techniques for Internet of Things

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Integrating the Internet of Things Into Software Engineering Practices ◽

10.4018/978-1-5225-7790-4.ch008 ◽

2019 ◽

pp. 160-180

Author(s):

P. Priakanth ◽

S. Gopikrishnan

Keyword(s):

Machine Learning ◽

Data Science ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Independent Learning ◽

Machine Learning Techniques ◽

Analytical Models ◽

Guided Learning ◽

Learning Techniques ◽

Learning Machine

The idea of an intelligent, independent learning machine has fascinated humans for decades. The philosophy behind machine learning is to automate the creation of analytical models in order to enable algorithms to learn continuously with the help of available data. Since IoT will be among the major sources of new data, data science will make a great contribution to make IoT applications more intelligent. Machine learning can be applied in cases where the desired outcome is known (guided learning) or the data is not known beforehand (unguided learning) or the learning is the result of interaction between a model and the environment (reinforcement learning). This chapter answers the questions: How could machine learning algorithms be applied to IoT smart data? What is the taxonomy of machine learning algorithms that can be adopted in IoT? And what are IoT data characteristics in real-world which requires data analytics?

Download Full-text

A Literature Review on Thyroid Hormonal Problems in Women Using Data Science and Analytics

Advances in Data Mining and Database Management - Handbook of Research on Engineering, Business, and Healthcare Applications of Data Science and Analytics ◽

10.4018/978-1-7998-3053-5.ch021 ◽

2021 ◽

pp. 416-428

Author(s):

R. Suganya ◽

Rajaram S. ◽

Kameswari M.

Keyword(s):

Machine Learning ◽

Literature Review ◽

Data Science ◽

Learning Algorithms ◽

Research Literature ◽

Machine Learning Algorithms ◽

Thyroid Disorder ◽

Classification Models ◽

Indian Women ◽

Using Data

Currently, thyroid disorders are more common and widespread among women worldwide. In India, seven out of ten women are suffering from thyroid problems. Various research literature studies predict that about 35% of Indian women are examined with prevalent goiter. It is very necessary to take preventive measures at its early stages, otherwise it causes infertility problem among women. The recent review discusses various analytics models that are used to handle different types of thyroid problems in women. This chapter is planned to analyze and compare different classification models, both machine learning algorithms and deep leaning algorithms, to classify different thyroid problems. Literature from both machine learning and deep learning algorithms is considered. This literature review on thyroid problems will help to analyze the reason and characteristics of thyroid disorder. The dataset used to build and to validate the algorithms was provided by UCI machine learning repository.

Download Full-text