Fundamentals of Machine Learning

Author(s):  
Thomas P. Trappenberg

Machine learning is exploding, both in research and for industrial applications. This book aims to be a brief introduction to this area given the importance of this topic in many disciplines, from sciences to engineering, and even for its broader impact on our society. This book tries to contribute with a style that keeps a balance between brevity of explanations, the rigor of mathematical arguments, and outlining principle ideas. At the same time, this book tries to give some comprehensive overview of a variety of methods to see their relation on specialization within this area. This includes some introduction to Bayesian approaches to modeling as well as deep learning. Writing small programs to apply machine learning techniques is made easy today by the availability of high-level programming systems. This book offers examples in Python with the machine learning libraries sklearn and Keras. The first four chapters concentrate largely on the practical side of applying machine learning techniques. The book then discusses more fundamental concepts and includes their formulation in a probabilistic context. This is followed by chapters on advanced models, that of recurrent neural networks and that of reinforcement learning. The book closes with a brief discussion on the impact of machine learning and AI on our society.

2019 ◽  
Vol 19 (11) ◽  
pp. 2541-2549
Author(s):  
Chris Houser ◽  
Jacob Lehner ◽  
Nathan Cherry ◽  
Phil Wernette

Abstract. Rip currents and other surf hazards are an emerging public health issue globally. Lifeguards, warning flags, and signs are important, and to varying degrees they are effective strategies to minimize risk to beach users. In the United States and other jurisdictions around the world, lifeguards use coloured flags (green, yellow, and red) to indicate whether the danger posed by the surf and rip hazard is low, moderate, or high respectively. The choice of flag depends on the lifeguard(s) monitoring the changing surf conditions along the beach and over the course of the day using both regional surf forecasts and careful observation. There is a potential that the chosen flag is not consistent with the beach user perception of the risk, which may increase the potential for rescues or drownings. In this study, machine learning is used to determine the potential for error in the flags used at Pensacola Beach and the impact of that error on the number of rescues. Results of a decision tree analysis indicate that the colour flag chosen by the lifeguards was different from what the model predicted for 35 % of days between 2004 and 2008 (n=396/1125). Days when there is a difference between the predicted and posted flag colour represent only 17 % of all rescue days, but those days are associated with ∼60 % of all rescues between 2004 and 2008. Further analysis reveals that the largest number of rescue days and total number of rescues are associated with days where the flag deployed over-estimated the surf and hazard risk, such as a red or yellow flag flying when the model predicted a green flag would be more appropriate based on the wind and wave forcing alone. While it is possible that the lifeguards were overly cautious, it is argued that they most likely identified a rip forced by a transverse-bar and rip morphology common at the study site. Regardless, the results suggest that beach users may be discounting lifeguard warnings if the flag colour is not consistent with how they perceive the surf hazard or the regional forecast. Results suggest that machine learning techniques have the potential to support lifeguards and thereby reduce the number of rescues and drownings.


Author(s):  
Jasleen Kaur Sethi ◽  
Mamta Mittal

ABSTRACT Objective: The focus of this study is to monitor the effect of lockdown on the various air pollutants due to the coronavirus disease (COVID-19) pandemic and identify the ones that affect COVID-19 fatalities so that measures to control the pollution could be enforced. Methods: Various machine learning techniques: Decision Trees, Linear Regression, and Random Forest have been applied to correlate air pollutants and COVID-19 fatalities in Delhi. Furthermore, a comparison between the concentration of various air pollutants and the air quality index during the lockdown period and last two years, 2018 and 2019, has been presented. Results: From the experimental work, it has been observed that the pollutants ozone and toluene have increased during the lockdown period. It has also been deduced that the pollutants that may impact the mortalities due to COVID-19 are ozone, NH3, NO2, and PM10. Conclusions: The novel coronavirus has led to environmental restoration due to lockdown. However, there is a need to impose measures to control ozone pollution, as there has been a significant increase in its concentration and it also impacts the COVID-19 mortality rate.


Author(s):  
Christine A. Toh ◽  
Elizabeth M. Starkey ◽  
Conrad S. Tucker ◽  
Scarlett R. Miller

The emergence of ideation methods that generate large volumes of early-phase ideas has led to a need for reliable and efficient metrics for measuring the creativity of these ideas. However, existing methods of human judgment-based creativity assessments, as well as numeric model-based creativity assessment approaches suffer from low reliability and prohibitive computational burdens on human raters due to the high level of human input needed to calculate creativity scores. In addition, there is a need for an efficient method of computing the creativity of large sets of design ideas typically generated during the design process. This paper focuses on developing and empirically testing a machine learning approach for computing design creativity of large sets of design ideas to increase the efficiency and reliability of creativity evaluation methods in design research. The results of this study show that machine learning techniques can predict creativity of ideas with relatively high accuracy and sensitivity. These findings show that machine learning has the potential to be used for rating the creativity of ideas generated based on their descriptions.


Author(s):  
Qi Wang ◽  
Xia Zhao ◽  
Jincai Huang ◽  
Yanghe Feng ◽  
Zhong Liu ◽  
...  

The concept of ‘big data’ has been widely discussed, and its value has been illuminated throughout a variety of domains. To quickly mine potential values and alleviate the ever-increasing volume of information, machine learning is playing an increasingly important role and faces more challenges than ever. Because few studies exist regarding how to modify machine learning techniques to accommodate big data environments, we provide a comprehensive overview of the history of the evolution of big data, the foundations of machine learning, and the bottlenecks and trends of machine learning in the big data era. More specifically, based on learning principals, we discuss regularization to enhance generalization. The challenges of quality in big data are reduced to the curse of dimensionality, class imbalances, concept drift and label noise, and the underlying reasons and mainstream methodologies to address these challenges are introduced. Learning model development has been driven by domain specifics, dataset complexities, and the presence or absence of human involvement. In this paper, we propose a robust learning paradigm by aggregating the aforementioned factors. Over the next few decades, we believe that these perspectives will lead to novel ideas and encourage more studies aimed at incorporating knowledge and establishing data-driven learning systems that involve both data quality considerations and human interactions.


2021 ◽  
Author(s):  
Thiago Abdo ◽  
Fabiano Silva

The purpose of this paper is to analyze the use of different machine learning approaches and algorithms to be integrated as an automated assistance on a tool to aid the creation of new annotated datasets. We evaluate how they scale in an environment without dedicated machine learning hardware. In particular, we study the impact over a dataset with few examples and one that is being constructed. We experiment using deep learning algorithms (Bert) and classical learning algorithms with a lower computational cost (W2V and Glove combined with RF and SVM). Our experiments show that deep learning algorithms have a performance advantage over classical techniques. However, deep learning algorithms have a high computational cost, making them inadequate to an environment with reduced hardware resources. Simulations using Active and Iterative machine learning techniques to assist the creation of new datasets are conducted. For these simulations, we use the classical learning algorithms because of their computational cost. The knowledge gathered with our experimental evaluation aims to support the creation of a tool for building new text datasets.


2018 ◽  
Vol 10 (1) ◽  
pp. 58-72 ◽  
Author(s):  
Muhammad Rizwan Rashid Rana ◽  
Asif Nawaz ◽  
Javed Iqbal

Abstract Sentiment classification is the process of exploring sentiments, emotions, ideas and thoughts in the sentences which are expressed by the people. Sentiment classification allows us to judge the sentiments and feelings of the peoples by analyzing their reviews, social media comments etc. about all the aspects. Machine learning techniques and Lexicon based techniques are being mostly used in sentiment classification to predict sentiments from customers reviews and comments. Machine learning techniques includes several learning algorithms to judge the sentiments i.e Navie bayes, support vector machines etc whereas Lexicon Based techniques includes SentiWordnet, Wordnet etc. The main target of this survey is to give nearly full image of sentiment classification techniques. Survey paper provides the comprehensive overview of recent and past research on sentiment classification and provides excellent research queries and approaches for future aspects


2020 ◽  
Vol 163 ◽  
pp. 06009
Author(s):  
Evgeniy Malygin ◽  
Mikhail Lychagin

This study proposes an approach for simulation of heavy metal concentration in river waters using machine learning techniques. A regression model was built and it captured the relationship between the concentration of heavy metal and metalloids (HMM) and several characteristics of studied catchment. Machine learning techniques allowed to simulate the annual concentration variability of HMM. This approach allows exploring the impact of different factors on studied processes.


Author(s):  
Dinesh Rathi

This study investigates and characterizes the impact of different features of email on effective routing of email to domain experts. The findings of the study would help in understanding how machine learning techniques such as classification could be applied effectively to develop better automatic triage process in digital reference service.Cette étude examine et caractérise l'impact de différentes caractéristiques des courriels sur leur acheminement efficace aux experts du domaine. Les résultats de l'étude permettraient de comprendre comment les techniques d'apprentissages machine comme la classification pourraient être appliquées efficacement afin de développer de meilleurs processus de triage automatique pour les services de référence numérique. 


Author(s):  
Prakhar Mehrotra

The objective of this chapter is to discuss the integration of advancements made in the field of artificial intelligence into the existing business intelligence tools. Specifically, it discusses how the business intelligence tool can integrate time series analysis, supervised and unsupervised machine learning techniques and natural language processing in it and unlock deeper insights, make predictions, and execute strategic business action from within the tool itself. This chapter also provides a high-level overview of current state of the art AI techniques and provides examples in the realm of business intelligence. The eventual goal of this chapter is to leave readers thinking about what the future of business intelligence would look like and how enterprise can benefit by integrating AI in it.


Sign in / Sign up

Export Citation Format

Share Document