Automated Machine Learning for Business
Latest Publications


TOTAL DOCUMENTS

6
(FIVE YEARS 6)

H-INDEX

0
(FIVE YEARS 0)

Published By Oxford University Press

9780190941659, 9780197601495

Author(s):  
Kai R. Larsen ◽  
Daniel S. Becker

After preparing your dataset, the business problem should be quite familiar, along with the subject matter and the content of the dataset. This section is about modeling data, using data to train algorithms to create models that can be used to predict future events or understand past events. The section shows where data modeling fits in the overall machine learning pipeline. Traditionally, we store real-world data in one or more databases or files. This data is extracted, and features and a target (T) are created and submitted to the “Model Data” stage (the topic of this section). Following the completion of this stage, the model produced is examined (Section V) and placed into production. With the model in the production system, present data generated from the real-world environment is inputted into the system. In the example case of a diabetes patient, we enter a new patient’s information electronic health record into the system, and a database lookup retrieves additional data for feature creation.


Author(s):  
Kai R. Larsen ◽  
Daniel S. Becker

Access to additional and relevant data will lead to better predictions from algorithms until we reach the point where more observations (cases) are no longer helpful to detect the signal, the feature(s), or conditions that inform the target. In addition to obtaining more observations, we can also look for additional features of interest that we do not currently have, at which point it will invariably be necessary to integrate data from different sources. This section introduces this process of data integration, starting with an introduction of two methods: “joins” (to access more features) and “unions” (to access more observations) and continues on to cover regular expressions, data summarization, crosstabs, data reduction and splitting, and data wrangling in all its flavors.


Author(s):  
Kai R. Larsen ◽  
Daniel S. Becker

Having evaluated all the measures and selected the best model for this case, and much of the machine learning process has been clarified, our understanding of the problem context is still relatively immature. That is, while we have carefully specified the problem, we still do not fully understand what drives that target. Convincing management to support the implementation of the model typically includes explaining the answers to “why,” “what,” “where,” and “when” questions embedded in the model. While the model may be the best overall possible model according to selected measures, for the particular problem related to hospital readmissions, it is still not clear why the model predicts the readmission of some patients will be readmitted and that others will not. It also remains unknown what features drive these outcomes, where the patients who were readmitted come from, or whether or not this is relevant. In this case, access to time information is also unavailable––when, so it is not relevant, but it is easy to imagine that patients admitted in the middle of the night might have worse outcomes due to tired staff or lack of access to the best physicians. If we can convince management that the current analysis is useful, we can likely also make a case for the collection of additional data. The new data might include more information on past interactions with this patient, as well as date and time information to test the hypothesis about the effect of time-of-admission and whether the specific staff caring for a patient matters.


Author(s):  
Kai R. Larsen ◽  
Daniel S. Becker

This section covers the first steps of a the Machine Learning Life Cycle Model; how to specify a business problem, acquire subject matter expertise, define prediction target, define unit of analysis, identify success criteria, evaluate risks, and finally, decide whether to continue a project. Focus is on who will use the model, whether management is supportive, whether the drivers of the model can be visualized, and how much value a model can produce.


Author(s):  
Kai R. Larsen ◽  
Daniel S. Becker

This section covers the final section of the machine learning life cycle. Consider these the most important steps of the entire process. This is the point at which we have the greatest potential to help our organization reap the benefits of machine learning. In traditional information systems development, 60–80% of the cost of a system comes during the maintenance phase, so these steps are important. This section covers how to deploy a machine learning model, as well as documenting and maintaining this model. A chapter covers the seven types of target leakage followed by time-aware validation and time-series analysis.


Author(s):  
Kai R. Larsen ◽  
Daniel S. Becker

Machine learning is involved in search, translation, detecting depression, likelihood of college dropout, finding lost children, and to sell all kinds of products. While barely beyond its inception, the current machine learning revolution will affect people and organizations no less than the Industrial Revolution’s effect on weavers and many other skilled laborers. Machine learning will automate hundreds of millions of jobs that were considered too complex for machines ever to take over even a decade ago, including driving, flying, painting, programming, and customer service, as well as many of the jobs previously reserved for humans in the fields of finance, marketing, operations, accounting, and human resources. This section explains how automated machine learning addresses exploratory data analysis, feature engineering, algorithm selection, hyperparameter tuning, and model diagnostics. The section covers the eight criteria considered essential for AutoML to have significant impact: accuracy, productivity, ease of use, understanding and learning, resource availability, process transparency, generalization, and recommended actions.


Sign in / Sign up

Export Citation Format

Share Document