Acquire and Integrate Data
Access to additional and relevant data will lead to better predictions from algorithms until we reach the point where more observations (cases) are no longer helpful to detect the signal, the feature(s), or conditions that inform the target. In addition to obtaining more observations, we can also look for additional features of interest that we do not currently have, at which point it will invariably be necessary to integrate data from different sources. This section introduces this process of data integration, starting with an introduction of two methods: “joins” (to access more features) and “unions” (to access more observations) and continues on to cover regular expressions, data summarization, crosstabs, data reduction and splitting, and data wrangling in all its flavors.