The availability of realistic network data plays a significant role in fostering collaboration and ensuring U.S. technical leadership in network security research. Unfortunately, a host of technical, legal, policy, and privacy issues limit the ability of operators to produce
datasets for information security testing. In an effort to help overcome these limitations, several data collection efforts (e.g., CRAWDAD[14], PREDICT [34]) have been established in the past few years. The key principle used in all of these efforts to assure low-risk, high-value data is that of trace anonymization—the process of sanitizing data before release so that potentially sensitive information cannot be extracted.