People spontaneously divide everyday experience into smaller units (event segmentation). To measure event segmentation, studies typically ask participants to explicitly mark the boundary between events as they watch a movie (segmentation task). Their data may then be used to infer how others are likely to segment the same movie. However, significant variability in performance across individuals could undermine the ability to generalize across groups, especially as more research moves online. To address this concern, we used several widely employed and novel measures to quantify segmentation agreement across different sized groups (n=2-32) using data collected on different platforms and movie types (in-lab & commercial film vs. online & everyday activities). All agreement measures captured non-random and video-specific boundary identification with sample sizes as small as 2, though with notable between sample variability. As sample size increased, agreement values improved and eventually stabilized. Stabilization occurred at smaller sample sizes when measures reflected (1) agreement between two groups versus agreement between an individual and group, (2) boundary identification between small (fine-grained) rather than large (coarse-grained) events, and (3) segmentation of everyday activities online versus commercial film in-lab. These analyses inform the tailoring of sample sizes based on the comparison of interest, materials, and data collection platform. In addition to demonstrating the reliability of online as well as in-lab segmentation performance at moderate sample sizes, this study supports the use of these data as a means of inferring when everyday activities and commercial films are likely to be segmented.