Semantic web services have received a significant amount of attention in the last years and many frameworks, algorithms and tools leveraging them have been proposed. Nevertheless surprisingly little effort has been put into the evaluation of the approaches so far. The main blocker of thorough evaluations is the lack of large and diverse test collections of semantic web services. In this paper we analyze requirements on such collections and shortcomings of the state-of-the-art in this respect. Our contribution to overcoming those shortcomings is OPOSSum, a portal to support the community to build the necessary standard semantic web service test collections in a collaborative way. We discuss how existing test collections have been integrated with OPOSSum, showcase the benefits of OPOSSum by an illustrative use case and outline next steps towards better standard test collections of semantic web services.