Reusable Component Retrieval from a Large Repository Using Word2Vec with Continuous Bag of Words
A reusable code component is the one which can be easily used with a little or no adaptation to fit in to the application being developed. The major concern in such process is the maintenance of these reusable components in one place called ‘Repository’, so that those code components can be effectively identified as well as reused. Word embedding allows us to numerically represent our textual information. They have become so pervasive that almost all Natural Language Processing projects make use of them. In this work, we considered to use Word2Vec concept to find vector representation of features of a reusable component. The features of a reusable component in the form of sequence of words are input to Word2Vec network. Our method using Word2Vec with Continuous Bag of Words out performs existing method in the market. The proposed methodology has shown an accuracy of 94.8% in identifying the existing reusable component.