From word embeddings to document similarities for improved information retrieval in software engineering

Word embeddings are ubiquitous in NLP and information retrieval, but it is unclear what they represent when the word is polysemous. Here it is shown that multiple word senses reside in linear superposition within the word embedding and simple sparse coding can recover vectors that approximately capture the senses. The success of our approach, which applies to several embedding methods, is mathematically explained using a variant of the random walk on discourses model (Arora et al., 2016). A novel aspect of our technique is that each extracted word sense is accompanied by one of about 2000 “discourse atoms” that gives a succinct description of which other words co-occur with that word sense. Discourse atoms can be of independent interest, and make the method potentially more useful. Empirical tests are used to verify and support the theory.

Download Full-text

Configuring and Assembling Information Retrieval Based Solutions for Software Engineering Tasks

2016 IEEE International Conference on Software Maintenance and Evolution (ICSME) ◽

10.1109/icsme.2016.85 ◽

2016 ◽

Author(s):

Bogdan Dit

Keyword(s):

Information Retrieval ◽

Software Engineering

Download Full-text

Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings

Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR '15 ◽

10.1145/2766462.2767752 ◽

2015 ◽

Cited By ~ 73

Author(s):

Ivan Vulić ◽

Marie-Francine Moens

Keyword(s):

Information Retrieval ◽

Word Embeddings ◽

Retrieval Models ◽

Cross Lingual

Download Full-text

Integrating and Evaluating Neural Word Embeddings in Information Retrieval

Proceedings of the 20th Australasian Document Computing Symposium on ZZZ - ADCS '15 ◽

10.1145/2838931.2838936 ◽

2015 ◽

Cited By ~ 45

Author(s):

Guido Zuccon ◽

Bevan Koopman ◽

Peter Bruza ◽

Leif Azzopardi

Keyword(s):

Information Retrieval ◽

Word Embeddings

Download Full-text

Word embeddings for the software engineering domain

Proceedings of the 15th International Conference on Mining Software Repositories - MSR '18 ◽

10.1145/3196398.3196448 ◽

2018 ◽

Cited By ~ 16

Author(s):

Vasiliki Efstathiou ◽

Christos Chatzilenas ◽

Diomidis Spinellis

Keyword(s):

Software Engineering ◽

Word Embeddings

Download Full-text

Deep Learning

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Deep Learning Techniques and Optimization Strategies in Big Data Analytics ◽

10.4018/978-1-7998-1192-3.ch008 ◽

2020 ◽

pp. 124-141 ◽

Cited By ~ 3

Author(s):

Menaga D. ◽

Revathi S.

Keyword(s):

Information Retrieval ◽

Deep Learning ◽

Software Engineering ◽

Multimedia Information ◽

Research Area ◽

Multimedia Application ◽

Information Process ◽

Display Devices ◽

Storage Devices ◽

Major Application

Multimedia application is a significant and growing research area because of the advances in technology of software engineering, storage devices, networks, and display devices. With the intention of satisfying multimedia information desires of users, it is essential to build an efficient multimedia information process, access, and analysis applications, which maintain various tasks, like retrieval, recommendation, search, classification, and clustering. Deep learning is an emerging technique in the sphere of multimedia information process, which solves both the crisis of conventional and recent researches. The main aim is to resolve the multimedia-related problems by the use of deep learning. The deep learning revolution is discussed with the depiction and feature. Finally, the major application also explained with respect to different fields. This chapter analyzes the crisis of retrieval after providing the successful discussion of multimedia information retrieval that is the ability of retrieving an object of every multimedia.

Download Full-text

Semantically enhanced term frequency based on word embeddings for Arabic information retrieval

2016 4th IEEE International Colloquium on Information Science and Technology (CiSt) ◽

10.1109/cist.2016.7805076 ◽

2016 ◽

Cited By ~ 1

Author(s):

Abdelkader El Mahdaouy ◽

Said Ouatik El Alaoui ◽

Eric Gaussier

Keyword(s):

Information Retrieval ◽

Word Embeddings ◽

Term Frequency ◽

Arabic Information Retrieval ◽

Semantically Enhanced

Download Full-text

Crawling Wikipedia Pages to Train Word Embeddings Model for Software Engineering Domain

14th Innovations in Software Engineering Conference (formerly known as India Software Engineering Conference) ◽

10.1145/3452383.3452401 ◽

2021 ◽

Author(s):

Siba Mishra ◽

Arpit Sharma

Keyword(s):

Software Engineering ◽

Word Embeddings

Download Full-text

Supporting evidence-based Software Engineering with collaborative information retrieval

Proceedings of the 6th International ICST Conference on Collaborative Computing: Networking, Applications, Worksharing ◽

10.4108/icst.collaboratecom.2010.9 ◽

2010 ◽

Cited By ~ 3

Author(s):

Heri Ramampiaro ◽

Daniela Cruzes ◽

Reidar Conradi ◽

Manoel Mendona

Keyword(s):

Information Retrieval ◽

Software Engineering ◽

Evidence Based ◽

Supporting Evidence ◽

Collaborative Information Retrieval

Download Full-text

A Data-Driven Strategy to Combine Word Embeddings in Information Retrieval

10.5121/csit.2021.110107 ◽

2021 ◽

Author(s):

Alfredo Silva ◽

Marcelo Mendoza

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Language Processing ◽

Ad Hoc ◽

Data Driven ◽

Word Embeddings ◽

Continuous Vector ◽

Benchmark Data ◽

Promising Line ◽

Vector Representations

Word embeddings are vital descriptors of words in unigram representations of documents for many tasks in natural language processing and information retrieval. The representation of queries has been one of the most critical challenges in this area because it consists of a few terms and has little descriptive capacity. Strategies such as average word embeddings can enrich the queries' descriptive capacity since they favor the identification of related terms from the continuous vector representations that characterize these approaches. We propose a datadriven strategy to combine word embeddings. We use Idf combinations of embeddings to represent queries, showing that these representations outperform the average word embeddings recently proposed in the literature. Experimental results on benchmark data show that our proposal performs well, suggesting that data-driven combinations of word embeddings are a promising line of research in ad-hoc information retrieval.

Download Full-text

From word embeddings to document similarities for improved information retrieval in software engineering

Linear Algebraic Structure of Word Senses, with Applications to Polysemy

Configuring and Assembling Information Retrieval Based Solutions for Software Engineering Tasks

Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings

Integrating and Evaluating Neural Word Embeddings in Information Retrieval

Word embeddings for the software engineering domain

Deep Learning

Semantically enhanced term frequency based on word embeddings for Arabic information retrieval

Crawling Wikipedia Pages to Train Word Embeddings Model for Software Engineering Domain

Supporting evidence-based Software Engineering with collaborative information retrieval

A Data-Driven Strategy to Combine Word Embeddings in Information Retrieval

Export Citation Format