Building Lexical Resources for Dialectical Arabic
The natural language processing of Arabic dialects faces a major difficulty, which is the lack of lexical resources. This problem complicates the penetration and the business of related technologies such as machine translation, speech recognition, and sentiment analysis. Current solutions frequently use lexica, which are specific to the task at hand and limited to some language variety. Modern communication platforms including social media gather people from different nations and regions. This has increased the demand for general-purpose lexica towards effective natural language processing solutions. This chapter presents a collaborative web-based platform for building a cross-dialectical, general-purpose lexicon for Arabic dialects. This solution was tested by a team of two annotators, a reviewer, and a lexicographer. The lexicon expansion rate was measured and analyzed to estimate the overhead required to reach the desired size of the lexicon. The inter-annotator reliability was analyzed using Cohen's Kappa.