Filling a vacancy takes a lot of (costly) time. Automated
preprocessing of applications using artificial intelligence technology can
help to save time, e.g., by analyzing applications using machine learning
algorithms. We investigate whether such systems are potentially biased
in terms of gender, origin, and nobility. Using a corpus of common German reference letter sentences, we investigate two research questions.
First, we test sentiment analysis systems offered by Amazon, Google,
IBM and Microsoft. All tested services rate the sentiment of the same
template sentences very inconsistently and biased at least with regard to
gender. Second, we examine the impact of (im-)balanced training data
sets on classifiers, which are trained to estimate the sentiment of sentences from our corpus. This experiment shows that imbalanced data,
on the one hand, lead to biased results, but on the other hand, under
certain conditions, can lead to fair results.