E-mail classification with machine learning and word embeddings for improved customer support

AbstractMachine learning has emerged as a powerful approach in materials discovery. Its major challenge is selecting features that create interpretable representations of materials, useful across multiple prediction tasks. We introduce an end-to-end machine learning model that automatically generates descriptors that capture a complex representation of a material’s structure and chemistry. This approach builds on computational topology techniques (namely, persistent homology) and word embeddings from natural language processing. It automatically encapsulates geometric and chemical information directly from the material system. We demonstrate our approach on multiple nanoporous metal–organic framework datasets by predicting methane and carbon dioxide adsorption across different conditions. Our results show considerable improvement in both accuracy and transferability across targets compared to models constructed from the commonly-used, manually-curated features, consistently achieving an average 25–30% decrease in root-mean-squared-deviation and an average increase of 40–50% in R2 scores. A key advantage of our approach is interpretability: Our model identifies the pores that correlate best to adsorption at different pressures, which contributes to understanding atomic-level structure–property relationships for materials design.

Download Full-text

Machine learning for financial transaction classification across companies using character‐level word embeddings of text fields

Intelligent Systems in Accounting Finance & Management ◽

10.1002/isaf.1500 ◽

2021 ◽

Author(s):

Rasmus Kær Jørgensen ◽

Christian Igel

Keyword(s):

Machine Learning ◽

Word Embeddings ◽

Financial Transaction

Download Full-text

Analyzing the Effect of Document Representation on Machine Learning Approaches in Multi-Class e-Mail Filtering

2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06) ◽

10.1109/wi.2006.41 ◽

2006 ◽

Cited By ~ 1

Author(s):

Helmut Berger ◽

Michael Dittenbach ◽

Dieter Merkl

Keyword(s):

Machine Learning ◽

Learning Approaches ◽

Document Representation ◽

E Mail ◽

Class E

Download Full-text

Comparing Deep-Learning Architectures and Traditional Machine-Learning Approaches for Satire Identification in Spanish Tweets

Mathematics ◽

10.3390/math8112075 ◽

2020 ◽

Vol 8 (11) ◽

pp. 2075

Author(s):

Óscar Apolinario-Arzube ◽

José Antonio García-Díaz ◽

José Medina-Moreira ◽

Harry Luna-Aveiga ◽

Rafael Valencia-García

Keyword(s):

Machine Learning ◽

Deep Learning ◽

User Interfaces ◽

State Of The Art ◽

Learning Approaches ◽

Word Embeddings ◽

Linguistic Features ◽

Intended Meaning ◽

Language User ◽

Learning Architectures

Automatic satire identification can help to identify texts in which the intended meaning differs from the literal meaning, improving tasks such as sentiment analysis, fake news detection or natural-language user interfaces. Typically, satire identification is performed by training a supervised classifier for finding linguistic clues that can determine whether a text is satirical or not. For this, the state-of-the-art relies on neural networks fed with word embeddings that are capable of learning interesting characteristics regarding the way humans communicate. However, as far as our knowledge goes, there are no comprehensive studies that evaluate these techniques in Spanish in the satire identification domain. Consequently, in this work we evaluate several deep-learning architectures with Spanish pre-trained word-embeddings and compare the results with strong baselines based on term-counting features. This evaluation is performed with two datasets that contain satirical and non-satirical tweets written in two Spanish variants: European Spanish and Mexican Spanish. Our experimentation revealed that term-counting features achieved similar results to deep-learning approaches based on word-embeddings, both outperforming previous results based on linguistic features. Our results suggest that term-counting features and traditional machine learning models provide competitive results regarding automatic satire identification, slightly outperforming state-of-the-art models.

Download Full-text

A Feature Based Simple Machine Learning Approach with Word Embeddings to Named Entity Recognition on Tweets

Natural Language Processing and Information Systems - Lecture Notes in Computer Science ◽

10.1007/978-3-319-59569-6_30 ◽

2017 ◽

pp. 254-259 ◽

Cited By ~ 2

Author(s):

Mete Taşpınar ◽

Murat Can Ganiz ◽

Tankut Acarman

Keyword(s):

Machine Learning ◽

Named Entity Recognition ◽

Entity Recognition ◽

Learning Approach ◽

Word Embeddings ◽

Named Entity ◽

Simple Machine ◽

Machine Learning Approach ◽

Feature Based

Download Full-text

Comparing Supervised Machine Learning Strategies and Linguistic Features to Search for Very Negative Opinions

Information ◽

10.3390/info10010016 ◽

2019 ◽

Vol 10 (1) ◽

pp. 16 ◽

Cited By ~ 3

Author(s):

Sattam Almatarneh ◽

Pablo Gamallo

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Empirical Study ◽

Learning Strategies ◽

Supervised Machine Learning ◽

Support Vector ◽

Word Embeddings ◽

Linguistic Features ◽

Machine Learning Classifiers ◽

Supervised Machine Learning Classifiers

In this paper, we examine the performance of several classifiers in the process of searching for very negative opinions. More precisely, we do an empirical study that analyzes the influence of three types of linguistic features (n-grams, word embeddings, and polarity lexicons) and their combinations when they are used to feed different supervised machine learning classifiers: Naive Bayes (NB), Decision Tree (DT), and Support Vector Machine (SVM). The experiments we have carried out show that SVM clearly outperforms NB and DT in all datasets by taking into account all features individually as well as their combinations.

Download Full-text

A survey and evaluation of supervised machine learning techniques for spam e-mail filtering

2015 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT) ◽

10.1109/icecct.2015.7226077 ◽

2015 ◽

Cited By ~ 5

Author(s):

Tarjani Vyas ◽

Payal Prajapati ◽

Somil Gadhwal

Keyword(s):

Machine Learning ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Survey And Evaluation ◽

E Mail

Download Full-text

Elsevier’s Encyclopedia of Neuroscience9896George Adelman, Barry H. Smith. Elsevier’s Encyclopedia of Neuroscience. Regional Sales Office, Customer Support Department, P.O. Box 945, New York, N.Y. 10159‐0945, U.S.A. Tel: (+1) 212 633 3730, Toll Free number for North‐American customers: 1‐888‐4ES‐INFO (437‐4636), Fax: (+1) 212 633 3680, E‐mail: usinfo‐[email protected] Europe, Middle‐East, Africa, Asia (except Far East): Regional Sales Office, Customer Support Department, P.O. Box 211, 1000 AE Amsterdam, The Netherlands Tel: (+31) 20 485 3757, Fax: (+31) 20 485 3432, E‐mail: nlinfo‐[email protected]: Elsevier Science 1998. , ISBN: 0‐444‐92614‐9 $169.95 (Mac/PC single use version)

Electronic Resources Review ◽

10.1108/err.1998.2.9.104.96 ◽

1998 ◽

Vol 2 (9) ◽

pp. 104-105

Author(s):

Brad Eden

Keyword(s):

New York ◽

Middle East ◽

The Netherlands ◽

East Africa ◽

North American ◽

Far East ◽

Customer Support ◽

Single Use ◽

E Mail

Download Full-text

An Expanded Feature Extraction of E-Mail Header for Spam Recognition

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.846-847.1672 ◽

2013 ◽

Vol 846-847 ◽

pp. 1672-1675 ◽

Cited By ~ 2

Author(s):

Yuan Ning Liu ◽

Ye Han ◽

Xiao Dong Zhu ◽

Fei He ◽

Li Yan Wei

Keyword(s):

Machine Learning ◽

Set Theory ◽

Rough Set ◽

Rough Set Theory ◽

Spam Filtering ◽

Feature Sets ◽

Filtering Method ◽

The Past ◽

Machine Learning Methods ◽

E Mail

Currently a spam filtering method is extracting attributes from e-mail header and using machine learning methods to classify the sample sets. But as time goes on, spammers transform different ways to send spam, which result in a great change of spam's header. So the attributes defined in the past could not deal with this change sufficiently. This paper extracted attributes from all possible forged header fields to expand the feature sets, then used the rough set theory to classify the sample sets. Experiment validated more attributes including in feature sets may lead to greater performance, in terms of higher recall and precision, lower fake recognition than other algorithms.

Download Full-text

Spam/ham e-mail classification using machine learning methods based on bag of words technique

2018 26th Signal Processing and Communications Applications Conference (SIU) ◽

10.1109/siu.2018.8404347 ◽

2018 ◽

Cited By ~ 2

Author(s):

Esra Sahin ◽

Murat Aydos ◽

Fatih Orhan

Keyword(s):

Machine Learning ◽

Bag Of Words ◽

Learning Methods ◽

Machine Learning Methods ◽

E Mail

Download Full-text