A Text Mining Technique for Manufacturing Supplier Classification

Volume 1B: 35th Computers and Information in Engineering Conference ◽

10.1115/detc2015-46694 ◽

2015 ◽

Cited By ~ 4

Author(s):

Peyman Yazdizadeh ◽

Farhad Ameri

Keyword(s):

Text Mining ◽

Cnc Machining ◽

Training Dataset ◽

Web Presence ◽

R Programming Language ◽

Probabilistic Technique ◽

Textual Description ◽

Textual Data ◽

R Programming

The web presence of manufacturing suppliers is constantly increasing and so does the volume of textual data available online that pertains to the capabilities of manufacturing suppliers. To process this large volume of data and infer new knowledge about the capabilities of manufacturing suppliers, different text mining techniques such as association rule generation, classification, and clustering can be applied. This paper focuses on classification of manufacturing suppliers based on the textual description of their capabilities available in their online profiles. A probabilistic technique that adopts Naïve Bayes method is adopted and implemented using R programming language. Casting and CNC machining are used as the examples classes of suppliers in this work. The performance of the proposed classifier is evaluated experimentally based on the standard metrics such as precision, recall, and F-measure. It was observed that in order to improve the precision of the classification process, a larger training dataset with more relevant terms must be used.

Download Full-text

A Novel Text-Mining Approach for Retrieving Pharmacogenomics Associations From the Literature

Frontiers in Pharmacology ◽

10.3389/fphar.2020.602030 ◽

2020 ◽

Vol 11 ◽

Author(s):

Maria-Theodora Pandi ◽

Peter J. van der Spek ◽

Maria Koromina ◽

George P. Patrinos

Keyword(s):

Text Mining ◽

Generalized Linear Models ◽

Linear Models ◽

Biomedical Literature ◽

Linear Kernel ◽

R Programming Language ◽

Research Areas ◽

Text Classifiers ◽

R Programming ◽

Further Development

Text mining in biomedical literature is an emerging field which has already been shown to have a variety of implementations in many research areas, including genetics, personalized medicine, and pharmacogenomics. In this study, we describe a novel text-mining approach for the extraction of pharmacogenomics associations. The code that was used toward this end was implemented using R programming language, either through custom scripts, where needed, or through utilizing functions from existing libraries. Articles (abstracts or full texts) that correspond to a specified query were extracted from PubMed, while concept annotations were derived by PubTator Central. Terms that denote a Mutation or a Gene as well as Chemical compound terms corresponding to drug compounds were normalized and the sentences containing the aforementioned terms were filtered and preprocessed to create appropriate training sets. Finally, after training and adequate hyperparameter tuning, four text classifiers were created and evaluated (FastText, Linear kernel SVMs, XGBoost, Lasso, and Elastic-Net Regularized Generalized Linear Models) with regard to their performance in identifying pharmacogenomics associations. Although further improvements are essential toward proper implementation of this text-mining approach in the clinical practice, our study stands as a comprehensive, simplified, and up-to-date approach for the identification and assessment of research articles enriched in clinically relevant pharmacogenomics relationships. Furthermore, this work highlights a series of challenges concerning the effective application of text mining in biomedical literature, whose resolution could substantially contribute to the further development of this field.

Download Full-text

Ooredoo Rayek

International Journal of Technology Diffusion ◽

10.4018/ijtd.2020040105 ◽

2020 ◽

Vol 11 (2) ◽

pp. 66-81

Author(s):

Badia Klouche ◽

Sidi Mohamed Benslimane ◽

Sakina Rim Bennabi

Keyword(s):

Social Media ◽

Support Vector Machine ◽

Text Mining ◽

Sentiment Analysis ◽

Experimental Results ◽

Support Vector ◽

Textual Data ◽

New Strategy ◽

Set Up

Sentiment analysis is one of the recent areas of emerging research in the classification of sentiment polarity and text mining, particularly with the considerable number of opinions available on social media. The Algerian Operator Telephone Ooredoo, as other operators, deploys in its new strategy to conquer new customers, by exploiting their opinions through a sentiments analysis. The purpose of this work is to set up a system called “Ooredoo Rayek”, whose objective is to collect, transliterate, translate and classify the textual data expressed by the Ooredoo operator's customers. This article developed a set of rules allowing the transliteration from Algerian Arabizi to Algerian dialect. Furthermore, the authors used Naïve Bayes (NB) and (Support Vector Machine) SVM classifiers to assign polarity tags to Facebook comments from the official pages of Ooredoo written in multilingual and multi-dialect context. Experimental results show that the system obtains good performance with 83% of accuracy.

Download Full-text

K-MEANS CLUSTERING ALGORITHM BASED CLASSIFICATION OF SOIL FERTILITY IN NORTH WEST NIGERIA

FUDMA Journal of Sciences ◽

10.33003/fjs-2020-0402-363 ◽

2020 ◽

Vol 4 (2) ◽

pp. 780-787

Author(s):

Ibrahim Hassan Hayatu ◽

Abdullahi Mohammed ◽

Barroon Ahmad Isma’eel ◽

Sahabi Yusuf Ali

Keyword(s):

Soil Fertility ◽

Crop Yield ◽

Clustering Algorithm ◽

Soil Samples ◽

North West ◽

R Programming ◽

Available Information ◽

Northwest Region ◽

The Relationship

Soil fertility determines a plant's development process that guarantees food sufficiency and the security of lives and properties through bumper harvests. The fertility of soil varies according to regions, thereby determining the type of crops to be planted. However, there is no repository or any source of information about the fertility of the soil in any region in Nigeria especially the Northwest of the country. The only available information is soil samples with their attributes which gives little or no information to the average farmer. This has affected crop yield in all the regions, more particularly the Northwest region, thus resulting in lower food production. Therefore, this study is aimed at classifying soil data based on their fertility in the Northwest region of Nigeria using R programming. Data were obtained from the department of soil science from Ahmadu Bello University, Zaria. The data contain 400 soil samples containing 13 attributes. The relationship between soil attributes was observed based on the data. K-means clustering algorithm was employed in analyzing soil fertility clusters. Four clusters were identified with cluster 1 having the highest fertility, followed by 2 and the fertility decreases with an increasing number of clusters. The identification of the most fertile clusters will guide farmers on where best to concentrate on when planting their crops in order to improve productivity and crop yield.

Download Full-text

A Comparison of Rule-Based and Machine Learning Models for Classification of Human Factors Aviation Safety Event Reports

Proceedings of the Human Factors and Ergonomics Society Annual Meeting ◽

10.1177/1071181320641034 ◽

2020 ◽

Vol 64 (1) ◽

pp. 129-133

Author(s):

Katherine Darveau ◽

Daniel Hannon ◽

Chad Foster

Keyword(s):

Machine Learning ◽

Human Factors ◽

Human Error ◽

Data Science ◽

Aircraft Engine ◽

Rule Based ◽

Root Cause ◽

Textual Data ◽

Safety Event

There is growing interest in the study and practice of applying data science (DS) and machine learning (ML) to automate decision making in safety-critical industries. As an alternative or augmentation to human review, there are opportunities to explore these methods for classifying aviation operational events by root cause. This study seeks to apply a thoughtful approach to design, compare, and combine rule-based and ML techniques to classify events caused by human error in aircraft/engine assembly, maintenance or operation. Event reports contain a combination of continuous parameters, unstructured text entries, and categorical selections. A Human Factors approach to classifier development prioritizes the evaluation of distinct data features and entry methods to improve modeling. Findings, including the performance of tested models, led to recommendations for the design of textual data collection systems and classification approaches.

Download Full-text

FuzzyR: An Extended Fuzzy Logic Toolbox for the R Programming Language

2020 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) ◽

10.1109/fuzz48607.2020.9177780 ◽

2020 ◽

Author(s):

Chao Chen ◽

Tajul Rosli Razak ◽

Jonathan M. Garibaldi

Keyword(s):

Fuzzy Logic ◽

Programming Language ◽

R Programming Language ◽

R Programming

Download Full-text

Text Mining for the Automatic Classification of Road Accident Reports

Proceedings of the 30th European Safety and Reliability Conference and 15th Probabilistic Safety Assessment and Management Conference ◽

10.3850/978-981-14-8593-0_5850-cd ◽

2020 ◽

Author(s):

Dario Valcamonico ◽

Piero Baraldi ◽

Francesco Amigoni ◽

Enrico Zio

Keyword(s):

Text Mining ◽

Automatic Classification ◽

Road Accident

Download Full-text

Studying the Effects of Performing Text Mining to Improve Classification of Clustered Questions based on Bloom Taxonomy

Indian Journal of Science and Technology ◽

10.17485/ijst/2016/v9i28/97818 ◽

2016 ◽

Vol 9 (28) ◽

Author(s):

Nur Suhailayani Suhaimi ◽

Norazam Arbin ◽

Nur Najihah Zulkifli

Keyword(s):

Text Mining ◽

Bloom Taxonomy

Download Full-text

Classification of software patches: a text mining approach

Journal of Software Maintenance and Evolution Research and Practice ◽

10.1002/smr.468 ◽

2011 ◽

Vol 23 (2) ◽

pp. 69-87 ◽

Cited By ~ 3

Author(s):

Uzma Raja ◽

Marietta J. Tretter

Keyword(s):

Text Mining

Download Full-text

Combining Text Mining and Data Visualization Techniques to Understand Consumer Experiences of Electronic Cigarettes and Hookah in Online Forums

Online Journal of Public Health Informatics ◽

10.5210/ojphi.v7i1.5783 ◽

2015 ◽

Vol 7 (1) ◽

Cited By ~ 1

Author(s):

Annie T. Chen ◽

Shu-Hong Zhu ◽

Mike Conway

Keyword(s):

Text Mining ◽

Data Visualization ◽

Electronic Cigarettes ◽

Discussion Forums ◽

Online Forums ◽

Textual Data ◽

Consumer Experiences ◽

Visualization Techniques

Our aim in this work is to apply text mining and novel visualization techniques to textual data derived from online health discussion forums in order to better understand consumers experiences and perceptions of electronic cigarettes and hookah.

Download Full-text

Understanding the Behavior of Zadeh’s Extension Principle for One-to-One Functions by R Programming Language

Advances in Intelligent Systems and Computing - Intelligent and Fuzzy Techniques: Smart and Innovative Solutions ◽

10.1007/978-3-030-51156-2_153 ◽

2020 ◽

pp. 1309-1315

Author(s):

Abbas Parchami ◽

Parisa Khalilpoor

Keyword(s):

Programming Language ◽

Extension Principle ◽

R Programming Language ◽

One To One ◽

R Programming

Download Full-text