A Text Mining Technique for Manufacturing Supplier Classification

Author(s):  
Peyman Yazdizadeh ◽  
Farhad Ameri

The web presence of manufacturing suppliers is constantly increasing and so does the volume of textual data available online that pertains to the capabilities of manufacturing suppliers. To process this large volume of data and infer new knowledge about the capabilities of manufacturing suppliers, different text mining techniques such as association rule generation, classification, and clustering can be applied. This paper focuses on classification of manufacturing suppliers based on the textual description of their capabilities available in their online profiles. A probabilistic technique that adopts Naïve Bayes method is adopted and implemented using R programming language. Casting and CNC machining are used as the examples classes of suppliers in this work. The performance of the proposed classifier is evaluated experimentally based on the standard metrics such as precision, recall, and F-measure. It was observed that in order to improve the precision of the classification process, a larger training dataset with more relevant terms must be used.

2020 ◽  
Vol 11 ◽  
Author(s):  
Maria-Theodora Pandi ◽  
Peter J. van der Spek ◽  
Maria Koromina ◽  
George P. Patrinos

Text mining in biomedical literature is an emerging field which has already been shown to have a variety of implementations in many research areas, including genetics, personalized medicine, and pharmacogenomics. In this study, we describe a novel text-mining approach for the extraction of pharmacogenomics associations. The code that was used toward this end was implemented using R programming language, either through custom scripts, where needed, or through utilizing functions from existing libraries. Articles (abstracts or full texts) that correspond to a specified query were extracted from PubMed, while concept annotations were derived by PubTator Central. Terms that denote a Mutation or a Gene as well as Chemical compound terms corresponding to drug compounds were normalized and the sentences containing the aforementioned terms were filtered and preprocessed to create appropriate training sets. Finally, after training and adequate hyperparameter tuning, four text classifiers were created and evaluated (FastText, Linear kernel SVMs, XGBoost, Lasso, and Elastic-Net Regularized Generalized Linear Models) with regard to their performance in identifying pharmacogenomics associations. Although further improvements are essential toward proper implementation of this text-mining approach in the clinical practice, our study stands as a comprehensive, simplified, and up-to-date approach for the identification and assessment of research articles enriched in clinically relevant pharmacogenomics relationships. Furthermore, this work highlights a series of challenges concerning the effective application of text mining in biomedical literature, whose resolution could substantially contribute to the further development of this field.


2020 ◽  
Vol 11 (2) ◽  
pp. 66-81
Author(s):  
Badia Klouche ◽  
Sidi Mohamed Benslimane ◽  
Sakina Rim Bennabi

Sentiment analysis is one of the recent areas of emerging research in the classification of sentiment polarity and text mining, particularly with the considerable number of opinions available on social media. The Algerian Operator Telephone Ooredoo, as other operators, deploys in its new strategy to conquer new customers, by exploiting their opinions through a sentiments analysis. The purpose of this work is to set up a system called “Ooredoo Rayek”, whose objective is to collect, transliterate, translate and classify the textual data expressed by the Ooredoo operator's customers. This article developed a set of rules allowing the transliteration from Algerian Arabizi to Algerian dialect. Furthermore, the authors used Naïve Bayes (NB) and (Support Vector Machine) SVM classifiers to assign polarity tags to Facebook comments from the official pages of Ooredoo written in multilingual and multi-dialect context. Experimental results show that the system obtains good performance with 83% of accuracy.


2020 ◽  
Vol 4 (2) ◽  
pp. 780-787
Author(s):  
Ibrahim Hassan Hayatu ◽  
Abdullahi Mohammed ◽  
Barroon Ahmad Isma’eel ◽  
Sahabi Yusuf Ali

Soil fertility determines a plant's development process that guarantees food sufficiency and the security of lives and properties through bumper harvests. The fertility of soil varies according to regions, thereby determining the type of crops to be planted. However, there is no repository or any source of information about the fertility of the soil in any region in Nigeria especially the Northwest of the country. The only available information is soil samples with their attributes which gives little or no information to the average farmer. This has affected crop yield in all the regions, more particularly the Northwest region, thus resulting in lower food production.  Therefore, this study is aimed at classifying soil data based on their fertility in the Northwest region of Nigeria using R programming. Data were obtained from the department of soil science from Ahmadu Bello University, Zaria. The data contain 400 soil samples containing 13 attributes. The relationship between soil attributes was observed based on the data. K-means clustering algorithm was employed in analyzing soil fertility clusters. Four clusters were identified with cluster 1 having the highest fertility, followed by 2 and the fertility decreases with an increasing number of clusters. The identification of the most fertile clusters will guide farmers on where best to concentrate on when planting their crops in order to improve productivity and crop yield.


Author(s):  
Katherine Darveau ◽  
Daniel Hannon ◽  
Chad Foster

There is growing interest in the study and practice of applying data science (DS) and machine learning (ML) to automate decision making in safety-critical industries. As an alternative or augmentation to human review, there are opportunities to explore these methods for classifying aviation operational events by root cause. This study seeks to apply a thoughtful approach to design, compare, and combine rule-based and ML techniques to classify events caused by human error in aircraft/engine assembly, maintenance or operation. Event reports contain a combination of continuous parameters, unstructured text entries, and categorical selections. A Human Factors approach to classifier development prioritizes the evaluation of distinct data features and entry methods to improve modeling. Findings, including the performance of tested models, led to recommendations for the design of textual data collection systems and classification approaches.


2011 ◽  
Vol 23 (2) ◽  
pp. 69-87 ◽  
Author(s):  
Uzma Raja ◽  
Marietta J. Tretter
Keyword(s):  

Author(s):  
Annie T. Chen ◽  
Shu-Hong Zhu ◽  
Mike Conway

Our aim in this work is to apply text mining and novel visualization techniques to textual data derived from online health discussion forums in order to better understand consumers experiences and perceptions of electronic cigarettes and hookah.


Sign in / Sign up

Export Citation Format

Share Document