Text Complexity Classification Data Mining Model Based on Dynamic Quantitative Relationship between Modality and English Context

With the rapid development of mobile internet technology, there are a large number of unstructured data in dynamic data, such as text data, multimedia data, etc., so it is essential to analyze and process these unstructured data to obtain potentially valuable information. This article first starts with the theoretical research of text complexity analysis and analyzes the source of text complexity and its five characteristics of dynamic, complexity, concealment, sentiment, and ambiguity, combined with the expression of user needs in the network environment. Secondly, based on the specific process of text mining, namely, data collection, data processing, and data visualization, it is proposed to subdivide the user demand analysis into three stages of text complexity acquisition, recognition, and expression, to obtain a text complexity analysis based on text mining technology. After that, based on computational linguistics and mathematical-statistical analysis, combined with machine learning and information retrieval technology, the text in any format is converted into a content format that can be used for machine learning, and patterns or knowledge are derived from this content format. Then, through the comparison and research of text mining technology, combined with the text complexity analysis hierarchical structure model, a quantitative relationship complexity analysis framework based on text mining technology is proposed, which is embodied in the use of web crawler technology. Experimental results show that the collected quantitative relationship information is identified and expressed in order to realize the conversion of quantitative relationship information into product features. The market data and text data can be integrated to help improve the model performance and the use of text data can further improve predictions for accuracy.

Download Full-text

CyberCan: A New Dictionary for Cantonese Social Media Text Segmentation

10.31235/osf.io/tyjr7 ◽

2021 ◽

Author(s):

Fei Shen ◽

Wenting Yu ◽

Chen Min ◽

Qianying Ye ◽

Chuanli Xia ◽

...

Keyword(s):

Social Media ◽

Text Mining ◽

Word Segmentation ◽

Unstructured Data ◽

Text Segmentation ◽

Chinese Word ◽

Chinese Word Segmentation ◽

Text Data ◽

Social Media Text

Text mining has been a dominant approach to extracting useful information from massive unstructured data online. But existing tools for Chinese word segmentation are not ideal for processing social media text data in Cantonese. This project developed CyberCan (https://github.com/shenfei1010/CyberCan), a lexicon of contemporary Cantonese based on more than 100 million pieces of internet texts. We compared the performance of CyberCan with existing Mandarin and Cantonese lexicons in terms of their word segmentation performance. Findings suggest that CyberCan outperforms all existing lexicons by a considerable margin.

Download Full-text

Psychometric and Validity Issues in Machine Learning Approaches to Personality Assessment: A Focus on Social Media Text Mining

European Journal of Personality ◽

10.1002/per.2290 ◽

2020 ◽

Vol 34 (5) ◽

pp. 826-844 ◽

Cited By ~ 1

Author(s):

Louis Tay ◽

Sang Eun Woo ◽

Louis Hickman ◽

Rachel M. Saef

Keyword(s):

Machine Learning ◽

Social Media ◽

Text Mining ◽

Personality Assessment ◽

Ground Truth ◽

Psychometric Validation ◽

Learning Approaches ◽

Text Data ◽

Personality Psychology ◽

Social Media Text

In the age of big data, substantial research is now moving toward using digital footprints like social media text data to assess personality. Nevertheless, there are concerns and questions regarding the psychometric and validity evidence of such approaches. We seek to address this issue by focusing on social media text data and (i) conducting a review of psychometric validation efforts in social media text mining (SMTM) for personality assessment and discussing additional work that needs to be done; (ii) considering additional validity issues from the standpoint of reference (i.e. ‘ground truth’) and causality (i.e. how personality determines variations in scores derived from SMTM); and (iii) discussing the unique issues of generalizability when validating SMTM for personality assessment across different social media platforms and populations. In doing so, we explicate the key validity and validation issues that need to be considered as a field to advance SMTM for personality assessment, and, more generally, machine learning personality assessment methods. © 2020 European Association of Personality Psychology

Download Full-text

An Efficient Algorithm for Text Mining in Business Intelligence using Machine Learning

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.d2119.049620 ◽

2020 ◽

Vol 9 (6) ◽

pp. 772-774

Keyword(s):

Machine Learning ◽

Social Network ◽

Text Mining ◽

Business Intelligence ◽

Efficient Algorithm ◽

Social Network Sites ◽

Unstructured Data ◽

Efficient Approach ◽

Unstructured Text

Data plays an important role in success of any organization, so organizations required more data to make decision for their planning to improvement. The data that are generating for any organization, in which 80 to 90 percent data belongs to unstructured data type.Text mining is the process that indicate retrieve appealing and unknown information from unstructured text. Social network sites also generate huge amounts of data,with the help of these data people’s behavior and thought easily determine but analysis of these data is a difficult task. This paper proposed an efficient approach for text mining using machine learning.

Download Full-text

Assessment of Congruence of Unstructured Data Using Text Mining Technology

10.1109/cbi52690.2021.10067 ◽

2021 ◽

Author(s):

Denis Kovtun

Keyword(s):

Text Mining ◽

Unstructured Data ◽

Mining Technology

Download Full-text

An Analysis on Text Mining Techniques for Smart Literature Review

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/1121022021 ◽

2021 ◽

Vol 10 (2) ◽

pp. 1284-1288

Keyword(s):

Text Mining ◽

Review Paper ◽

Evaluation Criteria ◽

Unstructured Data ◽

Future Research ◽

Literature Analysis ◽

Research Papers ◽

Text Data ◽

Web Technologies ◽

Analysis Process

With the development of web technologies, databases and social networks etc. a large amount of text data is generated each day. Mostof the data on the internet is in unstructured form. This unstructured data can provide valuable knowledge. For getting valuable knowledge from text data text mining techniques are used widely. As each day large amounts of research papers were published in journals and conferences. These research papers are very valuable for future research and investigations. These research papers act as a source for future innovations. Researchers write review papers to give updated knowledge about the specific field. But review papers used a limited number of papers and involved manually reading each paper. Due to the large volume of research papers published each day, it is not possible for the researchers to go through each paper to find the updated knowledge about their field of interest. To automate the literature analysis process different techniques of text mining were used. This paper provides a review of text mining techniques used in automatic literature analysis. We collected papers in which previous literature is used with text mining techniques to get valuable knowledge. This review paper presented an overview of text mining techniques, their evaluation criteria, their limitations and challenges for exploring literature to find research trends.

Download Full-text

Web Service Architectures for Text Mining

Handbook of Research on Text and Web Mining Technologies ◽

10.4018/978-1-59904-990-8.ch047 ◽

2010 ◽

pp. 822-839

Author(s):

Neil Davis

Keyword(s):

Text Mining ◽

Web Services ◽

Web Service ◽

Scientific Literature ◽

Unstructured Data ◽

Scientific Publishing ◽

End User ◽

Mining Technology ◽

Pubmed Database ◽

Research Scientists

Text mining technology can be used to assist in finding relevant or novel information in large volumes of unstructured data, such as that which is increasingly available in the electronic scientific literature. However, publishers are not text mining specialists, nor typically are the end-user scientists who consume their products. This situation suggests a Web services based solution, where text mining specialists process the literature obtained from publishers and make their results available to remote consumers (research scientists). In this chapter we discuss the integration of Web services and text mining within the domain of scientific publishing and explore the strengths and weaknesses of three generic architectural designs for delivering text mining Web services. We argue for the superiority of one of these and demonstrate its viability by reference to an application designed to provide access to the results of text mining over the PubMed database of scientific abstracts.

Download Full-text

Klasifikasi Data Laporan Masyarakat pada Portal Layanan Aspirasi dan Pengaduan Online Masyarakat (Lapor!) dengan Metode Klasifikasi Naïve Bayes

STATISTIKA: Journal of Theoretical Statistics and Its Applications ◽

10.29313/jstat.v18i1.3872 ◽

2018 ◽

Vol 18 (1) ◽

pp. 11-20

Author(s):

Achmad Kurniansyah Thalib

Keyword(s):

Machine Learning ◽

Text Mining ◽

Naive Bayes ◽

Naïve Bayes ◽

Unstructured Data ◽

Irian Jaya

Layanan Aspirasi dan Pengaduan Online Rakyat (LAPOR!) merupakan salah satu program yang dicanangkan pemerintah guna menghimpun informasi seluasluasnya yang berupa kritik maupun saran dari masyarakat. Laporan masyarakat di bidang kesehatan yang berupa data teks yang tidak terstruktur (unstructured data) diklasifikasikan menjadi tiga kelas yaitu Aspirasi, Keluhan, dan Pertanyaan menggunakan metode machine learning yaitu Naïve Bayes. Pada periode Januari 2013 sampai dengan Desember 2015, jumlah laporan masyarakat yang masuk ke dalam sistem LAPOR! sebanyak 87492 laporan, terdapat 32047 atau sekitar 37% laporan yang belum ditanggapi, 8072 atau sekitar 9% laporan yang sedang proses ditanggapi, dan sisanya sebanyak 47373 atau 54% laporan sudah ditanggapi dan dinyatakan selesai. jumlah laporan yang paling banyak terdapat pada provinsi DKI Jakarta dan pulau Jawa secara keseluruhan. Provinsi yang menjadi pusat area yang menyumbangkan laporan terbanyak adalah DKI Jakarta sebanyak 25129 laporan, disusul Jawa Barat 15445 laporan, Jawa Timur 6106 laporan, Jawa Tengah 5818 laporan, dan seterusnya. Sedangkan provinsi yang paling sedikit melakukan lapor adalah provinsi Papua, Maluku, Maluku Utara, Sulawesi Barat, Irian Jaya Barat, dan Gorontalo dengan jumlah laporan dari provinsi tersebut dibawah 100 laporan. Selanjutnya hasil klasifikasi akan dianalisis dengan metode Text Mining, konsep utamanya adalah dengan melakukakan ekplorasi seluas-seluasnya dan ekstraksi dengan data yang sangat banyak dan terus bertambah, sehingga ditemukan sebuah fakta dan informasi yang dianggap penting dan dapat berguna untuk berbagai bidang keperluan. Hasil klasifikasi menunjukkan tingkat akurasi sebesar 96.67%.

Download Full-text

Email Spams via Text Mining using Machine Learning Techniques

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.d1915.029420 ◽

2020 ◽

Vol 9 (4) ◽

pp. 2535-2539

Keyword(s):

Machine Learning ◽

Text Mining ◽

Daily Basis ◽

Unstructured Data ◽

Machine Learning Techniques ◽

Spam Detection ◽

High Quality ◽

Quality Of Information ◽

Learning Techniques

A lot of data is generated on daily basis which may potentially be useful. This data is generally unstructured and ambiguous to draw a meaning from it. High quality of information can be extracted from this potentially useful data typically through devising of patterns and trends in it. This is done using Text Mining which includes the initial parsing of the unstructured data, processing it and then leading to some meaningful and fascinating information hidden in it. This paper presents the machine learning techniques for text mining that are useful for spam detection in emails.

Download Full-text