web text mining
Recently Published Documents


TOTAL DOCUMENTS

41
(FIVE YEARS 7)

H-INDEX

5
(FIVE YEARS 0)

2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Zihui Zheng

With the advent of the big data era and the rapid development of the Internet industry, the information processing technology of text mining has become an indispensable role in natural language processing. In our daily life, many things cannot be separated from natural language processing technology, such as machine translation, intelligent response, and semantic search. At the same time, with the development of artificial intelligence, text mining technology has gradually developed into a research hotspot. There are many ways to realize text mining. This paper mainly describes the realization of web text mining and the realization of text structure algorithm based on HTML through a variety of methods to compare the specific clustering time of web text mining. Through this comparison, we can also get which web mining is the most efficient. The use of WebKB datasets for many times in experimental comparison also reflects that Web text mining for the Chinese language logic intelligent detection algorithm provides a basis.


2021 ◽  
Author(s):  
Shu Wang ◽  
Lang Qian ◽  
Yunqiang Zhu ◽  
Jia Song ◽  
Feng Lu ◽  
...  

Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Chenhua Zu

This paper adopts Hadoop to build and test the storage and retrieval platform for painting resources. This paper adopts Hadoop as the platform and MapReduce as the computing framework and uses Hadoop Distributed Filesystem (HDFS) distributed file system to store massive log data, which solves the storage problem of massive data. According to the business requirements of the system, this paper designs the system according to the process of web text mining, mainly divided into log data preprocessing module, log data storage module, log data analysis module, and log data visualization module. The core part of the system is the log data analysis module. The analysis of search keywords ranking, Uniform Resource Locator (URL), and user click relationship, URL ranking, and other dimensions are realized through data statistical analysis, and Canopy coarse clustering is performed first according to search keywords, and then K-means clustering is used for the results after Canopy clustering, and the calculation of cosine similarity is adopted to realize the grouping of users and build user portrait. The Hadoop development environment is installed and deployed, and functional and performance tests are conducted on the contents implemented in this system. The constructed private cloud platform for remote sensing image data can realize online retrieval of remote sensing image metadata and fast download of remote sensing image data and solve the problems in storage, data sharing, and management of remote sensing image data to a certain extent.


Author(s):  
Canhui Li

Background:: To improve the information efficiency in web text mining, filtration is utilized. Methods:: A web content mining technology based on web text mining, augmented information support (AIS), is proposed for improving the web text mining efficiency. Additionally, the AIS technology is applied to the Xiangshan science conference website, and AIS4XSSC text mining system is developed. The developed system is tested for its efficiency, and its main functions are discussed. Results:: 192 documents are represented by 8352 vectors, and 192 × 8352 vectors are obtained; the similarity between 192 vectors is calculated using the cosine of included angle, 192 × 192 symmetric matrix is obtained, and 35 categories are formed by hierarchical clustering by using similarity between texts. Conclusion:: The results show that the AIS technology can effectively extract information from a large amount of web texts. The proposed system improves information retrieval efficiently and can push the valuable information to users.


Author(s):  
Zahraa Faiz Hussain ◽  
Hind Raad Ibraheem ◽  
Mohammad Alsajri ◽  
Ahmed Hussein Ali ◽  
Mohd Arfian Ismail ◽  
...  

Data mining is known as the process of detection concerning patterns from essential amounts of data. As a process of knowledge discovery. Classification is a data analysis that extracts a model which describes an important data classes. One of the outstanding classifications methods in data mining is support vector machine classification (SVM). It is capable of envisaging results and mostly effective than other classification methods. The SVM is a one technique of machine learning techniques that is well known technique, learning with supervised and have been applied perfectly to a vary problems of: regression, classification, and clustering in diverse domains such as gene expression, web text mining. In this study, we proposed a newly mode for classifying iris data set using SVM classifier and genetic algorithm to optimize c and gamma parameters of linear SVM, in addition principle components analysis (PCA) algorithm was use for features reduction.


Author(s):  
Huihua He ◽  
◽  
Si He ◽  
Yan Li ◽  
◽  
...  

Introduction. The current study investigated characteristics of parenting needs and questions of Mainland Chinese parents of young children. Specifically, Web text-mining technology was used to identify themes of parenting needs and questions, and parents' emotional status hidden in their question texts. Method. Total of 921,483 questions that parents posted from the top five parenting Websites in China during a 36-month study period were collected. Results. Daily care is one of the most important topics that concerned parents. Contemporary Mainland Chinese parents tend to raise questions about parental knowledge and skills. Different themes of questions could also be identified from different care-givers and different age groups of young children. Conclusions. From a parenting-oriented perspective, contemporary Chinese parents asked pesonalised questions through the Internet frequently. The considerable needs of grandparenting emerged. Programme designers and social policy makers should empower and support young children's parents with their parental knowledge, skills and emotional competence.


2018 ◽  
Vol 1 (1) ◽  
pp. 40-49
Author(s):  
Sugiarto Cokrowibowo ◽  
Ismail Majid

Terdapat milyaran dokumen web di world wide web yang terus bertumbuh dalam volume, kecepatan dan kompleksitas yang besar dan secara alamiah sebagian besar kontennya tidak terstruktur. Diperlukan adanya teknik atau alat untuk mengekstraksi data teks dari sebuah halaman web yang dapat beradaptasi terhadap konten yang tidak terstruktur maupun semi terstruktur dari halaman web. Pada penelitian ini penulis mengajukan pustaka Java Jsoup untuk mengekstraksi dokumen web kemudian memvisualisasikan hasilnya dalam bentuk word cloud.


2018 ◽  
Vol 173 ◽  
pp. 03060
Author(s):  
ZHANG Ying

Under the background of Internet economy and sharing economy, tourist scenic spots should pay more attention to tourists' network public opinion and do a good job in cultivating network word of mouth. Taking Wanlu Valley ecotourism area in Guangdong province as an example, the paper collects Baidu index and uses ROST Content Mining software to excavate the post-consumer evaluation text of five tourist websites, such as Tongcheng, Ctrip, Grasshopper's Honeycomb, Meituan, Qunar, etc. By mining the high-frequency characteristic words of the tourist evaluation text, constructing the social semantic network matrix map, and then synthetically analyzing the tourist network attention index and the tourists' evaluation perception information, the result demonstrate that the characteristics of scenic spots, service attitude and tourist facilities are the focuses of tourist evaluation: the number of high-frequency words is large and the degree of praise is high. Therefore, the scenic spots should pay attention to the integration development of "tourism +" industry, improve service quality, enrich tourism experience projects, promote the industrial transformation and update and innovation development of eco-tourism destination.


Sign in / Sign up

Export Citation Format

Share Document