Data Mining

Author(s):  
Martin Atzmueller

Data Mining provides approaches for the identification and discovery of non-trivial patterns and models hidden in large collections of data. In the applied natural language processing domain, data mining usually requires preprocessed data that has been extracted from textual documents. Additionally, this data is often integrated with other data sources. This chapter provides an overview on data mining focusing on approaches for pattern mining, cluster analysis, and predictive model construction. For those, we discuss exemplary techniques that are especially useful in the applied natural language processing context. Additionally, we describe how the presented data mining approaches are connected to text mining, text classification, and clustering, and discuss interesting problems and future research directions.

2000 ◽  
Vol 6 (2) ◽  
pp. 163-181 ◽  
Author(s):  
QIANG ZHOU ◽  
FUJI REN

In this paper, we propose a new ambiguity representation scheme; Structure Preference Relation (SPR), which consists of useful quantitative distribution information for ambiguous structures. Two automatic acquisition algorithms, the first acquired from a treebank, and the second acquired from raw texts, are introduced, and some experimental results which prove the availability of the algorithms are also given. Finally, we introduce some SPR applications in linguistics and natural language processing, such as preference-based parsing and the discovery of representative ambiguous structures, and propose some future research directions.


Author(s):  
Shaoxiang Chen ◽  
Ting Yao ◽  
Yu-Gang Jiang

Deep learning has achieved great successes in solving specific artificial intelligence problems recently. Substantial progresses are made on Computer Vision (CV) and Natural Language Processing (NLP). As a connection between the two worlds of vision and language, video captioning is the task of producing a natural-language utterance (usually a sentence) that describes the visual content of a video. The task is naturally decomposed into two sub-tasks. One is to encode a video via a thorough understanding and learn visual representation. The other is caption generation, which decodes the learned representation into a sequential sentence, word by word. In this survey, we first formulate the problem of video captioning, then review state-of-the-art methods categorized by their emphasis on vision or language, and followed by a summary of standard datasets and representative approaches. Finally, we highlight the challenges which are not yet fully understood in this task and present future research directions.


Author(s):  
Constanţa-Nicoleta Bodea ◽  
Maria-Iuliana Dascalu ◽  
Radu Ioan Mogos ◽  
Stelian Stancu

Reinforcement of the technology-enhanced education transformed education into a data-intensive domain. As in many other data-intensive domains, the interest for data analysis through various analytics is growing. The article starts by defining LA, with relevant views on the literature. A discussion about the relationships between LA, educational data mining and academic analytics is included in the background section. In the main section of the article, the learning analytics, as an emerging trend in the educational systems is describe, by discussing the main issues, controversies, problems on this topic. Final part of the article presents the future research directions and the conclusion.


2008 ◽  
pp. 849-879
Author(s):  
Dan A. Simovici

This chapter presents data mining techniques that make use of metrics defined on the set of partitions of finite sets. Partitions are naturally associated with object attributes and major data mining problem such as classification, clustering, and data preparation benefit from an algebraic and geometric study of the metric space of partitions. The metrics we find most useful are derived from a generalization of the entropic metric. We discuss techniques that produce smaller classifiers, allow incremental clustering of categorical data and help user to better prepare training data for constructing classifiers. Finally, we discuss open problems and future research directions.


Author(s):  
Md Mahbubur Rahim ◽  
Maryam Jabberzadeh ◽  
Nergiz Ilhan

E-procurement systems that have been in place for over a decade have begun incorporating digital tools like big data, cloud computing, internet of things, and data mining. Hence, there exists a rich literature on earlier e-procurement systems and advanced digitally-enabled e-procurement systems. Existing literature on these systems addresses many research issues (e.g., adoption) associated with e-procurement. However, one critical issue that has so far received no rigorous attention is about “unit of analysis,” a methodological concern of importance, for e-procurement research context. Hence, the aim of this chapter is twofold: 1) to discuss how the notion of “unit of analysis” has been conceptualised in the e-procurement literature and 2) to discuss how its use has been justified by e-procurement scholars to address the research issues under investigation. Finally, the chapter provides several interesting findings and outlines future research directions.


2022 ◽  
pp. 1477-1503
Author(s):  
Ali Al Mazari

HIV/AIDS big data analytics evolved as a potential initiative enabling the connection between three major scientific disciplines: (1) the HIV biology emergence and evolution; (2) the clinical and medical complex problems and practices associated with the infections and diseases; and (3) the computational methods for the mining of HIV/AIDS biological, medical, and clinical big data. This chapter provides a review on the computational and data mining perspectives on HIV/AIDS in big data era. The chapter focuses on the research opportunities in this domain, identifies the challenges facing the development of big data analytics in HIV/AIDS domain, and then highlights the future research directions of big data in the healthcare sector.


2020 ◽  
pp. 205-228
Author(s):  
George A. Khachatryan

Instruction modeling is still in its early stages. This chapter discusses promising directions in which instruction modeling could develop in coming years. This includes increasing the richness of interfaces used in instruction modeling programs (e.g., by allowing students to enter responses in free form and have them graded via natural language processing); applying instruction modeling to subjects beyond mathematics, including English, foreign language, and science; using educational data mining to create automated “coaches” to help teachers better implement instruction modeling programs in their classrooms; creating approaches to instruction modeling that allow for rapid authorship of content; redesigning schools (in schedules as well as architecture) to optimize the use of instruction modeling; and putting in place government policies to encourage the use of comprehensive blended learning programs (such as those developed through instruction modeling).


Sign in / Sign up

Export Citation Format

Share Document