Soft Computing for XML Data Mining

Efficient tools and algorithms for knowledge discovery in large data sets have been devised during the recent years. These methods exploit the capability of computers to search huge amounts of data in a fast and effective manner. However, the data to be analyzed is imprecise and afflicted with uncertainty. In the case of heterogeneous data sources such as text, audio and video, the data might moreover be ambiguous and partly conflicting. Besides, patterns and relationships of interest are usually vague and approximate. Thus, in order to make the information mining process more robust or say, human-like methods for searching and learning it requires tolerance towards imprecision, uncertainty and exceptions. Thus, they have approximate reasoning capabilities and are capable of handling partial truth. Properties of the aforementioned kind are typical soft computing. Soft computing techniques like Genetic Algorithms (GA), Artificial Neural Networks, Fuzzy Logic, Rough Sets and Support Vector Machines (SVM) when used in combination was found to be effective. Therefore, soft computing algorithms are used to accomplish data mining across different applications (Mitra S, Pal S K & Mitra P, 2002; Alex A Freitas, 2002). Extensible Markup Language (XML) is emerging as a de facto standard for information exchange among various applications of World Wide Web due to XML’s inherent data self-describing capacity and flexibility of organizing data. In XML representation, the semantics are associated with the contents of the document by making use of self describing tags which can be defined by the users. Hence XML can be used as a medium for interoperability over the Internet. With these advantages, the amount of data that is being published on the Web in the form of XML is growing enormously and many naïve users find the need to search over large XML document collections (Gang Gou & Rada Chirkova, 2007; Luk R et al., 2000).

Download Full-text

PCA for heterogeneous data sets in a distributed data mining

Proceedings of the Fourth Annual ACM Bangalore Conference on - COMPUTE '11 ◽

10.1145/1980422.1980451 ◽

2011 ◽

Author(s):

E. Chandra ◽

P. Ajitha

Keyword(s):

Data Mining ◽

Heterogeneous Data ◽

Distributed Data Mining ◽

Data Sets ◽

Distributed Data

Download Full-text

Outlier data Mining of large Data Sets relying on fast decomposition simulated annealing algorithm

10.1109/icris52159.2020.00170 ◽

2020 ◽

Author(s):

Wenjie Jia ◽

Zhihong He

Keyword(s):

Data Mining ◽

Simulated Annealing ◽

Simulated Annealing Algorithm ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Annealing Algorithm ◽

Outlier Data ◽

Fast Decomposition

Download Full-text

Knowledge Discovery in Large Data Sets: A Primer for Data Mining Applications in Health Care

Health Informatics - Nursing Informatics ◽

10.1007/978-1-4757-3252-8_10 ◽

2000 ◽

pp. 139-148 ◽

Cited By ~ 2

Author(s):

Patricia A. Abbott

Keyword(s):

Data Mining ◽

Health Care ◽

Knowledge Discovery ◽

Large Data ◽

Large Data Sets ◽

Data Sets

Download Full-text

Performance Evaluation of Request and Response Time for Audio and Video Data Sets

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-135 ◽

2019 ◽

pp. 133-136

Author(s):

Dr. Manish L Jivtode

Keyword(s):

Web Services ◽

Web Service ◽

Information Exchange ◽

Video Data ◽

Data Sets ◽

Representational State Transfer ◽

State Transfer ◽

Extensible Markup ◽

Communications Protocol ◽

Audio Video

Web services are applications that allow for communication between devices over the internet and are independent of the technology. The devices are built and use standardized eXtensible Markup Language (XML) for information exchange. A client or user is able to invoke a web service by sending an XML message and then gets back and XML response message. There are a number of communication protocols for web services that use the XML format such as Web Services Flow Language (WSFL), Blocks Extensible Exchange Protocol(BEEP) etc. Simple Object Access Protocol (SOAP) and Representational State Transfer (REST) are used options for accessing web services. It is not directly comparable that SOAP is a communications protocol while REST is a set of architectural principles for data transmission. In this paper, the data size of 1KB, 2KB, 4KB, 8KB and 16KB were tested each for Audio, Video and result obtained for CRUD methods. The encryption and decryption timings in milliseconds/seconds were recorded by programming extensibility points of a WCF REST web service in the Azure cloud..

Download Full-text

Detection of FAKE NEWS on SOCIAL MEDIA using CLASSIFICATION Data Mining Techniques

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1637.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 3132-3138

Keyword(s):

Machine Learning ◽

Data Mining ◽

Social Media ◽

Information Exchange ◽

Learning Algorithm ◽

Daily Life ◽

Support Vector ◽

Machine Learning Algorithm ◽

Fake News ◽

Other Information

In today’s world social media is one of the most important tool for communication that helps people to interact with each other and share their thoughts, knowledge or any other information. Some of the most popular social media websites are Facebook, Twitter, Whatsapp and Wechat etc. Since, it has a large impact on people’s daily life it can be used a source for any fake or misinformation. So it is important that any information presented on social media should be evaluated for its genuineness and originality in terms of the probability of correctness and reliability to trust the information exchange. In this work we have identified the features that can be helpful in predicting whether a given Tweet is Rumor or Information. Two machine learning algorithm are executed using WEKA tool for the classification that is Decision Tree and Support Vector Machine.

Download Full-text

Data Mining: A Bagged Decision Tree Classifier Algorithm For Ids Intrusion Detection System Based Attacks Classification

Design Engineering ◽

10.17762/de.v2021i04.1800 ◽

2021 ◽

pp. 1826-1839

Author(s):

Sandeep Adhikari, Dr. Sunita Chaudhary

Keyword(s):

Data Mining ◽

Intrusion Detection ◽

Decision Tree ◽

Intrusion Detection System ◽

Detection System ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Decision Tree Classifier ◽

Tree Classifier

The exponential growth in the use of computers over networks, as well as the proliferation of applications that operate on different platforms, has drawn attention to network security. This paradigm takes advantage of security flaws in all operating systems that are both technically difficult and costly to fix. As a result, intrusion is used as a key to worldwide a computer resource's credibility, availability, and confidentiality. The Intrusion Detection System (IDS) is critical in detecting network anomalies and attacks. In this paper, the data mining principle is combined with IDS to efficiently and quickly identify important, secret data of interest to the user. The proposed algorithm addresses four issues: data classification, high levels of human interaction, lack of labeled data, and the effectiveness of distributed denial of service attacks. We're also working on a decision tree classifier that has a variety of parameters. The previous algorithm classified IDS up to 90% of the time and was not appropriate for large data sets. Our proposed algorithm was designed to accurately classify large data sets. Aside from that, we quantify a few more decision tree classifier parameters.

Download Full-text

Folksonomy

Virtual Communities ◽

10.4018/978-1-60960-100-3.ch313 ◽

2011 ◽

pp. 877-891

Author(s):

Katrin Weller ◽

Isabella Peters ◽

Wolfgang G. Stock

Keyword(s):

Large Data ◽

Document Retrieval ◽

Point Of View ◽

Knowledge Organization ◽

Data Sets ◽

Document Collections ◽

User Input ◽

Indexing Method ◽

Indexing And Retrieval ◽

Locating Information

This chapter discusses folksonomies as a novel way of indexing documents and locating information based on user generated keywords. Folksonomies are considered from the point of view of knowledge organization and representation in the context of user collaboration within the Web 2.0 environments. Folksonomies provide multiple benefits which make them a useful indexing method in various contexts; however, they also have a number of shortcomings that may hamper precise or exhaustive document retrieval. The position maintained is that folksonomies are a valuable addition to the traditional spectrum of knowledge organization methods since they facilitate user input, stimulate active language use and timeliness, create opportunities for processing large data sets, and allow new ways of social navigation within document collections. Applications of folksonomies as well as recommendations for effective information indexing and retrieval are discussed.

Download Full-text

Bibliomining for Library Decision-Making

Encyclopedia of Information Science and Technology, Second Edition ◽

10.4018/978-1-60566-026-4.ch058 ◽

2011 ◽

pp. 341-345

Author(s):

Scott Nicholson ◽

Jeffrey Stanton

Keyword(s):

Data Mining ◽

Digital Libraries ◽

Large Data ◽

Data Sets ◽

Data Mining Techniques ◽

Governmental Organizations ◽

Non Governmental Organizations ◽

The People ◽

The World ◽

Use Of The Internet

Most people think of a library as the little brick building in the heart of their community or the big brick building in the center of a campus. These notions greatly oversimplify the world of libraries, however. Most large commercial organizations have dedicated in-house library operations, as do schools, non-governmental organizations, as well as local, state, and federal governments. With the increasing use of the Internet and the World Wide Web, digital libraries have burgeoned, and these serve a huge variety of different user audiences. With this expanded view of libraries, two key insights arise. First, libraries are typically embedded within larger institutions. Corporate libraries serve their corporations, academic libraries serve their universities, and public libraries serve taxpaying communities who elect overseeing representatives. Second, libraries play a pivotal role within their institutions as repositories and providers of information resources. In the provider role, libraries represent in microcosm the intellectual and learning activities of the people who comprise the institution. This fact provides the basis for the strategic importance of library data mining: By ascertaining what users are seeking, bibliomining can reveal insights that have meaning in the context of the library’s host institution. Use of data mining to examine library data might be aptly termed bibliomining. With widespread adoption of computerized catalogs and search facilities over the past quarter century, library and information scientists have often used bibliometric methods (e.g., the discovery of patterns in authorship and citation within a field) to explore patterns in bibliographic information. During the same period, various researchers have developed and tested data mining techniques—advanced statistical and visualization methods to locate non-trivial patterns in large data sets. Bibliomining refers to the use of these bibliometric and data mining techniques to explore the enormous quantities of data generated by the typical automated library.

Download Full-text

Audio and Speech Processing for Data Mining

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch017 ◽

2011 ◽

pp. 98-103 ◽

Cited By ~ 1

Author(s):

Zheng-Hua Tan

Keyword(s):

Data Mining ◽

Speech Processing ◽

Large Data ◽

Data Preprocessing ◽

Multimedia Data ◽

Data Sets ◽

Data Types ◽

Multimedia Data Mining ◽

Customer Preferences ◽

And Storage

The explosive increase in computing power, network bandwidth and storage capacity has largely facilitated the production, transmission and storage of multimedia data. Compared to alpha-numeric database, non-text media such as audio, image and video are different in that they are unstructured by nature, and although containing rich information, they are not quite as expressive from the viewpoint of a contemporary computer. As a consequence, an overwhelming amount of data is created and then left unstructured and inaccessible, boosting the desire for efficient content management of these data. This has become a driving force of multimedia research and development, and has lead to a new field termed multimedia data mining. While text mining is relatively mature, mining information from non-text media is still in its infancy, but holds much promise for the future. In general, data mining the process of applying analytical approaches to large data sets to discover implicit, previously unknown, and potentially useful information. This process often involves three steps: data preprocessing, data mining and postprocessing (Tan, Steinbach, & Kumar, 2005). The first step is to transform the raw data into a more suitable format for subsequent data mining. The second step conducts the actual mining while the last one is implemented to validate and interpret the mining results. Data preprocessing is a broad area and is the part in data mining where essential techniques are highly dependent on data types. Different from textual data, which is typically based on a written language, image, video and some audio are inherently non-linguistic. Speech as a spoken language lies in between and often provides valuable information about the subjects, topics and concepts of multimedia content (Lee & Chen, 2005). The language nature of speech makes information extraction from speech less complicated yet more precise and accurate than from image and video. This fact motivates content based speech analysis for multimedia data mining and retrieval where audio and speech processing is a key, enabling technology (Ohtsuki, Bessho, Matsuo, Matsunaga, & Kayashi, 2006). Progress in this area can impact numerous business and government applications (Gilbert, Moore, & Zweig, 2005). Examples are discovering patterns and generating alarms for intelligence organizations as well as for call centers, analyzing customer preferences, and searching through vast audio warehouses.

Download Full-text

Visual Data Mining for Discovering Association Rules

Data Warehousing and Mining ◽

10.4018/978-1-59904-951-9.ch125 ◽

2008 ◽

pp. 2105-2120

Author(s):

Kesaraporn Techapichetvanich ◽

Amitava Datta

Keyword(s):

Data Mining ◽

Association Rules ◽

Association Rule ◽

Large Data ◽

Data Sets ◽

Visual Data Mining ◽

Useful Knowledge ◽

Large Databases ◽

A New Technique ◽

Mining Association Rule

Both visualization and data mining have become important tools in discovering hidden relationships in large data sets, and in extracting useful knowledge and information from large databases. Even though many algorithms for mining association rules have been researched extensively in the past decade, they do not incorporate users in the association-rule mining process. Most of these algorithms generate a large number of association rules, some of which are not practically interesting. This chapter presents a new technique that integrates visualization into the mining association rule process. Users can apply their knowledge and be involved in finding interesting association rules through interactive visualization, after obtaining visual feedback as the algorithm generates association rules. In addition, the users gain insight and deeper understanding of their data sets, as well as control over mining meaningful association rules.

Download Full-text