scholarly journals a Schema Extraction of Document-Oriented Database for Data Warehouse

Author(s):  
A. Nurul Istiqamah ◽  
Kemas Rahmat Saleh Wiharja

The data warehouse is a very famous solution for analyzing business data from heterogeneous sources. Unfortunately, a data warehouse only can analyze structured data. Whereas, nowadays, thanks to the popularity of social media and the ease of creating data on the web, we are experiencing a flood of unstructured data. Therefore, we need an approach that can "structure" the unstructured data into structured data that can be processed by the data warehouse. To do this, we propose a schema extraction approach using Google Cloud Platform that will create a schema from unstructured data. Based on our experiment, our approach successfully produces a schema from unstructured data. To the best of our knowledge, we are the first in using Google Cloud Platform for extracting a schema. We also prove that our approach helps the database developer to understand the unstructured data better.

2021 ◽  
Vol 7 ◽  
pp. e347
Author(s):  
Bhavana R. Bhamare ◽  
Jeyanthi Prabhu

Due to the massive progression of the Web, people post their reviews for any product, movies and places they visit on social media. The reviews available on social media are helpful to customers as well as the product owners to evaluate their products based on different reviews. Analyzing structured data is easy as compared to unstructured data. The reviews are available in an unstructured format. Aspect-Based Sentiment Analysis mines the aspects of a product from the reviews and further determines sentiment for each aspect. In this work, two methods for aspect extraction are proposed. The datasets used for this work are SemEval restaurant review dataset, Yelp and Kaggle datasets. In the first method a multivariate filter-based approach for feature selection is proposed. This method support to select significant features and reduces redundancy among selected features. It shows improvement in F1-score compared to a method that uses only relevant features selected using Term Frequency weight. In another method, selective dependency relations are used to extract features. This is done using Stanford NLP parser. The results gained using features extracted by selective dependency rules are better as compared to features extracted by using all dependency rules. In the hybrid approach, both lemma features and selective dependency relation based features are extracted. Using the hybrid feature set, 94.78% accuracy and 85.24% F1-score is achieved in the aspect category prediction task.


Author(s):  
Wafaa A. Al-Rabayah ◽  
Ahmad Al-Zyoud

Sentiment analysis is a process of determining the polarity (i.e. positive, negative or neutral) of a given text. The extremely increased amount of information available on the web, especially social media, create a challenge to be retrieved and analyzed on time, timely analyzed of unstructured data provide businesses a competitive advantage by better understanding their customers' needs and preferences. This literature review will cover a number of studies about sentiment analysis and finds the connection between sentiment analysis of social network content and customers retention; we will focus on sentiment analysis and discuss concepts related to this field, most important relevant studies and its results, its methods of applications, where it can be applied and its business applications, finally, we will discuss how can sentiment analysis improve the customer retention based on retrieved data.


Big Data ◽  
2016 ◽  
pp. 1495-1518
Author(s):  
Mohammad Alaa Hussain Al-Hamami

Big Data is comprised systems, to remain competitive by techniques emerging due to Big Data. Big Data includes structured data, semi-structured and unstructured. Structured data are those data formatted for use in a database management system. Semi-structured and unstructured data include all types of unformatted data including multimedia and social media content. Among practitioners and applied researchers, the reaction to data available through blogs, Twitter, Facebook, or other social media can be described as a “data rush” promising new insights about consumers' choices and behavior and many other issues. In the past Big Data has been used just by very large organizations, governments and large enterprises that have the ability to create its own infrastructure for hosting and mining large amounts of data. This chapter will show the requirements for the Big Data environments to be protected using the same rigorous security strategies applied to traditional database systems.


Author(s):  
Wafaa A. Al-Rabayah ◽  
Ahmad Al-Zyoud

Sentiment analysis is a process of determining the polarity (i.e. positive, negative or neutral) of a given text. The extremely increased amount of information available on the web, especially social media, create a challenge to be retrieved and analyzed on time, timely analyzed of unstructured data provide businesses a competitive advantage by better understanding their customers' needs and preferences. This literature review will cover a number of studies about sentiment analysis and finds the connection between sentiment analysis of social network content and customers retention; we will focus on sentiment analysis and discuss concepts related to this field, most important relevant studies and its results, its methods of applications, where it can be applied and its business applications, finally, we will discuss how can sentiment analysis improve the customer retention based on retrieved data.


Author(s):  
Caio Saraiva Coneglian ◽  
Elvis Fusco

The data available on the Web is growing exponentially, providing information of high added value to organizations. Such information can be arranged in diverse bases and in varied formats, like videos and photos in social media. However, unstructured data present great difficulty for the information retrieval, not efficiently meeting the informational needs of the users, because there are problems in understanding the meaning of documents stored on the Web. In the context of an Information Retrieval architecture, this research aims to The implementation of a semantic extraction agent in the context of the Web that allows the location, treatment and retrieval of information in the context of Big Data in the most varied informational sources that serves as the basis for the implementation of informational environments that aid the Information Retrieval process , Using ontology to add semantics to the process of retrieval and presentation of results obtained to users, thus being able to meet their needs.


Semantic web is not just a matter of translation from HTML to RDF/OWL languages. It is a matter of understanding the content of the web through knowledge graphs. Entities need to be related with relationships. This content is composed of resources (web pages) that contain, for example, text, images and audio. Thus, there is the need of extracting entities from these resources. Currently, most of the web content is in HTML5 format which is a W3C recommendation which enables describing the structure marginally with the help of annotations. The main challenge here is to transform unstructured data from plain HTML files to structured data (e.g RDF or OWL). The current work provides the first hand information for dealing with unstructured heterogeneous data residing on web using Twinkle, a Java tool for SPARQL query execution on FOAF (Friend Of A Friend) document.


Author(s):  
Sanjeev Kumar Punia ◽  
Manoj Kumar ◽  
Thompson Stephan ◽  
Ganesh Gopal Deverajan ◽  
Rizwan Patan

In broad, three machine learning classification algorithms are used to discover correlations, hidden patterns, and other useful information from different data sets known as big data. Today, Twitter, Facebook, Instagram, and many other social media networks are used to collect the unstructured data. The conversion of unstructured data into structured data or meaningful information is a very tedious task. The different machine learning classification algorithms are used to convert unstructured data into structured data. In this paper, the authors first collect the unstructured research data from a frequently used social media network (i.e., Twitter) by using a Twitter application program interface (API) stream. Secondly, they implement different machine classification algorithms (supervised, unsupervised, and reinforcement) like decision trees (DT), neural networks (NN), support vector machines (SVM), naive Bayes (NB), linear regression (LR), and k-nearest neighbor (K-NN) from the collected research data set. The comparison of different machine learning classification algorithms is concluded.


Author(s):  
Mohammad Alaa Hussain Al-Hamami

Big Data is comprised systems, to remain competitive by techniques emerging due to Big Data. Big Data includes structured data, semi-structured and unstructured. Structured data are those data formatted for use in a database management system. Semi-structured and unstructured data include all types of unformatted data including multimedia and social media content. Among practitioners and applied researchers, the reaction to data available through blogs, Twitter, Facebook, or other social media can be described as a “data rush” promising new insights about consumers' choices and behavior and many other issues. In the past Big Data has been used just by very large organizations, governments and large enterprises that have the ability to create its own infrastructure for hosting and mining large amounts of data. This chapter will show the requirements for the Big Data environments to be protected using the same rigorous security strategies applied to traditional database systems.


Author(s):  
Cate Dowd

Trigonometry in algorithms with NLP (Natural Language Processing) can sort word connotations. The Triple structure of grammar in the RDF also extends to semantics in machine learning and big-data processing, but ontologies and a metamodel are essential for meaningful relations across data. They should inform the design of new journalism systems. Major processing platforms used by Facebook and Yahoo are distributed systems, like Hadoop, with resource negotiation features and computations applied to text. NLP used by Google also uses cosine vectors for connotations of words. Data processing already works across structured data for online news tags and unstructured data, like social media tags, with folksonomy characteristics, but social media also uses structured data. However, journalism is yet to come up with semantic systems, from an ontological base. For that end, ontologies across journalism, social media, and public relations and a little OWL to reason about resources can inform AI sub-systems and wider system perspectives.


Sign in / Sign up

Export Citation Format

Share Document