scholarly journals Efficient Computation of the Well-Founded Semantics over Big Data

2014 ◽  
Vol 14 (4-5) ◽  
pp. 445-459 ◽  
Author(s):  
ILIAS TACHMAZIDIS ◽  
GRIGORIS ANTONIOU ◽  
WOLFGANG FABER

AbstractData originating from the Web, sensor readings and social media result in increasingly huge datasets. The so called Big Data comes with new scientific and technological challenges while creating new opportunities, hence the increasing interest in academia and industry. Traditionally, logic programming has focused on complex knowledge structures/programs, so the question arises whether and how it can work in the face of Big Data. In this paper, we examine how the well-founded semantics can process huge amounts of data through mass parallelization. More specifically, we propose and evaluate a parallel approach using the MapReduce framework. Our experimental results indicate that our approach is scalable and that well-founded semantics can be applied to billions of facts. To the best of our knowledge, this is the first work that addresses large scale nonmonotonic reasoning without the restriction of stratification for predicates of arbitrary arity.

2019 ◽  
pp. 1049-1070
Author(s):  
Fabian Neuhaus

User data created in the digital context has increasingly been of interest to analysis and spatial analysis in particular. Large scale computer user management systems such as digital ticketing and social networking are creating vast amount of data. Such data systems can contain information generated by potentially millions of individuals. This kind of data has been termed big data. The analysis of big data can in its spatial but also in a temporal and social nature be of much interest for analysis in the context of cities and urban areas. This chapter discusses this potential along with a selection of sample work and an in-depth case study. Hereby the focus is mainly on the use and employment of insight gained from social media data, especially the Twitter platform, in regards to cities and urban environments. The first part of the chapter discusses a range of examples that make use of big data and the mapping of digital social network data. The second part discusses the way the data is collected and processed. An important section is dedicated to the aspects of ethical considerations. A summary and an outlook are discussed at the end.


Author(s):  
Caio Saraiva Coneglian ◽  
Elvis Fusco

The data available on the Web is growing exponentially, providing information of high added value to organizations. Such information can be arranged in diverse bases and in varied formats, like videos and photos in social media. However, unstructured data present great difficulty for the information retrieval, not efficiently meeting the informational needs of the users, because there are problems in understanding the meaning of documents stored on the Web. In the context of an Information Retrieval architecture, this research aims to The implementation of a semantic extraction agent in the context of the Web that allows the location, treatment and retrieval of information in the context of Big Data in the most varied informational sources that serves as the basis for the implementation of informational environments that aid the Information Retrieval process , Using ontology to add semantics to the process of retrieval and presentation of results obtained to users, thus being able to meet their needs.


Author(s):  
Bunjamin Memishi ◽  
Shadi Ibrahim ◽  
Maria S. Perez ◽  
Gabriel Antoniu

MapReduce has become a relevant framework for Big Data processing in the cloud. At large-scale clouds, failures do occur and may incur unwanted performance degradation to Big Data applications. As the reliability of MapReduce depends on how well they detect and handle failures, this book chapter investigates the problem of failure detection in the MapReduce framework. The case studies of this contribution reveal that the current static timeout value is not adequate and demonstrate significant variations in the application's response time with different timeout values. While arguing that comparatively little attention has been devoted to the failure detection in the framework, the chapter presents design ideas for a new adaptive timeout.


Author(s):  
Fabian Neuhaus

User data created in the digital context has increasingly been of interest to analysis and spatial analysis in particular. Large scale computer user management systems such as digital ticketing and social networking are creating vast amount of data. Such data systems can contain information generated by potentially millions of individuals. This kind of data has been termed big data. The analysis of big data can in its spatial but also in a temporal and social nature be of much interest for analysis in the context of cities and urban areas. This chapter discusses this potential along with a selection of sample work and an in-depth case study. Hereby the focus is mainly on the use and employment of insight gained from social media data, especially the Twitter platform, in regards to cities and urban environments. The first part of the chapter discusses a range of examples that make use of big data and the mapping of digital social network data. The second part discusses the way the data is collected and processed. An important section is dedicated to the aspects of ethical considerations. A summary and an outlook are discussed at the end.


PLoS ONE ◽  
2021 ◽  
Vol 16 (4) ◽  
pp. e0249993
Author(s):  
Paul X. McCarthy ◽  
Xian Gong ◽  
Sina Eghbal ◽  
Daniel S. Falster ◽  
Marian-Andrei Rizoiu

Ever since the web began, the number of websites has been growing exponentially. These websites cover an ever-increasing range of online services that fill a variety of social and economic functions across a growing range of industries. Yet the networked nature of the web, combined with the economics of preferential attachment, increasing returns and global trade, suggest that over the long run a small number of competitive giants are likely to dominate each functional market segment, such as search, retail and social media. Here we perform a large scale longitudinal study to quantify the distribution of attention given in the online environment to competing organisations. In two large online social media datasets, containing more than 10 billion posts and spanning more than a decade, we tally the volume of external links posted towards the organisations’ main domain name as a proxy for the online attention they receive. We also use the Common Crawl dataset—which contains the linkage patterns between more than a billion different websites—to study the patterns of link concentration over the past three years across the entire web. Lastly, we showcase the linking between economic, financial and market data by exploring the relationships between online attention on social media and the growth in enterprise value in the electric carmaker Tesla. Our analysis shows that despite the fact that we observe consistent growth in all the macro indicators—the total amount of online attention, in the number of organisations with an online presence, and in the functions they perform—we also observe that a smaller number of organisations account for an ever-increasing proportion of total user attention, usually with one large player dominating each function. These results highlight how evolution of the online economy involves innovation, diversity, and then competitive dominance.


Author(s):  
Rasmus Helles ◽  
Jacob Ørmen ◽  
Klaus Bruhn Jensen ◽  
Signe Sophus Lai ◽  
Ericka Menchen-Trevino ◽  
...  

In recent years, large-scale analysis of log data from digital devices - often termed ""big data analysis"" (Lazer, Kennedy, King, & Vespignani, 2014) - have taken hold in the field of internet research. Through Application Programming Interfaces (APIs) and commercial measurement, scholars have been able to analyze social media users (Freelon 2014) and web audiences (Taneja, 2016) on an uprecedented scale. And by developing digital research tools, scholars have been able to track individuals across websites (Menchen-Trevino, 2013) and mobile applications (Ørmen & Thorhauge 2015) in greater detail than ever before. Big data analysis holds unique potential for studying communication in depth and across many individuals (see e.g. Boase & Ling, 2013; Prior, 2013). At the same time, this approach introduces new methodological challenges in the transparency of data collection (Webster, 2014), sampling of participants and validity of conclusions (Rieder, Abdulla, Poell, Woltering, & Zack, 2015). Firstly, data aggregation is typically designed for commercial rather than academic purposes. The type of data included as well as how it is presented depend in large part on the business interests of measurement and advertisement companies (Webster, 2014). Secondly, when relying on this kind of secondary data it can be difficult to validate the output or techniques used to generate the data (Rieder, Abdulla, Poell, Woltering, & Zack, 2015). Thirdly, often the unit of analysis is media-centric, taking specific websites or social network pages as the empirical basis instead of individual users (Taneja, 2016). This makes it hard to untangle the behavior of real-world users from the aggregate trends. Lastly, variations in what users do might be so large that it is necessary to move from the aggregate to smaller groups of users to make meaningful inferences (Welles, 2014). Internet research is thus faced with a new research approach in big data analysis with potentials and perils that need to be discussed in combination with traditional approaches. This panel explores the role of big data analysis in relation to the wider repertoire of methods in internet research. The panel comprises four presentations that each sheds light on the complementarity of big data analysis with more traditional qualitative and quantitative methods. The first presentation opens the discussion with an overview of strategies for combining digital traces and commercial audience data with qualitative interviews and quantitative survey methods. The next presentation explores the potential of trace data to improve upon the experimental method. Researcher-collected data enables scholars to operate in a real-world setting, in contrast to a research lab, while obtaining informed consent from participants. The third presentation argues that large-scale audience data provide a unique perspective on internet use. By integrating census-level information about users with detailed traces of their behavior across websites, commercial audience data combines the strength of surveys and digital trace data respectively. Lastly, the fourth presentation shows how multi-institutional collaboration makes it possible do document social media activity (on Twitter) for a whole country (Australia) in a comprehensive manner. A feat not possible through other methods on a similar scale. Through these four presentations, the panel aims to situate big data analysis in the broader repertoire of internet research methods. 


Author(s):  
Samir Sellami ◽  
Taoufiq Dkaki ◽  
Nacer Eddine Zarour ◽  
Pierre-Jean Charrel

The web diversification into the Web of Data and social media means that companies need to gather all the necessary data to help make the best-informed market decisions. However, data providers on the web publish data in various data models and may equip it with different search capabilities, thus requiring data integration techniques to access them. This work explores the current challenges in this area, discusses the limitations of some existing integration tools, and addresses them by proposing a semantic mediator-based approach to virtually integrate enterprise data with large-scale social and linked data. The implementation of the proposed approach is a configurable middleware application and a user-friendly keyword search interface that retrieves its input from internal enterprise data combined with various SPARQL endpoints and Web APIs. An evaluation study was conducted to compare its features with recent integration approaches. The results illustrate the added value and usability of the contributed approach.


Author(s):  
Grigoris Antoniou ◽  
Sotiris Batsakis ◽  
Raghava Mutharaju ◽  
Jeff Z. Pan ◽  
Guilin Qi ◽  
...  

AbstractAs more and more data is being generated by sensor networks, social media and organizations, the Web interlinking this wealth of information becomes more complex. This is particularly true for the so-called Web of Data, in which data is semantically enriched and interlinked using ontologies. In this large and uncoordinated environment, reasoning can be used to check the consistency of the data and of associated ontologies, or to infer logical consequences which, in turn, can be used to obtain new insights from the data. However, reasoning approaches need to be scalable in order to enable reasoning over the entire Web of Data. To address this problem, several high-performance reasoning systems, which mainly implement distributed or parallel algorithms, have been proposed in the last few years. These systems differ significantly; for instance in terms of reasoning expressivity, computational properties such as completeness, or reasoning objectives. In order to provide a first complete overview of the field, this paper reports a systematic review of such scalable reasoning approaches over various ontological languages, reporting details about the methods and over the conducted experiments. We highlight the shortcomings of these approaches and discuss some of the open problems related to performing scalable reasoning.


Sign in / Sign up

Export Citation Format

Share Document