A survey of large-scale reasoning on the Web of data

AbstractAs more and more data is being generated by sensor networks, social media and organizations, the Web interlinking this wealth of information becomes more complex. This is particularly true for the so-called Web of Data, in which data is semantically enriched and interlinked using ontologies. In this large and uncoordinated environment, reasoning can be used to check the consistency of the data and of associated ontologies, or to infer logical consequences which, in turn, can be used to obtain new insights from the data. However, reasoning approaches need to be scalable in order to enable reasoning over the entire Web of Data. To address this problem, several high-performance reasoning systems, which mainly implement distributed or parallel algorithms, have been proposed in the last few years. These systems differ significantly; for instance in terms of reasoning expressivity, computational properties such as completeness, or reasoning objectives. In order to provide a first complete overview of the field, this paper reports a systematic review of such scalable reasoning approaches over various ontological languages, reporting details about the methods and over the conducted experiments. We highlight the shortcomings of these approaches and discuss some of the open problems related to performing scalable reasoning.

Download Full-text

MidSemI

International Journal of Information System Modeling and Design ◽

10.4018/ijismd.2019040101 ◽

2019 ◽

Vol 10 (2) ◽

pp. 1-25 ◽

Cited By ~ 1

Author(s):

Samir Sellami ◽

Taoufiq Dkaki ◽

Nacer Eddine Zarour ◽

Pierre-Jean Charrel

Keyword(s):

Social Media ◽

Linked Data ◽

Large Scale ◽

Keyword Search ◽

Evaluation Study ◽

Added Value ◽

Web Of Data ◽

Integration Techniques ◽

User Friendly ◽

The Web

The web diversification into the Web of Data and social media means that companies need to gather all the necessary data to help make the best-informed market decisions. However, data providers on the web publish data in various data models and may equip it with different search capabilities, thus requiring data integration techniques to access them. This work explores the current challenges in this area, discusses the limitations of some existing integration tools, and addresses them by proposing a semantic mediator-based approach to virtually integrate enterprise data with large-scale social and linked data. The implementation of the proposed approach is a configurable middleware application and a user-friendly keyword search interface that retrieves its input from internal enterprise data combined with various SPARQL endpoints and Web APIs. An evaluation study was conducted to compare its features with recent integration approaches. The results illustrate the added value and usability of the contributed approach.

Download Full-text

Social Media-based User Embedding: A Literature Review

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/881 ◽

2019 ◽

Author(s):

Shimei Pan ◽

Tao Ding

Keyword(s):

Social Media ◽

High Performance ◽

Large Scale ◽

Ground Truth ◽

Representation Learning ◽

Success Stories ◽

Recent Success ◽

User Data ◽

Low Dimensional ◽

And Behavior

Automated representation learning is behind many recent success stories in machine learning. It is often used to transfer knowledge learned from a large dataset (e.g., raw text) to tasks for which only a small number of training examples are available. In this paper, we review recent advance in learning to represent social media users in low-dimensional embeddings. The technology is critical for creating high performance social media-based human traits and behavior models since the ground truth for assessing latent human traits and behavior is often expensive to acquire at a large scale. In this survey, we review typical methods for learning a unified user embeddings from heterogeneous user data (e.g., combines social media texts with images to learn a unified user representation). Finally we point out some current issues and future directions.

Download Full-text

The Web of Data and the Tourism Industry

Digital Culture and E-Tourism ◽

10.4018/978-1-61520-867-8.ch006 ◽

2011 ◽

pp. 75-89 ◽

Cited By ~ 1

Author(s):

Diego Berrueta ◽

Antonio Campos ◽

Emilio Rubiera ◽

Carlos Tejo ◽

José E. Labra

Keyword(s):

Large Scale ◽

Tourism Industry ◽

Web Architecture ◽

Data Flows ◽

Web Of Data ◽

Efficient Data ◽

Evolutionary Step ◽

Full Benefit ◽

Existing Data ◽

The Web

The web of data is a new evolutionary step of the web that involves the publication, interchange and consumption of meaningful, raw data by taking full benefit of the web architecture. All the parties involved in the tourism industry should consider the opportunities offered by this new web. Entry barriers are low because existing data sources and documents can be easily leveraged to be part of this extended web. At the same time, new services and platforms that exploit the data are beginning to show the large potential for increased technological and business opportunities. A new scenario of large-scale information availability and efficient data flows is discussed in this chapter.

Download Full-text

Evolution of diversity and dominance of companies in online activity

PLoS ONE ◽

10.1371/journal.pone.0249993 ◽

2021 ◽

Vol 16 (4) ◽

pp. e0249993

Author(s):

Paul X. McCarthy ◽

Xian Gong ◽

Sina Eghbal ◽

Daniel S. Falster ◽

Marian-Andrei Rizoiu

Keyword(s):

Social Media ◽

Large Scale ◽

Preferential Attachment ◽

Online Activity ◽

Enterprise Value ◽

Online Social Media ◽

Long Run ◽

User Attention ◽

Online Attention ◽

The Web

Ever since the web began, the number of websites has been growing exponentially. These websites cover an ever-increasing range of online services that fill a variety of social and economic functions across a growing range of industries. Yet the networked nature of the web, combined with the economics of preferential attachment, increasing returns and global trade, suggest that over the long run a small number of competitive giants are likely to dominate each functional market segment, such as search, retail and social media. Here we perform a large scale longitudinal study to quantify the distribution of attention given in the online environment to competing organisations. In two large online social media datasets, containing more than 10 billion posts and spanning more than a decade, we tally the volume of external links posted towards the organisations’ main domain name as a proxy for the online attention they receive. We also use the Common Crawl dataset—which contains the linkage patterns between more than a billion different websites—to study the patterns of link concentration over the past three years across the entire web. Lastly, we showcase the linking between economic, financial and market data by exploring the relationships between online attention on social media and the growth in enterprise value in the electric carmaker Tesla. Our analysis shows that despite the fact that we observe consistent growth in all the macro indicators—the total amount of online attention, in the number of organisations with an online presence, and in the functions they perform—we also observe that a smaller number of organisations account for an ever-increasing proportion of total user attention, usually with one large player dominating each function. These results highlight how evolution of the online economy involves innovation, diversity, and then competitive dominance.

Download Full-text

CloutContracts Whitepaper

10.31224/osf.io/tb7xh ◽

2021 ◽

Author(s):

Andrew Kamal

Keyword(s):

Social Media ◽

High Performance ◽

Large Scale ◽

Social Aspects ◽

Smart Contracts ◽

New Class ◽

Social Media Platforms ◽

Ease Of Access ◽

Large Scale Networks ◽

Basic Level

CloutContracts is a smart contracts layer on top of and complimentary to BitClout, as well as potentially other social media platforms in the future as well. As a smart contracts layer, many creators onboarded to CloutContracts can create high performance DAPPs w/ an emphasis on low gas fees, customization, speed and various social aspects as well. This will eventually allow creators to build large scale networks, tokenization usecases, and bring blockchain adaptability to their fanbases. Unlike traditional rollup networks or DAPP tools, the emphasis is on the creator, adaptability, and expanded functionalities such as modular tools or microservices. CloutContracts aims to have creators feel like they aren't needing to choose between expanded functionality from running their own blockchain or accessibility from running on top of a network. This creates a new class of blockchain developers, in which ease of access is aimed towards both the most basic level to running complex lightweight apps on JavaScript or Solidity.

Download Full-text

Data transforms, patterns and profiles for 21st century Cultural Heritage

Proceedings of the Symposium on Cultural Heritage Markup ◽

10.4242/balisagevol16.ogbuji01 ◽

2015 ◽

Author(s):

Uche Ogbuji ◽

Mark Baker

Keyword(s):

Cultural Heritage ◽

Linked Data ◽

High Performance ◽

Large Scale ◽

Public Libraries ◽

Public Library ◽

Web Pages ◽

Dramatic Improvement ◽

Web Resource ◽

The Web

If you search for books and other media on the Web, you find Amazon, Wikipedia, and many other resources long before you see any libraries. This is a historical problem of librarians' having started ahead of the state of the art in database technologies, and yet unable to keep up with mainstream computing developments, including the Web. As a result, libraries are left with extraordinarily rich catalogs in formats which are unsuited to the Web, and which need a lot of work to adapt for the Web. A first step towards addressing this problem, BIBFRAME is a model developed for representing metadata from libraries and other cultural heritage institutions in linked data form. Libhub is a project building on BIBFRAME to convert traditional library formats, especially MARC/XML, to Web resource pages using BIBFRAME and other vocabulary frameworks. The technology used to implement Libhub transforms MARC/XML to a semi-structured, RDF-like metamodel called Versa, from which various outputs are possible, including data-rich Web pages. The authors developed a pipeline processing technology in Python in order to address the need for high performance and scalability as well as a prodigious degree of customization to accommodate a half century of variations and nuances in library cataloging conventions. The heart of this pipelining system is in the open-source project pybibframe, and the main way to customize the transform for non-technical librarians is a pattern microlanguage called marcpatterns.py. Using marcpatterns.py recipes specialized for the first Libhub participant, Denver Public Library, further specialized from common patterns among public libraries, (FIXME - not quite sure what is being said here) The first prerelease of linked data Web pages has already demonstrated the dramatic improvement in visibility for the library and quality, curated content for the Web, made possible through the adaptive, semistructured transform from notoriously abstruse library catalog formats. This paper discusses an unorthodox approach to structured and heuristics-based transformation from a large corpus of XML in a difficult format which doesn't well serve the richness of its content. It covers some of the pragmatic choices made by developers of the system who happen to be pioneering advocates of The Web, markup, and standards around these, but who had to subordinate purity to the urgent need to effect large-scale exposure of dark cultural heritage data in difficult circumstances for a small development and maintenance team. This is a case study of where proper knowledge of XML and its related standards must combine with agile techniques and "worse-is-better" concessions to solve a stubborn problem in extracting value from cultural heritage markup.

Download Full-text

Revisiting the High-Performance Reconfigurable Computing for Future Datacenters

Future Internet ◽

10.3390/fi12040064 ◽

2020 ◽

Vol 12 (4) ◽

pp. 64 ◽

Cited By ~ 2

Author(s):

Qaiser Ijaz ◽

El-Bay Bourennane ◽

Ali Kashif Bashir ◽

Hira Asghar

Keyword(s):

Reconfigurable Computing ◽

High Performance ◽

Large Scale ◽

Open Problems ◽

Communication Architecture ◽

Field Programmable ◽

Large Scale Integration ◽

On Chip ◽

Scale Integration ◽

Standard Terms

Modern datacenters are reinforcing the computational power and energy efficiency by assimilating field programmable gate arrays (FPGAs). The sustainability of this large-scale integration depends on enabling multi-tenant FPGAs. This requisite amplifies the importance of communication architecture and virtualization method with the required features in order to meet the high-end objective. Consequently, in the last decade, academia and industry proposed several virtualization techniques and hardware architectures for addressing resource management, scheduling, adoptability, segregation, scalability, performance-overhead, availability, programmability, time-to-market, security, and mainly, multitenancy. This paper provides an extensive survey covering three important aspects—discussion on non-standard terms used in existing literature, network-on-chip evaluation choices as a mean to explore the communication architecture, and virtualization methods under latest classification. The purpose is to emphasize the importance of choosing appropriate communication architecture, virtualization technique and standard language to evolve the multi-tenant FPGAs in datacenters. None of the previous surveys encapsulated these aspects in one writing. Open problems are indicated for scientific community as well.

Download Full-text

Linking Web Resources in Web of Data to Encyclopedic Knowledge Base

Open Computer Science ◽

10.1515/comp-2020-0102 ◽

2020 ◽

Vol 10 (1) ◽

pp. 357-368

Author(s):

Farzam Matinfar

Keyword(s):

Knowledge Base ◽

High Performance ◽

Web Resources ◽

Additional Information ◽

Web Resource ◽

The Core ◽

Knowledge Based ◽

Database Evaluation ◽

Web Of Data ◽

The Web

AbstractThis paper introduces Wikipedia as an extensive knowledge base which provides additional information about a great number of web resources in the semantic web, and shows how RDF web resources in the web of data can be linked to this encyclopedia. Given an input web resource, the designed system identifies the topic of the web resource and links it to the corresponding Wikipedia article. To perform this task, we use the core labeling properties in web of data to specify the candidate Wikipedia articles for a web resource. Finally, a knowledge based approach is used to identify the most appropriate article in Wikipedia database. Evaluation of the system shows the high performance of the designed system.

Download Full-text

Problem solving methods in a global networked age

Artificial intelligence for engineering design analysis and manufacturing ◽

10.1017/s0890060409990060 ◽

2009 ◽

Vol 23 (4) ◽

pp. 373-390 ◽

Cited By ~ 2

Author(s):

John Domingue ◽

Dieter Fensel

Keyword(s):

Problem Solving ◽

Web Service ◽

Large Scale ◽

Ad Hoc ◽

Future Internet ◽

European Project ◽

Web Of Data ◽

Viable Approach ◽

Semantic Service ◽

The Web

AbstractWe believe that the future for problem solving method (PSM) derived work is very promising. In short, PSMs provide a solid foundation for creating a semantic layer supporting planetary-scale networks. Moreover, within a world-scale network where billions services are used and created by billions of parties inad hocdynamic fashion we believe that PSM-based mechanisms provide the only viable approach to dealing the sheer scale systematically. Our current experiments in this area are based upon a generic ontology for describing Web services derived from earlier work on PSMs. We outline how platforms based on our ontology can support large-scale networked interactivity in three main areas. Within a large European project we are able to map business level process descriptions to semantic Web service descriptions, to enable business experts to manage and use enterprise processes running in corporate information technology systems. Although highly successful, Web service-based applications predominately run behind corporate firewalls and are far less pervasive on the general Web. Within a second large European project we are extending our semantic service work using the principles underlying the Web and Web 2.0 to transform the Web from a Web of data to one where services are managed and used at large scale. Significant initiatives are now underway in North America, Asia, and Europe to design a new Internet using a “clean-slate” approach to fulfill the demands created by new modes of use and the additional 3 billion users linked to mobile phones. Our investigations within the European-based Future Internet program indicate that a significant opportunity exists for our PSM-derived work to address the key challenges currently identified: scalability, trust, interoperability, pervasive usability, and mobility. We outline one PSM-derived approach as an exemplar.

Download Full-text

Efficient Computation of the Well-Founded Semantics over Big Data

Theory and Practice of Logic Programming ◽

10.1017/s1471068414000131 ◽

2014 ◽

Vol 14 (4-5) ◽

pp. 445-459 ◽

Cited By ~ 4

Author(s):

ILIAS TACHMAZIDIS ◽

GRIGORIS ANTONIOU ◽

WOLFGANG FABER

Keyword(s):

Social Media ◽

Big Data ◽

Logic Programming ◽

Large Scale ◽

Nonmonotonic Reasoning ◽

Efficient Computation ◽

Mapreduce Framework ◽

The Face ◽

Complex Knowledge ◽

The Web

AbstractData originating from the Web, sensor readings and social media result in increasingly huge datasets. The so called Big Data comes with new scientific and technological challenges while creating new opportunities, hence the increasing interest in academia and industry. Traditionally, logic programming has focused on complex knowledge structures/programs, so the question arises whether and how it can work in the face of Big Data. In this paper, we examine how the well-founded semantics can process huge amounts of data through mass parallelization. More specifically, we propose and evaluate a parallel approach using the MapReduce framework. Our experimental results indicate that our approach is scalable and that well-founded semantics can be applied to billions of facts. To the best of our knowledge, this is the first work that addresses large scale nonmonotonic reasoning without the restriction of stratification for predicates of arbitrary arity.

Download Full-text