scholarly journals Swift Logic for Big Data and Knowledge Graphs

Author(s):  
Luigi Bellomarini ◽  
Georg Gottlob ◽  
Andreas Pieris ◽  
Emanuel Sallinger

Many modern companies wish to maintain knowledge in the form of a corporate knowledge graph and to use and manage this knowledge via a knowledge graph management system (KGMS). We formulate various requirements for a fully fledged KGMS. In particular, such a system must be capable of performing complex reasoning tasks but, at the same time, achieve efficient and scalable reasoning over Big Data with an acceptable computational complexity. Moreover, a KGMS needs interfaces to corporate databases, the web, and machine-learning and analytics packages. We present KRR formalisms and a system achieving these goals.

Digital technology is fast changing in the recent years and with this change, the number of data systems, sources, and formats has also increased exponentially. So the process of extracting data from these multiple source systems and transforming it to suit for various analytics processes is gaining importance at an alarming rate. In order to handle Big Data, the process of transformation is quite challenging, as data generation is a continuous process. In this paper, we extract data from various heterogeneous sources from the web and try to transform it into a form which is vastly used in data warehousing so that it caters to the analytical needs of the machine learning community.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Jui-Chan Huang ◽  
Po-Chang Ko ◽  
Cher-Min Fong ◽  
Sn-Man Lai ◽  
Hsin-Hung Chen ◽  
...  

With the increase in the number of online shopping users, customer loyalty is directly related to product sales. This research mainly explores the statistical modeling and simulation of online shopping customer loyalty based on machine learning and big data analysis. This research mainly uses machine learning clustering algorithm to simulate customer loyalty. Call the k-means interactive mining algorithm based on the Hash structure to perform data mining on the multidimensional hierarchical tree of corporate credit risk, continuously adjust the support thresholds for different levels of data mining according to specific requirements and select effective association rules until satisfactory results are obtained. After conducting credit risk assessment and early warning modeling for the enterprise, the initial preselected model is obtained. The information to be collected is first obtained by the web crawler from the target website to the temporary web page database, where it will go through a series of preprocessing steps such as completion, deduplication, analysis, and extraction to ensure that the crawled web page is correctly analyzed, to avoid incorrect data due to network errors during the crawling process. The correctly parsed data will be stored for the next step of data cleaning or data analysis. For writing a Java program to parse HTML documents, first set the subject keyword and URL and parse the HTML from the obtained file or string by analyzing the structure of the website. Secondly, use the CSS selector to find the web page list information, retrieve the data, and store it in Elements. In the overall fit test of the model, the root mean square error approximation (RMSEA) value is 0.053, between 0.05 and 0.08. The results show that the model designed in this study achieves a relatively good fitting effect and strengthens customers’ perception of shopping websites, and relationship trust plays a greater role in maintaining customer loyalty.


JAMIA Open ◽  
2020 ◽  
Vol 3 (3) ◽  
pp. 332-337
Author(s):  
Bhuvan Sharma ◽  
Van C Willis ◽  
Claudia S Huettner ◽  
Kirk Beaty ◽  
Jane L Snowdon ◽  
...  

Abstract Objectives Describe an augmented intelligence approach to facilitate the update of evidence for associations in knowledge graphs. Methods New publications are filtered through multiple machine learning study classifiers, and filtered publications are combined with articles already included as evidence in the knowledge graph. The corpus is then subjected to named entity recognition, semantic dictionary mapping, term vector space modeling, pairwise similarity, and focal entity match to identify highly related publications. Subject matter experts review recommended articles to assess inclusion in the knowledge graph; discrepancies are resolved by consensus. Results Study classifiers achieved F-scores from 0.88 to 0.94, and similarity thresholds for each study type were determined by experimentation. Our approach reduces human literature review load by 99%, and over the past 12 months, 41% of recommendations were accepted to update the knowledge graph. Conclusion Integrated search and recommendation exploiting current evidence in a knowledge graph is useful for reducing human cognition load.


Semantic Web ◽  
2021 ◽  
pp. 1-20
Author(s):  
Pierre Monnin ◽  
Chedy Raïssi ◽  
Amedeo Napoli ◽  
Adrien Coulet

Knowledge graphs are freely aggregated, published, and edited in the Web of data, and thus may overlap. Hence, a key task resides in aligning (or matching) their content. This task encompasses the identification, within an aggregated knowledge graph, of nodes that are equivalent, more specific, or weakly related. In this article, we propose to match nodes within a knowledge graph by (i) learning node embeddings with Graph Convolutional Networks such that similar nodes have low distances in the embedding space, and (ii) clustering nodes based on their embeddings, in order to suggest alignment relations between nodes of a same cluster. We conducted experiments with this approach on the real world application of aligning knowledge in the field of pharmacogenomics, which motivated our study. We particularly investigated the interplay between domain knowledge and GCN models with the two following focuses. First, we applied inference rules associated with domain knowledge, independently or combined, before learning node embeddings, and we measured the improvements in matching results. Second, while our GCN model is agnostic to the exact alignment relations (e.g., equivalence, weak similarity), we observed that distances in the embedding space are coherent with the “strength” of these different relations (e.g., smaller distances for equivalences), letting us considering clustering and distances in the embedding space as a means to suggest alignment relations in our case study.


2021 ◽  
Author(s):  
Aisha Mohamed ◽  
Ghadeer Abuoda ◽  
Abdurrahman Ghanem ◽  
Zoi Kaoudi ◽  
Ashraf Aboulnaga

AbstractKnowledge graphs represented as RDF datasets are integral to many machine learning applications. RDF is supported by a rich ecosystem of data management systems and tools, most notably RDF database systems that provide a SPARQL query interface. Surprisingly, machine learning tools for knowledge graphs do not use SPARQL, despite the obvious advantages of using a database system. This is due to the mismatch between SPARQL and machine learning tools in terms of data model and programming style. Machine learning tools work on data in tabular format and process it using an imperative programming style, while SPARQL is declarative and has as its basic operation matching graph patterns to RDF triples. We posit that a good interface to knowledge graphs from a machine learning software stack should use an imperative, navigational programming paradigm based on graph traversal rather than the SPARQL query paradigm based on graph patterns. In this paper, we present RDFFrames, a framework that provides such an interface. RDFFrames provides an imperative Python API that gets internally translated to SPARQL, and it is integrated with the PyData machine learning software stack. RDFFrames enables the user to make a sequence of Python calls to define the data to be extracted from a knowledge graph stored in an RDF database system, and it translates these calls into a compact SPQARL query, executes it on the database system, and returns the results in a standard tabular format. Thus, RDFFrames is a useful tool for data preparation that combines the usability of PyData with the flexibility and performance of RDF database systems.


Author(s):  
Alexandros Vassiliades ◽  
Nick Bassiliades ◽  
Filippos Gouidis ◽  
Theodore Patkos

Abstract In the field of domestic cognitive robotics, it is important to have a rich representation of knowledge about how household objects are related to each other and with respect to human actions. In this paper, we present a domain dependent knowledge retrieval framework for household environments which was constructed by extracting knowledge from the VirtualHome dataset (http://virtual-home.org). The framework provides knowledge about sequences of actions on how to perform human scaled tasks in a household environment, answers queries about household objects, and performs semantic matching between entities from the web knowledge graphs DBpedia, ConceptNet, and WordNet, with the ones existing in our knowledge graph. We offer a set of predefined SPARQL templates that directly address the ontology on which our knowledge retrieval framework is built, and querying capabilities through SPARQL. We evaluated our framework via two different user evaluations.


2021 ◽  
Author(s):  
David Geleta ◽  
Andriy Nikolov ◽  
Gavin Edwards ◽  
Anna Gogleva ◽  
Richard Jackson ◽  
...  

The use of knowledge graphs as a data source for machine learning methods to solve complex problems in life sciences has rapidly become popular in recent years. Our Biological Insights Knowledge Graph (BIKG) combines relevant data for drug development from public as well as internal data sources to provide insights for a range of tasks: from identifying new targets to repurposing existing drugs. Besides the common requirements to organisational knowledge graphs such as being able to capture the domain precisely and give the users the ability to search and query the data, the focus on handling multiple use cases and supporting use case-specific machine learning models presents additional challenges: the data models must also be streamlined for the performance of downstream tasks; graph content must be easily customisable for different use cases; different projections of the graph content are required to support a wider range of different consumption modes. In this paper we describe our main design choices in implementation of the BIKG graph and discuss different aspects of its life cycle: from graph construction to exploitation.


2022 ◽  
Vol 13 (1) ◽  
pp. 1-21
Author(s):  
Zhihan Lv ◽  
Ranran Lou ◽  
Hailin Feng ◽  
Dongliang Chen ◽  
Haibin Lv

Two-dimensional 1 arrays of bi-component structures made of cobalt and permalloy elliptical dots with thickness of 25 nm, length 1 mm and width of 225 nm, have been prepared by a self-aligned shadow deposition technique. Brillouin light scattering has been exploited to study the frequency dependence of thermally excited magnetic eigenmodes on the intensity of the external magnetic field, applied along the easy axis of the elements. Scientific information technology has been developed rapidly. Here, the purposes are to make people's lives more convenient and ensure information management and classification. The machine learning algorithm is improved to obtain the optimized Light Gradient Boosting Machine (LightGBM) algorithm. Then, an Android-based intelligent support information management system is designed based on LightGBM for the big data analysis and classification management of information in the intelligent support information management system. The system is designed with modules of employee registration and login, company announcement notice, attendance and attendance management, self-service, and daily tools with the company as the subject. Furthermore, the performance of the constructed information management system is analyzed through simulations. Results demonstrate that the training time of the optimized LightGBM algorithm can stabilize at about 100s, and the test time can stabilize at 0.68s. Besides, its accuracy rate can reach 89.24%, which is at least 3.6% higher than other machine learning algorithms. Moreover, the acceleration efficiency analysis of each algorithm suggests that the optimized LightGBM algorithm is suitable for processing large amounts of data; its acceleration effect is more apparent, and its acceleration ratio is higher than other algorithms. Hence, the constructed intelligent support information management system can reach a high accuracy while ensuring the error, with apparent acceleration effect. Therefore, this model can provide an experimental reference for information classification and management in various fields.


Author(s):  
Dipali Navnath Argade ◽  
Shailaja Dilip Pawar ◽  
Vijay Vitthal Thitme ◽  
Amol Dagu Shelkar

Machine Learning is that the core subarea of AI . It makes computers get into a self-learning mode without explicit programming. When fed new data, these computers learn, grow, change, and develop by themselves. The concept of machine learning has been around for a short time now. However, the power to automatically and quickly apply mathematical calculations to big data is now gaining a touch of momentum. Machine learning has been utilized in several places just like the self-driving Google car, the web recommendation engines – friend recommendations on Facebook, offer suggestions from Amazon, and in cyber fraud detection. In this paper, we will learn basic terms in machine learning.


Sign in / Sign up

Export Citation Format

Share Document