LIMES: A Framework for Link Discovery on the Semantic Web

AbstractThe Linked Data paradigm builds upon the backbone of distributed knowledge bases connected by typed links. The mere volume of current knowledge bases as well as their sheer number pose two major challenges when aiming to support the computation of links across and within them. The first is that tools for link discovery have to be time-efficient when they compute links. Secondly, these tools have to produce links of high quality to serve the applications built upon Linked Data well. Solutions to the second problem build upon efficient computational approaches developed to solve the first and combine these with dedicated machine learning techniques. The current version of the Limes framework is the product of seven years of research on these two challenges. A series of machine learning techniques and efficient computation approaches were developed and integrated into this framework to address the link discovery problem. The framework combines these diverse algorithms within a generic and extensible architecture. In this article, we give an overview of version 1.7.4 of the open-source release of the framework. In particular, we focus on an overview of the architecture of the framework, an intuition of its inner workings and a brief overview of the approaches it contains. Some descriptions of the applications within which the framework was used complete the paper. Our framework is open-source and available under a GNU license at https://github.com/dice-group/LIMES together with a user manual and a developer manual.

Download Full-text

The Waikato Open Source Frameworks (WEKA and MOA) for Machine Learning Techniques

Journal of Physics Conference Series ◽

10.1088/1742-6596/1804/1/012133 ◽

2021 ◽

Vol 1804 (1) ◽

pp. 012133

Author(s):

Mahmood Shakir Hammoodi ◽

Hasanain Ali Al Essa ◽

Wial Abbas Hanon

Keyword(s):

Machine Learning ◽

Open Source ◽

Machine Learning Techniques ◽

Learning Techniques

Download Full-text

Applying Machine Learning Techniques to Predict the Maintainability of Open Source Software

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.e1045.0785s319 ◽

2019 ◽

Vol 8 (5S3) ◽

pp. 192-195

Keyword(s):

Machine Learning ◽

Open Source ◽

Open Source Software ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Software Applications ◽

Learning Techniques ◽

Software Maintainability ◽

Quality Aspect

Software maintainability is a vital quality aspect as per ISO standards. This has been a concern since decades and even today, it is of top priority. At present, majority of the software applications, particularly open source software are being developed using Object-Oriented methodologies. Researchers in the earlier past have used statistical techniques on metric data extracted from software to evaluate maintainability. Recently, machine learning models and algorithms are also being used in a majority of research works to predict maintainability. In this research, we performed an empirical case study on an open source software jfreechart by applying machine learning algorithms. The objective was to study the relationships between certain metrics and maintainability.

Download Full-text

Developing an Open-Source Lightweight Game Engine with DNN Support

Electronics ◽

10.3390/electronics9091421 ◽

2020 ◽

Vol 9 (9) ◽

pp. 1421

Author(s):

Haechan Park ◽

Nakhoon Baek

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Open Source ◽

Programming Languages ◽

Cost Effective ◽

Machine Learning Techniques ◽

Learning Technology ◽

Game Engine ◽

Learning Techniques ◽

Technical Issues

With the growth of artificial intelligence and deep learning technology, we have many active research works to apply the related techniques in various fields. To test and apply the latest machine learning techniques in gaming, it will be very useful to have a light-weight game engine for quick prototyping. Our game engine is implemented in a cost-effective way, in comparison to well-known commercial proprietary game engines, by utilizing open source products. Due to its simple internal architecture, our game engine is especially beneficial for modifying and reviewing the new functions through quick and repetitive tests. In addition, the game engine has a DNN (deep neural network) module, with which the proposed game engine can apply deep learning techniques to the game features, through applying deep learning algorithms in real-time. Our DNN module uses a simple C++ function interface, rather than additional programming languages and/or scripts. This simplicity enables us to apply machine learning techniques more efficiently and casually to the game applications. We also found some technical issues during our development with open sources. These issues mostly occurred while integrating various open source products into a single game engine. We present details of these technical issues and our solutions.

Download Full-text

A Unified Framework for Bug Report Assignment

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194019500256 ◽

2019 ◽

Vol 29 (04) ◽

pp. 607-628 ◽

Cited By ~ 1

Author(s):

Yuan Zhao ◽

Tieke He ◽

Zhenyu Chen

Keyword(s):

Machine Learning ◽

Open Source ◽

Software Testing ◽

Mobile Internet ◽

Machine Learning Techniques ◽

Classification Algorithms ◽

Unified Framework ◽

Bug Reports ◽

Learning Techniques ◽

Bug Report

It is typically a manual, time-consuming, and tedious task of assigning bug reports to individual developers. Although some machine learning techniques are adopted to alleviate this dilemma, they are mainly focused on the open source projects, which use traditional repositories such as Bugzilla to manage their bug reports. With the boom of the mobile Internet, some new requirements and methods of software testing are emerging, especially the crowdsourced testing. Unlike the traditional channels, whose bug reports are often heavyweight, which means their bug reports are standardized with detailed attribute localization, bug reports tend to be lightweight in the context of crowdsourced testing. To exploit the differences of the bug reports assignment in the new settings, a unified bug reports assignment framework is proposed in this paper. This framework is capable of handling both the traditional heavyweight bug reports and the lightweight ones by (i) first preprocessing the bug reports and feature selections, (ii) then tuning the parameters that indicate the ratios of choosing different methods to vectorize bug reports, (iii) and finally applying classification algorithms to assign bug reports. Extensive experiments are conducted on three datasets to evaluate the proposed framework. The results indicate the applicability of the proposed framework, and also reveal the differences of bug report assignment between traditional repositories and crowdsourced ones.

Download Full-text

NICeSim: An open-source simulator based on machine learning techniques to support medical research on prenatal and perinatal care decision making

Artificial Intelligence in Medicine ◽

10.1016/j.artmed.2014.10.001 ◽

2014 ◽

Vol 62 (3) ◽

pp. 193-201 ◽

Cited By ~ 10

Author(s):

Fabio Ribeiro Cerqueira ◽

Tiago Geraldo Ferreira ◽

Alcione de Paiva Oliveira ◽

Douglas Adriano Augusto ◽

Eduardo Krempser ◽

...

Keyword(s):

Machine Learning ◽

Decision Making ◽

Medical Research ◽

Open Source ◽

Perinatal Care ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Care Decision

Download Full-text

Krathu-500: Post-Comments Thai Corpus

10.36227/techrxiv.17205755.v1 ◽

2021 ◽

Author(s):

Pittawat Taveekitworachai ◽

Jonathan H. Chan

Keyword(s):

Machine Learning ◽

Open Source ◽

Source Code ◽

Current Source ◽

Real Life ◽

Machine Learning Techniques ◽

Life Situation ◽

Thai Language ◽

Learning Techniques ◽

Real Life Situation

The Krathu-500 contains 574 Pantip posts title, post body with all comments of each post. The number of total comments is at 63,293 comments. The corpus provide Thai language used in real life situation with various context and types in conversational form. The corpus serves as a good way to improve capability of machine learning techniques that dealing with Thai language. Sentiment labeled smaller version of the comments dataset also provided with 6,306 records. The labeled corpus is human-annotated dataset with three labels for negative, neutral, and positive comments. The project also consists of open-source repository that allow any people who interested to modify and built on top of the current source code and dataset.

Download Full-text

Methodological guidelines to estimate population-based health indicators using linked data and/or machine learning techniques

Archives of Public Health ◽

10.1186/s13690-021-00770-6 ◽

2022 ◽

Vol 80 (1) ◽

Author(s):

Romana Haneef ◽

Mariken Tijhuis ◽

Rodolphe Thiébaut ◽

Ondřej Májek ◽

Ivan Pristaš ◽

...

Keyword(s):

Machine Learning ◽

Population Health ◽

Linked Data ◽

Health Indicators ◽

Population Based ◽

European Countries ◽

Machine Learning Techniques ◽

High Quality Research ◽

Learning Techniques ◽

Methodological Guidelines

Abstract Background The capacity to use data linkage and artificial intelligence to estimate and predict health indicators varies across European countries. However, the estimation of health indicators from linked administrative data is challenging due to several reasons such as variability in data sources and data collection methods resulting in reduced interoperability at various levels and timeliness, availability of a large number of variables, lack of skills and capacity to link and analyze big data. The main objective of this study is to develop the methodological guidelines calculating population-based health indicators to guide European countries using linked data and/or machine learning (ML) techniques with new methods. Method We have performed the following step-wise approach systematically to develop the methodological guidelines: i. Scientific literature review, ii. Identification of inspiring examples from European countries, and iii. Developing the checklist of guidelines contents. Results We have developed the methodological guidelines, which provide a systematic approach for studies using linked data and/or ML-techniques to produce population-based health indicators. These guidelines include a detailed checklist of the following items: rationale and objective of the study (i.e., research question), study design, linked data sources, study population/sample size, study outcomes, data preparation, data analysis (i.e., statistical techniques, sensitivity analysis and potential issues during data analysis) and study limitations. Conclusions This is the first study to develop the methodological guidelines for studies focused on population health using linked data and/or machine learning techniques. These guidelines would support researchers to adopt and develop a systematic approach for high-quality research methods. There is a need for high-quality research methodologies using more linked data and ML-techniques to develop a structured cross-disciplinary approach for improving the population health information and thereby the population health.

Download Full-text

Importance of Machine Learning Techniques to Improve the Open Source Intrusion Detection Systems

Indonesian Journal of Electrical Engineering and Informatics (IJEEI) ◽

10.52549/ijeei.v9i3.3219 ◽

2021 ◽

Vol 9 (3) ◽

Author(s):

Fatimetou Abdou VADHIL ◽

Mohamedade Farouk NANNE ◽

Mohamed Lemine SALIHI

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Open Source ◽

Intrusion Detection Systems ◽

Machine Learning Techniques ◽

Detection Systems ◽

Learning Techniques

Download Full-text

On the Application of Cross-Project Validation for Predicting Maintainability of Open Source Software using Machine Learning Techniques

2018 7th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO) ◽

10.1109/icrito.2018.8748749 ◽

2018 ◽

Cited By ~ 2

Author(s):

Ruchika Malhotra ◽

Kusum Lata

Keyword(s):

Machine Learning ◽

Open Source ◽

Open Source Software ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Cross Project

Download Full-text

Learning of OWL Class Expressions on Very Large Knowledge Bases and its Applications

Semantic Services, Interoperability and Web Applications ◽

10.4018/978-1-60960-593-3.ch005 ◽

2011 ◽

pp. 104-130 ◽

Cited By ~ 6

Author(s):

Sebastian Hellmann ◽

Jens Lehmann ◽

Sören Auer

Keyword(s):

Semantic Web ◽

Linked Data ◽

Web Applications ◽

Real Life ◽

Knowledge Bases ◽

Machine Learning Techniques ◽

Research And Practice ◽

Learning Techniques ◽

Web Research ◽

The Web

The vision of the Semantic Web aims to make use of semantic representations on the largest possible scale - the Web. Large knowledge bases such as DBpedia, OpenCyc, and GovTrack are emerging and freely available as Linked Data and SPARQL endpoints. Exploring and analysing such knowledge bases is a significant hurdle for Semantic Web research and practice. As one possible direction for tackling this problem, the authors present an approach for obtaining complex class expressions from objects in knowledge bases by using Machine Learning techniques. The chapter describes in detail how to leverage existing techniques to achieve scalability on large knowledge bases available as SPARQL endpoints or Linked Data. The algorithms are made available in the open source DL-Learner project and this chapter presents several real-life scenarios in which they can be used by Semantic Web applications.

Download Full-text