Web Mining to Create Semantic Content: A Case Study for the Environment

AbstrakEkstraksi informasi merupakan suatu bidang ilmu untuk pengolahan bahasa alami, dengan cara mengubah teks tidak terstruktur menjadi informasi dalam bentuk terstruktur. Berbagai jenis informasi di Internet ditransmisikan secara tidak terstruktur melalui website, menyebabkan munculnya kebutuhan akan suatu teknologi untuk menganalisa teks dan menemukan pengetahuan yang relevan dalam bentuk informasi terstruktur. Contoh informasi tidak terstruktur adalah informasi utama yang ada pada konten halaman web. Bermacam pendekatan untuk ekstraksi informasi telah dikembangkan oleh berbagai peneliti, baik menggunakan metode manual atau otomatis, namun masih perlu ditingkatkan kinerjanya terkait akurasi dan kecepatan ekstraksi. Pada penelitian ini diusulkan suatu penerapan pendekatan ekstraksi informasi dengan mengkombinasikan pendekatan bootstrapping dengan Ontology-based Information Extraction (OBIE). Pendekatan bootstrapping dengan menggunakan sedikit contoh data berlabel, digunakan untuk memimalkan keterlibatan manusia dalam proses ekstraksi informasi, sedangkan penggunakan panduan ontologi untuk mengekstraksi classes (kelas), properties dan instance digunakan untuk menyediakan konten semantik untuk web semantik. Pengkombinasian kedua pendekatan tersebut diharapkan dapat meningkatan kecepatan proses ekstraksi dan akurasi hasil ekstraksi. Studi kasus untuk penerapan sistem ekstraksi informasi menggunakan dataset “LonelyPlanet”. Kata kunci—Ekstraksi informasi, ontologi, bootstrapping, Ontology-Based Information Extraction, OBIE, kinerja Abstract Information extraction is a field study of natural language processing by converting unstructured text into structured information. Several types of information on the Internet is transmitted through unstructured information via websites, led to emergence of the need a technology to analyze text and found relevant knowledge into structured information. For example of unstructured information is existing main information on the content of web pages. Various approaches for information extraction have been developed by many researchers, either using manual or automatic method, but still need to be improved performance related accuracy and speed of extraction. This research proposed an approach of information extraction that combines bootstrapping approach with Ontology-Based Information Extraction (OBIE). Bootstrapping approach using small seed of labelled data, is used to minimize human intervention on information extraction process, while the use of guide ontology for extracting classes, properties and instances, using for provide semantic content for semantic web. Combining both approaches expected to increase speed of extraction process and accuracy of extraction results. Case study to apply information extraction system using “LonelyPlanet” datasets. Keywords— Information extraction, ontology, bootstrapping, Ontology-Based Information Extraction, OBIE, performance

Download Full-text

The recipes of Philosophy of Science: Characterizing the semantic structure of corpora by means of topic associative rules

PLoS ONE ◽

10.1371/journal.pone.0242353 ◽

2020 ◽

Vol 15 (11) ◽

pp. e0242353

Author(s):

Christophe Malaterre ◽

Jean-François Chartier ◽

Francis Lareau

Keyword(s):

Philosophy Of Science ◽

Full Text ◽

Topic Modeling ◽

Semantic Content ◽

Semantic Structure ◽

Language Generation ◽

Topological Features ◽

Text Content ◽

F Measure

Scientific articles have semantic contents that are usually quite specific to their disciplinary origins. To characterize such semantic contents, topic-modeling algorithms make it possible to identify topics that run throughout corpora. However, they remain limited when it comes to investigating the extent to which topics are jointly used together in specific documents and form particular associative patterns. Here, we propose to characterize such patterns through the identification of “topic associative rules” that describe how topics are associated within given sets of documents. As a case study, we use a corpus from a subfield of the humanities—the philosophy of science—consisting of the complete full-text content of one of its main journals: Philosophy of Science. On the basis of a pre-existing topic modeling, we develop a methodology with which we infer a set of 96 topic associative rules that characterize specific types of articles depending on how these articles combine topics in peculiar patterns. Such rules offer a finer-grained window onto the semantic content of the corpus and can be interpreted as “topical recipes” for distinct types of philosophy of science articles. Examining rule networks and rule predictive success for different article types, we find a positive correlation between topological features of rule networks (connectivity) and the reliability of rule predictions (as summarized by the F-measure). Topic associative rules thereby not only contribute to characterizing the semantic contents of corpora at a finer granularity than topic modeling, but may also help to classify documents or identify document types, for instance to improve natural language generation processes.

Download Full-text

Design-by-Analogy: Exploring for Analogical Inspiration With Behavior, Material, and Component-Based Structural Representation of Patent Databases

Journal of Computing and Information Science in Engineering ◽

10.1115/1.4043364 ◽

2019 ◽

Vol 19 (2) ◽

Cited By ~ 2

Author(s):

Hyeonik Song ◽

Katherine Fu

Keyword(s):

Mechanical Design ◽

Semantic Content ◽

Structural Representation ◽

History Of ◽

Design By Analogy ◽

Computational Methodology ◽

Design Solutions ◽

Analogical Retrieval ◽

Novel Design

Design-by-analogy (DbA) is an important method for innovation that has gained much attention due to its history of leading to successful and novel design solutions. The method uses a repository of existing design solutions where designers can recognize and retrieve analogical inspirations. Yet, exploring for analogical inspiration has been a laborious task for designers. This work presents a computational methodology that is driven by a topic modeling technique called non-negative matrix factorization (NMF). NMF is widely used in the text mining field for its ability to discover topics within documents based on their semantic content. In the proposed methodology, NMF is performed iteratively to build hierarchical repositories of design solutions, with which designers can explore clusters of analogical stimuli. This methodology has been applied to a repository of mechanical design-related patents, processed to contain only component-, behavior-, or material-based content to test if unique and valuable attribute-based analogical inspiration can be discovered from the different representations of patent data. The hierarchical repositories have been visualized, and a case study has been conducted to test the effectiveness of the analogical retrieval process of the proposed methodology. Overall, this paper demonstrates that the exploration-based computational methodology may provide designers an enhanced control over design repositories to retrieve analogical inspiration for DbA practice.

Download Full-text

Using Web Mining in the Analysis of Housing Prices: A Case study of Tehran

2019 5th International Conference on Web Research (ICWR) ◽

10.1109/icwr.2019.8765250 ◽

2019 ◽

Author(s):

Rahimberdi Annamoradnejad ◽

Issa Annamoradnejad ◽

Taher Safarrad ◽

Jafar Habibi

Keyword(s):

Web Mining ◽

Housing Prices

Download Full-text

Web Mining and Analytics for Improving E-Government Services in India

Advances in Data Mining and Database Management - Web Usage Mining Techniques and Applications Across Industries ◽

10.4018/978-1-5225-0613-3.ch009 ◽

2017 ◽

pp. 223-247 ◽

Cited By ~ 2

Author(s):

Rajan Gupta ◽

Sunil K. Muttoo ◽

Saibal K. Pal

Keyword(s):

State Government ◽

Web Mining ◽

Central Government ◽

Turnaround Time ◽

Developing Nations ◽

Theoretical Background ◽

Practical Case ◽

Electronic Transactions ◽

Form Of Government

The ever increasing technology usage and the globalization have given rise to the need of quick, accurate and smarter handling of information by organizations, states, nations and the entire globe. For every nation to be under any form of government, it became mandatory to have shorter turnaround time for their interactions with citizens. This pressure gave rise to the concept of e-Governance. It has been implemented by various nations and even UN reported an increase in E-Governance activities around the world. However, the major problems that need to be addressed by developing nations are digital divide and lack of e-Infrastructure. India started its e-Governance plan through a proposal in 2006 with establishment of National e-Governance Plan popularly known as NeGP headed by Ministry of Communications and Information Technology, Government of India. As per the Electronic Transaction and Aggregation Layer, millions of transactions are taking place on regular basis. Within 2015 itself, over 2 billion transactions have been carried out by the Indian citizens in various categories and sectors like agriculture, health, and the likes. For central government projects alone, around 980 million electronic transactions have taken place, while for state government projects, the combined total of all the states is close to 1.2 billion. With the kind of data getting generated through e-Governance initiative in India, it will open up lot of opportunities for data analysts & mining experts to explore this data and generate insights out of them. The aim of this chapter is to introduce various areas and sectors in India where analytics can be applied for e-Governance related entities like citizens, corporate and government departments. It will be useful for researchers, academicians and students to understand various areas in E-Governance where web mining and data analysis can be applied. The theoretical background has been supported by practical case study for better understanding of the concepts of web analysis and mining in the area of E-Governance.

Download Full-text

AL-QuIn

Semantic Web ◽

10.4018/978-1-4666-3610-1.ch003 ◽

2013 ◽

pp. 52-74 ◽

Cited By ~ 1

Author(s):

Francesca A. Lisi

Keyword(s):

Relational Database ◽

Web Mining ◽

Relational Learning ◽

Pattern Discovery ◽

Frequent Pattern ◽

Learning Approach ◽

Discovery Process ◽

On Line ◽

Multiple Granularity

Onto-Relational Learning is an extension of Relational Learning aimed at accounting for ontologies in a clear, well-founded and elegant manner. The system QuIn supports a variant of the frequent pattern discovery task by following the Onto-Relational Learning approach. It takes taxonomic ontologies into account during the discovery process and produces descriptions of a given relational database at multiple granularity levels. The functionalities of the system are illustrated by means of examples taken from a Semantic Web Mining case study concerning the analysis of relational data extracted from the on-line CIA World Fact Book.

Download Full-text

Web mining based framework for solving usual problems in recommender systems. A case study for movies׳ recommendation

Neurocomputing ◽

10.1016/j.neucom.2014.10.097 ◽

2016 ◽

Vol 176 ◽

pp. 72-80 ◽

Cited By ~ 32

Author(s):

María N. Moreno ◽

Saddys Segrera ◽

Vivian F. López ◽

María Dolores Muñoz ◽

Ángel Luis Sánchez

Keyword(s):

Recommender Systems ◽

Web Mining

Download Full-text

The Case Study for the Basic Information Service of Job Post Resource Based on Web Mining

2012 International Conference on Computer Science and Service System ◽

10.1109/csss.2012.131 ◽

2012 ◽

Cited By ~ 1

Author(s):

Qingxia Kong ◽

Yang Cai ◽

Quanyin Zhu

Keyword(s):

Web Mining ◽

Information Service ◽

Basic Information

Download Full-text

QUALE LOGICA (PER LA FISICA)?

Istituto Lombardo - Accademia di Scienze e Lettere - Incontri di Studio ◽

10.4081/incontri.2019.462 ◽

2019 ◽

Author(s):

Marco Erba

Keyword(s):

Quantum Theory ◽

Scientific Practice ◽

Present Report ◽

Semantic Content ◽

Scientific Activity ◽

Logical System ◽

Empirical Regularity ◽

Peculiar Case ◽

Mathematics And Physics

The preeminent motivation to the scientific practice – stated in a weak way – can be recognized in the individuation of recurring phenomena (or else empirical regularity), along with the manipulation, both experimental and theoretical, of these. One can thus pose the issue of the necessity of adopting a set of rules for the logical inferential process, in order to assign a syntax, a semantic content, and possibly an interpretation, to the empirical evidences. According to Aristotle, non-contradiction is “the firmest principle of all”: irrefutable, otherwise the very possibility of formulating thoughts fails. Throughout the present report, the entailments of refusing some of the laws of classical logic – e.g. non-contradiction – are exposed. Such a possibility sheds light on a plurality of logical systems: some traits of these, which are significant for Mathematics and Physics, are examined. For instance, the relevance of dialetheism and intuitionism will be discussed. Besides, the report discusses on which basis one should choose the logical system to be adopted for the scientific activity. The peculiar case study given by Quantum Theory serves as fil rouge in developing the reported matters.

Download Full-text

Web Mining in e-Procurement: A Case Study in Indonesia

2021 3rd Asia Pacific Information Technology Conference ◽

10.1145/3449365.3449382 ◽

2021 ◽

Author(s):

Julius Dimas Trisaktyo Nugroho ◽

Rahmad Mahendra ◽

Indra Budi

Keyword(s):

Web Mining

Download Full-text