Patent Retrieval Experiments in the Context of the CLEF IP Track 2009

Author(s):  
Daniela Becks ◽  
Christa Womser-Hacker ◽  
Thomas Mandl ◽  
Ralph Kölle
Keyword(s):  

Patents are critical intellectual assets for any competitive business. With ever increasing patent filings, effective patent prior art search has become an inevitably important task in patent retrieval which is a subfield of information retrieval (IR). The goal of the prior art search is to find and rank documents related to a query patent. Query formulation is a key step in prior art search in which patent structure is exploited to generate queries using various fields available in patent text. As patent encodes multiple technical domains, this work argues that technical domains and patent structure have their combined effect on the effectiveness of patent retrieval. The study uses international patent classification codes (IPC) to categorize query patents in eight technical domains and also explores eighteen different combination of patent fields to generate search queries. A total of 144 extensive retrieval experiments have been carried out using BM25 ranking algorithm. Retrieval performance is evaluated in terms of recall score of top 1000 records. Empirical results support our assumption. A two-way analysis of variance is also conducted to validate the hypotheses. The findings of this work may be helpful for patent information retrieval professionals to develop domain specific patent retrieval systems exploiting the patent structure.


Author(s):  
Jianxi Luo ◽  
Binyang Song ◽  
Lucienne Blessing ◽  
Kristin Wood

AbstractTraditionally, design opportunities and directions are conceived based on expertise, intuition, or time-consuming user studies and marketing research at the fuzzy front end of the design process. Herein, we propose the use of the total technology space map (TSM) as a visual ideation aid for rapidly conceiving high-level design opportunities. The map is comprised of various technology domains positioned according to knowledge proximity, which is measured based on a large quantity of patent data. It provides a systematic picture of the total technology space to enable stimulated ideation beyond the designer's knowledge. Designers can browse the map and navigate various technologies to conceive new design opportunities that relate different technologies across the space. We demonstrate the process of using TSM as a rapid ideation aid and then analyze its applications in two experiments to show its effectiveness and limitations. Furthermore, we have developed a cloud-based system for computer-aided ideation, that is, InnoGPS, to integrate interactive map browsing for conceiving high-level design opportunities with domain-specific patent retrieval for stimulating concrete technical concepts, and to potentially embed machine-learning and artificial intelligence in the map-aided ideation process.


Author(s):  
Thomas Mandl

In the 1960s, automatic indexing methods for texts were developed. They had already implemented the “bag-ofwords” approach, which still prevails. Although automatic indexing is widely used today, many information providers and even Internet services still rely on human information work. In the 1970s, research shifted its interest to partial-match retrieval models and proved their superiority over Boolean retrieval models. Vector-space and later probabilistic retrieval models were developed. However, it took until the 1990s for partial-match models to succeed in the market. The Internet played a great role in this success. All Web search engines were based on partial-match models and provided ranked lists as results rather than unordered sets of documents. Consumers got used to this kind of search systems, and all big search engines included partial-match functionality. However, there are many niches in which Boolean methods still dominate, for example, patent retrieval. The basis for information retrieval systems may be pictures, graphics, videos, music objects, structured documents, or combinations thereof. This article is mainly concerned with information retrieval for text documents.


2019 ◽  
Vol 28 (4) ◽  
pp. 558-569
Author(s):  
Ana B Gil-GonzÁlez ◽  
Andrea VÁzquez-Ingelmo ◽  
Fernando de la Prieta ◽  
Ana de Luis-Reboredo ◽  
Alfonso GonzÁlez-Briones

Abstract A patent is a property granted to any new shape, configuration or arrangement of elements, of any device, tool, instrument, mechanism or other object or part thereof, that allows for a better or different operation, use or manufacture of the object that incorporates it or that provides it with some utility, advantage or technical effect that it did not have before. As a document, a patent really is a title that recognizes the right to exploit the patented invention exclusively, preventing others from making, selling or using it without the consent of the owner. The fact of making a patent is motivated by the fact of promoting creativity, hindering competition in the market as only one person holds the patent, thus protecting the initial investment and fighting against plagiarism. Patents are available to the public for dissemination and general knowledge. It is generally recognized in the specialized literature that patents can be used as an indicator to calculate the results generated by research and development activities, being a very useful indicator to measure various social, economic or technological aspects. For this reason, it is of relevant interest to have tools or systems that allow us to obtain the patents developed in a specific period of time and to carry out analyses of various economic and social factors. These analyses can serve to obtain a social perspective of society’s progress in the technological field, and this is why an analysis of patents is of our interest. This paper proposes a platform specifically designed to obtain knowledge about patents as an indicator of Spanish social, economic or technological aspects. For this purpose, the platform retrieves, analyses and visualizes functionalities that represent data on the landscape of patents obtained from the Spanish Patent and Trademark Office (OEPM) as a particular case of study.


2020 ◽  
Vol 27 (8) ◽  
pp. 1891-1912
Author(s):  
Hengqin Wu ◽  
Geoffrey Shen ◽  
Xue Lin ◽  
Minglei Li ◽  
Boyu Zhang ◽  
...  

PurposeThis study proposes an approach to solve the fundamental problem in using query-based methods (i.e. searching engines and patent retrieval tools) to screen patents of information and communication technology in construction (ICTC). The fundamental problem is that ICTC incorporates various techniques and thus cannot be simply represented by man-made queries. To investigate this concern, this study develops a binary classifier by utilizing deep learning and NLP techniques to automatically identify whether a patent is relevant to ICTC, thus accurately screening a corpus of ICTC patents.Design/methodology/approachThis study employs NLP techniques to convert the textual data of patents into numerical vectors. Then, a supervised deep learning model is developed to learn the relations between the input vectors and outputs.FindingsThe validation results indicate that (1) the proposed approach has a better performance in screening ICTC patents than traditional machine learning methods; (2) besides the United States Patent and Trademark Office (USPTO) that provides structured and well-written patents, the approach could also accurately screen patents form Derwent Innovations Index (DIX), in which patents are written in different genres.Practical implicationsThis study contributes a specific collection for ICTC patents, which is not provided by the patent offices.Social implicationsThe proposed approach contributes an alternative manner in gathering a corpus of patents for domains like ICTC that neither exists as a searchable classification in patent offices, nor is accurately represented by man-made queries.Originality/valueA deep learning model with two layers of neurons is developed to learn the non-linear relations between the input features and outputs providing better performance than traditional machine learning models. This study uses advanced NLP techniques lemmatization and part-of-speech POS to process textual data of ICTC patents. This study contributes specific collection for ICTC patents which is not provided by the patent offices.


2011 ◽  
Vol 47 (3) ◽  
pp. 309-322 ◽  
Author(s):  
Yen-Liang Chen ◽  
Yu-Ting Chiu

Sign in / Sign up

Export Citation Format

Share Document