Hierarchical Interpretable Topical Embeddings for Exploratory Search and Real-Time Document Tracking

Author(s):  
Anastasia Ianina ◽  
Konstantin Vorontsov

Real-time monitoring of scientific papers and technological news requires fast processing of complicated search demands motivated by thematically relevant information acquisition. For this case, the authors develop an exploratory search engine based on probabilistic hierarchical topic modeling. Topic model gives a low dimensional sparse interpretable vector representation (topical embedding) of a text, which is used for ranking documents by their similarity to the query. They explore several ways of comparing topical vectors including searching with thematically homogeneous text segments. Topical hierarchies are built using the regularized EM-algorithm from BigARTM project. The topic-based search achieves better precision and recall than other approaches (TF-IDF, fastText, LSTM, BERT) and even human assessors who spend up to an hour to complete the same search task. They also discover that blending hierarchical topic vectors with neural pretrained embeddings is a promising way of enriching both models that helps to get precision and recall higher than 90%.

2020 ◽  
pp. 1-19
Author(s):  
Fernando Cantú-Bazaldúa

World economic aggregates are compiled infrequently and released after considerable lags. There are, however, many potentially relevant series released in a timely manner and at a higher frequency that could provide significant information about the evolution of global aggregates. The challenge is then to extract the relevant information from this multitude of indicators and combine it to track the real-time evolution of the target variables. We develop a methodology based on dynamic factor models adapted for variables with heterogeneous frequencies, ragged ends and missing data. We apply this methodology to nowcast global trade in goods in goods and services. In addition to monitoring these variables in real time, this method can also be used to obtain short-term forecasts based on the most up-to-date values of the underlying indicators.


Database ◽  
2021 ◽  
Vol 2021 ◽  
Author(s):  
Valerio Arnaboldi ◽  
Jaehyoung Cho ◽  
Paul W Sternberg

Abstract Finding relevant information from newly published scientific papers is becoming increasingly difficult due to the pace at which articles are published every year as well as the increasing amount of information per paper. Biocuration and model organism databases provide a map for researchers to navigate through the complex structure of the biomedical literature by distilling knowledge into curated and standardized information. In addition, scientific search engines such as PubMed and text-mining tools such as Textpresso allow researchers to easily search for specific biological aspects from newly published papers, facilitating knowledge transfer. However, digesting the information returned by these systems—often a large number of documents—still requires considerable effort. In this paper, we present Wormicloud, a new tool that summarizes scientific articles in a graphical way through word clouds. This tool is aimed at facilitating the discovery of new experimental results not yet curated by model organism databases and is designed for both researchers and biocurators. Wormicloud is customized for the Caenorhabditis  elegans literature and provides several advantages over existing solutions, including being able to perform full-text searches through Textpresso, which provides more accurate results than other existing literature search engines. Wormicloud is integrated through direct links from gene interaction pages in WormBase. Additionally, it allows analysis on the gene sets obtained from literature searches with other WormBase tools such as SimpleMine and Gene Set Enrichment. Database URL: https://wormicloud.textpressolab.com


2020 ◽  
Vol 10 (24) ◽  
pp. 9154
Author(s):  
Paula Morella ◽  
María Pilar Lambán ◽  
Jesús Royo ◽  
Juan Carlos Sánchez ◽  
Jaime Latapia

The purpose of this work is to develop a new Key Performance Indicator (KPI) that can quantify the cost of Six Big Losses developed by Nakajima and implements it in a Cyber Physical System (CPS), achieving a real-time monitorization of the KPI. This paper follows the methodology explained below. A cost model has been used to accurately develop this indicator together with the Six Big Losses description. At the same time, the machine tool has been integrated into a CPS, enhancing the real-time data acquisition, using the Industry 4.0 technologies. Once the KPI has been defined, we have developed the software that can turn these real-time data into relevant information (using Python) through the calculation of our indicator. Finally, we have carried out a case of study showing our new KPI results and comparing them to other indicators related with the Six Big Losses but in different dimensions. As a result, our research quantifies economically the Six Big Losses, enhances the detection of the bigger ones to improve them, and enlightens the importance of paying attention to different dimensions, mainly, the productive, sustainable, and economic at the same time.


2021 ◽  
pp. 147387162110481
Author(s):  
Haijun Yu ◽  
Shengyang Li

Hyperspectral images (HSIs) have become increasingly prominent as they can maintain the subtle spectral differences of the imaged objects. Designing approaches and tools for analyzing HSIs presents a unique set of challenges due to their high-dimensional characteristics. An improved color visualization approach is proposed in this article to achieve communication between users and HSIs in the field of remote sensing. Under the real-time interactive control and color visualization, this approach can help users intuitively obtain the rich information hidden in original HSIs. Using the dimensionality reduction (DR) method based on band selection, high-dimensional HSIs are reduced to low-dimensional images. Through drop-down boxes, users can freely specify images that participate in the combination of RGB channels of the output image. Users can then interactively and independently set the fusion coefficient of each image within an interface based on concentric circles. At the same time, the output image will be calculated and visualized in real time, and the information it reflects will also be different. In this approach, channel combination and fusion coefficient setting are two independent processes, which allows users to interact more flexibly according to their needs. Furthermore, this approach is also applicable for interactive visualization of other types of multi-layer data.


Algorithms ◽  
2018 ◽  
Vol 11 (7) ◽  
pp. 97
Author(s):  
Lin Tang ◽  
Lin Liu ◽  
Jianhou Gan

Author(s):  
Allison Ragan ◽  
Tessa Sommer ◽  
Frank Drews

This study examined the effect of humor on airline safety information retention. Passenger attention to pre- flight safety demonstrations is low, even though it may impact the chance of survival in an aviation accident. Airlines have employed humor and entertainment to educate passengers on safety information. This study explored whether the humorous presentation increases retention of safety information, or if humor acts as a distraction from safety relevant information. Participants viewed two pre-flight safety demonstration videos (humorous and non-humorous) in counterbalanced order then answered short-answer questions about the content of the videos. Retention scores after viewing either type of video for the first time were the same. However, when a humorous video was shown prior to a standard safety video, retention scores for safety material dropped. These findings suggest that humorous safety demonstrations may be more effective, not because they are best at conveying information, but because passengers do not attend to standard videos if they have previously been exposed to a humorous version.


2014 ◽  
Vol 2014 ◽  
pp. 1-11 ◽  
Author(s):  
Alexandros Andre Chaaraoui ◽  
Francisco Flórez-Revuelta

This paper presents a novel silhouette-based feature for vision-based human action recognition, which relies on the contour of the silhouette and a radial scheme. Its low-dimensionality and ease of extraction result in an outstanding proficiency for real-time scenarios. This feature is used in a learning algorithm that by means of model fusion of multiple camera streams builds a bag of key poses, which serves as a dictionary of known poses and allows converting the training sequences into sequences of key poses. These are used in order to perform action recognition by means of a sequence matching algorithm. Experimentation on three different datasets returns high and stable recognition rates. To the best of our knowledge, this paper presents the highest results so far on the MuHAVi-MAS dataset. Real-time suitability is given, since the method easily performs above video frequency. Therefore, the related requirements that applications as ambient-assisted living services impose are successfully fulfilled.


Author(s):  
Jyotsna Talreja Wassan

The digitization of world in various areas including health care domain has brought up remarkable changes. Electronic Health Records (EHRs) have emerged for maintaining and analyzing health care real data online unlike traditional paper based system to accelerate clinical environment for providing better healthcare. These digitized health care records are form of Big Data, not because of the fact they are voluminous but also they are real time, dynamic, sporadic and heterogeneous in nature. It is desirable to extract relevant information from EHRs to facilitate various stakeholders of the clinical environment. The role, scope and impact of Big Data paradigm on health care is discussed in this chapter.


Author(s):  
Iris Xie

Online catalogs are types of interactive computer systems; they can also be called “interactive catalogs” because a user interacts with the computer to find relevant information. The interaction is the main difference between Online Public Access Catalogs (OPACs) and other types of library catalogs (Hildreth, 1982; Matthews, 1985). Online catalogs are regarded as real-time interactive retrieval systems for libraries (Fayen, 1983). According to Peters (1991), the development of online catalogs can be characterized by three decades of development. In the 1960s, the development of online catalogs was led by the development of computer technology and the library community’s desire to increase efficiency in finding library materials. In the 1970s, commercial vendors started to replace large university libraries as the principal developers of computer-based library systems. In the 1980s, local libraries expand their control of the library catalog systems.


Sign in / Sign up

Export Citation Format

Share Document