scholarly journals Interfaces for Searching and Triaging Large Document Sets: An Ontology-Supported Visual Analytics Approach

Information ◽  
2021 ◽  
Vol 13 (1) ◽  
pp. 8
Author(s):  
Jonathan Demelo ◽  
Kamran Sedig

We investigate the design of ontology-supported, progressively disclosed visual analytics interfaces for searching and triaging large document sets. The goal is to distill a set of criteria that can help guide the design of such systems. We begin with a background of information search, triage, machine learning, and ontologies. We review research on the multi-stage information-seeking process to distill the criteria. To demonstrate their utility, we apply the criteria to the design of a prototype visual analytics interface: VisualQUEST (Visual interface for QUEry, Search, and Triage). VisualQUEST allows users to plug-and-play document sets and expert-defined ontology files within a domain-independent environment for multi-stage information search and triage tasks. We describe VisualQUEST through a functional workflow and culminate with a discussion of ongoing formative evaluations, limitations, future work, and summary.

2021 ◽  
Vol 11 (3-4) ◽  
pp. 1-38
Author(s):  
Rita Sevastjanova ◽  
Wolfgang Jentner ◽  
Fabian Sperrle ◽  
Rebecca Kehlbeck ◽  
Jürgen Bernard ◽  
...  

Linguistic insight in the form of high-level relationships and rules in text builds the basis of our understanding of language. However, the data-driven generation of such structures often lacks labeled resources that can be used as training data for supervised machine learning. The creation of such ground-truth data is a time-consuming process that often requires domain expertise to resolve text ambiguities and characterize linguistic phenomena. Furthermore, the creation and refinement of machine learning models is often challenging for linguists as the models are often complex, in-transparent, and difficult to understand. To tackle these challenges, we present a visual analytics technique for interactive data labeling that applies concepts from gamification and explainable Artificial Intelligence (XAI) to support complex classification tasks. The visual-interactive labeling interface promotes the creation of effective training data. Visual explanations of learned rules unveil the decisions of the machine learning model and support iterative and interactive optimization. The gamification-inspired design guides the user through the labeling process and provides feedback on the model performance. As an instance of the proposed technique, we present QuestionComb , a workspace tailored to the task of question classification (i.e., in information-seeking vs. non-information-seeking questions). Our evaluation studies confirm that gamification concepts are beneficial to engage users through continuous feedback, offering an effective visual analytics technique when combined with active learning and XAI.


2021 ◽  
Vol 11 (3-4) ◽  
pp. 1-42
Author(s):  
Jürgen Bernard ◽  
Marco Hutter ◽  
Michael Sedlmair ◽  
Matthias Zeppelzauer ◽  
Tamara Munzner

Strategies for selecting the next data instance to label, in service of generating labeled data for machine learning, have been considered separately in the machine learning literature on active learning and in the visual analytics literature on human-centered approaches. We propose a unified design space for instance selection strategies to support detailed and fine-grained analysis covering both of these perspectives. We identify a concise set of 15 properties, namely measureable characteristics of datasets or of machine learning models applied to them, that cover most of the strategies in these literatures. To quantify these properties, we introduce Property Measures (PM) as fine-grained building blocks that can be used to formalize instance selection strategies. In addition, we present a taxonomy of PMs to support the description, evaluation, and generation of PMs across four dimensions: machine learning (ML) Model Output , Instance Relations , Measure Functionality , and Measure Valence . We also create computational infrastructure to support qualitative visual data analysis: a visual analytics explainer for PMs built around an implementation of PMs using cascades of eight atomic functions. It supports eight analysis tasks, covering the analysis of datasets and ML models using visual comparison within and between PMs and groups of PMs, and over time during the interactive labeling process. We iteratively refined the PM taxonomy, the explainer, and the task abstraction in parallel with each other during a two-year formative process, and show evidence of their utility through a summative evaluation with the same infrastructure. This research builds a formal baseline for the better understanding of the commonalities and differences of instance selection strategies, which can serve as the stepping stone for the synthesis of novel strategies in future work.


2020 ◽  
Vol 13 (5) ◽  
pp. 1020-1030
Author(s):  
Pradeep S. ◽  
Jagadish S. Kallimani

Background: With the advent of data analysis and machine learning, there is a growing impetus of analyzing and generating models on historic data. The data comes in numerous forms and shapes with an abundance of challenges. The most sorted form of data for analysis is the numerical data. With the plethora of algorithms and tools it is quite manageable to deal with such data. Another form of data is of categorical nature, which is subdivided into, ordinal (order wise) and nominal (number wise). This data can be broadly classified as Sequential and Non-Sequential. Sequential data analysis is easier to preprocess using algorithms. Objective: The challenge of applying machine learning algorithms on categorical data of nonsequential nature is dealt in this paper. Methods: Upon implementing several data analysis algorithms on such data, we end up getting a biased result, which makes it impossible to generate a reliable predictive model. In this paper, we will address this problem by walking through a handful of techniques which during our research helped us in dealing with a large categorical data of non-sequential nature. In subsequent sections, we will discuss the possible implementable solutions and shortfalls of these techniques. Results: The methods are applied to sample datasets available in public domain and the results with respect to accuracy of classification are satisfactory. Conclusion: The best pre-processing technique we observed in our research is one hot encoding, which facilitates breaking down the categorical features into binary and feeding it into an Algorithm to predict the outcome. The example that we took is not abstract but it is a real – time production services dataset, which had many complex variations of categorical features. Our Future work includes creating a robust model on such data and deploying it into industry standard applications.


2021 ◽  
Vol 40 (5) ◽  
pp. 9471-9484
Author(s):  
Yilun Jin ◽  
Yanan Liu ◽  
Wenyu Zhang ◽  
Shuai Zhang ◽  
Yu Lou

With the advancement of machine learning, credit scoring can be performed better. As one of the widely recognized machine learning methods, ensemble learning has demonstrated significant improvements in the predictive accuracy over individual machine learning models for credit scoring. This study proposes a novel multi-stage ensemble model with multiple K-means-based selective undersampling for credit scoring. First, a new multiple K-means-based undersampling method is proposed to deal with the imbalanced data. Then, a new selective sampling mechanism is proposed to select the better-performing base classifiers adaptively. Finally, a new feature-enhanced stacking method is proposed to construct an effective ensemble model by composing the shortlisted base classifiers. In the experiments, four datasets with four evaluation indicators are used to evaluate the performance of the proposed model, and the experimental results prove the superiority of the proposed model over other benchmark models.


Sensors ◽  
2021 ◽  
Vol 21 (11) ◽  
pp. 3616
Author(s):  
Jan Ubbo van Baardewijk ◽  
Sarthak Agarwal ◽  
Alex S. Cornelissen ◽  
Marloes J. A. Joosen ◽  
Jiska Kentrop ◽  
...  

Early detection of exposure to a toxic chemical, e.g., in a military context, can be life-saving. We propose to use machine learning techniques and multiple continuously measured physiological signals to detect exposure, and to identify the chemical agent. Such detection and identification could be used to alert individuals to take appropriate medical counter measures in time. As a first step, we evaluated whether exposure to an opioid (fentanyl) or a nerve agent (VX) could be detected in freely moving guinea pigs using features from respiration, electrocardiography (ECG) and electroencephalography (EEG), where machine learning models were trained and tested on different sets (across subject classification). Results showed this to be possible with close to perfect accuracy, where respiratory features were most relevant. Exposure detection accuracy rose steeply to over 95% correct during the first five minutes after exposure. Additional models were trained to correctly classify an exposed state as being induced either by fentanyl or VX. This was possible with an accuracy of almost 95%, where EEG features proved to be most relevant. Exposure detection models that were trained on subsets of animals generalized to subsets of animals that were exposed to other dosages of different chemicals. While future work is required to validate the principle in other species and to assess the robustness of the approach under different, realistic circumstances, our results indicate that utilizing different continuously measured physiological signals for early detection and identification of toxic agents is promising.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Kyoungsik Na

PurposeThis study explores the effects of cognitive load on the propensity to reformulate queries during information seeking on the web.Design/methodology/approachThis study employs an experimental design to analyze the effect of manipulations of cognitive load on the propensity for query reformulation between experimental and control groups. In total, three affective components that contribute to cognitive load were manipulated: mental demand, temporal demand and frustration.FindingsA significant difference in the propensity of query reformulation behavior was found between searchers exposed to cognitive load manipulations and searchers who were not exposed. Those exposed to cognitive load manipulations made half as many search query reformulations as searchers not exposed. Furthermore, the National Aeronautical and Space Administration Task Load Index (NASA-TLX) cognitive load scores of searchers who were exposed to the three cognitive load manipulations were higher than those of searchers who were not exposed indicating that the manipulation was effective. Query reformulation behavior did not differ across task types.Originality/valueThe findings suggest that a dual-task method and NASA-TLX assessment serve as good indicators of cognitive load. Because the findings show that cognitive load hinders a searcher's interaction with information search tools, this study provides empirical support for reducing cognitive load when designing information systems or user interfaces.


2019 ◽  
Vol 9 (2) ◽  
pp. 81-86
Author(s):  
H. M. Shashikala ◽  
S. Srinivasaragavan

Web-based use of E-resources is playing a vital role for information seeking. In this direction the present study was conducted on the use of E-resources (e-books, e-journals, e-databases subscribed by Health Science Library and Information Network, HELINET Consortium and ERMED Consortium) by the faculty members and PG students of Kempegowda Institute of Medical Sciences and Information Centre, Bangalore, Karnataka State. A structured questionnaire was designed and distributed to faculty members and PG students (150) to know their effective use of e-resource for their study, teaching and research. A total of 135 filled in questionnaires were received and the response rate was 82.66%. The study results found that most of the teaching faculty and PG students preferred to search Google and Yahoo as search engine for their information search requirements. At the same time they consulted Pub Med and Science Direct and Ovid publisher’s journal databases to access E- resources.


2019 ◽  
Vol 8 (1) ◽  
pp. 562
Author(s):  
Yuni Rahmah ◽  
Elva Rahmah

AbstractIn this paper the language about Millennial Generation Information Search Behavior To Meet Information Needs. This study aims to describe the information seeking behavior of the millennial generation to meet the information needs of this study at the Padang State University. Data were collected through observation and distribution of questionnaires with students of the Indonesian and Regional Languages and Literature Department at Padang State University. Analyzing the data, concluded the following matters. (1). Starting - consists of activities that initiate information seeking activities. In general (100%) determine the topic especially before conducting an information search, in general (90%) conduct information search after discussion or consultation with lecturers, in general (95.23%) know the information needs when attending lectures, (88.4 %) know the information needs specifically, and in general (88%) do information when they are aware of and know the need for information. (2). Chaining - activities following a series of citations, citations or forms of reconciliation between documents with each other. In general (92.8 uses a bibliography to search information, generally (90.4%) use the author's name from the core reference to look for other references in conducting information searches, and in general (92.9%) use subjects from core reference to look for other references (3) Browsing - merawak, looking for, but rather directed, in areas that are considered to have the potential for the information needed.In general (73%) libraries can always meet information needs, in general In general (95.2%) look for information on the internet if the information you are looking for is not found in the printed source of information, (92.8%) generally directly looking for information on the internet if the information you need is not found in the printed source of information, at generally (45.22%) query identification (keywords). (4). Differentiating - sorting, using the features in the information source as a basic reference for checking quality or information content. in general (88%) the internet is the main source of information, in general (92.84%) The source of information printed is still very much needed in fulfilling information needs. (5). Monitoring - monitoring progress by focusing on selected sources. In general (88.09%) looked for the latest information through the internet by searching for the latest articles, in general (78.56%) needed to find the latest information to enrich the reference sources. (6). Extracting - systematically digging in one source to retrieve information that is considered important. In general (90.47 when you need information you often search the search engine (google, yahoo), in general (88.09 often uses a journal database to get information, in general (78.56%) after getting information on the internet , you directly copy the information, in general (76.19%) use the "Google" search engine because it is more relevant than other search engines.Keywords: behavior, millennial generation and information.


Author(s):  
Lion Schulz ◽  
Stephen M. Fleming ◽  
Peter Dayan

The metacognitive sense of confidence can play a critical role in regulating decisionmaking. In particular, a lack of confidence can justify the explicit, potentially costly, instrumental acquisition of extra information that might resolve the underlying uncertainty. Recent work has suggested a statistically sophisticated tapestry behind the information governing both the making and monitoring of choices. Here, we extend this tapestry to reveal extra richness in the use of confidence for controlling information seeking. We thereby highlight how different models of metacognition can generate diverse relationships between action, confidence, and information search. More broadly, our work shows how crucial it can be to treat metacognitive monitoring and control together.


Sign in / Sign up

Export Citation Format

Share Document