A Topic Modeling Approach to Study Design Requirements

2021 ◽  
Author(s):  
Cheng Chen ◽  
Jesse Mullis ◽  
Beshoy Morkos

Abstract Risk management is vital to a product’s lifecycle. The current practice of reducing risks relies on domain experts or management tools to identify unexpected engineering changes, where such approaches are prone to human errors and laborious operations. However, this study presents a framework to contribute to requirements management by implementing a generative probabilistic model, the supervised latent Dirichlet allocation (LDA) with collapsed Gibbs sampling (CGS), to study the topic composition within three unlabeled and unstructured industrial requirements documents. As finding the preferred number of topics remains an open-ended question, a case study estimates an appropriate number of topics to represent each requirements document based on both perplexity and coherence values. Using human evaluations and interpretable visualizations, the result demonstrates the different level of design details by varying the number of topics. Further, a relevance measurement provides the flexibility to improve the quality of topics. Designers can increase design efficiency by understanding, organizing, and analyzing high-volume requirements documents in confirmation management based on topics across different domains. With domain knowledge and purposeful interpretation of topics, designers can make informed decisions on product evolution and mitigate the risks of unexpected engineering changes.

2021 ◽  
pp. 016555152110077
Author(s):  
Sulong Zhou ◽  
Pengyu Kan ◽  
Qunying Huang ◽  
Janet Silbernagel

Natural disasters cause significant damage, casualties and economical losses. Twitter has been used to support prompt disaster response and management because people tend to communicate and spread information on public social media platforms during disaster events. To retrieve real-time situational awareness (SA) information from tweets, the most effective way to mine text is using natural language processing (NLP). Among the advanced NLP models, the supervised approach can classify tweets into different categories to gain insight and leverage useful SA information from social media data. However, high-performing supervised models require domain knowledge to specify categories and involve costly labelling tasks. This research proposes a guided latent Dirichlet allocation (LDA) workflow to investigate temporal latent topics from tweets during a recent disaster event, the 2020 Hurricane Laura. With integration of prior knowledge, a coherence model, LDA topics visualisation and validation from official reports, our guided approach reveals that most tweets contain several latent topics during the 10-day period of Hurricane Laura. This result indicates that state-of-the-art supervised models have not fully utilised tweet information because they only assign each tweet a single label. In contrast, our model can not only identify emerging topics during different disaster events but also provides multilabel references to the classification schema. In addition, our results can help to quickly identify and extract SA information to responders, stakeholders and the general public so that they can adopt timely responsive strategies and wisely allocate resource during Hurricane events.


2022 ◽  
Vol 24 (3) ◽  
pp. 0-0

In this digital era, people are very keen to share their feedback about any product, services, or current issues on social networks and other platforms. A fine analysis of these feedbacks can give a clear picture of what people think about a particular topic. This work proposed an almost unsupervised Aspect Based Sentiment Analysis approach for textual reviews. Latent Dirichlet Allocation, along with linguistic rules, is used for aspect extraction. Aspects are ranked based on their probability distribution values and then clustered into predefined categories using frequent terms with domain knowledge. SentiWordNet lexicon uses for sentiment scoring and classification. The experiment with two popular datasets shows the superiority of our strategy as compared to existing methods. It shows the 85% average accuracy when tested on manually labeled data.


The term “Big data” refers to “the high volume of data sets that are relatively complex in nature and having challenges in processing and analyzing the data using conventional database management tools”. In the digital universe, the data volume and variety that, we deal today have grown-up massively from different sources such as Business Informatics, Social-Media Networks, Images from High Definition TV, data from Mobile Networks, Banking data from ATM Machines, Genomics and GPS Trails, Telemetry from automobiles, Meteorology, Financial market data etc. Data Scientists confirm that 80% of the data that we have gathered today are in unstructured format, i.e. in the form of images, pixel data, Videos, geo-spatial data, PDF files etc. Because of the massive growth of data and its different formats, organizations are having multiple challenges in capturing, storing, mining, analyzing, and visualizing the Big data. This paper aims to exemplify the key challenges faced by most organizations and the significance of implementing the emerging Big data techniques for effective extraction of business intelligence to make better and faster decisions


2021 ◽  
pp. 139-150
Author(s):  
Jakub Flotyński ◽  
Paweł Sobociński ◽  
Sergiusz Strykowski ◽  
Dominik Strugała ◽  
Paweł Buń ◽  
...  

Domain-specific knowledge representation is an essential element of efficient management of professional training. Formal and powerful knowledge representation for training systems can be built upon the semantic web standards, which enable reasoning and complex queries against the content. Virtual reality training is currently used in multiple domains, in particular, if the activities are potentially dangerous for the trainees or require advanced skills or expensive equipment. However, the available methods and tools for creating VR training systems do not use knowledge representation. Therefore, creation, modification and management of training scenarios is problematic for domain experts without expertise in programming and computer graphics. In this paper, we propose an approach to creating semantic virtual training scenarios, in which users’ activities, mistakes as well as equipment and its possible errors are represented using domain knowledge understandable to domain experts. We have verified the approach by developing a user-friendly editor of VR training scenarios for electrical operators of high-voltage installations.


2021 ◽  
Vol 5 (12) ◽  
pp. 73
Author(s):  
Daniel Kerrigan ◽  
Jessica Hullman ◽  
Enrico Bertini

Eliciting knowledge from domain experts can play an important role throughout the machine learning process, from correctly specifying the task to evaluating model results. However, knowledge elicitation is also fraught with challenges. In this work, we consider why and how machine learning researchers elicit knowledge from experts in the model development process. We develop a taxonomy to characterize elicitation approaches according to the elicitation goal, elicitation target, elicitation process, and use of elicited knowledge. We analyze the elicitation trends observed in 28 papers with this taxonomy and identify opportunities for adding rigor to these elicitation approaches. We suggest future directions for research in elicitation for machine learning by highlighting avenues for further exploration and drawing on what we can learn from elicitation research in other fields.


Author(s):  
Joey Jansen van Vuuren ◽  
Louise Leenen ◽  
Marthie M. Grobler ◽  
Ka Fai Peter Chan ◽  
Zubeida C. Khan

In the Social-technical domain scientists are often confronted with a class of problems that are termed messy, ill-structured or wicked. These problems address complex issues that not well-defined, contain unresolvable uncertainties, and are characterized by a lack of common agreement on problem definition. This chapter proposes a new mixed methods research technique, Morphological Ontology Design Engineering (MODE), which can be applied to develop models for ill-structured problems. MODE combines three different research methodologies into a single, methodology. MODE draws from research paradigms that include exploratory and descriptive research approaches to develop models. General morphological analysis offers a systematic method to extract meaningful information from domain experts, while ontology based representation is used to logically represent domain knowledge. The design science methodology guides the entire process. MODE is applied to a case study where an ontological model is developed to support the implementation of a South African national cybersecurity policy.


Author(s):  
Jakub Flotyński

Abstract The main element of extended reality (XR) environments is behavior-rich 3D content consisting of objects that act and interact with one another as well as with users. Such actions and interactions constitute the evolution of the content over time. Multiple application domains of XR, e.g., education, training, marketing, merchandising, and design, could benefit from the analysis of 3D content changes based on general or domain knowledge comprehensible to average users or domain experts. Such analysis can be intended, in particular, to monitor, comprehend, examine, and control XR environments as well as users’ skills, experience, interests and preferences, and XR objects’ features. However, it is difficult to achieve as long as XR environments are developed with methods and tools that focus on programming and 3D modeling rather than expressing domain knowledge accompanying content users and objects, and their behavior. The main contribution of this paper is an approach to creating explorable knowledge-based XR environments with semantic annotations. The approach combines description logics with aspect-oriented programming, which enables knowledge representation in an arbitrary domain as well as transformation of available environments with minimal users’ effort. We have implemented the approach using well-established development tools and exemplify it with an explorable immersive car showroom. The approach enables efficient creation of explorable XR environments and knowledge acquisition from XR.


2019 ◽  
Vol 28 (01) ◽  
pp. 052-054
Author(s):  
Gretchen Jackson ◽  
Jianying Hu ◽  

Objective: To summarize significant research contributions to the field of artificial intelligence (AI) in health in 2018. Methods: Ovid MEDLINE® and Web of Science® databases were searched to identify original research articles that were published in the English language during 2018 and presented advances in the science of AI applied in health. Queries employed Medical Subject Heading (MeSH®) terms and keywords representing AI methodologies and limited results to health applications. Section editors selected 15 best paper candidates that underwent peer review by internationally renowned domain experts. Final best papers were selected by the editorial board of the 2018 International Medical Informatics Association (IMIA) Yearbook. Results: Database searches returned 1,480 unique publications. Best papers employed innovative AI techniques that incorporated domain knowledge or explored approaches to support distributed or federated learning. All top-ranked papers incorporated novel approaches to advance the science of AI in health and included rigorous evaluations of their methodologies. Conclusions: Performance of state-of-the-art AI machine learning algorithms can be enhanced by approaches that employ a multidisciplinary biomedical informatics pipeline to incorporate domain knowledge and can overcome challenges such as sparse, missing, or inconsistent data. Innovative training heuristics and encryption techniques may support distributed learning with preservation of privacy.


2019 ◽  
Vol 9 (4) ◽  
pp. 1-20 ◽  
Author(s):  
Nicola Burns ◽  
Yaxin Bi ◽  
Hui Wang ◽  
Terry Anderson

There is a need to automatically classify information from online reviews. Customers want to know useful information about different aspects of a product or service and also the sentiment expressed towards each aspect. This article proposes an Enhanced Twofold-LDA model (Latent Dirichlet Allocation), in which one LDA is used for aspect assignment and another is used for sentiment classification, aiming to automatically determine aspect and sentiment. The enhanced model incorporates domain knowledge (i.e., seed words) to produce more focused topics and has the ability to handle two aspects in at the sentence level simultaneously. The experiment results show that the Enhanced Twofold-LDA model is able to produce topics more related to aspects in comparison to the state of arts method ASUM (Aspect and Sentiment Unification Model), whereas comparable with ASUM on sentiment classification performance.


Author(s):  
Sebastian Nusser ◽  
Clemens Otte ◽  
Werner Hauptmann ◽  
Rudolf Kruse

This chapter describes a machine learning approach for classification problems in safety-related domains. The proposed method is based on ensembles of low-dimensional submodels. The usage of low-dimensional submodels enables the domain experts to understand the mechanisms of the learned solution. Due to the limited dimensionality of the submodels each individual model can be visualized and can thus be interpreted and validated according to the domain knowledge. The ensemble of all submodels overcomes the limited predictive performance of each single submodel while the overall solution remains interpretable and verifiable. By different examples from real-world applications the authors will show that their classification approach is applicable to a wide range of classification problems in the field of safety-related applications - ranging from decision support systems over plant monitoring and diagnosis systems to control tasks with very high safety requirements.


Sign in / Sign up

Export Citation Format

Share Document