Semantic Analytics in Intelligence

Author(s):  
Boanerges Aleman-Meza ◽  
Amit P. Sheth ◽  
Devanand Palaniswami ◽  
Matthew Eavenson ◽  
I. Budak Arpinar

We describe an ontological approach for determining the relevance of documents based on the underlying concept of exploiting complex semantic relationships among real-world entities. This research builds upon semantic metadata extraction and annotation, practical domain-specific ontology creation, main-memory query processing, and the notion of semantic association. A prototype application illustrates the approach by supporting the identification of insider threats for document access. In this scenario, we describe how investigative assignments performed by intelligence analysts are captured into a context of investigation by including concepts andrelationships from the ontology. A relevance measure for documents is computed using semantic analytics techniques. Additionally, a graph-based visualization component allows exploration of potential document access beyond the ‘need to know’. We also discuss how a commercial product using Semantic Web technology, Semagix Freedom, is used for metadata extraction when designing and populating an ontology from heterogeneous sources.

2020 ◽  
Author(s):  
Derek Koehl ◽  
Carson Davis ◽  
Rahul Ramachandran ◽  
Udaysankar Nair ◽  
Manil Maskey

<p>Word embedding are numeric representations of text which capture meanings and semantic relationships in text. Embeddings can be constructed using different methods such as One Hot encoding, Frequency-based or Prediction-based approaches. Prediction-based approaches such as  Word2Vec, can be used to generate word embeddings that can capture the underlying semantics and word relationships in a corpus. Word2Vec embeddings generated from domain specific corpus have been shown in studies to both predict relationships and augment word vectors to improve classifications. We describe results from two different experiments utilizing word embeddings for Earth science constructed from a corpus of over 20,000 journal papers using Word2Vec. </p><p>The first experiment explores the analogy prediction performance of word embeddings built from the Earth science journal corpus and trained using domain-specific vocabulary. Our results demonstrate that the accuracy of domain-specific word embeddings in predicting Earth science analogy questions outperforms the ability of general corpus embedding to predict general analogy questions. While the results are as anticipated,  the substantial increase in accuracy, particularly in the lexicographical domain was encouraging. The results point to the need for developing a comprehensive Earth science analogy test set that covers the full breadth of lexicographical and encyclopedic categories for validating word embeddings.</p><p>The second experiment utilizes the word embeddings to augment metadata keyword classifications. Metadata describing NASA datasets have science keywords that are manually assigned which can lead to errors and inconsistencies. These science keywords are controlled vocabulary and are used to aid data discovery via faceted search and relevancy ranking. Given the small size of the number of metadata records with proper description and keywords, word embeddings were used for augmentation. A fully connected neural network was trained to suggest keywords given a description text. This approach provided the best accuracy at ~76% as compared to other methods tested.</p>


Author(s):  
Shi Kuo Chang ◽  
Vincenzo Deufemia ◽  
Giuseppe Polese

In this chapter we present normal forms for the design of multimedia database schemes with reduced manipulation anomalies. To this aim we first discuss how to describe the semantics of multimedia attributes based upon the concept of generalized icons, already used in the modeling of multimedia languages. Then, we introduce new extended dependencies involving different types of multimedia data. Such dependencies are based on domain specific similarity measures that are used to detect semantic relationships between complex data types. Based upon these new dependencies, we have defined five normal forms for multimedia databases, some focusing on the level of segmentation of multimedia attributes, others on the level of fragmentation of tables.


Author(s):  
Arvind Rangarajan ◽  
Pradeep Radhakrishnan ◽  
Abha Moitra ◽  
Andrew Crapo ◽  
Dean Robinson

Early manufacturability feedback is critical for reducing product cost and lead-time. This paper describes a new architecture and platform for authoring and applying manufacturability rules for design. The key step is to define a domain-specific ontology by creating a higher-level semantic language that describes design and manufacturing concepts relevant to specific manufacturing processes. This language has two primary uses; express design in the context of manufacturing and relate manufacturing constraints on design as declarative rules. OWL and Jena (a reasoning engine) are used in the background to reason about specific designs and provide manufacturability feedback in a client-server model. The use of Semantic Web technology makes it easier to augment manufacturability feedback with a query system for the designer that utilizes the same rule knowledge base to answer what-if scenarios. This is implemented using SPARQL and using the CAD design context and so enhances the user experience. This novel approach makes it easier for the domain experts to write or verify rules and the designers to validate concepts before changing the CAD model. This helps in maintaining the independence between the CAD platform and core enterprise knowledge. A pilot study in the sheet metal domain is implemented to demonstrate the steps necessary for complete early manufacturability analysis software and highlights the benefits of this approach.


Sign in / Sign up

Export Citation Format

Share Document