Video saliency prediction through machine learning with semantic information

Computational semantics performs automatic meaning analysis of natural language. Research in computational semantics designs meaning representations and develops mechanisms for automatically assigning those representations and reasoning over them. Computational semantics is not a single monolithic task but consists of many subtasks, including word sense disambiguation, multi-word expression analysis, semantic role labeling, the construction of sentence semantic structure, coreference resolution, and the automatic induction of semantic information from data. The development of manually constructed resources has been vastly important in driving the field forward. Examples include WordNet, PropBank, FrameNet, VerbNet, and TimeBank. These resources specify the linguistic structures to be targeted in automatic analysis, and they provide high-quality human-generated data that can be used to train machine learning systems. Supervised machine learning based on manually constructed resources is a widely used technique. A second core strand has been the induction of lexical knowledge from text data. For example, words can be represented through the contexts in which they appear (called distributional vectors or embeddings), such that semantically similar words have similar representations. Or semantic relations between words can be inferred from patterns of words that link them. Wide-coverage semantic analysis always needs more data, both lexical knowledge and world knowledge, and automatic induction at least alleviates the problem. Compositionality is a third core theme: the systematic construction of structural meaning representations of larger expressions from the meaning representations of their parts. The representations typically use logics of varying expressivity, which makes them well suited to performing automatic inferences with theorem provers. Manual specification and automatic acquisition of knowledge are closely intertwined. Manually created resources are automatically extended or merged. The automatic induction of semantic information is guided and constrained by manually specified information, which is much more reliable. And for restricted domains, the construction of logical representations is learned from data. It is at the intersection of manual specification and machine learning that some of the current larger questions of computational semantics are located. For instance, should we build general-purpose semantic representations, or is lexical knowledge simply too domain-specific, and would we be better off learning task-specific representations every time? When performing inference, is it more beneficial to have the solid ground of a human-generated ontology, or is it better to reason directly with text snippets for more fine-grained and gradual inference? Do we obtain a better and deeper semantic analysis as we use better and deeper manually specified linguistic knowledge, or is the future in powerful learning paradigms that learn to carry out an entire task from natural language input and output alone, without pre-specified linguistic knowledge?

Download Full-text

Video saliency prediction using enhanced spatiotemporal alignment network

Pattern Recognition ◽

10.1016/j.patcog.2020.107615 ◽

2021 ◽

Vol 109 ◽

pp. 107615

Author(s):

Jin Chen ◽

Huihui Song ◽

Kaihua Zhang ◽

Bo Liu ◽

Qingshan Liu

Keyword(s):

Saliency Prediction ◽

Video Saliency

Download Full-text

Semantic Information G Theory and Logical Bayesian Inference for Machine Learning

Information ◽

10.3390/info10080261 ◽

2019 ◽

Vol 10 (8) ◽

pp. 261 ◽

Cited By ~ 3

Author(s):

Lu

Keyword(s):

Machine Learning ◽

Bayesian Inference ◽

Mutual Information ◽

Em Algorithm ◽

Semantic Information ◽

Feature Space ◽

Iteration Algorithm ◽

Membership Functions ◽

Multilabel Learning ◽

Learning Functions

An important problem in machine learning is that, when using more than two labels, it is very difficult to construct and optimize a group of learning functions that are still useful when the prior distribution of instances is changed. To resolve this problem, semantic information G theory, Logical Bayesian Inference (LBI), and a group of Channel Matching (CM) algorithms are combined to form a systematic solution. A semantic channel in G theory consists of a group of truth functions or membership functions. In comparison with the likelihood functions, Bayesian posteriors, and Logistic functions that are typically used in popular methods, membership functions are more convenient to use, providing learning functions that do not suffer the above problem. In Logical Bayesian Inference (LBI), every label is independently learned. For multilabel learning, we can directly obtain a group of optimized membership functions from a large enough sample with labels, without preparing different samples for different labels. Furthermore, a group of Channel Matching (CM) algorithms are developed for machine learning. For the Maximum Mutual Information (MMI) classification of three classes with Gaussian distributions in a two-dimensional feature space,only 2–3 iterations are required for the mutual information between three classes and three labels to surpass 99% of the MMI for most initial partitions For mixture models, the Expectation-Maximization (EM) algorithm is improved to form the CM-EM algorithm, which can outperform the EM algorithm when the mixture ratios are imbalanced, or when local convergence exists. The CM iteration algorithm needs to combine with neural networks for MMI classification in high-dimensional feature spaces. LBI needs further investigation for the unification of statistics and logic.

Download Full-text

Towards Capturing Sonographic Experience: Cognition-Inspired Ultrasound Video Saliency Prediction

Communications in Computer and Information Science - Medical Image Understanding and Analysis ◽

10.1007/978-3-030-39343-4_15 ◽

2020 ◽

pp. 174-186 ◽

Cited By ~ 1

Author(s):

Richard Droste ◽

Yifan Cai ◽

Harshita Sharma ◽

Pierre Chatelain ◽

Aris T. Papageorghiou ◽

...

Keyword(s):

Saliency Prediction ◽

Video Saliency

Download Full-text

Using the Semantic Information G Measure to Explain and Extend Rate-Distortion Functions and Maximum Entropy Distributions

Entropy ◽

10.3390/e23081050 ◽

2021 ◽

Vol 23 (8) ◽

pp. 1050

Author(s):

Chenguang Lu

Keyword(s):

Machine Learning ◽

Mutual Information ◽

Data Compression ◽

Maximum Entropy ◽

Semantic Information ◽

Rate Distortion ◽

Distortion Function ◽

Partition Functions ◽

Statistical Probability ◽

Distortion Functions

In the rate-distortion function and the Maximum Entropy (ME) method, Minimum Mutual Information (MMI) distributions and ME distributions are expressed by Bayes-like formulas, including Negative Exponential Functions (NEFs) and partition functions. Why do these non-probability functions exist in Bayes-like formulas? On the other hand, the rate-distortion function has three disadvantages: (1) the distortion function is subjectively defined; (2) the definition of the distortion function between instances and labels is often difficult; (3) it cannot be used for data compression according to the labels’ semantic meanings. The author has proposed using the semantic information G measure with both statistical probability and logical probability before. We can now explain NEFs as truth functions, partition functions as logical probabilities, Bayes-like formulas as semantic Bayes’ formulas, MMI as Semantic Mutual Information (SMI), and ME as extreme ME minus SMI. In overcoming the above disadvantages, this paper sets up the relationship between truth functions and distortion functions, obtains truth functions from samples by machine learning, and constructs constraint conditions with truth functions to extend rate-distortion functions. Two examples are used to help readers understand the MMI iteration and to support the theoretical results. Using truth functions and the semantic information G measure, we can combine machine learning and data compression, including semantic compression. We need further studies to explore general data compression and recovery, according to the semantic meaning.

Download Full-text

Real-Time Video Saliency Prediction Via 3D Residual Convolutional Neural Network

IEEE Access ◽

10.1109/access.2019.2946479 ◽

2019 ◽

Vol 7 ◽

pp. 147743-147754 ◽

Cited By ~ 4

Author(s):

Zhenhao Sun ◽

Xu Wang ◽

Qiudan Zhang ◽

Jianmin Jiang

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Real Time ◽

Saliency Prediction ◽

Video Saliency

Download Full-text

A CNN Model for Human Parsing Based on Capacity Optimization

Applied Sciences ◽

10.3390/app9071330 ◽

2019 ◽

Vol 9 (7) ◽

pp. 1330 ◽

Cited By ~ 1

Author(s):

Yalong Jiang ◽

Zheru Chi

Keyword(s):

Neural Networks ◽

Computational Efficiency ◽

Semantic Information ◽

State Of The Art ◽

Depth Estimation ◽

Baseline Model ◽

Computational Burden ◽

Proposed Model ◽

Saliency Prediction ◽

Benchmark Solutions

Although a state-of-the-art performance has been achieved in pixel-specific tasks, such as saliency prediction and depth estimation, convolutional neural networks (CNNs) still perform unsatisfactorily in human parsing where semantic information of detailed regions needs to be perceived under the influences of variations in viewpoints, poses, and occlusions. In this paper, we propose to improve the robustness of human parsing modules by introducing a depth-estimation module. A novel scheme is proposed for the integration of a depth-estimation module and a human-parsing module. The robustness of the overall model is improved with the automatically obtained depth labels. As another major concern, the computational efficiency is also discussed. Our proposed human parsing module with 24 layers can achieve a similar performance as the baseline CNN model with over 100 layers. The number of parameters in the overall model is less than that in the baseline model. Furthermore, we propose to reduce the computational burden by replacing a conventional CNN layer with a stack of simplified sub-layers to further reduce the overall number of trainable parameters. Experimental results show that the integration of two modules contributes to the improvement of human parsing without additional human labeling. The proposed model outperforms the benchmark solutions and the capacity of our model is better matched to the complexity of the task.

Download Full-text

Video Saliency Prediction via Joint Discrimination and Local Consistency

IEEE Transactions on Cybernetics ◽

10.1109/tcyb.2020.2989158 ◽

2020 ◽

pp. 1-12

Author(s):

Zheng Wang ◽

Ziqi Zhou ◽

Huchuan Lu ◽

Qinghua Hu ◽

Jianmin Jiang

Keyword(s):

Local Consistency ◽

Saliency Prediction ◽

Video Saliency

Download Full-text