Video Summarization Based on Multimodal Features

Author(s):  
Yu Zhang ◽  
Ju Liu ◽  
Xiaoxi Liu ◽  
Xuesong Gao

In this manuscript, the authors present a keyshots-based supervised video summarization method, where feature fusion and LSTM networks are used for summarization. The framework can be divided into three folds: 1) The authors formulate video summarization as a sequence to sequence problem, which should predict the importance score of video content based on video feature sequence. 2) By simultaneously considering visual features and textual features, the authors present the deep fusion multimodal features and summarize videos based on recurrent encoder-decoder architecture with bi-directional LSTM. 3) Most importantly, in order to train the supervised video summarization framework, the authors adopt the number of users who decided to select current video clip in their final video summary as the importance scores and ground truth. Comparisons are performed with the state-of-the-art methods and different variants of FLSum and T-FLSum. The results of F-score and rank correlation coefficients on TVSum and SumMe shows the outstanding performance of the method proposed in this manuscript.

2021 ◽  
Author(s):  
Yiming Qian

A High Definition visual attention based video summarization algorithm is proposed to extract feature frames and create a video summary. Specifically, the proposed framework is used as the basis for establishing whether or not there is a measurable impact on summaries constructed when choosing to incorporate visual attention mechanisms into the processing pipeline. The algorithm was assessed against manual human generated key-frame summaries presented with tested datasets from the Open Video Dataset (www.open-video.org). Of the frames selected by the algorithm, up to 68.1% were in agreement with the manual frame summaries depending on the category and length of the video. Specifically, a clear impact of agreement rate with the ground truth is demonstrated when including colour-attention models (in general) into the summarization framework, with the proposed colour-attention model achieving stronger agreement with human selected summaries, than other models from the literature.


2021 ◽  
Vol 18 (6) ◽  
pp. 9294-9311
Author(s):  
Yunyun Sun ◽  
◽  
Peng Li ◽  
Zhaohui Jiang ◽  
Sujun Hu ◽  
...  

<abstract> <p>Numerous limitations of Shot-based and Content-based key-frame extraction approaches have encouraged the development of Cluster-based algorithms. This paper proposes an Optimal Threshold and Maximum Weight (OTMW) clustering approach that allows accurate and automatic extraction of video summarization. Firstly, the video content is analyzed using the image color, texture and information complexity, and video feature dataset is constructed. Then a Golden Section method is proposed to determine the threshold function optimal solution. The initial cluster center and the cluster number <italic>k</italic> are automatically obtained by employing the improved clustering algorithm. k-clusters video frames are produced with the help of K-MEANS algorithm. The representative frame of each cluster is extracted using the Maximum Weight method and an accurate video summarization is obtained. The proposed approach is tested on 16 multi-type videos, and the obtained key-frame quality evaluation index, and the average of Fidelity and Ratio are 96.11925 and 97.128, respectively. Fortunately, the key-frames extracted by the proposed approach are consistent with artificial visual judgement. The performance of the proposed approach is compared with several state-of-the-art cluster-based algorithms, and the Fidelity are increased by 12.49721, 10.86455, 10.62984 and 10.4984375, respectively. In addition, the Ratio is increased by 1.958 on average with small fluctuations. The obtained experimental results demonstrate the advantage of the proposed solution over several related baselines on sixteen diverse datasets and validated that proposed approach can accurately extract video summarization from multi-type videos.</p> </abstract>


2021 ◽  
Author(s):  
Yiming Qian

A High Definition visual attention based video summarization algorithm is proposed to extract feature frames and create a video summary. Specifically, the proposed framework is used as the basis for establishing whether or not there is a measurable impact on summaries constructed when choosing to incorporate visual attention mechanisms into the processing pipeline. The algorithm was assessed against manual human generated key-frame summaries presented with tested datasets from the Open Video Dataset (www.open-video.org). Of the frames selected by the algorithm, up to 68.1% were in agreement with the manual frame summaries depending on the category and length of the video. Specifically, a clear impact of agreement rate with the ground truth is demonstrated when including colour-attention models (in general) into the summarization framework, with the proposed colour-attention model achieving stronger agreement with human selected summaries, than other models from the literature.


Energies ◽  
2021 ◽  
Vol 14 (11) ◽  
pp. 3119
Author(s):  
Yinjiao Su ◽  
Xuan Liu ◽  
Yang Teng ◽  
Kai Zhang

Mercury (Hg) is a toxic trace element emitted from coal conversion and utilization. Samples with different coal ranks and gangue from Ningwu Coalfield are selected and investigated in this study. For understanding dependence of mercury distribution characteristics on coalification degree, Pearson regression analysis coupled with Spearman rank correlation is employed to explore the relationship between mercury and sulfur, mercury and ash in coal, and sequential chemical extraction method is adopted to recognize the Hg speciation in the samples of coal and gangue. The measured results show that Hg is positively related to total sulfur content in coal and the affinity of Hg to different sulfur forms varies with the coalification degree. Organic sulfur has the biggest impact on Hg in peat, which becomes weak with increasing the coalification degree from lignite to bituminous coal. Sulfate sulfur is only related to Hg in peat or lignite as little content in coal. However, the Pearson linear correlation coefficients of Hg and pyritic sulfur are relatively high with 0.479 for lignite, 0.709 for sub-bituminous coal and 0.887 for bituminous coal. Hg is also related to ash content in coal, whose Pearson linear correlation coefficients are 0.504, 0.774 and 0.827 respectively, in lignite, sub-bituminous coal and bituminous coal. Furthermore, Hg distribution is directly depended on own speciation in coal. The total proportion of F2 + F3 + F4 is increased from 41.5% in peat to 87.4% in bituminous coal, but the average proportion of F5 is decreased from 56.8% in peat to 12.4% in bituminous coal. The above findings imply that both Hg and sulfur enrich in coal largely due to the migration from organic state to inorganic state with the increase of coalification degree in Ningwu Coalfield.


2021 ◽  
Vol 11 (3) ◽  
pp. 1064
Author(s):  
Jenq-Haur Wang ◽  
Yen-Tsang Wu ◽  
Long Wang

In social networks, users can easily share information and express their opinions. Given the huge amount of data posted by many users, it is difficult to search for relevant information. In addition to individual posts, it would be useful if we can recommend groups of people with similar interests. Past studies on user preference learning focused on single-modal features such as review contents or demographic information of users. However, such information is usually not easy to obtain in most social media without explicit user feedback. In this paper, we propose a multimodal feature fusion approach to implicit user preference prediction which combines text and image features from user posts for recommending similar users in social media. First, we use the convolutional neural network (CNN) and TextCNN models to extract image and text features, respectively. Then, these features are combined using early and late fusion methods as a representation of user preferences. Lastly, a list of users with the most similar preferences are recommended. The experimental results on real-world Instagram data show that the best performance can be achieved when we apply late fusion of individual classification results for images and texts, with the best average top-k accuracy of 0.491. This validates the effectiveness of utilizing deep learning methods for fusing multimodal features to represent social user preferences. Further investigation is needed to verify the performance in different types of social media.


2021 ◽  
Vol 5 (1) ◽  
Author(s):  
Orhun Utku Aydin ◽  
Abdel Aziz Taha ◽  
Adam Hilbert ◽  
Ahmed A. Khalil ◽  
Ivana Galinovic ◽  
...  

AbstractAverage Hausdorff distance is a widely used performance measure to calculate the distance between two point sets. In medical image segmentation, it is used to compare ground truth images with segmentations allowing their ranking. We identified, however, ranking errors of average Hausdorff distance making it less suitable for applications in segmentation performance assessment. To mitigate this error, we present a modified calculation of this performance measure that we have coined “balanced average Hausdorff distance”. To simulate segmentations for ranking, we manually created non-overlapping segmentation errors common in magnetic resonance angiography cerebral vessel segmentation as our use-case. Adding the created errors consecutively and randomly to the ground truth, we created sets of simulated segmentations with increasing number of errors. Each set of simulated segmentations was ranked using both performance measures. We calculated the Kendall rank correlation coefficient between the segmentation ranking and the number of errors in each simulated segmentation. The rankings produced by balanced average Hausdorff distance had a significantly higher median correlation (1.00) than those by average Hausdorff distance (0.89). In 200 total rankings, the former misranked 52 whilst the latter misranked 179 segmentations. Balanced average Hausdorff distance is more suitable for rankings and quality assessment of segmentations than average Hausdorff distance.


2001 ◽  
Vol 4 (5) ◽  
pp. 961-970 ◽  
Author(s):  
Heidi J Wengreen ◽  
Ronald G Munger ◽  
Siew Sun Wong ◽  
Nancy A West ◽  
Richard Cutler

AbstractObjective:To evaluate the 137-item Utah Picture-sort Food-frequency Questionnaire (FFQ) in the measurement of usual dietary intake in older adults.Design:The picture-sort FFQ was administered at baseline and again one year later. Three seasonal 24-hour dietary recall interviews were collected during the year between the two FFQs. Mean nutrient intakes were compared between methods and between administrations of the FFQ.Setting:The FFQ interviews were administered in respondents' homes or care-centres. The 24-hour diet recalls were conducted by telephone interview on random days of the week.Subjects:Two-hundred-and-eight men and women aged 55–84 years were recruited by random sample of controls from a case–control study of nutrition and bone health in Utah.Results:After adjustment for total energy intake, median Spearman rank correlation coefficients between the two picture-sort FFQs were 0.69 for men aged ≤69 years, 0.66 for men aged >69 years; and 0.68 for women aged ≤69 years, 0.67 for women aged >69 years. Median correlation coefficients between methods were 0.50 for men ≤69 years old, 0.52 for men >69 years old; 0.55 for women ≤69 years old, 0.46 for women >69 years old.Conclusions:We report intake correlations between methods and administrations comparable to those reported in the literature for traditional paper-and-pencil FFQs and one other picture-sort method of FFQ. This dietary assessment method may improve ease and accuracy of response in this and other populations with low literacy levels, poor memory skill, impaired hearing, or poor vision.


1987 ◽  
Vol 41 (1) ◽  
pp. 40 ◽  
Author(s):  
Sallie Keller-McNulty ◽  
Mark McNulty

2012 ◽  
Vol 37 (1) ◽  
pp. 65-69 ◽  
Author(s):  
Prasath Jayakaran ◽  
Gillian M Johnson ◽  
S John Sullivan

Background and Aim: The physical asymmetries associated with a prosthesis raises the question of validity of the Sensory Organization Test (SOT) measures (equilibrium score (ES) and strategy score (SS)) in lower limb amputees. This study explores the validity of these measures in transtibial amputees by correlating with their corresponding centre of pressure (COP) excursion/velocity measures. Technique: Fifteen transtibial amputees (69.5 ± 6.5 years) completed three trials for each of the six SOT conditions. Discussion: The Spearman’s rank correlation coefficients between ESs and global COP excursion/velocity measures ranged from 0.52 to 0.71 for Conditions 1, 4 and 5, 0.79 to 0.85 for Conditions 2 and 3, and 0.39 to 0.43 for Condition 6. The coefficients for SSs ranged between 0.78 and 0.97 for Conditions 1 to 5 and 0.55 to 0.67 for Condition 6. The corresponding sound and prosthetic side COP variables demonstrated varying strengths of association with ES and SS. Clinical relevance Of the two clinical measures examined, the SSs are strongly reflective of COP excursion/velocity measures and these findings have application in the interpretation of SOT when evaluating balance in transtibial amputees.


Sign in / Sign up

Export Citation Format

Share Document