Positional Mask Attention for Video Sequence Modeling

Author(s):  
Jiaxuan Wang ◽  
Chaoyi Wang ◽  
Yang Hua ◽  
Tao Song ◽  
Zhengui Xue ◽  
...  
2019 ◽  
Vol 63 (5) ◽  
pp. 50401-1-50401-7 ◽  
Author(s):  
Jing Chen ◽  
Jie Liao ◽  
Huanqiang Zeng ◽  
Canhui Cai ◽  
Kai-Kuang Ma

Abstract For a robust three-dimensional video transmission through error prone channels, an efficient multiple description coding for multi-view video based on the correlation of spatial polyphase transformed subsequences (CSPT_MDC_MVC) is proposed in this article. The input multi-view video sequence is first separated into four subsequences by spatial polyphase transform and then grouped into two descriptions. With the correlation of macroblocks in corresponding subsequence positions, these subsequences should not be coded in completely the same way. In each description, one subsequence is directly coded by the Joint Multi-view Video Coding (JMVC) encoder and the other subsequence is classified into four sets. According to the classification, the indirectly coding subsequence selectively employed the prediction mode and the prediction vector of the counter directly coding subsequence, which reduces the bitrate consumption and the coding complexity of multiple description coding for multi-view video. On the decoder side, the gradient-based directional interpolation is employed to improve the side reconstructed quality. The effectiveness and robustness of the proposed algorithm is verified by experiments in the JMVC coding platform.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
M.-C. Audétat ◽  
S. Cairo Notari ◽  
J. Sader ◽  
C. Ritz ◽  
T. Fassier ◽  
...  

Abstract Background Primary care physicians are at the very heart of managing patients suffering from multimorbidity. However, several studies have highlighted that some physicians feel ill-equipped to manage these kinds of complex clinical situations. Few studies are available on the clinical reasoning processes at play during the long-term management and follow-up of patients suffering from multimorbidity. This study aims to contribute to a better understanding on how the clinical reasoning of primary care physicians is affected during follow-up consultations with these patients. Methods A qualitative research project based on semi-structured interviews with primary care physicians in an ambulatory setting will be carried out, using the video stimulated recall interview method. Participants will be filmed in their work environment during a standard consultation with a patient suffering from multimorbidity using a “button camera” (small camera) which will be pinned to their white coat. The recording will be used in a following semi-structured interview with physicians and the research team to instigate a stimulated recall. Stimulated recall is a research method that allows the investigation of cognitive processes by inviting participants to recall their concurrent thinking during an event when prompted by a video sequence recall. During this interview, participants will be prompted by different video sequence and asked to discuss them; the aim will be to encourage them to make their clinical reasoning processes explicit. Fifteen to twenty interviews are planned to reach data saturation. The interviews will be transcribed verbatim and data will be analysed according to a standard content analysis, using deductive and inductive approaches. Conclusion Study results will contribute to the scientific community’s overall understanding of clinical reasoning. This will subsequently allow future generation of primary care physicians to have access to more adequate trainings to manage patients suffering from multimorbidity in their practice. As a result, this will improve the quality of the patient’s care and treatments.


Author(s):  
Nujud Aloshban ◽  
Anna Esposito ◽  
Alessandro Vinciarelli

AbstractDepression is one of the most common mental health issues. (It affects more than 4% of the world’s population, according to recent estimates.) This article shows that the joint analysis of linguistic and acoustic aspects of speech allows one to discriminate between depressed and nondepressed speakers with an accuracy above 80%. The approach used in the work is based on networks designed for sequence modeling (bidirectional Long-Short Term Memory networks) and multimodal analysis methodologies (late fusion, joint representation and gated multimodal units). The experiments were performed over a corpus of 59 interviews (roughly 4 hours of material) involving 29 individuals diagnosed with depression and 30 control participants. In addition to an accuracy of 80%, the results show that multimodal approaches perform better than unimodal ones owing to people’s tendency to manifest their condition through one modality only, a source of diversity across unimodal approaches. In addition, the experiments show that it is possible to measure the “confidence” of the approach and automatically identify a subset of the test data in which the performance is above a predefined threshold. It is possible to effectively detect depression by using unobtrusive and inexpensive technologies based on the automatic analysis of speech and language.


Sign in / Sign up

Export Citation Format

Share Document