Coherent video generation for multiple hand-held cameras with dynamic foreground

Fang-Lue Zhang; Connelly Barnes; Hao-Tian Zhang; Junhong Zhao; Gabriel Salas

doi:10.1007/s41095-020-0187-3

Coherent video generation for multiple hand-held cameras with dynamic foreground

Computational Visual Media ◽

10.1007/s41095-020-0187-3 ◽

2020 ◽

Vol 6 (3) ◽

pp. 291-306

Author(s):

Fang-Lue Zhang ◽

Connelly Barnes ◽

Hao-Tian Zhang ◽

Junhong Zhao ◽

Gabriel Salas

Keyword(s):

Video Sequence ◽

State Of The Art ◽

Close Attention ◽

Cut Points ◽

Social Events ◽

Art Methods ◽

Previous State ◽

Smooth Transitions

Abstract For many social events such as public performances, multiple hand-held cameras may capture the same event. This footage is often collected by amateur cinematographers who typically have little control over the scene and may not pay close attention to the camera. For these reasons, each individually captured video may fail to cover the whole time of the event, or may lose track of interesting foreground content such as a performer. We introduce a new algorithm that can synthesize a single smooth video sequence of moving foreground objects captured by multiple hand-held cameras. This allows later viewers to gain a cohesive narrative experience that can transition between different cameras, even though the input footage may be less than ideal. We first introduce a graph-based method for selecting a good transition route. This allows us to automatically select good cut points for the hand-held videos, so that smooth transitions can be created between the resulting video shots. We also propose a method to synthesize a smooth photorealistic transition video between each pair of hand-held cameras, which preserves dynamic foreground content during this transition. Our experiments demonstrate that our method outperforms previous state-of-the-art methods, which struggle to preserve dynamic foreground content.

Download Full-text

Simple Shading Correction Method for Brightfield Whole Slide Imaging

Sensors ◽

10.3390/s20113084 ◽

2020 ◽

Vol 20 (11) ◽

pp. 3084

Author(s):

Yoon-Oh Tak ◽

Anjin Park ◽

Janghoon Choi ◽

Jonghyun Eom ◽

Hyuk-Sang Kwon ◽

...

Keyword(s):

State Of The Art ◽

Correction Method ◽

Input Image ◽

Image Sequences ◽

Whole Slide Imaging ◽

Pattern Noise ◽

Art Methods ◽

Previous State ◽

Fixed Pattern Noise

Whole slide imaging (WSI) refers to the process of creating a high-resolution digital image of a whole slide. Since digital images are typically produced by stitching image sequences acquired from different fields of view, the visual quality of the images can be degraded owing to shading distortion, which produces black plaid patterns on the images. A shading correction method for brightfield WSI is presented, which is simple but robust not only against typical image artifacts caused by specks of dust and bubbles, but also against fixed-pattern noise, or spatial variations in pixel values under uniform illumination. The proposed method comprises primarily of two steps. The first step constructs candidates of a shading distortion model from a stack of input image sequences. The second step selects the optimal model from the candidates. The proposed method was compared experimentally with two previous state-of-the-art methods, regularized energy minimization (CIDRE) and background and shading correction (BaSiC) and showed better correction scores, as smooth operations and constraints were not imposed when estimating the shading distortion. The correction scores, averaged over 40 image collections, were as follows: proposed method, 0.39 ± 0.099; CIDRE method, 0.67 ± 0.047; BaSiC method, 0.55 ± 0.038. Based on the quantitative evaluations, we can confirm that the proposed method can correct not only shading distortion, but also fixed-pattern noise, compared with the two previous state-of-the-art methods.

Download Full-text

Partially Supervised Named Entity Recognition via the Expected Entity Ratio Loss

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00429 ◽

2021 ◽

Vol 9 ◽

pp. 1320-1335

Author(s):

Thomas Effland ◽

Michael Collins

Keyword(s):

Latent Variables ◽

State Of The Art ◽

Named Entity Recognition ◽

Entity Recognition ◽

Annotation Scheme ◽

Named Entity ◽

Art Methods ◽

Previous State

Abstract We study learning named entity recognizers in the presence of missing entity annotations. We approach this setting as tagging with latent variables and propose a novel loss, the Expected Entity Ratio, to learn models in the presence of systematically missing tags. We show that our approach is both theoretically sound and empirically useful. Experimentally, we find that it meets or exceeds performance of strong and state-of-the-art baselines across a variety of languages, annotation scenarios, and amounts of labeled data. In particular, we find that it significantly outperforms the previous state-of-the-art methods from Mayhew et al. (2019) and Li et al. (2021) by +12.7 and +2.3 F1 score in a challenging setting with only 1,000 biased annotations, averaged across 7 datasets. We also show that, when combined with our approach, a novel sparse annotation scheme outperforms exhaustive annotation for modest annotation budgets.1

Download Full-text

Visual Dialogue State Tracking for Question Generation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6856 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11831-11838 ◽

Cited By ~ 2

Author(s):

Wei Pang ◽

Xiaojie Wang

Keyword(s):

State Of The Art ◽

Experimental Results ◽

Question Generation ◽

Art Methods ◽

Previous State ◽

State Tracking ◽

Art Performance

GuessWhat?! is a visual dialogue task between a guesser and an oracle. The guesser aims to locate an object supposed by the oracle oneself in an image by asking a sequence of Yes/No questions. Asking proper questions with the progress of dialogue is vital for achieving successful final guess. As a result, the progress of dialogue should be properly represented and tracked. Previous models for question generation pay less attention on the representation and tracking of dialogue states, and therefore are prone to asking low quality questions such as repeated questions. This paper proposes visual dialogue state tracking (VDST) based method for question generation. A visual dialogue state is defined as the distribution on objects in the image as well as representations of objects. Representations of objects are updated with the change of the distribution on objects. An object-difference based attention is used to decode new question. The distribution on objects is updated by comparing the question-answer pair and objects. Experimental results on GuessWhat?! dataset show that our model significantly outperforms existing methods and achieves new state-of-the-art performance. It is also noticeable that our model reduces the rate of repeated questions from more than 50% to 21.9% compared with previous state-of-the-art methods.

Download Full-text

A Multifeature Learning and Fusion Network for Facial Age Estimation

Sensors ◽

10.3390/s21134597 ◽

2021 ◽

Vol 21 (13) ◽

pp. 4597

Author(s):

Yulan Deng ◽

Shaohua Teng ◽

Lunke Fei ◽

Wei Zhang ◽

Imad Rida

Keyword(s):

Age Estimation ◽

State Of The Art ◽

Great Influence ◽

Embedded Devices ◽

Face Images ◽

Memory Overhead ◽

Art Methods ◽

Previous State ◽

Effectiveness And Efficiency ◽

Facial Age

Age estimation from face images has attracted much attention due to its favorable and many real-world applications such as video surveillance and social networking. However, most existing studies usually learn a single kind of age feature and ignore other appearance features such as gender and race, which have a great influence on the age pattern. In this paper, we proposed a compact multifeature learning and fusion method for age estimation. Specifically, we first used three subnetworks to learn gender, race, and age information. Then, we fused these complementary features to further form more robust features for age estimation. Finally, we engineered a regression-ranking age-feature estimator to convert the fusion features into the exact age numbers. Experimental results on three benchmark databases demonstrated the effectiveness and efficiency of the proposed method on facial age estimation in comparison to previous state-of-the-art methods. Moreover, compared with previous state-of-the-art methods, our model was more compact with only a 20 MB memory overhead and is suitable for deployment on mobile or embedded devices for age estimation.

Download Full-text

Multi-Resolution Autoregressive Graph-to-Graph Translation for Molecules

10.26434/chemrxiv.8266745.v1 ◽

2019 ◽

Author(s):

Wengong Jin ◽

Regina Barzilay ◽

Tommi S Jaakkola

Keyword(s):

Drug Discovery ◽

State Of The Art ◽

Molecular Graph ◽

Biochemical Properties ◽

Large Margin ◽

Previous State ◽

Translation Methods ◽

Atom Level ◽

Precursor Molecules ◽

Prior State

The problem of accelerating drug discovery relies heavily on automatic tools to optimize precursor molecules to afford them with better biochemical properties. Our work in this paper substantially extends prior state-of-the-art on graph-to-graph translation methods for molecular optimization. In particular, we realize coherent multi-resolution representations by interweaving trees over substructures with the atom-level encoding of the original molecular graph. Moreover, our graph decoder is fully autoregressive, and interleaves each step of adding a new substructure with the process of resolving its connectivity to the emerging molecule. We evaluate our model on multiple molecular optimization tasks and show that our model outperforms previous state-of-the-art baselines by a large margin.

Download Full-text

Multi-hop assortativities for network classification

Journal of Complex Networks ◽

10.1093/comnet/cny034 ◽

2018 ◽

Vol 7 (4) ◽

pp. 603-622 ◽

Cited By ~ 1

Author(s):

Leonardo Gutiérrez-Gómez ◽

Jean-Charles Delvenne

Keyword(s):

Machine Learning ◽

Scientific Collaboration ◽

State Of The Art ◽

Medical Engineering ◽

Research Field ◽

Classification Task ◽

Collaboration Network ◽

Structural Patterns ◽

Art Methods

Abstract Several social, medical, engineering and biological challenges rely on discovering the functionality of networks from their structure and node metadata, when it is available. For example, in chemoinformatics one might want to detect whether a molecule is toxic based on structure and atomic types, or discover the research field of a scientific collaboration network. Existing techniques rely on counting or measuring structural patterns that are known to show large variations from network to network, such as the number of triangles, or the assortativity of node metadata. We introduce the concept of multi-hop assortativity, that captures the similarity of the nodes situated at the extremities of a randomly selected path of a given length. We show that multi-hop assortativity unifies various existing concepts and offers a versatile family of ‘fingerprints’ to characterize networks. These fingerprints allow in turn to recover the functionalities of a network, with the help of the machine learning toolbox. Our method is evaluated empirically on established social and chemoinformatic network benchmarks. Results reveal that our assortativity based features are competitive providing highly accurate results often outperforming state of the art methods for the network classification task.

Download Full-text

Automatic Detection of Discrimination Actions from Social Images

Electronics ◽

10.3390/electronics10030325 ◽

2021 ◽

Vol 10 (3) ◽

pp. 325

Author(s):

Zhihao Wu ◽

Baopeng Zhang ◽

Tianchen Zhou ◽

Yan Li ◽

Jianping Fan

Keyword(s):

Action Recognition ◽

State Of The Art ◽

Automatic Detection ◽

Experimental Results ◽

Practical Approach ◽

Detection And Identification ◽

Art Methods ◽

Image Set ◽

Social Images ◽

Relationship Identification

In this paper, we developed a practical approach for automatic detection of discrimination actions from social images. Firstly, an image set is established, in which various discrimination actions and relations are manually labeled. To the best of our knowledge, this is the first work to create a dataset for discrimination action recognition and relationship identification. Secondly, a practical approach is developed to achieve automatic detection and identification of discrimination actions and relationships from social images. Thirdly, the task of relationship identification is seamlessly integrated with the task of discrimination action recognition into one single network called the Co-operative Visual Translation Embedding++ network (CVTransE++). We also compared our proposed method with numerous state-of-the-art methods, and our experimental results demonstrated that our proposed methods can significantly outperform state-of-the-art approaches.

Download Full-text

Using spatial-temporal ensembles of convolutional neural networks for lumen segmentation in ureteroscopy

International Journal of Computer Assisted Radiology and Surgery ◽

10.1007/s11548-021-02376-3 ◽

2021 ◽

Author(s):

Jorge F. Lazo ◽

Aldo Marzullo ◽

Sara Moccia ◽

Michele Catellani ◽

Benoit Rosa ◽

...

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

State Of The Art ◽

Automatic Segmentation ◽

Temporal Information ◽

Invasive Technique ◽

Dice Similarity Coefficient ◽

Specular Reflections ◽

Lumen Segmentation ◽

Previous State

Abstract Purpose Ureteroscopy is an efficient endoscopic minimally invasive technique for the diagnosis and treatment of upper tract urothelial carcinoma. During ureteroscopy, the automatic segmentation of the hollow lumen is of primary importance, since it indicates the path that the endoscope should follow. In order to obtain an accurate segmentation of the hollow lumen, this paper presents an automatic method based on convolutional neural networks (CNNs). Methods The proposed method is based on an ensemble of 4 parallel CNNs to simultaneously process single and multi-frame information. Of these, two architectures are taken as core-models, namely U-Net based in residual blocks ($$m_1$$ m 1 ) and Mask-RCNN ($$m_2$$ m 2 ), which are fed with single still-frames I(t). The other two models ($$M_1$$ M 1 , $$M_2$$ M 2 ) are modifications of the former ones consisting on the addition of a stage which makes use of 3D convolutions to process temporal information. $$M_1$$ M 1 , $$M_2$$ M 2 are fed with triplets of frames ($$I(t-1)$$ I ( t - 1 ) , I(t), $$I(t+1)$$ I ( t + 1 ) ) to produce the segmentation for I(t). Results The proposed method was evaluated using a custom dataset of 11 videos (2673 frames) which were collected and manually annotated from 6 patients. We obtain a Dice similarity coefficient of 0.80, outperforming previous state-of-the-art methods. Conclusion The obtained results show that spatial-temporal information can be effectively exploited by the ensemble model to improve hollow lumen segmentation in ureteroscopic images. The method is effective also in the presence of poor visibility, occasional bleeding, or specular reflections.

Download Full-text

A Deep Learning Approach to Predict Autism Spectrum Disorder Using Multisite Resting-State fMRI

Applied Sciences ◽

10.3390/app11083636 ◽

2021 ◽

Vol 11 (8) ◽

pp. 3636

Author(s):

Faria Zarin Subah ◽

Kaushik Deb ◽

Pranab Kumar Dhar ◽

Takeshi Koshiba

Keyword(s):

Autism Spectrum Disorder ◽

Resting State ◽

State Of The Art ◽

Resting State Fmri ◽

Autism Spectrum ◽

Spectrum Disorder ◽

Bootstrap Analysis ◽

Proposed Model ◽

Art Methods ◽

The Mean

Autism spectrum disorder (ASD) is a complex and degenerative neuro-developmental disorder. Most of the existing methods utilize functional magnetic resonance imaging (fMRI) to detect ASD with a very limited dataset which provides high accuracy but results in poor generalization. To overcome this limitation and to enhance the performance of the automated autism diagnosis model, in this paper, we propose an ASD detection model using functional connectivity features of resting-state fMRI data. Our proposed model utilizes two commonly used brain atlases, Craddock 200 (CC200) and Automated Anatomical Labelling (AAL), and two rarely used atlases Bootstrap Analysis of Stable Clusters (BASC) and Power. A deep neural network (DNN) classifier is used to perform the classification task. Simulation results indicate that the proposed model outperforms state-of-the-art methods in terms of accuracy. The mean accuracy of the proposed model was 88%, whereas the mean accuracy of the state-of-the-art methods ranged from 67% to 85%. The sensitivity, F1-score, and area under receiver operating characteristic curve (AUC) score of the proposed model were 90%, 87%, and 96%, respectively. Comparative analysis on various scoring strategies show the superiority of BASC atlas over other aforementioned atlases in classifying ASD and control.

Download Full-text

A contour property based approach to segment nuclei in cervical cytology images

BMC Medical Imaging ◽

10.1186/s12880-020-00533-9 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Iram Tazim Hoque ◽

Nabil Ibtehaz ◽

Saumitra Chakravarty ◽

M. Saifur Rahman ◽

M. Sohel Rahman

Keyword(s):

Pap Smear ◽

State Of The Art ◽

Cervical Cytology ◽

Average Intensity ◽

Nucleus Size ◽

Real Dataset ◽

Art Methods ◽

Convolution Filter ◽

Cervical Cells ◽

Nucleus Segmentation

Abstract Background Segmentation of nuclei in cervical cytology pap smear images is a crucial stage in automated cervical cancer screening. The task itself is challenging due to the presence of cervical cells with spurious edges, overlapping cells, neutrophils, and artifacts. Methods After the initial preprocessing steps of adaptive thresholding, in our approach, the image passes through a convolution filter to filter out some noise. Then, contours from the resultant image are filtered by their distinctive contour properties followed by a nucleus size recovery procedure based on contour average intensity value. Results We evaluate our method on a public (benchmark) dataset collected from ISBI and also a private real dataset. The results show that our algorithm outperforms other state-of-the-art methods in nucleus segmentation on the ISBI dataset with a precision of 0.978 and recall of 0.933. A promising precision of 0.770 and a formidable recall of 0.886 on the private real dataset indicate that our algorithm can effectively detect and segment nuclei on real cervical cytology images. Tuning various parameters, the precision could be increased to as high as 0.949 with an acceptable decrease of recall to 0.759. Our method also managed an Aggregated Jaccard Index of 0.681 outperforming other state-of-the-art methods on the real dataset. Conclusion We have proposed a contour property-based approach for segmentation of nuclei. Our algorithm has several tunable parameters and is flexible enough to adapt to real practical scenarios and requirements.

Download Full-text