mobile cameras
Recently Published Documents


TOTAL DOCUMENTS

64
(FIVE YEARS 23)

H-INDEX

9
(FIVE YEARS 2)

2022 ◽  
Author(s):  
Hariharan Nagasubramaniam ◽  
Rabih Younes

Bokeh effect is growing to be an important feature in photography, essentially to choose an object of interest to be in focus with the rest of the background being blurred. While naturally rendering this effect requires a DSLR with large diameter of aperture, with the current advancements in Deep Learning, this effect can also be produced in mobile cameras. Most of the existing methods use Convolutional Neural Networks while some relying on the depth map to render this effect. In this paper, we propose an end-to-end Vision Transformer model for Bokeh rendering of images from monocular camera. This architecture uses vision transformers as backbone, thus learning from the entire image rather than just the parts from the filters in a CNN. This property of retaining global information coupled with initial training of the model for image restoration before training to render the blur effect for the background, allows our method to produce clearer images and outperform the current state-of-the-art models on the EBB! Data set. The code to our proposed method can be found at: https://github.com/Soester10/ Bokeh-Rendering-with-Vision-Transformers.


2022 ◽  
Author(s):  
Hariharan Nagasubramaniam ◽  
Rabih Younes

Bokeh effect is growing to be an important feature in photography, essentially to choose an object of interest to be in focus with the rest of the background being blurred. While naturally rendering this effect requires a DSLR with large diameter of aperture, with the current advancements in Deep Learning, this effect can also be produced in mobile cameras. Most of the existing methods use Convolutional Neural Networks while some relying on the depth map to render this effect. In this paper, we propose an end-to-end Vision Transformer model for Bokeh rendering of images from monocular camera. This architecture uses vision transformers as backbone, thus learning from the entire image rather than just the parts from the filters in a CNN. This property of retaining global information coupled with initial training of the model for image restoration before training to render the blur effect for the background, allows our method to produce clearer images and outperform the current state-of-the-art models on the EBB! Data set. The code to our proposed method can be found at: https://github.com/Soester10/ Bokeh-Rendering-with-Vision-Transformers.


2021 ◽  
Vol 12 (1) ◽  
pp. 317
Author(s):  
Shakil Ahmed ◽  
A F M Saifuddin Saif ◽  
Md Imtiaz Hanif ◽  
Md Mostofa Nurannabi Shakil ◽  
Md Mostofa Jaman ◽  
...  

With the advancement of the technological field, day by day, people from around the world are having easier access to internet abled devices, and as a result, video data is growing rapidly. The increase of portable devices such as various action cameras, mobile cameras, motion cameras, etc., can also be considered for the faster growth of video data. Data from these multiple sources need more maintenance to process for various usages according to the needs. By considering these enormous amounts of video data, it cannot be navigated fully by the end-users. Throughout recent times, many research works have been done to generate descriptions from the images or visual scene recordings to address the mentioned issue. This description generation, also known as video captioning, is more complex than single image captioning. Various advanced neural networks have been used in various studies to perform video captioning. In this paper, we propose an attention-based Bi-LSTM and sequential LSTM (Att-BiL-SL) encoder-decoder model for describing the video in textual format. The model consists of two-layer attention-based bi-LSTM and one-layer sequential LSTM for video captioning. The model also extracts the universal and native temporal features from the video frames for smooth sentence generation from optical frames. This paper includes the word embedding with a soft attention mechanism and a beam search optimization algorithm to generate qualitative results. It is found that the architecture proposed in this paper performs better than various existing state of the art models.


2021 ◽  
Vol 17 (4) ◽  
pp. 1-20
Author(s):  
Liang Dong ◽  
Jingao Xu ◽  
Guoxuan Chi ◽  
Danyang Li ◽  
Xinglin Zhang ◽  
...  

Smartphone localization is essential to a wide spectrum of applications in the era of mobile computing. The ubiquity of smartphone mobile cameras and surveillance ambient cameras holds promise for offering sub-meter accuracy localization services thanks to the maturity of computer vision techniques. In general, ambient-camera-based solutions are able to localize pedestrians in video frames at fine-grained, but the tracking performance under dynamic environments remains unreliable. On the contrary, mobile-camera-based solutions are capable of continuously tracking pedestrians; however, they usually involve constructing a large volume of image database, a labor-intensive overhead for practical deployment. We observe an opportunity of integrating these two most promising approaches to overcome above limitations and revisit the problem of smartphone localization with a fresh perspective. However, fusing mobile-camera-based and ambient-camera-based systems is non-trivial due to disparity of camera in terms of perspectives, parameters and incorrespondence of localization results. In this article, we propose iMAC, an integrated mobile cameras and ambient cameras based localization system that achieves sub-meter accuracy and enhanced robustness with zero-human start-up effort. The key innovation of iMAC is a well-designed fusing frame to eliminate disparity of cameras including a construction of projection map function to automatically calibrate ambient cameras, an instant crowd fingerprints model to describe user motion patterns, and a confidence-aware matching algorithm to associate results from two sub-systems. We fully implement iMAC on commodity smartphones and validate its performance in five different scenarios. The results show that iMAC achieves a remarkable localization accuracy of 0.68 m, outperforming the state-of-the-art systems by >75%.


2021 ◽  
Author(s):  
Haoying Li ◽  
Honghao Chen ◽  
Meng Chang ◽  
Huajun Feng ◽  
Zhihai Xu ◽  
...  

2021 ◽  
Author(s):  
Francescomaria Faticanti ◽  
Francesco Bronzino ◽  
Francesco De Pellegrini

Photonics ◽  
2021 ◽  
Vol 8 (10) ◽  
pp. 454
Author(s):  
Yuru Huang ◽  
Yikun Liu ◽  
Haishan Liu ◽  
Yuyang Shui ◽  
Guanwen Zhao ◽  
...  

Image fusion and reconstruction from muldti-images taken by distributed or mobile cameras need accurate calibration to avoid image mismatching. This calibration process becomes difficult in fog when no clear nearby reference is available. In this work, the fusion of multi-view images taken in fog by two cameras fixed on a moving platform is realized. The positions and aiming directions of the cameras are determined by taking a close visible object as a reference. One camera with a large field of view (FOV) is applied to acquire images of a short-distance object which is still visible in fog. This reference is then adopted to the calibration of the camera system to determine the positions and pointing directions at each viewpoint. The extrinsic parameter matrices are obtained with these data, which are applied for the image fusion of distant images captured by another camera beyond visibility. The experimental verification was carried out in a fog chamber and the technique is shown to be valid for imaging reconstruction in fog without a prior in-plane. The synthetic image, accumulated and averaged by ten-view images, is shown to perform potential applicability for fog removal. The enhanced structure similarity is discussed and compared in detail with conventional single-view defogging techniques.


2021 ◽  
Vol 7 (1) ◽  
pp. 571-604
Author(s):  
Mauricio Delbracio ◽  
Damien Kelly ◽  
Michael S. Brown ◽  
Peyman Milanfar

The first mobile camera phone was sold only 20 years ago, when taking pictures with one's phone was an oddity, and sharing pictures online was unheard of. Today, the smartphone is more camera than phone. How did this happen? This transformation was enabled by advances in computational photography—the science and engineering of making great images from small-form-factor, mobile cameras. Modern algorithmic and computing advances, including machine learning, have changed the rules of photography, bringing to it new modes of capture, postprocessing, storage, and sharing. In this review, we give a brief history of mobile computational photography and describe some of the key technological components, including burst photography, noise reduction, and super-resolution. At each step, we can draw naive parallels to the human visual system.


Author(s):  
Sara Merlino

In this paper, I analyse video recordings of speech-language therapy sessions for people diagnosed with aphasia. I particularly explore the way in which the speech-language therapists instruct the patients to correctly pronounce speech sounds (e.g. phonemes, syllables) by deploying not only audible but also visible forms of cues. By using their bodies – face and gestures – as an instructional tool, the therapists make visual perceptual access to articulatory features of pronunciation relevant and salient. They can also make these sensory practices accountable through the use of other senses, such as touch. Data was collected in a hospital and in a rehabilitation clinic, tracking each patient’s recovery, and is part of a longitudinal multisite corpus. The paper considers the way in which participants in the therapeutic process use and coordinate forms of sensory access to language that are based on hearing and seeing. It highlights the importance of audio and video recordings to make accessible the auditory and visual details of these sensorial experiences – particularly, proper framings and the complementary use of fixed and mobile cameras.  


Sign in / Sign up

Export Citation Format

Share Document