scholarly journals Bokeh Effect Rendering with Vision Transformers

Author(s):  
Hariharan Nagasubramaniam ◽  
Rabih Younes

Bokeh effect is growing to be an important feature in photography, essentially to choose an object of interest to be in focus with the rest of the background being blurred. While naturally rendering this effect requires a DSLR with large diameter of aperture, with the current advancements in Deep Learning, this effect can also be produced in mobile cameras. Most of the existing methods use Convolutional Neural Networks while some relying on the depth map to render this effect. In this paper, we propose an end-to-end Vision Transformer model for Bokeh rendering of images from monocular camera. This architecture uses vision transformers as backbone, thus learning from the entire image rather than just the parts from the filters in a CNN. This property of retaining global information coupled with initial training of the model for image restoration before training to render the blur effect for the background, allows our method to produce clearer images and outperform the current state-of-the-art models on the EBB! Data set. The code to our proposed method can be found at: https://github.com/Soester10/ Bokeh-Rendering-with-Vision-Transformers.

2022 ◽  
Author(s):  
Hariharan Nagasubramaniam ◽  
Rabih Younes

Bokeh effect is growing to be an important feature in photography, essentially to choose an object of interest to be in focus with the rest of the background being blurred. While naturally rendering this effect requires a DSLR with large diameter of aperture, with the current advancements in Deep Learning, this effect can also be produced in mobile cameras. Most of the existing methods use Convolutional Neural Networks while some relying on the depth map to render this effect. In this paper, we propose an end-to-end Vision Transformer model for Bokeh rendering of images from monocular camera. This architecture uses vision transformers as backbone, thus learning from the entire image rather than just the parts from the filters in a CNN. This property of retaining global information coupled with initial training of the model for image restoration before training to render the blur effect for the background, allows our method to produce clearer images and outperform the current state-of-the-art models on the EBB! Data set. The code to our proposed method can be found at: https://github.com/Soester10/ Bokeh-Rendering-with-Vision-Transformers.


Author(s):  
Weixiang Xu ◽  
Xiangyu He ◽  
Tianli Zhao ◽  
Qinghao Hu ◽  
Peisong Wang ◽  
...  

Large neural networks are difficult to deploy on mobile devices because of intensive computation and storage. To alleviate it, we study ternarization, a balance between efficiency and accuracy that quantizes both weights and activations into ternary values. In previous ternarized neural networks, a hard threshold Δ is introduced to determine quantization intervals. Although the selection of Δ greatly affects the training results, previous works estimate Δ via an approximation or treat it as a hyper-parameter, which is suboptimal. In this paper, we present the Soft Threshold Ternary Networks (STTN), which enables the model to automatically determine quantization intervals instead of depending on a hard threshold. Concretely, we replace the original ternary kernel with the addition of two binary kernels at training time, where ternary values are determined by the combination of two corresponding binary values. At inference time, we add up the two binary kernels to obtain a single ternary kernel. Our method dramatically outperforms current state-of-the-arts, lowering the performance gap between full-precision networks and extreme low bit networks. Experiments on ImageNet with AlexNet (Top-1 55.6%), ResNet-18 (Top-1 66.2%) achieves new state-of-the-art.


2021 ◽  
Vol 7 ◽  
pp. e495
Author(s):  
Saleh Albahli ◽  
Hafiz Tayyab Rauf ◽  
Abdulelah Algosaibi ◽  
Valentina Emilia Balas

Artificial intelligence (AI) has played a significant role in image analysis and feature extraction, applied to detect and diagnose a wide range of chest-related diseases. Although several researchers have used current state-of-the-art approaches and have produced impressive chest-related clinical outcomes, specific techniques may not contribute many advantages if one type of disease is detected without the rest being identified. Those who tried to identify multiple chest-related diseases were ineffective due to insufficient data and the available data not being balanced. This research provides a significant contribution to the healthcare industry and the research community by proposing a synthetic data augmentation in three deep Convolutional Neural Networks (CNNs) architectures for the detection of 14 chest-related diseases. The employed models are DenseNet121, InceptionResNetV2, and ResNet152V2; after training and validation, an average ROC-AUC score of 0.80 was obtained competitive as compared to the previous models that were trained for multi-class classification to detect anomalies in x-ray images. This research illustrates how the proposed model practices state-of-the-art deep neural networks to classify 14 chest-related diseases with better accuracy.


Information ◽  
2019 ◽  
Vol 10 (3) ◽  
pp. 98 ◽  
Author(s):  
Tariq Ahmad ◽  
Allan Ramsay ◽  
Hanady Ahmed

Assigning sentiment labels to documents is, at first sight, a standard multi-label classification task. Many approaches have been used for this task, but the current state-of-the-art solutions use deep neural networks (DNNs). As such, it seems likely that standard machine learning algorithms, such as these, will provide an effective approach. We describe an alternative approach, involving the use of probabilities to construct a weighted lexicon of sentiment terms, then modifying the lexicon and calculating optimal thresholds for each class. We show that this approach outperforms the use of DNNs and other standard algorithms. We believe that DNNs are not a universal panacea and that paying attention to the nature of the data that you are trying to learn from can be more important than trying out ever more powerful general purpose machine learning algorithms.


2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Hylke E. Beck ◽  
Seth Westra ◽  
Jackson Tan ◽  
Florian Pappenberger ◽  
George J. Huffman ◽  
...  

Abstract We introduce the Precipitation Probability DISTribution (PPDIST) dataset, a collection of global high-resolution (0.1°) observation-based climatologies (1979–2018) of the occurrence and peak intensity of precipitation (P) at daily and 3-hourly time-scales. The climatologies were produced using neural networks trained with daily P observations from 93,138 gauges and hourly P observations (resampled to 3-hourly) from 11,881 gauges worldwide. Mean validation coefficient of determination (R2) values ranged from 0.76 to 0.80 for the daily P occurrence indices, and from 0.44 to 0.84 for the daily peak P intensity indices. The neural networks performed significantly better than current state-of-the-art reanalysis (ERA5) and satellite (IMERG) products for all P indices. Using a 0.1 mm 3 h−1 threshold, P was estimated to occur 12.2%, 7.4%, and 14.3% of the time, on average, over the global, land, and ocean domains, respectively. The highest P intensities were found over parts of Central America, India, and Southeast Asia, along the western equatorial coast of Africa, and in the intertropical convergence zone. The PPDIST dataset is available via www.gloh2o.org/ppdist.


2013 ◽  
Vol 36 (6) ◽  
pp. 623-624 ◽  
Author(s):  
Tobias A. Mattei

AbstractBy integrating the classic psychological principles of ancient art of memory (AAOM) with the most recent paradigms in cognitive neuroscience (i.e., the concepts of hodotopic organization and nonlinear dynamics of brain neural networks), Llewellyn provides an up-to-date model of the complex psychological relationships between memory, imagination, and dreams in accordance with current state-of-the-art principles in neuroscience.


Author(s):  
Aydin Ayanzadeh ◽  
Sahand Vahidnia

In this paper, we leverage state of the art models on Imagenet data-sets. We use the pre-trained model and learned weighs to extract the feature from the Dog breeds identification data-set. Afterwards, we applied fine-tuning and dataaugmentation to increase the performance of our test accuracy in classification of dog breeds datasets. The performance of the proposed approaches are compared with the state of the art models of Image-Net datasets such as ResNet-50, DenseNet-121, DenseNet-169 and GoogleNet. we achieved 89.66% , 85.37% 84.01% and 82.08% test accuracy respectively which shows thesuperior performance of proposed method to the previous works on Stanford dog breeds datasets.


Author(s):  
Alex Dexter ◽  
Spencer A. Thomas ◽  
Rory T. Steven ◽  
Kenneth N. Robinson ◽  
Adam J. Taylor ◽  
...  

AbstractHigh dimensionality omics and hyperspectral imaging datasets present difficult challenges for feature extraction and data mining due to huge numbers of features that cannot be simultaneously examined. The sample numbers and variables of these methods are constantly growing as new technologies are developed, and computational analysis needs to evolve to keep up with growing demand. Current state of the art algorithms can handle some routine datasets but struggle when datasets grow above a certain size. We present a training deep learning via neural networks on non-linear dimensionality reduction, in particular t-distributed stochastic neighbour embedding (t-SNE), to overcome prior limitations of these methods.One Sentence SummaryAnalysis of prohibitively large datasets by combining deep learning via neural networks with non-linear dimensionality reduction.


2021 ◽  
Vol 15 (02) ◽  
pp. 161-187
Author(s):  
Olav A. Nergård Rongved ◽  
Steven A. Hicks ◽  
Vajira Thambawita ◽  
Håkon K. Stensland ◽  
Evi Zouganeli ◽  
...  

Developing systems for the automatic detection of events in video is a task which has gained attention in many areas including sports. More specifically, event detection for soccer videos has been studied widely in the literature. However, there are still a number of shortcomings in the state-of-the-art such as high latency, making it challenging to operate at the live edge. In this paper, we present an algorithm to detect events in soccer videos in real time, using 3D convolutional neural networks. We test our algorithm on three different datasets from SoccerNet, the Swedish Allsvenskan, and the Norwegian Eliteserien. Overall, the results show that we can detect events with high recall, low latency, and accurate time estimation. The trade-off is a slightly lower precision compared to the current state-of-the-art, which has higher latency and performs better when a less accurate time estimation can be accepted. In addition to the presented algorithm, we perform an extensive ablation study on how the different parts of the training pipeline affect the final results.


2015 ◽  
Vol 15 (4-5) ◽  
pp. 481-494 ◽  
Author(s):  
CRAIG BLACKMORE ◽  
OLIVER RAY ◽  
KERSTIN EDER

AbstractThis paper introduces a new logic-based method for optimising the selection of compiler flags on embedded architectures. In particular, we use Inductive Logic Programming (ILP) to learn logical rules that relate effective compiler flags to specific program features. Unlike earlier work, we aim to infer human-readable rules and we seek to develop a relational first-order approach which automatically discovers relevant features rather than relying on a vector of predetermined attributes. To this end we generated a data set by measuring execution times of 60 benchmarks on an embedded system development board and we developed an ILP prototype which outperforms the current state-of-the-art learning approach in 34 of the 60 benchmarks. Finally, we combined the strengths of the current state of the art and our ILP method in a hybrid approach which reduced execution times by an average of 8% and up to 50% in some cases.


Sign in / Sign up

Export Citation Format

Share Document