model size
Recently Published Documents


TOTAL DOCUMENTS

282
(FIVE YEARS 105)

H-INDEX

23
(FIVE YEARS 4)

2022 ◽  
Vol 4 (1) ◽  
pp. 22-41
Author(s):  
Nermeen Abou Baker ◽  
Nico Zengeler ◽  
Uwe Handmann

Transfer learning is a machine learning technique that uses previously acquired knowledge from a source domain to enhance learning in a target domain by reusing learned weights. This technique is ubiquitous because of its great advantages in achieving high performance while saving training time, memory, and effort in network design. In this paper, we investigate how to select the best pre-trained model that meets the target domain requirements for image classification tasks. In our study, we refined the output layers and general network parameters to apply the knowledge of eleven image processing models, pre-trained on ImageNet, to five different target domain datasets. We measured the accuracy, accuracy density, training time, and model size to evaluate the pre-trained models both in training sessions in one episode and with ten episodes.


2022 ◽  
Vol 14 (1) ◽  
pp. 21
Author(s):  
Weiwei Zhang ◽  
Xin Ma ◽  
Yuzhao Zhang ◽  
Ming Ji ◽  
Chenghui Zhen

Due to the arbitrariness of the drone’s shooting angle of view and camera movement and the limited computing power of the drone platform, pedestrian detection in the drone scene poses a greater challenge. This paper proposes a new convolutional neural network structure, SMYOLO, which achieves the balance of accuracy and speed from three aspects: (1) By combining deep separable convolution and point convolution and replacing the activation function, the calculation amount and parameters of the original network are reduced; (2) by adding a batch normalization (BN) layer, SMYOLO accelerates the convergence and improves the generalization ability; and (3) through scale matching, reduces the feature loss of the original network. Compared with the original network model, SMYOLO reduces the accuracy of the model by only 4.36%, the model size is reduced by 76.90%, the inference speed is increased by 43.29%, and the detection target is accelerated by 33.33%, achieving minimization of the network model volume while ensuring the detection accuracy of the model.


2021 ◽  
Author(s):  
Ranit Karmakar ◽  
Saeid Nooshabadi

Abstract Colon polyps, small clump of cells on the lining of the colon can lead to Colorectal cancer (CRC), one of the leading types of cancer globally. Hence, early detection of these polyps is crucial in the prevention of CRC. This paper proposes a lightweight deep learning model for colorectal polyp segmentation that achieved state-of-the-art accuracy while significantly reducing the model size and complexity. The proposed deep learning autoencoder model employs a set of state-of-the-art architectural blocks and optimization objective functions to achieve the desired efficiency. The model is trained and tested on five publicly available colorectal polyp segmentation datasets (CVC-ClinicDB, CVC-ColonDB, EndoScene, Kvasir, and ETIS). We also performed ablation testing on the model to test various aspects of the autoencoder architecture. We performed the model evaluation using most of the common image segmentation metrics. The backbone model achieved a dice score of 0.935 on the Kvasir dataset and 0.945 on the CVC-ClinicDB dataset improving the accuracy by 4.12% and 5.12% respectively over the current state-of-the-art network, while using 88 times fewer parameters, 40 times less storage space, and being computationally 17 times more efficient. Our ablation study showed that the addition of ConvSkip in the autoencoder slightly improves the model’s performance but it was not significant (p-value=0.815).


2021 ◽  
Vol 12 (1) ◽  
pp. 44
Author(s):  
Seokjin Lee ◽  
Minhan Kim ◽  
Seunghyeon Shin ◽  
Seungjae Baek ◽  
Sooyoung Park ◽  
...  

In recent acoustic scene classification (ASC) models, various auxiliary methods to enhance performance have been applied, e.g., subsystem ensembles and data augmentations. Particularly, the ensembles of several submodels may be effective in the ASC models, but there is a problem with increasing the size of the model because it contains several submodels. Therefore, it is hard to be used in model-complexity-limited ASC tasks. In this paper, we would like to find the performance enhancement method while taking advantage of the model ensemble technique without increasing the model size. Our method is proposed based on a mean-teacher model, which is developed for consistency learning in semi-supervised learning. Because our problem is supervised learning, which is different from the purpose of the conventional mean-teacher model, we modify detailed strategies to maximize the consistency learning performance. To evaluate the effectiveness of our method, experiments were performed with an ASC database from the Detection and Classification of Acoustic Scenes and Events 2021 Task 1A. The small-sized ASC model with our proposed method improved the log loss performance up to 1.009 and the F1-score performance by 67.12%, whereas the vanilla ASC model showed a log loss of 1.052 and an F1-score of 65.79%.


2021 ◽  
Vol 13 (24) ◽  
pp. 5143
Author(s):  
Bo Huang ◽  
Zhiming Guo ◽  
Liaoni Wu ◽  
Boyong He ◽  
Xianjiang Li ◽  
...  

Image super-resolution (SR) technology aims to recover high-resolution images from low-resolution originals, and it is of great significance for the high-quality interpretation of remote sensing images. However, most present SR-reconstruction approaches suffer from network training difficulties and the challenge of increasing computational complexity with increasing numbers of network layers. This indicates that these approaches are not suitable for application scenarios with limited computing resources. Furthermore, the complex spatial distributions and rich details of remote sensing images increase the difficulty of their reconstruction. In this paper, we propose the pyramid information distillation attention network (PIDAN) to solve these issues. Specifically, we propose the pyramid information distillation attention block (PIDAB), which has been developed as a building block in the PIDAN. The key components of the PIDAB are the pyramid information distillation (PID) module and the hybrid attention mechanism (HAM) module. Firstly, the PID module uses feature distillation with parallel multi-receptive field convolutions to extract short- and long-path feature information, which allows the network to obtain more non-redundant image features. Then, the HAM module enhances the sensitivity of the network to high-frequency image information. Extensive validation experiments show that when compared with other advanced CNN-based approaches, the PIDAN achieves a better balance between image SR performance and model size.


2021 ◽  
Vol 2137 (1) ◽  
pp. 012062
Author(s):  
Chengshuai Fan

Abstract The magnetic tile image has the characteristics of uneven illumination, complex surface texture, and low contrast. Aiming at the problem that the traditional defect detection algorithm is difficult to accurately identify the defects, and the deep learning algorithm is difficult to balance the classification accuracy and the size of the speed model, a defect classification algorithm based on attention-based EfficientNet is proposed. The algorithm first enhances the network’s spatial and location information for image features by integrating the Convolutional Block Attention Module, and improves the network’s ability to identify defects. Then, on this basis, Criss-Cross Attention is added to the network, so that the network can better the context information of the horizontal and vertical cross of image features, so that each pixel can finally capture the full image dependency of all pixels. Experimental results show that the algorithm has higher classification accuracy than EfficientNet-B0, reached 99.11%, and has a better balance between accuracy, speed and model size than other classification models.


2021 ◽  
Vol 2021 (12) ◽  
pp. 124003
Author(s):  
Preetum Nakkiran ◽  
Gal Kaplun ◽  
Yamini Bansal ◽  
Tristan Yang ◽  
Boaz Barak ◽  
...  

Abstract We show that a variety of modern deep learning tasks exhibit a ‘double-descent’ phenomenon where, as we increase model size, performance first gets worse and then gets better. Moreover, we show that double descent occurs not just as a function of model size, but also as a function of the number of training epochs. We unify the above phenomena by defining a new complexity measure we call the effective model complexity and conjecture a generalized double descent with respect to this measure. Furthermore, our notion of model complexity allows us to identify certain regimes where increasing (even quadrupling) the number of train samples actually hurts test performance.


2021 ◽  
pp. 1-7
Author(s):  
Lina Majeed Haider Al-Haideri ◽  
Necla Cakmak

Electronic and structural features of uranium-doped models of graphene (UG) were investigated in this work by employing the density functional theory (DFT) approach. Three sizes of models were investigated based on the numbers of surrounding layers around the central U-doped region including UG1, UG2, and UG3. In this regard, stabilized structures were obtained and their electronic molecular orbital features were evaluated, accordingly. The results indicated that the stabilized structures could be obtained, in which their electronic features are indeed size-dependent. The conductivity feature was expected at a higher level for the UG3 model whereas that of the UG1 model was at a lower level. Energy levels of the highest occupied and the lowest unoccupied molecular orbitals (HOMO and LUMO) were indeed the evidence of such achievement for electronic conductivity features. As a consequence, the model size of UG could determine its electronic feature providing it for specified applications.


Sign in / Sign up

Export Citation Format

Share Document