student model
Recently Published Documents


TOTAL DOCUMENTS

211
(FIVE YEARS 54)

H-INDEX

15
(FIVE YEARS 2)

2022 ◽  
pp. 1-39
Author(s):  
Zhicheng Geng ◽  
Zhanxuan Hu ◽  
Xinming Wu ◽  
Luming Liang ◽  
Sergey Fomel

Detecting subsurface salt structures from seismic images is important for seismic structural analysis and subsurface modeling. Recently, deep learning has been successfully applied in solving salt segmentation problems. However, most of the studies focus on supervised salt segmentation and require numerous accurately labeled data, which is usually laborious and time-consuming to collect, especially for the geophysics community. In this paper, we propose a semi-supervised framework for salt segmentation, which requires only a small amount of labeled data. In our method, adopting the mean teacher method, we train two models sharing the same network architecture. The student model is optimized using a combination of supervised loss and unsupervised consistency loss, whereas the teacher model is the exponential moving average (EMA) of the student model. We introduce the unsupervised consistency loss to better extract information from unlabeled data by constraining the network to give consistent predictions for the input data and its perturbed version. We train and test our novel semi-supervised method on both synthetic and real datasets. Results demonstrate that our proposed semi-supervised salt segmentation method outperforms the supervised baseline when there is a lack of labeled training data.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Wen-Ting Li ◽  
Shang-Bing Gao ◽  
Jun-Qiang Zhang ◽  
Shu-Xing Guo

Recent advances in pretraining language models have obtained state-of-the-art results in various natural language processing tasks. However, these huge pretraining language models are difficult to be used in practical applications, such as mobile devices and embedded devices. Moreover, there is no pretraining language model for the chemical industry. In this work, we propose a method to pretrain a smaller language representation model of the chemical industry domain. First, a huge number of chemical industry texts are used as pretraining corpus, and nontraditional knowledge distillation technology is used to build a simplified model to learn the knowledge in the BERT model. By learning the embedded layer, the middle layer, and the prediction layer at different stages, the simplified model not only learns the probability distribution of the prediction layer but also learns the embedded layer and the middle layer at the same time, to acquire the learning ability of BERT model. Finally, it is applied to the downstream tasks. Experiments show that, compared with the current BERT model distillation method, our method makes full use of the rich feature knowledge in the middle layer of the teacher model while building a student model based on the BiLSTM architecture, which effectively solves the problem that the traditional student model based on the transformer architecture is too large and improves the accuracy of the language model in the chemical domain.


Author(s):  
Bingxue Zhang ◽  
Yang Shi ◽  
Yuxing Li ◽  
Chengliang Chai ◽  
Longfeng Hou

Author(s):  
Pengcheng Xu ◽  
Kyungsang Kim ◽  
Jeongwan Koh ◽  
Dufan Wu ◽  
Yu Rim Lee ◽  
...  

Abstract Segmentation has been widely used in diagnosis, lesion detection, and surgery planning. Although deep learning (DL)-based segmentation methods currently outperform traditional methods, most DL-based segmentation models are computationally expensive and memory inefficient, which are not suitable for the intervention of liver surgery. To address this issue, a simple solution is to make a segmentation model very small for the fast inference time, however, there is a trade-off between the model size and performance. In this paper, we propose a DL-based real- time 3-D liver CT segmentation method, where knowledge distillation (KD) method, referred to as knowledge transfer from teacher to student models, is incorporated to compress the model while preserving the performance. Because it is known that the knowledge transfer is inefficient when the disparity of teacher and student model sizes is large, we propose a growing teacher assistant network (GTAN) to gradually learn the knowledge without extra computational cost, which can efficiently transfer knowledges even with the large gap of teacher and student model sizes. In our results, dice similarity coefficient of the student model with KD improved 1.2% (85.9% to 87.1%) compared to the student model without KD, which is a similar performance of the teacher model using only 8% (100k) parameters. Furthermore, with a student model of 2% (30k) parameters, the proposed model using the GTAN improved the dice coefficient about 2% compared to the student model without KD, with the inference time of 13ms per case. Therefore, the proposed method has a great potential for intervention in liver surgery, which also can be utilized in many real-time applications.


2021 ◽  
Vol 2 (4) ◽  
pp. 5296-5312
Author(s):  
Fernando Acevedo Calamet

La investigación que se reseña en este artículo se centró en dos focos problemáticos: (i) la afectación que las características estructurales del contexto socio-académico en el que se inscribe una organización de Educación Superior (ES), cuando su cualidad es desfavorable, produce en las condiciones inherentes al estudiante que ingresa, en especial en su disposición y motivación hacia el aprendizaje; (ii) la evaluación del grado en que esa afectación pone en riesgo la persistencia estudiantil. En el caso de Rivera, en el noreste de Uruguay, la estructura de oportunidades laborales y fundamentalmente educacionales terciarias –escasas y poco diversificadas– constituye un caldo de cultivo de eventos de riesgo de abandono de los estudios. El objetivo capital de la investigación fue aportar insumos teóricamente consistentes y empíricamente sustentados para la elaboración de un modelo «pro-persistencia» estudiantil en ES aplicable a contextos socio-académicos desfavorables y entonces superador, en su aplicabilidad, del modelo más aceptado en el actual mundo académico: el «Model of Institutional Action for Student Success» (MIASS) formulado por Tinto en 2012. En la investigación se tuvieron especialmente en cuenta algunos relevantes planteos teóricos y conceptuales sobre la temática, entre los que se destacan, por su profundidad y rigor, los más recientes de Tinto, Seidman, Kuh y Pascarella & Terenzini. La investigación asumió un enfoque meso-estructural y una estrategia metodológica predominantemente cualitativa: análisis documental, entrevista en profundidad, grupo de discusión; también se aplicó una encuesta censal. El más relevante de los resultados alcanzados es que en lugares que, como Rivera, ofrecen pocas opciones de estudios superiores, las posibilidades de la persistencia estudiantil resultan notoriamente restringidas, ya que en esos casos la motivación intrínseca del estudiante hacia sus estudios suele ser débil: una considerable cantidad de estudiantes, al egresar de la Educación Media, decide cursar alguna de las pocas ofertas de ES existentes en su ciudad y no la que preferiría cursar si existiera esa opción. Esa débil motivación es, pues, el principal factor de riesgo de abandono, sobre todo en el primer año. Tal constatación es la base sustantiva sobre la que habrá de elaborarse un modelo «pro-persistencia» estudiantil alternativo al MIASS, en tanto aplicable en contextos socio-académicos desfavorables. Aquí radica la principal contribución que esta investigación puede ofrecer a organizaciones de ES inscriptas en contextos con oportunidades educacionales y laborales reducidas, tanto en cuanto a una inserción laboral atractiva (durante los estudios superiores o al finalizarlos) como, muy especialmente, a una oferta de ES escasa y poco diversificada.   The research reviewed in this article focused on two problem areas: (i) the affectation that the structural characteristics of the socio-academic context in which a Higher Education (HE) organization is inscribed, when its quality is unfavorable, produces in the inherent conditions of the entering student, especially in his/her disposition and motivation towards learning; (ii) the evaluation of the degree to which this affectation puts student persistence at risk. In the case of Rivera, in the northeast of Uruguay, the structure of labor and mainly tertiary educational opportunities -scarce and not very diversified- constitutes a breeding ground for dropout risk events. The main objective of the research was to provide theoretically consistent and empirically supported inputs for the development of a "pro-persistence" student model in HE applicable to unfavorable socio-academic contexts and thus surpassing, in its applicability, the most accepted model in the current academic world: the "Model of Institutional Action for Student Success" (MIASS) formulated by Tinto in 2012. The research especially took into account some relevant theoretical and conceptual approaches on the subject, among which the most recent ones by Tinto, Seidman, Kuh and Pascarella & Terenzini stand out for their depth and rigor. The research assumed a mesostructural approach and a predominantly qualitative methodological strategy: documentary analysis, in-depth interview, discussion group; a census survey was also applied. The most relevant of the results obtained is that in places which, like Rivera, offer few options for higher education studies, the possibilities of student persistence are notoriously restricted, since in these cases the intrinsic motivation of students towards their studies is usually weak: a considerable number of students, upon graduating from high school, decide to study one of the few HE offers existing in their city and not the one they would prefer to study if that option existed. This weak motivation is, therefore, the main risk factor for dropping out, especially in the first year. This finding is the substantive basis on which an alternative "pro-persistence" student model to MIASS will have to be developed, as it is applicable in unfavorable socio-academic contexts. Herein lies the main contribution that this research can offer to HE organizations in contexts with reduced educational and employment opportunities, both in terms of an attractive labor market insertion (during or upon completion of higher education) and, especially, to a scarce and poorly diversified HE offer.


Author(s):  
Taehyeon Kim ◽  
Jaehoon Oh ◽  
Nak Yil Kim ◽  
Sangwook Cho ◽  
Se-Young Yun

Knowledge distillation (KD), transferring knowledge from a cumbersome teacher model to a lightweight student model, has been investigated to design efficient neural architectures. Generally, the objective function of KD is the Kullback-Leibler (KL) divergence loss between the softened probability distributions of the teacher model and the student model with the temperature scaling hyperparameter τ. Despite its widespread use, few studies have discussed how such softening influences generalization. Here, we theoretically show that the KL divergence loss focuses on the logit matching when τ increases and the label matching when τ goes to 0 and empirically show that the logit matching is positively correlated to performance improvement in general. From this observation, we consider an intuitive KD loss function, the mean squared error (MSE) between the logit vectors, so that the student model can directly learn the logit of the teacher model. The MSE loss outperforms the KL divergence loss, explained by the penultimate layer representations difference between the two losses. Furthermore, we show that sequential distillation can improve performance and that KD, using the KL divergence loss with small τ particularly, mitigates the label noise. The code to reproduce the experiments is publicly available online at https://github.com/jhoon-oh/kd_data/.


Author(s):  
Chao Ye ◽  
Huaidong Zhang ◽  
Xuemiao Xu ◽  
Weiwei Cai ◽  
Jing Qin ◽  
...  

Deep neural networks have been shown to be very powerful tools for object detection in various scenes. Their remarkable performance, however, heavily depends on the availability of a large number of high quality labeled data, which are time-consuming and costly to acquire for scenes with densely packed objects. We present a novel semi-supervised approach to addressing this problem, which is designed based on a common teacher-student model, integrated with a novel intersection-over-union (IoU) aware consistency loss and a new proposal consistency loss. The IoU-aware consistency loss evaluates the IoU over the prediction pairs of the teacher model and the student model, which enforces the prediction of the student model to approach closely to that of the teacher model. The IoU-aware consistency loss also reweights the importance of different prediction pairs to suppress the low-confident pairs. The proposal consistency loss ensures proposal consistency between the two models, making it possible to involve the region proposal network in the training process with unlabeled data. We also construct a new dataset, namely RebarDSC, containing 2,125 rebar images annotated with 350,348 bounding boxes in total (164.9 annotations per image average), to evaluate the proposed method. Extensive experiments are conducted over both the RebarDSC dataset and the famous large public dataset SKU-110K. Experimental results corroborate that the proposed method is able to improve the object detection performance in densely packed scenes, consistently outperforming state-of-the-art approaches. Dataset is available in https://github.com/Armin1337/RebarDSC.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Mingyong Li ◽  
Qiqi Li ◽  
Lirong Tang ◽  
Shuang Peng ◽  
Yan Ma ◽  
...  

Cross-modal hashing encodes heterogeneous multimedia data into compact binary code to achieve fast and flexible retrieval across different modalities. Due to its low storage cost and high retrieval efficiency, it has received widespread attention. Supervised deep hashing significantly improves search performance and usually yields more accurate results, but requires a lot of manual annotation of the data. In contrast, unsupervised deep hashing is difficult to achieve satisfactory performance due to the lack of reliable supervisory information. To solve this problem, inspired by knowledge distillation, we propose a novel unsupervised knowledge distillation cross-modal hashing method based on semantic alignment (SAKDH), which can reconstruct the similarity matrix using the hidden correlation information of the pretrained unsupervised teacher model, and the reconstructed similarity matrix can be used to guide the supervised student model. Specifically, firstly, the teacher model adopted an unsupervised semantic alignment hashing method, which can construct a modal fusion similarity matrix. Secondly, under the supervision of teacher model distillation information, the student model can generate more discriminative hash codes. Experimental results on two extensive benchmark datasets (MIRFLICKR-25K and NUS-WIDE) show that compared to several representative unsupervised cross-modal hashing methods, the mean average precision (MAP) of our proposed method has achieved a significant improvement. It fully reflects its effectiveness in large-scale cross-modal data retrieval.


Sign in / Sign up

Export Citation Format

Share Document