scholarly journals A Weakly Supervised and Deep Learning Method for an Additive Topic Analysis of Large Corpora

Author(s):  
Yair Fogel-Dror ◽  
Shaul R. Shenhav ◽  
Tamir Sheafer

The collaborative effort of theory-driven content analysis can benefit significantly from the use of topic analysis methods, which allow researchers to add more categories while developing or testing a theory. This additive approach enables the reuse of previous efforts of analysis or even the merging of separate research projects, thereby making these methods more accessible and increasing the discipline’s ability to create and share content analysis capabilities. This paper proposes a weakly supervised topic analysis method that uses both a low-cost unsupervised method to compile a training set and supervised deep learning as an additive and accurate text classification method. We test the validity of the method, specifically its additivity, by comparing the results of the method after adding 200 categories to an initial number of 450. We show that the suggested method provides a foundation for a low-cost solution for large-scale topic analysis.

2021 ◽  
Vol 3 (1) ◽  
pp. 29-59
Author(s):  
Yair Fogel-Dror ◽  
Shaul R. Shenhav ◽  
Tamir Sheafer

Abstract The collaborative effort of theory-driven content analysis can benefit significantly from the use of topic analysis methods, which allow researchers to add more categories while developing or testing a theory. This additive approach enables the reuse of previous efforts of analysis or even the merging of separate research projects, thereby making these methods more accessible and increasing the discipline’s ability to create and share content analysis capabilities. This paper proposes a weakly supervised topic analysis method that uses both a low-cost unsupervised method to compile a training set and supervised deep learning as an additive and accurate text classification method. We test the validity of the method, specifically its additivity, by comparing the results of the method after adding 200 categories to an initial number of 450. We show that the suggested method provides a foundation for a low-cost solution for large-scale topic analysis.


Author(s):  
Rui Guo ◽  
Xiaobin Hu ◽  
Haoming Song ◽  
Pengpeng Xu ◽  
Haoping Xu ◽  
...  

Abstract Purpose To develop a weakly supervised deep learning (WSDL) method that could utilize incomplete/missing survival data to predict the prognosis of extranodal natural killer/T cell lymphoma, nasal type (ENKTL) based on pretreatment 18F-FDG PET/CT results. Methods One hundred and sixty-seven patients with ENKTL who underwent pretreatment 18F-FDG PET/CT were retrospectively collected. Eighty-four patients were followed up for at least 2 years (training set = 64, test set = 20). A WSDL method was developed to enable the integration of the remaining 83 patients with incomplete/missing follow-up information in the training set. To test generalization, these data were derived from three types of scanners. Prediction similarity index (PSI) was derived from deep learning features of images. Its discriminative ability was calculated and compared with that of a conventional deep learning (CDL) method. Univariate and multivariate analyses helped explore the significance of PSI and clinical features. Results PSI achieved area under the curve scores of 0.9858 and 0.9946 (training set) and 0.8750 and 0.7344 (test set) in the prediction of progression-free survival (PFS) with the WSDL and CDL methods, respectively. PSI threshold of 1.0 could significantly differentiate the prognosis. In the test set, WSDL and CDL achieved prediction sensitivity, specificity, and accuracy of 87.50% and 62.50%, 83.33% and 83.33%, and 85.00% and 75.00%, respectively. Multivariate analysis confirmed PSI to be an independent significant predictor of PFS in both the methods. Conclusion The WSDL-based framework was more effective for extracting 18F-FDG PET/CT features and predicting the prognosis of ENKTL than the CDL method.


2020 ◽  
Vol 10 (8) ◽  
pp. 2878 ◽  
Author(s):  
Jihyun Seo ◽  
Hanse Ahn ◽  
Daewon Kim ◽  
Sungju Lee ◽  
Yongwha Chung ◽  
...  

Automated pig monitoring is an important issue in the surveillance environment of a pig farm. For a large-scale pig farm in particular, practical issues such as monitoring cost should be considered but such consideration based on low-cost embedded boards has not yet been reported. Since low-cost embedded boards have more limited computing power than typical PCs and have tradeoffs between execution speed and accuracy, achieving fast and accurate detection of individual pigs for “on-device” pig monitoring applications is very challenging. Therefore, in this paper, we propose a method for the fast detection of individual pigs by reducing the computational workload of 3 × 3 convolution in widely-used, deep learning-based object detectors. Then, in order to recover the accuracy of the “light-weight” deep learning-based object detector, we generate a three-channel composite image as its input image, through “simple” image preprocessing techniques. Our experimental results on an NVIDIA Jetson Nano embedded board show that the proposed method can improve the integrated performance of both execution speed and accuracy of widely-used, deep learning-based object detectors, by a factor of up to 8.7.


2019 ◽  
Vol 7 (4) ◽  
pp. T911-T922
Author(s):  
Satyakee Sen ◽  
Sribharath Kainkaryam ◽  
Cen Ong ◽  
Arvind Sharma

Salt model building has long been considered a severe bottleneck for large-scale 3D seismic imaging projects. It is one of the most time-consuming, labor-intensive, and difficult-to-automate processes in the entire depth imaging workflow requiring significant intervention by domain experts to manually interpret the salt bodies on noisy, low-frequency, and low-resolution seismic images at each iteration of the salt model building process. The difficulty and need for automating this task is well-recognized by the imaging community and has propelled the use of deep-learning-based convolutional neural network (CNN) architectures to carry out this task. However, significant challenges remain for reliable production-scale deployment of CNN-based methods for salt model building. This is mainly due to the poor generalization capabilities of these networks. When used on new surveys, never seen by the CNN models during the training stage, the interpretation accuracy of these models drops significantly. To remediate this key problem, we have introduced a U-shaped encoder-decoder type CNN architecture trained using a specialized regularization strategy aimed at reducing the generalization error of the network. Our regularization scheme perturbs the ground truth labels in the training set. Two different perturbations are discussed: one that randomly changes the labels of the training set, flipping salt labels to sediments and vice versa and the second that smooths the labels. We have determined that such perturbations act as a strong regularizer preventing the network from making highly confident predictions on the training set and thus reducing overfitting. An ensemble strategy is also used for test time augmentation that is shown to further improve the accuracy. The robustness of our CNN models, in terms of reduced generalization error and improved interpretation accuracy is demonstrated with real data examples from the Gulf of Mexico.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 30885-30896 ◽  
Author(s):  
Jibing Gong ◽  
Hongyuan Ma ◽  
Zhiyong Teng ◽  
Qi Teng ◽  
Hekai Zhang ◽  
...  

2013 ◽  
Vol 361-363 ◽  
pp. 2122-2126
Author(s):  
Jun Chen ◽  
Xiao Hua Li ◽  
Lan Ma

Traditional transit travel information is acquired by Trip Sample Survey which has some disadvantages including high cost and short data lifecycle. This paper researched transit travel demand analysis method using Advanced Public Transportation Systems (APTS) data. The study collected APTS data of Nanning City in China and established APTS multi-source data analysis platform applying data warehouse technology. Based on key problems research, the paper presented the analysis procedure and content. Then, this study proposed the core algorithms of the method which are determinations of boarding bus stops, alighting bus stops and transfer bus stops of smart card passengers. Finally, these algorithms programs are experimented using large scale practical APTS data. The results show that this analysis method is low cost, operability and high accuracy.


2021 ◽  
Author(s):  
Abhilasha Dubey ◽  
Sanjay Upadhyay ◽  
Manjeet Mehta

Rapid, reliable and robust method for the detection of SARS-CoV-2 is an indispensable need for diagnostics. The development of diagnostic methods will aid to address further waves of the pandemic potentially with rapid surveillance of disease and to allay the fears. To meet this challenge, we have developed a rapid RT-qPCR method for the detection of 3 target genes or confirmatory genes in less than 30 minutes. The assay showed 100% sensitivity and 100% specificity when tested on 120 samples. We compared a conventional extraction based method with extraction-free method, and then further reduced the run time of extraction free method. Additionally, we have validated our rapid RT-qPCR method for the assessment of pooled samples. We hereby propose a most reliable approach for the mass screening of samples with ease of operation at a low cost. Finally we designed a single tube analysis method which provides qualitative as well as quantitative results in minimum time.


IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 171548-171558 ◽  
Author(s):  
Jiaying Wang ◽  
Yaxin Li ◽  
Jing Shan ◽  
Jinling Bao ◽  
Chuanyu Zong ◽  
...  

2021 ◽  
Author(s):  
Melanie Brandmeier ◽  
Eya Cherif

<p>Degradation of large forest areas such as the Brazilian Amazon due to logging and fires can increase the human footprint way beyond deforestation. Monitoring and quantifying such changes on a large scale has been addressed by several research groups (e.g. Souza et al. 2013) by making use of freely available remote sensing data such as the Landsat archive. However, fully automatic large-scale land cover/land use mapping is still one of the great challenges in remote sensing. One problem is the availability of reliable “ground truth” labels for training supervised learning algorithms. For the Amazon area, several landcover maps with 22 classes are available from the MapBiomas project that were derived by semi-automatic classification and verified by extensive fieldwork (Project MapBiomas). These labels cannot be considered real ground-truth as they were derived from Landsat data themselves but can still be used for weakly supervised training of deep-learning models that have a potential to improve predictions on higher resolution data nowadays available. The term weakly supervised learning was originally coined by (Zhou 2017) and refers to the attempt of constructing predictive models from incomplete, inexact and/or inaccurate labels as is often the case in remote sensing. To this end, we investigate advanced deep-learning strategies on Sentinel-1 timeseries and Sentinel-2 optical data to improve large-scale automatic mapping and monitoring of landcover changes in the Amazon area. Sentinel-1 data has the advantage to be resistant to cloud cover that often hinders optical remote sensing in the tropics.</p><p>We propose new architectures that are adapted to the particularities of remote sensing data (S1 timeseries and multispectral S2 data) and compare the performance to state-of-the-art models.  Results using only spectral data were very promising with overall test accuracies of 77.9% for Unet and 74.7% for a DeepLab implementation with ResNet50 backbone and F1 measures of 43.2% and 44.2% respectively.  On the other hand, preliminary results for new architectures leveraging the multi-temporal aspect of  SAR data have improved the quality of mapping, particularly for agricultural classes. For instance, our new designed network AtrousDeepForestM2 has a similar quantitative performances as DeepLab  (F1 of 58.1% vs 62.1%), however it produces better qualitative land cover maps.</p><p>To make our approach scalable and feasible for others, we integrate the trained models in a geoprocessing tool in ArcGIS that can also be deployed in a cloud environment and offers a variety of post-processing options to the user.</p><p>Souza, J., Carlos M., et al. (2013). "Ten-Year Landsat Classification of Deforestation and Forest Degradation in the Brazilian Amazon." Remote Sensing 5(11): 5493-5513.   </p><p>Zhou, Z.-H. (2017). "A brief introduction to weakly supervised learning." National Science Review 5(1): 44-53.</p><p>"Project MapBiomas - Collection  4.1 of Brazilian Land Cover & Use Map Series, accessed on January 2020 through the link: https://mapbiomas.org/colecoes-mapbiomas?cama_set_language=en"</p>


2019 ◽  
Vol 2019 ◽  
pp. 1-14 ◽  
Author(s):  
Sambuddha Ghosal ◽  
Bangyou Zheng ◽  
Scott C. Chapman ◽  
Andries B. Potgieter ◽  
David R. Jordan ◽  
...  

The yield of cereal crops such as sorghum (Sorghum bicolor L. Moench) depends on the distribution of crop-heads in varying branching arrangements. Therefore, counting the head number per unit area is critical for plant breeders to correlate with the genotypic variation in a specific breeding field. However, measuring such phenotypic traits manually is an extremely labor-intensive process and suffers from low efficiency and human errors. Moreover, the process is almost infeasible for large-scale breeding plantations or experiments. Machine learning-based approaches like deep convolutional neural network (CNN) based object detectors are promising tools for efficient object detection and counting. However, a significant limitation of such deep learning-based approaches is that they typically require a massive amount of hand-labeled images for training, which is still a tedious process. Here, we propose an active learning inspired weakly supervised deep learning framework for sorghum head detection and counting from UAV-based images. We demonstrate that it is possible to significantly reduce human labeling effort without compromising final model performance (R2 between human count and machine count is 0.88) by using a semitrained CNN model (i.e., trained with limited labeled data) to perform synthetic annotation. In addition, we also visualize key features that the network learns. This improves trustworthiness by enabling users to better understand and trust the decisions that the trained deep learning model makes.


Sign in / Sign up

Export Citation Format

Share Document