scholarly journals TREC Deep Learning Track: Reusable Test Collections in the Large Data Regime

Author(s):  
Nick Craswell ◽  
Bhaskar Mitra ◽  
Emine Yilmaz ◽  
Daniel Campos ◽  
Ellen M. Voorhees ◽  
...  
2020 ◽  
Author(s):  
Turki Turki ◽  
Y-h. Taguchi

AbstractAnalyzing single-cell pancreatic data would play an important role in understanding various metabolic diseases and health conditions. Due to the sparsity and noise present in such single-cell gene expression data, analyzing various functions related to the inference of gene regulatory networks, derived from single-cell data, remains difficult, thereby posing a barrier to the deepening of understanding of cellular metabolism. Since recent studies have led to the reliable inference of single-cell gene regulatory networks (SCGRNs), the challenge of discriminating between SCGRNs has now arisen. By accurately discriminating between SCGRNs (e.g., distinguishing SCGRNs of healthy pancreas from those of T2D pancreas), biologists would be able to annotate, organize, visualize, and identify common patterns of SCGRNs for metabolic diseases. Such annotated SCGRNs could play an important role in speeding up the process of building large data repositories. In this study, we aimed to contribute to the development of a novel deep learning (DL) application. First, we generated a dataset consisting of 224 SCGRNs belonging to both T2D and healthy pancreas and made it freely available. Next, we chose seven DL architectures, including VGG16, VGG19, Xception, ResNet50, ResNet101, DenseNet121, and DenseNet169, trained each of them on the dataset, and checked prediction based on a test set. We evaluated the DL architectures on an HP workstation platform with a single NVIDIA GeForce RTX 2080Ti GPU. Experimental results on the whole dataset, using several performance measures, demonstrated the superiority of VGG19 DL model in the automatic classification of SCGRNs, derived from the single-cell pancreatic data.


2020 ◽  
pp. 1826-1838
Author(s):  
Rojalina Priyadarshini ◽  
Rabindra K. Barik ◽  
Chhabi Panigrahi ◽  
Harishchandra Dubey ◽  
Brojo Kishore Mishra

This article describes how machine learning (ML) algorithms are very useful for analysis of data and finding some meaningful information out of them, which could be used in various other applications. In the last few years, an explosive growth has been seen in the dimension and structure of data. There are several difficulties faced by conventional ML algorithms while dealing with such highly voluminous and unstructured big data. The modern ML tools are designed and used to deal with all sorts of complexities of data. Deep learning (DL) is one of the modern ML tools which are commonly used to find the hidden structure and cohesion among these large data sets by giving proper training in parallel platforms with intelligent optimization techniques to further analyze and interpret the data for future prediction and classification. This article focuses on the use of DL tools and software which are used in past couple of years in various areas and especially in the area of healthcare applications.


2022 ◽  
pp. 27-50
Author(s):  
Rajalaxmi Prabhu B. ◽  
Seema S.

A lot of user-generated data is available these days from huge platforms, blogs, websites, and other review sites. These data are usually unstructured. Analyzing sentiments from these data automatically is considered an important challenge. Several machine learning algorithms are implemented to check the opinions from large data sets. A lot of research has been undergone in understanding machine learning approaches to analyze sentiments. Machine learning mainly depends on the data required for model building, and hence, suitable feature exactions techniques also need to be carried. In this chapter, several deep learning approaches, its challenges, and future issues will be addressed. Deep learning techniques are considered important in predicting the sentiments of users. This chapter aims to analyze the deep-learning techniques for predicting sentiments and understanding the importance of several approaches for mining opinions and determining sentiment polarity.


2020 ◽  
Vol 34 (01) ◽  
pp. 598-605
Author(s):  
Chaoran Cheng ◽  
Fei Tan ◽  
Zhi Wei

We consider the problem of Named Entity Recognition (NER) on biomedical scientific literature, and more specifically the genomic variants recognition in this work. Significant success has been achieved for NER on canonical tasks in recent years where large data sets are generally available. However, it remains a challenging problem on many domain-specific areas, especially the domains where only small gold annotations can be obtained. In addition, genomic variant entities exhibit diverse linguistic heterogeneity, differing much from those that have been characterized in existing canonical NER tasks. The state-of-the-art machine learning approaches heavily rely on arduous feature engineering to characterize those unique patterns. In this work, we present the first successful end-to-end deep learning approach to bridge the gap between generic NER algorithms and low-resource applications through genomic variants recognition. Our proposed model can result in promising performance without any hand-crafted features or post-processing rules. Our extensive experiments and results may shed light on other similar low-resource NER applications.


Author(s):  
Qusay Abdullah Abed ◽  
Osamah Mohammed Fadhil ◽  
Wathiq Laftah Al-Yaseen

In general, multidimensional data (mobile application for example) contain a large number of unnecessary information. Web app users find it difficult to get the information needed quickly and effectively due to the sheer volume of data (big data produced per second). In this paper, we tend to study the data mining in web personalization using blended deep learning model. So, one of the effective solutions to this problem is web personalization. As well as, explore how this model helps to analyze and estimate the huge amounts of operations. Providing personalized recommendations to improve reliability depends on the web application using useful information in the web application. The results of this research are important for the training and testing of large data sets for a map of deep mixed learning based on the model of back-spread neural network. The HADOOP framework was used to perform a number of experiments in a different environment with a learning rate between -1 and +1. Also, using the number of techniques to evaluate the number of parameters, true positive cases are represent and fall into positive cases in this example to evaluate the proposed model.


2020 ◽  
Author(s):  
Hannes Wartmann ◽  
Sven Heins ◽  
Karin Kloiber ◽  
Stefan Bonn

AbstractRecent technological advances have resulted in an unprecedented increase in publicly available biomedical data, yet the reuse of the data is often precluded by experimental bias and a lack of annotation depth and consistency. Here we investigate RNA-seq metadata prediction based on gene expression values. We present a deep-learning based domain adaptation algorithm for the automatic annotation of RNA-seq metadata. We show how our algorithm outperforms existing approaches as well as traditional deep learning methods for the prediction of tissue, sample source, and patient sex information across several large data repositories. By using a model architecture similar to siamese networks the algorithm is able to learn biases from datasets with few samples. Our domain adaptation approach achieves metadata annotation accuracies up to 12.3% better than a previously published method. Lastly, we provide a list of more than 10,000 novel tissue and sex label annotations for 8,495 unique SRA samples.


Sensors ◽  
2020 ◽  
Vol 20 (1) ◽  
pp. 322 ◽  
Author(s):  
Faraz Malik Awan ◽  
Yasir Saleem ◽  
Roberto Minerva ◽  
Noel Crespi

Machine/Deep Learning (ML/DL) techniques have been applied to large data sets in order to extract relevant information and for making predictions. The performance and the outcomes of different ML/DL algorithms may vary depending upon the data sets being used, as well as on the suitability of algorithms to the data and the application domain under consideration. Hence, determining which ML/DL algorithm is most suitable for a specific application domain and its related data sets would be a key advantage. To respond to this need, a comparative analysis of well-known ML/DL techniques, including Multilayer Perceptron, K-Nearest Neighbors, Decision Tree, Random Forest, and Voting Classifier (or the Ensemble Learning Approach) for the prediction of parking space availability has been conducted. This comparison utilized Santander’s parking data set, initiated while working on the H2020 WISE-IoT project. The data set was used in order to evaluate the considered algorithms and to determine the one offering the best prediction. The results of this analysis show that, regardless of the data set size, the less complex algorithms like Decision Tree, Random Forest, and KNN outperform complex algorithms such as Multilayer Perceptron, in terms of higher prediction accuracy, while providing comparable information for the prediction of parking space availability. In addition, in this paper, we are providing Top-K parking space recommendations on the basis of distance between current position of vehicles and free parking spots.


2020 ◽  
Vol 13 ◽  
pp. 263177452093522
Author(s):  
Shraddha Gulati ◽  
Andrew Emmanuel ◽  
Mehul Patel ◽  
Sophie Williams ◽  
Amyn Haji ◽  
...  

Artificial intelligence is a strong focus of interest for global health development. Diagnostic endoscopy is an attractive substrate for artificial intelligence with a real potential to improve patient care through standardisation of endoscopic diagnosis and to serve as an adjunct to enhanced imaging diagnosis. The possibility to amass large data to refine algorithms makes adoption of artificial intelligence into global practice a potential reality. Initial studies in luminal endoscopy involve machine learning and are retrospective. Improvement in diagnostic performance is appreciable through the adoption of deep learning. Research foci in the upper gastrointestinal tract include the diagnosis of neoplasia, including Barrett’s, squamous cell and gastric where prospective and real-time artificial intelligence studies have been completed demonstrating a benefit of artificial intelligence–augmented endoscopy. Deep learning applied to small bowel capsule endoscopy also appears to enhance pathology detection and reduce capsule reading time. Prospective evaluation including the first randomised trial has been performed in the colon, demonstrating improved polyp and adenoma detection rates; however, these appear to be relevant to small polyps. There are potential additional roles of artificial intelligence relevant to improving the quality of endoscopic examinations, training and triaging of referrals. Further large-scale, multicentre and cross-platform validation studies are required for the robust incorporation of artificial intelligence–augmented diagnostic luminal endoscopy into our routine clinical practice.


Sign in / Sign up

Export Citation Format

Share Document