scholarly journals Image Caption Generator Using Deep Learning

Author(s):  
A. V. N. Kameswari

Abstract: When humans see an image, their brain can easily tell what the image is about, but a computer cannot do it easily. Computer vision researchers worked on this a lot and they considered it impossible until now! With the advancement in Deep learning techniques, availability of huge datasets and computer power, we can build models that can generate captions for an image. Image Caption Generator is a popular research area of Deep Learning that deals with image understanding and a language description for that image. Generating well-formed sentences requires both syntactic and semantic understanding of the language. Being able to describe the content of an image using accurately formed sentences is a very challenging task, but it could also have a great impact, by helping visually impaired people better understand the content of images. The biggest challenge is most definitely being able to create a description that must capture not only the objects contained in an image, but also express how these objects relate to each other. This paper uses Flickr_8K dataset and Flickr8k_text folder that contains Flickr8k.token which is the main file of our dataset that contains image name and their respective caption separated by newline(“\n”). CNN is used for extracting features from the image. We will use the pre-trained model Xception. LSTM will use the information from CNN to help generate a description of the image. In our Flickr8k_text folder, we have Flickr_8k.trainImages.txt file that contains a list of 6000 images names that we will use for training. After CNN-LSTM model is defined we give an image file as parameter through command prompt for testing image caption generator and it generates the caption of an image and its accuracy is observed by calculating bleu score for generated and reference captions. Keywords: Image Caption Generator, Convolutional Neural Network, Long Short-Term Memory, Bleu score, Flickr_8K

2018 ◽  
Author(s):  
Andre Lamurias ◽  
Luka A. Clarke ◽  
Francisco M. Couto

AbstractRecent studies have proposed deep learning techniques, namely recurrent neural networks, to improve biomedical text mining tasks. However, these techniques rarely take advantage of existing domain-specific resources, such as ontologies. In Life and Health Sciences there is a vast and valuable set of such resources publicly available, which are continuously being updated. Biomedical ontologies are nowadays a mainstream approach to formalize existing knowledge about entities, such as genes, chemicals, phenotypes, and disorders. These resources contain supplementary information that may not be yet encoded in training data, particularly in domains with limited labeled data.We propose a new model, BO-LSTM, that takes advantage of domain-specific ontologies, by representing each entity as the sequence of its ancestors in the ontology. We implemented BO-LSTM as a recurrent neural network with long short-term memory units and using an open biomedical ontology, which in our case-study was Chemical Entities of Biological Interest (ChEBI). We assessed the performance of BO-LSTM on detecting and classifying drug-drug interactions in a publicly available corpus from an international challenge, composed of 792 drug descriptions and 233 scientific abstracts. By using the domain-specific ontology in addition to word embeddings and WordNet, BO-LSTM improved both the F1-score of the detection and classification of drug-drug interactions, particularly in a document set with a limited number of annotations. Our findings demonstrate that besides the high performance of current deep learning techniques, domain-specific ontologies can still be useful to mitigate the lack of labeled data.Author summaryA high quantity of biomedical information is only available in documents such as scientific articles and patents. Due to the rate at which new documents are produced, we need automatic methods to extract useful information from them. Text mining is a subfield of information retrieval which aims at extracting relevant information from text. Scientific literature is a challenge to text mining because of the complexity and specificity of the topics approached. In recent years, deep learning has obtained promising results in various text mining tasks by exploring large datasets. On the other hand, ontologies provide a detailed and sound representation of a domain and have been developed to diverse biomedical domains. We propose a model that combines deep learning algorithms with biomedical ontologies to identify relations between concepts in text. We demonstrate the potential of this model to extract drug-drug interactions from abstracts and drug descriptions. This model can be applied to other biomedical domains using an annotated corpus of documents and an ontology related to that domain to train a new classifier.


2021 ◽  
Vol 2021 ◽  
pp. 1-15
Author(s):  
Kazi Nabiul Alam ◽  
Md Shakib Khan ◽  
Abdur Rab Dhruba ◽  
Mohammad Monirujjaman Khan ◽  
Jehad F. Al-Amri ◽  
...  

The COVID-19 pandemic has had a devastating effect on many people, creating severe anxiety, fear, and complicated feelings or emotions. After the initiation of vaccinations against coronavirus, people’s feelings have become more diverse and complex. Our aim is to understand and unravel their sentiments in this research using deep learning techniques. Social media is currently the best way to express feelings and emotions, and with the help of Twitter, one can have a better idea of what is trending and going on in people’s minds. Our motivation for this research was to understand the diverse sentiments of people regarding the vaccination process. In this research, the timeline of the collected tweets was from December 21 to July21. The tweets contained information about the most common vaccines available recently from across the world. The sentiments of people regarding vaccines of all sorts were assessed using the natural language processing (NLP) tool, Valence Aware Dictionary for sEntiment Reasoner (VADER). Initializing the polarities of the obtained sentiments into three groups (positive, negative, and neutral) helped us visualize the overall scenario; our findings included 33.96% positive, 17.55% negative, and 48.49% neutral responses. In addition, we included our analysis of the timeline of the tweets in this research, as sentiments fluctuated over time. A recurrent neural network- (RNN-) oriented architecture, including long short-term memory (LSTM) and bidirectional LSTM (Bi-LSTM), was used to assess the performance of the predictive models, with LSTM achieving an accuracy of 90.59% and Bi-LSTM achieving 90.83%. Other performance metrics such as precision,, F1-score, and a confusion matrix were also used to validate our models and findings more effectively. This study improves understanding of the public’s opinion on COVID-19 vaccines and supports the aim of eradicating coronavirus from the world.


2018 ◽  
Vol 7 (3.27) ◽  
pp. 258 ◽  
Author(s):  
Yecheng Yao ◽  
Jungho Yi ◽  
Shengjun Zhai ◽  
Yuwen Lin ◽  
Taekseung Kim ◽  
...  

The decentralization of cryptocurrencies has greatly reduced the level of central control over them, impacting international relations and trade. Further, wide fluctuations in cryptocurrency price indicate an urgent need for an accurate way to forecast this price. This paper proposes a novel method to predict cryptocurrency price by considering various factors such as market cap, volume, circulating supply, and maximum supply based on deep learning techniques such as the recurrent neural network (RNN) and the long short-term memory (LSTM),which are effective learning models for training data, with the LSTM being better at recognizing longer-term associations. The proposed approach is implemented in Python and validated for benchmark datasets. The results verify the applicability of the proposed approach for the accurate prediction of cryptocurrency price.


10.6036/10007 ◽  
2021 ◽  
Vol 96 (5) ◽  
pp. 528-533
Author(s):  
XAVIER LARRIVA NOVO ◽  
MARIO VEGA BARBAS ◽  
VICTOR VILLAGRA ◽  
JULIO BERROCAL

Cybersecurity has stood out in recent years with the aim of protecting information systems. Different methods, techniques and tools have been used to make the most of the existing vulnerabilities in these systems. Therefore, it is essential to develop and improve new technologies, as well as intrusion detection systems that allow detecting possible threats. However, the use of these technologies requires highly qualified cybersecurity personnel to analyze the results and reduce the large number of false positives that these technologies presents in their results. Therefore, this generates the need to research and develop new high-performance cybersecurity systems that allow efficient analysis and resolution of these results. This research presents the application of machine learning techniques to classify real traffic, in order to identify possible attacks. The study has been carried out using machine learning tools applying deep learning algorithms such as multi-layer perceptron and long-short-term-memory. Additionally, this document presents a comparison between the results obtained by applying the aforementioned algorithms and algorithms that are not deep learning, such as: random forest and decision tree. Finally, the results obtained are presented, showing that the long-short-term-memory algorithm is the one that provides the best results in relation to precision and logarithmic loss.


2021 ◽  
Vol 4 (1) ◽  
pp. 121-128
Author(s):  
A Iorliam ◽  
S Agber ◽  
MP Dzungwe ◽  
DK Kwaghtyo ◽  
S Bum

Social media provides opportunities for individuals to anonymously communicate and express hateful feelings and opinions at the comfort of their rooms. This anonymity has become a shield for many individuals or groups who use social media to express deep hatred for other individuals or groups, tribes or race, religion, gender, as well as belief systems. In this study, a comparative analysis is performed using Long Short-Term Memory and Convolutional Neural Network deep learning techniques for Hate Speech classification. This analysis demonstrates that the Long Short-Term Memory classifier achieved an accuracy of 92.47%, while the Convolutional Neural Network classifier achieved an accuracy of 92.74%. These results showed that deep learning techniques can effectively classify hate speech from normal speech.


Author(s):  
Thang

In this research, we propose a method of human robot interactive intention prediction. The proposed algorithm makes use of a OpenPose library and a Long-short term memory deep learning neural network. The neural network observes the human posture in a time series, then predicts the human interactive intention. We train the deep neural network using dataset generated by us. The experimental results show that, our proposed method is able to predict the human robot interactive intention, providing 92% the accuracy on the testing set.


2021 ◽  
Author(s):  
Usha Devi G ◽  
Priyan M K ◽  
Gokulnath Chandra Babu ◽  
Gayathri Karthick

Abstract Twitter sentiment analysis is an automated process of analyzing the text data which determining the opinion or feeling of public tweets from the various fields. For example, in marketing field, political field huge number of tweets is posting with hash tags every moment via internet from one user to another user. This sentiment analysis is a challenging task for the researchers mainly to correct interpretation of context in which certain tweet words are difficult to evaluate what truly is negative and positive statement from the huge corpus of tweet data. This problem violates the integrity of the system and the user reliability can be significantly reduced. In this paper, we identify the each tweet word and we are assigning a meaning into it. The feature work is combined with tweet words, word2vec, stop words and integrated into the deep learning techniques of Convolution neural network model and Long short Term Memory, these algorithms can identify the pattern of stop word counts with its own strategy. Those two models are well trained and applied for IMDB dataset which contains 50,000 movie reviews. With huge amount of twitter data is processed for predicting the sentimental tweets for classification. With the proposed methodology, the samples are experimentally collected from the real-time environment can be discriminated well and the efficacy of the system is improved. The result of Deep Learning algorithms aims to rate the review tweets and also able to identify movie review with testing accuracy as 87.74% and 88.02%.


2017 ◽  
Author(s):  
Brett K. Beaulieu-Jones ◽  
Patryk Orzechowski ◽  
Jason H. Moore

Electronic Health Records (EHRs) contain a wealth of patient data useful to biomedical researchers. At present, both the extraction of data and methods for analyses are frequently designed to work with a single snapshot of a patient’s record. Health care providers often perform and record actions in small batches over time. By extracting these care events, a sequence can be formed providing a trajectory for a patient’s interactions with the health care system. These care events also offer a basic heuristic for the level of attention a patient receives from health care providers. We show that is possible to learn meaningful embeddings from these care events using two deep learning techniques, unsupervised autoencoders and long short-term memory networks. We compare these methods to traditional machine learning methods which require a point in time snapshot to be extracted from an EHR.


Author(s):  
Sujeet Kumar Shukla ◽  
Saurabh Dubey ◽  
Aniket Kumar Pandey ◽  
Vineet Mishra ◽  
Mayank Awasthi ◽  
...  

In this paper, we focus on one of the visual recognition facets of computer vision, i.e. image captioning. This model’s goal is to come up with captions for an image. Using deep learning techniques, image captioning aims to generate captions for an image automatically. Initially, a Convolutional Neural Network is used to detect the objects in the image (InceptionV3). Recurrent Neural Networks (RNN) and Long Short Term Memory (LSTM) with attention mechanism are used to generate a syntactically and semantically correct caption for the image based on the detected objects. In our project, we're working with a traffic sign dataset that has been captioned using the process described above. This model is extremely useful for visually impaired people who need to cross roads safely.


Author(s):  
Dr. A. M. Chandrashekhar

Describing the content of an image has been a fundamental problem of Machine learning that connects computer vision and natural language processing. In recent years, the task of object recognition has advanced at an exceptional rate which in turn has made image captioning that much better and easier. In this paper, we have discussed the usage of image captioning using deep learning for the visually impaired. We have used Convolutional Neural Networks along with Long Short-Term Memory to train and generate captions for images along with a text-to-speech engine which makes the experience of visually impaired users who are browsing the internet much smoother. We discuss how the model was implemented, its different components and modules along with a result analysis conducted on a set of outputs peer reviewed by our colleagues, friends and professors.


Sign in / Sign up

Export Citation Format

Share Document