International Journal of Knowledge Discovery in Bioinformatics
Latest Publications


TOTAL DOCUMENTS

99
(FIVE YEARS 0)

H-INDEX

8
(FIVE YEARS 0)

Published By Igi Global

1947-9123, 1947-9115

Author(s):  
Priya Deshpande ◽  
Alexander Rasin ◽  
Eli T Brown ◽  
Jacob Furst ◽  
Steven M. Montner ◽  
...  

Teaching files are widely used by radiologists in the diagnostic process and for student education. Most hospitals maintain an active collection of teaching files for internal purposes, but many teaching files are also publicly available online, some linked to secondary sources. However, public sources offer very limited (and ad-hoc) search capabilities. Based on the previous work on data integration and text-based search, the authors extended their Integrated Radiology Image Search (IRIS 1.1) engine with a new medical ontology, SNOMED CT, and the ICD10 dictionary. IRIS 1.1 integrates public data sources and applies query expansion with exact and partial matches to find relevant teaching files. Using a set of 28 representative queries from multiple sources, the search engine finds more relevant teaching cases versus other publicly available search engines.


Author(s):  
Samah Jamal Fodeh ◽  
Edwin D. Boudreaux ◽  
Rixin Wang ◽  
Dennis Silva ◽  
Robert Bossarte ◽  
...  

While many studies have explored the use of social media and behavioral changes of individuals, few examined the utility of using social media for suicide detection and prevention. The study by Jashinsky et al. identified specific language patterns associated with a set of twelve suicide risk factors. The authors extended these methods to assess the significance of the language used on Twitter for suicide detection. This article quantifies the use of Twitter to express suicide related language, and its potential to detect users at high risk of suicide. The authors searched Twitter for tweets indicative of 12 suicide risk factors. This paper divided Twitter users into two groups: “high risk” and “at risk” based on two of the risk factors (“self-harm” and “prior suicide attempts”) and examined language patterns by computing co-occurrences of terms in tweets which helped identify relationships between suicide risk factors in both groups.


Author(s):  
Alvaro J Riascos ◽  
Natalia Serna

Health-care systems that rely on hospitalization for early patient treatment pose a financial concern for governments. In this article, the author suggests a hospitalization prevention program in which the decision of whether to intervene on a patient depends on a simple decision model and the prediction of the patient risk of an annual length-of-stay using machine learning techniques. These results show that the prevention program achieves significant cost savings relative to several base scenarios for program efficacies greater than or equal to 40% and intervention costs per patient of 100,000 to 700,000 Colombian pesos (i.e., approximately 14% to 100% of the average cost per patient in Colombia statuary health care system). This article also shows how tree-based methods outperform linear regressions when predicting an annual length-of-stay and the final model achieves a lower out-of-sample error compared to those of the Heritage Health Prize.


Author(s):  
Deepali Virmani ◽  
Nikita Jain ◽  
Ketan Parikh ◽  
Shefali Upadhyaya ◽  
Abhishek Srivastav

This article describes how data is relevant and if it can be organized, linked with other data and grouped into a cluster. Clustering is the process of organizing a given set of objects into a set of disjoint groups called clusters. There are a number of clustering algorithms like k-means, k-medoids, normalized k-means, etc. So, the focus remains on efficiency and accuracy of algorithms. The focus is also on the time it takes for clustering and reducing overlapping between clusters. K-means is one of the simplest unsupervised learning algorithms that solves the well-known clustering problem. The k-means algorithm partitions data into K clusters and the centroids are randomly chosen resulting numeric values prohibits it from being used to cluster real world data containing categorical values. Poor selection of initial centroids can result in poor clustering. This article deals with a proposed algorithm which is a variant of k-means with some modifications resulting in better clustering, reduced overlapping and lesser time required for clustering by selecting initial centres in k-means and normalizing the data.


Author(s):  
Sumitra Kisan ◽  
Sarojananda Mishra ◽  
Ajay Chawda ◽  
Sanjay Nayak

This article describes how the term fractal dimension (FD) plays a vital role in fractal geometry. It is a degree that distinguishes the complexity and the irregularity of fractals, denoting the amount of space filled up. There are many procedures to evaluate the dimension for fractal surfaces, like box count, differential box count, and the improved differential box count method. These methods are basically used for grey scale images. The authors' objective in this article is to estimate the fractal dimension of color images using different color models. The authors have proposed a novel method for the estimation in CMY and HSV color spaces. In order to achieve the result, they performed test operation by taking number of color images in RGB color space. The authors have presented their experimental results and discussed the issues that characterize the approach. At the end, the authors have concluded the article with the analysis of calculated FDs for images with different color space.


Author(s):  
Gauri Jain ◽  
Manisha Sharma ◽  
Basant Agarwal

This article describes how spam detection in the social media text is becoming increasing important because of the exponential increase in the spam volume over the network. It is challenging, especially in case of text within the limited number of characters. Effective spam detection requires more number of efficient features to be learned. In the current article, the use of a deep learning technology known as a convolutional neural network (CNN) is proposed for spam detection with an added semantic layer on the top of it. The resultant model is known as a semantic convolutional neural network (SCNN). A semantic layer is composed of training the random word vectors with the help of Word2vec to get the semantically enriched word embedding. WordNet and ConceptNet are used to find the word similar to a given word, in case it is missing in the word2vec. The architecture is evaluated on two corpora: SMS Spam dataset (UCI repository) and Twitter dataset (Tweets scrapped from public live tweets). The authors' approach outperforms the-state-of-the-art results with 98.65% accuracy on SMS spam dataset and 94.40% accuracy on Twitter dataset.


Author(s):  
Saeed Rouhani ◽  
Maryam MirSharif

In this article, the authors proposed the method of medical diagnosis in gestational diabetes mellitus (GDM) in the initial stages of pregnancy to facilitate diagnoses and prevent the affection. Nowadays, in industrial modern world with changing lifestyle alimental manner the incidence of complex disease has been increasingly grown. GDM is a chronic disease and one of the major health problems that is often diagnosed in middle or late period of pregnancy, when it is too late for prediction. If it is not treated, it will make serious complications and various side effects for mother and child. This article is designed for answering to the question of: “What is the best approach in timely and accurate prediction of GDM?” Thus, the artificial neural network and decision tree are proposed to reduce the amount of error and the level of accuracy in anticipating and improving the precision of prediction. The results illustrate that intelligent diagnosis systems can improve the quality of healthcare, timely prediction, prevention, and knowledge discovery in bioinformatics.


Author(s):  
Amit Kumar ◽  
Bikash Kanti Sarkar

This article describes how for the last few decades, data mining research has had significant progress in a wide spectrum of applications. Research in prediction of multi-domain data sets is a challenging task due to the imbalanced, voluminous, conflicting, and complex nature of data sets. A learning algorithm is the most important technique for solving these problems. The learning algorithms are widely used for classification purposes. But choosing the learners that perform best for data sets of particular domains is a challenging task in data mining. This article provides a comparative performance assessment of various state-of-the-art learning algorithms over multi-domain data sets to search the effective classifier(s) for a particular domain, e.g., artificial, natural, semi-natural, etc. In the present article, a total of 14 real world data sets are selected from University of California, Irvine (UCI) machine learning repository for conducting experiments using three competent individual learners and their hybrid combinations.


Author(s):  
Mohammad Ahsan ◽  
Madhu Kumari ◽  
Tajinder Singh ◽  
Triveni Lal Pal

This article describes how social media has emerged as a main vehicle of information diffusion among people. They often share their experience, feelings and knowledge through these channels. Some pieces of information quickly reach a large number of people, while others not. The authors analyzed this variation by collecting tweets on 2016 U.S. presidential election. This article gives a comprehensive understanding of how sentiment encoded in the textual contents can affects the information diffusion, along with the effect of content features, i.e., URLs, hashtags, and contextual features, i.e., number of followers, followees, tweets generated by the user so far, account age, tweet age. In order to explore the relationship between sentiment content and information diffusion, the authors first checked the features' significance as an indicator of diffusibility by using random forests. Finally, support vectors and k-Neighbors regression models are used to capture the complete dynamics of information diffusion. Experiments and results clearly reveal that sentiment prominently helps in making a better prediction of information diffusion.


Author(s):  
Libi Hertzberg ◽  
Assif Yitzhaky ◽  
Metsada Pasmanik-Chor

This article describes how the last decade has been characterized by the production of huge amounts of different types of biological data. Following that, a flood of bioinformatics tools have been published. However, many of these tools are commercial, or require computational skills. In addition, not all tools provide intuitive and highly accessible visualization of the results. The authors have developed GEView (Gene Expression View), which is a free, user-friendly tool harboring several existing algorithms and statistical methods for the analysis of high-throughput gene, microRNA or protein expression data. It can be used to perform basic analysis such as quality control, outlier detection, batch correction and differential expression analysis, through a single intuitive graphical user interface. GEView is unique in its simplicity and highly accessible visualization it provides. Together with its basic and intuitive functionality it allows Bio-Medical scientists with no computational skills to independently analyze and visualize high-throughput data produced in their own labs.


Sign in / Sign up

Export Citation Format

Share Document