scholarly journals Movie Success Prediction

2019 ◽  
Vol 8 (3) ◽  
pp. 5659-5663

The film business is a billion-dollar business, and extensive measure of data identified with motion pictures is accessible over the web. In this system we are analyzing the dataset for predicting the success of the movies. For doing this the analysis of the dataset is done in which the chronicled information of every segment, for example, actor, actress, director, music that impacts the achievement or disappointment of a motion picture is given weight age and after that dependent on different parameters we are predicting whether the movie will be a flop, average or superhit. Certain algorithms are used that can help to predict whether the movies will be a flop, average, or superhit. In this model we focus on the attribute selection for predicting success of the movies. A comparative analysis is to be performed so as to find the accurate results among the algorithms used. Few parameters that are important for predicting success of a movie are gross, genres, release date, star powers of actors, actress, directors, and budget etc. In the dataset there are 28 parameters. The task is to find out most relevant parameters. This will be achieved by Feature selection method as shown in figure 1. Feature selection method is present in “sklearn” library of python. Feature selection method includes Decision trees, information gain, gain ratio. Generating heatmap to visualize success of movie in different regions. Various graphs are generated between time vs algorithms and accuracy vs algorithms for analysis.

2016 ◽  
Vol 13 (10) ◽  
pp. 6885-6891 ◽  
Author(s):  
Amarnath B ◽  
S. Appavu alias Balamurugan

A new feature selection method based on Inductive probability is proposed in this paper. The main idea is to find the dependent attributes and remove the redundant ones among them. The technology to obtain the dependency needed is based on Inductive probability approach. The purpose of the proposed method is to reduce the computational complexity and increase the classification accuracy of the selected feature subsets. The dependence between two attributes is determined based on the probabilities of their joint values that contribute to positive and negative classification decisions. If there is an opposing set of attribute values that do not lead to opposing classification decisions (zero probability), the two attributes are considered independent, otherwise dependent. One of them can be removed and thus the number of attributes is reduced. A new attribute selection algorithm with Inductive probability is implemented and evaluated through extensive experiments, comparing with related attribute selection algorithms over eight datasets such as Molecular Biology, Connect4, Soybean, Zoo, Ballon, Mushroom, Lenses and Fictional from UCI Machine Learning Repository databases.


2010 ◽  
Vol 9 ◽  
pp. CIN.S3794 ◽  
Author(s):  
Xiaosheng Wang ◽  
Osamu Gotoh

Gene selection is of vital importance in molecular classification of cancer using high-dimensional gene expression data. Because of the distinct characteristics inherent to specific cancerous gene expression profiles, developing flexible and robust feature selection methods is extremely crucial. We investigated the properties of one feature selection approach proposed in our previous work, which was the generalization of the feature selection method based on the depended degree of attribute in rough sets. We compared the feature selection method with the established methods: the depended degree, chi-square, information gain, Relief-F and symmetric uncertainty, and analyzed its properties through a series of classification experiments. The results revealed that our method was superior to the canonical depended degree of attribute based method in robustness and applicability. Moreover, the method was comparable to the other four commonly used methods. More importantly, the method can exhibit the inherent classification difficulty with respect to different gene expression datasets, indicating the inherent biology of specific cancers.


Author(s):  
GULDEN UCHYIGIT ◽  
KEITH CLARK

Text classification is the problem of classifying a set of documents into a pre-defined set of classes. A major problem with text classification problems is the high dimensionality of the feature space. Only a small subset of these words are feature words which can be used in determining a document's class, while the rest adds noise and can make the results unreliable and significantly increase computational time. A common approach in dealing with this problem is feature selection where the number of words in the feature space are significantly reduced. In this paper we present the experiments of a comparative study of feature selection methods used for text classification. Ten feature selection methods were evaluated in this study including the new feature selection method, called the GU metric. The other feature selection methods evaluated in this study are: Chi-Squared (χ2) statistic, NGL coefficient, GSS coefficient, Mutual Information, Information Gain, Odds Ratio, Term Frequency, Fisher Criterion, BSS/WSS coefficient. The experimental evaluations show that the GU metric obtained the best F1 and F2 scores. The experiments were performed on the 20 Newsgroups data sets with the Naive Bayesian Probabilistic Classifier.


IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 151525-151538 ◽  
Author(s):  
Xinzheng Wang ◽  
Bing Guo ◽  
Yan Shen ◽  
Chimin Zhou ◽  
Xuliang Duan

Sign in / Sign up

Export Citation Format

Share Document