scholarly journals Predict App Rank on Google Play Using the Random Forest Method

2021 ◽  
Vol 8 (9) ◽  
pp. 436-441
Author(s):  
Abdul Khaliq ◽  
Eko Hariyanto ◽  
Supina Batubara

Application developers and users are the keys to the market impact on application development. In application development, developers need to predict applications in the market accurately, accurate prediction results are very important in showing user ratings that affect the success of an application. Ratings are given by users to judge that the application is good or not. The higher the rating given by the user, it means that the user likes the application and can be a benchmark for other users to download the application. It is undeniable that there are so many apps available on the google play store, it is impossible for users to select one by one app on the google play store. Therefore, a rating prediction system is needed to determine the right application based on the rating given by the user to an application. Predictions will be made using the random forest algorithm as the method used to predict application ratings. This study using the Google Play Store dataset. This dataset has 10840 rows and 13 attributes. The results of this study can be seen from the use of the random forest algorithm with an average accuracy of 93.8%. Keywords: Google Play Store, Rating, Prediction, Random Forest.

10.29007/cfm3 ◽  
2019 ◽  
Author(s):  
Salman Faizi ◽  
Shawon Rahman

Software application development must include implementation of core functionality along with secure coding to contain security vulnerabilities of applications. Considering the life cycle that a software application undergoes, application developers have many opportunities to include security starting from the very first stage of planning or requirement gathering. However, before even starting requirement gathering, the software application development team must select a framework to use for the application’s lifecycle. Based on the application and organizational characteristics, software application developers must select the best-fit framework for the lifecycle. A software application’s functionality and security start with picking the right lifecycle framework.When it comes to application development frameworks, one size does not fit all. Based on the characteristics of the application development organization such as the number of application developers involved, project budget and criticality, and the number of teams, one of the five frameworks will work better than others.Keywords: Software development lifecycle, software functionality, software security, application development, framework security


Author(s):  
Harits Ar Rosyid ◽  
Utomo Pujianto ◽  
Moch Rajendra Yudhistira

There are various ways to improve the quality of someone's education, one of them is reading. By reading, insight and knowledge of various kinds of things can increase. But, the ability and someone's understanding of reading is different. This can be a problem for readers if the reading material exceeds his comprehension ability. Therefore, it is necessary to determine the load of reading material using Lexile Levels. Lexile Levels are a value that gives a size the complexity of reading material and someone's reading ability. Thus, the reading material will be classified based a value on the Lexile Levels. Lexile Levels will cluster the reading material into 2 clusters which is easy, and difficult. The clustering process will use the k-means method. After the clustering process, reading material will be classified using the reading load Random Forest method. The k-means method was chosen because of the method has a simple computing process and fast also. Random Forest algorithm is a method that can build decision tree and it’s able to build several decision trees then choose the best tree. The results of this experiment indicate that the experiment scenario uses 2 cluster and SMOTE and GIFS preprocessing are carried out shows good results with an accuracy of 76.03%, precision of 81.85% and recall of 76.05%.


2021 ◽  
Vol 5 (1) ◽  
pp. 61-69
Author(s):  
Ievgen Nastenko ◽  
Vitaliy Maksymenko ◽  
Sergiy Potashev ◽  
Volodymyr Pavlov ◽  
Vitalii Babenko ◽  
...  

Background. Recent studies show that cardiovascular diseases, including coronary heart disease, are the leading causes of death and one of the main factors of disability worldwide. The detection of cases of this type of disease over the past 30 years has increased from 271 million to 523 million and the number of deaths – from 12.1 million to 18.6 million. Cardiovascular diseases are the main cause of death among the population of Ukraine and, according to this indicator, the country remains one of the world leaders. Coronary heart disease is the leading factor in the loss of health in Ukraine and modern diagnostic methods, including machine learning algorithms, are increasingly being used for timely detection. Objective. According to the data of speckle-tracking echocardiography using the random forest method, construct classification algorithms for diagnosing violations of the kinematics of left ventricular contractions in patients with coronary heart disease at rest, and when using an echostress test with a dobutamine test. Methods. Speckle-tracking echocardiography was used to examine 40 patients with coronary heart disease and 16 in whom no cardiac pathology was found. Echocardiography was recorded in B mode in three positions: along the long axis, in 4-chamber, and 2-chamber positions. In total, 6245 frames of the video stream were used: 1871 – without cardiac abnormalities, and 4374 – in the presence of pathology during the examination. 56 patients (2509 frames of video data) were examined without the use of a dobutamine test and 38 patients (3736 frames of video data) – using an echostress test with a dobutamine test if no disturbances were found at rest. Dobutamine doses of 10, 20, and 40 mcg were administered under the supervision of an anesthesiologist. The data of texture analysis of images were used as informative features. To build an algorithm for detecting coronary heart disease the random forest algorithm was applied. Results. At the first stage of the study, the diagnostic algorithms norma–pathology for the state of rest and dobutamine doses of 10, 20, and 40 mcg were constructed. Before applying the algorithm the samples were randomly divided into training (70%) and test (30%). The classifiers were evaluated for accuracy, sensitivity, and specificity. According to the test samples, the accuracy of diagnostic conclusions varied from 97 to 99%. At the second stage of the study, to increase the versatility of the models, the classifier was built for all images, without dividing them into dobutamine doses. The accuracy for the test samples also ranged from 96.6 to 97.8%. To construct diagnostic algorithms by the random forest method the data of texture analysis of images were used. Conclusions. High-precision classification models were obtained using the random forest algorithm. The developed models can be applied to the analysis of echocardiograms obtained in B mode on equipment that is not equipped with the speckle tracking technology.


Author(s):  
Harits Ar Rosyid ◽  
Utomo Pujianto ◽  
Moch Rajendra Yudhistira

There are various ways to improve the quality of someone's education, one of them is reading. By reading, insight and knowledge of various kinds of things can increase. But, the ability and someone's understanding of reading is different. This can be a problem for readers if the reading material exceeds his comprehension ability. Therefore, it is necessary to determine the load of reading material using Lexile Levels. Lexile Levels are a value that gives a size the complexity of reading material and someone's reading ability. Thus, the reading material will be classified based a value on the Lexile Levels. Lexile Levels will cluster the reading material into 2 clusters which is easy, and difficult. The clustering process will use the k-means method. After the clustering process, reading material will be classified using the reading load Random Forest method. The k-means method was chosen because of the method has a simple computing process and fast also. Random Forest algorithm is a method that can build decision tree and it’s able to build several decision trees then choose the best tree. The results of this experiment indicate that the experiment scenario uses 2 cluster and SMOTE and GIFS preprocessing are carried out shows good results with an accuracy of 76.03%, precision of 81.85% and recall of 76.05%.


2021 ◽  
Vol 5 (2) ◽  
pp. 630
Author(s):  
I Putu Ananda Miarta Utama ◽  
Sri Suryani Prasetyowati ◽  
Yuliant Sibaroni

In the hotel tourism sector, of course, it cannot be separated from the role of social media because tourists tend to share experiences about services and products offered by a hotel, such as adding pictures, reviews, and ratings which will be helpful as references for other tourists, for example on the media online TripAdvisor. However, tourists' many experiences regarding a hotel make some people feel confused in determining the right hotel to visit. Therefore, in this study, an aspect-based analysis of reviews on hotels is carried out, which will make it easier for tourists to determine the right hotel based on the best category aspects. The dataset used is the TripAdvisor Hotel Reviews dataset which is already on the Kaggle website. And has five aspects, namely Room, Location, Cleanliness, Registration, and Service. A review analysis was carried out into positive and negative categories using the Random Forest, SVM, and Naive Bayes based Hybrid Classifier methods to solve this problem. In this study the Hybrid Classifier method gets better accuracy than the classification using one algorithm on multi-aspect data, namely the Hybrid Classifier got an average accuracy 84%, Naïve Bayes got an average accuracy 82.4%, Random Forest got an average accuracy 82.2%, and use SVM got an average accuracy 81%


2020 ◽  
Vol 6 (2) ◽  
pp. 230-239
Author(s):  
Richky Faizal Amir ◽  
Irwan Agus Sobari ◽  
Rousyati Rousyati

Abstract: The dataset of software metrics, in general, are not balanced (Imbalanced). Class imbalance in Dataset can reduce the performance of software defect prediction models, because it tends to produce majority class predictions from minority classes, the dataset used in this study uses the National Aeronautics and Space Administration (NASA) Metrics Data Program (MDP), dataset From Stages Pre-processing proposed the Particle Swarm Optimization (PSO). method to overcome the problem of attributes in the training data and the Random Over Sampling (ROS) Resampling method. to deal with class imbalances. This study proposes that the Random Forest method combined with Adaboost can estimate the level of disability of software through training data. The results of this study indicate that the Resampling + Adaboost + Random Forest algorithm can be used to predict software defects with an average accuracy of 94.70% and a value of AUC 0.939. While the PSO + Random Forest algorithm only has an average accuracy of 89.60% and AUC 0.636 the difference in the accuracy of the two models is 5.10% and AUC 0.303. Statistical tests show that there is a significant influence between the proposed model and the Random Forest model with a p-value (0.036) smaller than the alpha value (0.05), which means there is a significant difference between the two models.Keywords: Imbalanced Class, Resample, Particle Swarm Optimization, Random Forest, Adaboost, Software DefectAbstrak: Dataset dari software matrik secara umum bersifat tidak seimbang (Imbalanced). Ketidak seimbangan kelas yang ada dalam dataset dapat menurunkan kinerja model prediksi cacat software, karena cenderung menghasilkan prediksi kelas mayoritas dari kelas minoritas. Dataset yang digunakan pada penelitian ini menggunakan dataset National Aeronautics and Space Administration (NASA) Metrics Data Program (MDP). Dari tahapan pra pemrosesan diusulkan metode Particle Swarm Optimization (PSO) untuk mengatasi masalah attribute pada data training dan metode Resampling Random Over Sampling (ROS). untuk menangani ketidak seimbangan kelas. Penelitian ini mengusulkan metode Random Forest yang dikombinasikan dengan Adaboost dapat mengestimasi tingkat kecacatan suatu Software melalui data training, Dari Hasil penelitian ini menunjukan bahwa algoritma Resampling+Adaboost+Random Forest dapat digunakan untuk memprediksi cacat software dengan rata-rata akurasi 94,70% dan nilai AUC 0,939. Sementara algoritma PSO+Random Forest hanya memiliki rata-rata akurasi 89,60% dan AUC 0,636 perbedaan akurasi dari kedua model tersebut 5,10% dan AUC 0,303. Uji statistik menunjukan bahwa adanya pengaruh yang signifikan antara model usulan dengan model Random Forest dengan nilai p (0,036) lebih kecil dari nilai alpha (0,05) yang artinya terdapat perbedaan yang siginifkan antara kedua model.Kata kunci: Imbalanced Class, Resample, Particle Swarm Optimization, Random Forest, Adaboost, Kecacatan Software


TEM Journal ◽  
2021 ◽  
pp. 1209-1219
Author(s):  
Nur Widiyasono ◽  
Ida Ayu Dwi Giriantari ◽  
Made Sudarma ◽  
L Linawati

The potential for Cyber-attacks against Internet of Thing (IoT) Infrastructure is enormous as devices run on pre-existing network infrastructure, for example Mirai Malware Attack. Network Forensics investigations require the Random Forest Algorithm which is used to perform classification and detection techniques for the Mirai Malware attack. The trials have been carried out using 5 attack scenarios and device types. The experimental results show that the RF algorithm achieves optimal performance with an average accuracy value of 95.01%, recall 90.82%, F1 Score 93.85% and the best precision value 99.23%. Besides, the Random Forest algorithm is suitable for very large data processing. The contribution of this research is to provide a recommendation that the RF Algorithm can be used to classify and identify Mirai malware attacks on the Internet of Things infrastructure.


Mousaion ◽  
2019 ◽  
Vol 36 (3) ◽  
Author(s):  
Chimango Nyasulu ◽  
Winner Chawinga ◽  
George Chipeta

Governments the world over are increasingly challenging universities to produce human resources with the right skills sets and knowledge required to drive their economies in this twenty-first century. It therefore becomes important for universities to produce graduates that bring tangible and meaningful contributions to the economies. Graduate tracer studies are hailed to be one of the ways in which universities can respond and reposition themselves to the actual needs of the industry. It is against this background that this study was conducted to establish the relevance of the Department of Information and Communication Technology at Mzuzu University to the Malawian economy by systematically investigating occupations of its former students after graduating from the University. The study adopted a quantitative design by distributing an online-based questionnaire with predominantly closed-ended questions. The study focused on three key objectives: to identify key employing sectors of ICT graduates, to gauge the relevance of the ICT programme to its former students’ jobs and businesses, and to establish the level of satisfaction of the ICT curriculum from the perspectives of former ICT graduates. The key findings from the study are that the ICT programme is relevant to the industry. However, some respondents were of the view that the curriculum should be strengthened by revising it through an addition of courses such as Mobile Application Development, Machine Learning, Natural Language Processing, Data Mining, and LINUX Administration to keep abreast with the ever-changing ICT trends and job requirements. The study strongly recommends the need for regular reviews of the curriculum so that it is continually responding to and matches the needs of the industry.


2020 ◽  
Vol 27 (6) ◽  
pp. 37-55
Author(s):  
E. V. Zarova ◽  
E. I. Dubravskaya

The topic of quantitative research on informal employment has a consistently high relevance both in the Russian Federation and in other countries due to its high dependence on cyclicality and crisis stages in economic dynamics of countries with any level of economic development. Developing effective government policy measures to overcome the negative impact of informal employment requires special attention in theoretical and applied research to assessing the factors and conditions of informal employment in the Russian Federation including at the regional level. Such effects of informal employment as a shortfall in taxes, potential losses in production efficiency, and negative social consequences are a concern for the authorities of the federal and regional levels. Development of quantitative indicators to determine the level of informal employment in the regions, taking into account their specifics in the general spatial and economic system of Russia are necessary to overcome these negative effects. The article proposes and tests methods for solving the problem of assessing the impact of hierarchical relationships on macroeconomic factors at the regional level of informal employment in constituent entities of the Russian Federation. Majority of the works on the study of informal employment are based on basic statistical methods of spatial-dynamic analysis, as well as on the now «traditional» methods of cluster and correlation-regression analysis. Without diminishing the merits of these methods, it should be noted that they are somewhat limited in identifying hidden structural connections and interdependencies in such a complex multidimensional phenomenon as informal employment. In order to substantiate the possibility of overcoming these limitations, the article proposes indicators of regional statistics that directly and indirectly characterize informal employment and also presents the possibilities of using the «random forest» method to identify groups of constituent entities of the Russian Federation that have similar macroeconomic factors of informal employment. The novelty of this method in terms of research objectives is that it allows one to assess the impact of macroeconomic indicators of regional development on the level of informal employment, taking into account the implicit, not predetermined by the initial hypotheses, hierarchical relationships of factor indicators. Based on the generalization of the studies presented in the literature, as well as the authors’ statistical calculations using Rosstat data, the authors came to the conclusion about the high importance of macroeconomic parameters of regional development and systemic relationships of macroeconomic indicators in substantiating the differentiation of the informal level across the constituent entities of the Russian Federation.


Sign in / Sign up

Export Citation Format

Share Document