A Note on Breiman's Random Forest Data Mining Technique and Conventional Cox Modeling of Survival Statistics: The Case of the Phantom “Induct” Covariate in the Ohio State University Kidney Transplant Database

This paper presents a salary prediction system using the job listings from an employment website, in this case Glassdoor.com. A data mining technique is used to generate a model which will scrape number of jobs from the employment website, clean it on the basis of number of factors including the rival companies, revenue and skill required thereby predicting the salary to be expected when applying for a data science job. Techniques like linear regression, lasso regression, random forest regressors are optimised using GridsearchCV to reach the best model. The model can be further extended to build a flask API thus can be deployed on the internet for public usage.

Download Full-text

Predicting Diabetes Disease using Random Forest Tree (Rft) Data Mining Technique

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d1019.1284s519 ◽

2020 ◽

Vol 8 (4S5) ◽

pp. 46-48

Keyword(s):

Data Mining ◽

Random Forest ◽

Blood Sugar ◽

Energy Use ◽

Primary Source ◽

Organ Damage ◽

Forest Tree ◽

Data Mining Technique ◽

Mining Technique ◽

Random Forest Tree

Diabetes is a condition that happens when the blood glucose is too high, also known as blood sugar. The primary source of energy is blood sugar, and it comes from the food you eat. Insulin, a pancreatic hormone, helps food glucose get into the cells for energy use. It also leads for an unrelated condition named, "Diabetes Insipidus”, which entails complications with the processing of fluids in the kidney. Insulin is the key to the ability of the cell to use glucose. Problems with the processing of insulin or how cells perceive insulin can easily cause out of control the body's carefully balanced glucose metabolism process [1]. Diabetes emerges when either of these conditions happens, blood sugar levels rise and crash and the risk of organ damage. Earlier prediction of this diabetes condition could provide proper treatment to protect the people from un avoided illness. For this prediction we can apply data mining which is used predominantly in healthcare organizations for decision making, disease detection purpose. In this paper data have been collected from UCI repositories and the data mining tool (WEKA) is used to predict diabetes. In this database there are 768 instances in which 500 instances belongs to tested negative and 268 instances belongs to tested positive. An experimental study is carried out using data mining technique classification technique called Random Forest Tree (RFT) classifier to predict diabetes. In this research, we have used different cross fold validation to achieve better accuracy and we found that cross fold validation k= 8 gives high accuracy 76.69% while compared with other cross fold validation values.

Download Full-text

Estimating the soil water retention curve: Comparison of multiple nonlinear regression approach and random forest data mining technique

Computers and Electronics in Agriculture ◽

10.1016/j.compag.2020.105502 ◽

2020 ◽

Vol 174 ◽

pp. 105502

Author(s):

M. Rastgou ◽

H. Bayat ◽

M. Mansoorizadeh ◽

Andrew S. Gregory

Keyword(s):

Data Mining ◽

Random Forest ◽

Water Retention ◽

Water Retention Curve ◽

Soil Water Retention Curve ◽

Soil Water Retention ◽

Data Mining Technique ◽

Retention Curve ◽

Mining Technique ◽

Regression Approach

Download Full-text

Modeling knowledge and functional intent for context-aware pragmatic analysis

ACM SIGWEB Newsletter ◽

10.1145/3447879.3447882 ◽

2021 ◽

pp. 1-4

Author(s):

Nikhita Vedula

Keyword(s):

Data Mining ◽

Natural Language ◽

Language Processing ◽

Ohio State University ◽

State University ◽

Research Award ◽

User Intentions ◽

The Ohio State University ◽

Institute Of Technology ◽

The Web

Nikhita Vedula is an Applied Scientist at Amazon Alexa Science. She obtained her PhD in Computer Science and Engineering from the Ohio State University in August 2020, advised by Professor Srinivasan Parthasarathy. She received her bachelor's degree from the National Institute of Technology, Nagpur, India in 2015. Her research interests are at the intersection of data mining, natural language processing and social computing. Over the course of her PhD, her research involved designing efficient and novel machine learning and computational linguistic techniques that extract, interpret and transform the vast, unstructured digital content into structured knowledge representations in diverse contexts. She has worked with researchers from interdisciplinary fields such as emergency response, marketing, sociology and psychology. She performed research internships at Nokia Bell Laboratories, Adobe Research and Amazon Alexa AI. Her work has been published at several top data mining conferences such as the Web Conference, SIGIR, WSDM and ICDM. Her work on detecting user intentions from their natural language interactions won the Best paper award at the Web Conference 2020. She was a recipient of a Graduate Research Award (2020), a Presidential Fellowship (2019) and a University Graduate Fellowship (2015) at the Ohio State University. She was also selected as a Rising Star in EECS (2019).

Download Full-text

Data Mining Approach for Educational Decision Support

EKSAKTA: Journal of Sciences and Data Analysis ◽

10.20885/eksakta.vol2.iss1.art5 ◽

2021 ◽

Vol 2 (1) ◽

pp. 33-44

Author(s):

Sinta Septi Pangastuti ◽

Kartika Fithriasari ◽

Nur Iriawan ◽

Wahyuni Suryaningtyas

Keyword(s):

Data Mining ◽

Random Forest ◽

Classification Accuracy ◽

Storage System ◽

Performance Criteria ◽

Data Mining Technique ◽

Mining Technique ◽

Data Mining Approach ◽

Educational Decision ◽

Boosting Algorithm

data mining techniques in education sector have begun to evolve, along with the development of technology and the amount of data that can be stored in an education database storage system. One of them is a database of Bidikmisi scholarships in Indonesia. The Bidikmisi data used in this study will be classified using classification data mining technique. The technique that used in this study is random forest in combination with boosting algorithm and bagging algorithms. These algorithms also combine with SMOTE algorithm to handling the imbalance class in dataset. Based on the performance criteria G-mean and AUC, the algorithm combines with SMOTE tended to be better. The classification accuracy of each method being more than 90%

Download Full-text

Land-Subsidence Spatial Modeling Using the Random Forest Data-Mining Technique

Spatial Modeling in GIS and R for Earth and Environmental Sciences ◽

10.1016/b978-0-12-815226-3.00006-5 ◽

2019 ◽

pp. 147-159 ◽

Cited By ~ 4

Author(s):

Hamid Reza Pourghasemi ◽

Mohsen Mohseni Saravi

Keyword(s):

Data Mining ◽

Random Forest ◽

Land Subsidence ◽

Spatial Modeling ◽

Data Mining Technique ◽

Mining Technique

Download Full-text

Analysis of classification learning algorithms

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v17.i2.pp1029-1039 ◽

2020 ◽

Vol 17 (2) ◽

pp. 1029

Author(s):

Hana Rashied Esmaeel

Keyword(s):

Data Mining ◽

Random Forest ◽

Software Tool ◽

Average Error ◽

Classification Learning ◽

Data Mining Technique ◽

Mining Technique ◽

Information Engineering ◽

Apply Data Mining Technique ◽

Lower Contact

<p>The paper attempts to apply data mining Technique, Five classification algorithms were used to build data they are (ZeroR, SMO, Naive Bayesian, J48 and Random Forest).The analysis implemented using WEKA (3.8.2) Data mining software tool. The information was collected from college of Information Engineering (COIE) In Al Nahrain University within the variety of form using "Referendum" to estimate the teacher performance; it was store in Excel file CSV format then regenerate to ARFF (Attribute Relation File Format). Many criteria like (Time taken to create models, accuracy and average error) was taken to evaluate the algorithms Random forest and , SMO Predicts higher than alternative algorithms ,since their accuracy is the highest and have lowest average error compared to others ,"The teacher clarification and wanting to be useful to students " was the strongest attribute. Further removing the bad ranked attributes (10, 11, 12, and 14) that have a lower contact on dataset can increase accuracies of algorithms</p>

Download Full-text

Five Decades of Research Experience in Speech Anatomy and Physiology

Perspectives of the ASHA Special Interest Groups ◽

10.1044/persp1.sig5.4 ◽

2016 ◽

Vol 1 (5) ◽

pp. 4-12

Author(s):

David P. Kuehn

Keyword(s):

Ohio State University ◽

State University ◽

Research Experience ◽

University Of Iowa ◽

University Of Illinois ◽

Anatomy And Physiology ◽

The Ohio State University ◽

Speech Science ◽

The University ◽

East Carolina University

This report highlights some of the major developments in the area of speech anatomy and physiology drawing from the author's own research experience during his years at the University of Iowa and the University of Illinois. He has benefited greatly from mentors including Professors James Curtis, Kenneth Moll, and Hughlett Morris at the University of Iowa and Professor Paul Lauterbur at the University of Illinois. Many colleagues have contributed to the author's work, especially Professors Jerald Moon at the University of Iowa, Bradley Sutton at the University of Illinois, Jamie Perry at East Carolina University, and Youkyung Bae at the Ohio State University. The strength of these researchers and their students bodes well for future advances in knowledge in this important area of speech science.

Download Full-text

Cardiology Fellowship Education in the Era of High-density Training, Data Tracking, and Quality Measures

The American Heart Hospital Journal ◽

10.15420/ahhj.2011.9.2.99 ◽

2011 ◽

Vol 9 (2) ◽

pp. 99

Author(s):

Alex J Auseon ◽

Albert J Kolibash ◽

◽

Keyword(s):

Ohio State University ◽

Training Data ◽

State University ◽

Training Requirements ◽

Healthcare Environment ◽

Medical Centers ◽

Work Hour ◽

The Ohio State University ◽

Data Tracking

Background:Educating trainees during cardiology fellowship is a process in constant evolution, with program directors regularly adapting to increasing demands and regulations as they strive to prepare graduates for practice in todays healthcare environment.Methods and Results:In a 10-year follow-up to a previous manuscript regarding fellowship education, we reviewed the literature regarding the most topical issues facing training programs in 2010, describing our approach at The Ohio State University.Conclusion:In the midst of challenges posed by the increasing complexity of training requirements and documentation, work hour restrictions, and the new definitions of quality and safety, we propose methods of curricula revision and collaboration that may serve as an example to other medical centers.

Download Full-text