scholarly journals Selection of Important Features for Optimizing Crop Yield Prediction

Author(s):  
Maya Gopal P S ◽  
Bhargavi R

In agriculture, crop yield prediction is critical. Crop yield depends on various features including geographic, climate and biological. This research article discusses five Feature Selection (FS) algorithms namely Sequential Forward FS, Sequential Backward Elimination FS, Correlation based FS, Random Forest Variable Importance and the Variance Inflation Factor algorithm for feature selection. Data used for the analysis was drawn from secondary sources of the Tamil Nadu state Agriculture Department for a period of 30 years. 75% of data was used for training and 25% data was used for testing. The performance of the feature selection algorithms are evaluated by Multiple Linear Regression. RMSE, MAE, R and RRMSE metrics are calculated for the feature selection algorithms. The adjusted R2 was used to find the optimum feature subset. Also, the time complexity of the algorithms was considered for the computation. The selected features are applied to Multilinear regression, Artificial Neural Network and M5Prime. MLR gives 85% of accuracy by using the features which are selected by SFFS algorithm.

2019 ◽  
Vol 35 (1) ◽  
pp. 9-14 ◽  
Author(s):  
P. S. Maya Gopal ◽  
R Bhargavi

Abstract. In agriculture, crop yield prediction is critical. Crop yield depends on various features which can be categorized as geographical, climatic, and biological. Geographical features consist of cultivable land in hectares, canal length to cover the cultivable land, number of tanks and tube wells available for irrigation. Climatic features consist of rainfall, temperature, and radiation. Biological features consist of seeds, minerals, and nutrients. In total, 15 features were considered for this study to understand features impact on paddy crop yield for all seasons of each year. For selecting vital features, five filter and wrapper approaches were applied. For predicting accuracy of features selection algorithm, Multiple Linear Regression (MLR) model was used. The RMSE, MAE, R, and RRMSE metrics were used to evaluate the performance of feature selection algorithms. Data used for the analysis was drawn from secondary sources of state Agriculture Department, Government of Tamil Nadu, India, for over 30 years. Seventy-five percent of data was used for training and 25% was used for testing. Low computational time was also considered for the selection of best feature subset. Outcome of all feature selection algorithms have given similar results in the RMSE, RRMSE, R, and MAE values. The adjusted R2 value was used to find the optimum feature subset despite all the deviations. The evaluation of the dataset used in this work shows that total area of cultivation, number of tanks and open wells used for irrigation, length of canals used for irrigation, and average maximum temperature during the season of the crop are the best features for better crop yield prediction on the study area. The MLR gives 85% of model accuracy for the selected features with low computational time. Keywords: Feature selection algorithm, Model validation, Multiple linear regression, Performance metrics.


Author(s):  
T. Thurkkaivel ◽  
G. A. Dheebakaran ◽  
V. Geethalakshmi ◽  
S. G. Patil ◽  
K. Bhuvaneshwari

Advance knowledge of harvestable products, especially essential food crops such as rice, wheat, maize, and pulses, would allow policymakers and traders to plan procurement, processing, pricing, marketing, and related infrastructure and procedures. There are many statistical models are being used for the yield prediction with different weather parameter combinations. The performance of these models are dependent on the location’s weather input and its accuracy. In this context, a study was conducted at Agro Climate Research Centre, Tamil Nadu Agricultural University, Coimbatore during Kharif (2020) season to compare the performance of four multivariate weather-based models viz., SMLR, LASSO, ENET and Bayesian models for the rice yield prediction at Tanjore district of Tamil Nadu State with Tmax, Tmin, Mean RH, WS, SSH, EVP and RF.  The results indicated that the R2, RMSE, and nRMSE values of the above models were ranged between 0.54 to 0.79 per cent, 149 to 398 kg/ha, 4.0 to 10.6 per cent, respectively. The study concluded that the Bayesian model was found to be more reliable followed by LASSO and ENET. In addition, it was found that the Bayesian model could perform better even with limited weather parameters and detention of wind speed, sunshine hours and evaporation data would not affect the model performance. It is concluded that Bayesian model may be a better option for rice yield forecasting in Thanjavur districts of Tamil Nadu.


2020 ◽  
Vol 8 (5) ◽  
pp. 3516-3520

The main objective of this research is to predict crop yields based on cultivation area, Rainfall and maximum and minimum temperature data. It will help our Indian farmers to predict crop yielding according to the environment conditions. Nowadays, Machine learning based crop yield prediction is very popular than the traditional models because of its accuracy. In this paper, linear regression, Support Vector Regression, Decision Tree and Random forest is compared with XG Boost algorithm. The above mentioned algorithms are compared based on R2 , Minimum Square Error and Minimum Absolute Error. The dataset is prepared from the data.gov.in site for the year from 2000 to 2014. The data for 4 south Indian states Andhra Pradesh, Karnataka, Tamil Nadu and Kerala data alone is taken since all these states has same climatic conditions. The proposed model in this paper based on XG Boost is showing much better results than other models. In XG Boost R2 is 0.9391 which is the best when compared with other models.


2015 ◽  
Vol 2015 ◽  
pp. 1-5 ◽  
Author(s):  
Manli Zhou ◽  
Youxi Luo ◽  
Guoquan Sun ◽  
Guoqin Mai ◽  
Fengfeng Zhou

Efficient and intuitive characterization of biological big data is becoming a major challenge for modern bio-OMIC based scientists. Interactive visualization and exploration of big data is proven to be one of the successful solutions. Most of the existing feature selection algorithms do not allow the interactive inputs from users in the optimizing process of feature selection. This study investigates this question as fixing a few user-input features in the finally selected feature subset and formulates these user-input features as constraints for a programming model. The proposed algorithm, fsCoP (feature selection based on constrained programming), performs well similar to or much better than the existing feature selection algorithms, even with the constraints from both literature and the existing algorithms. An fsCoP biomarker may be intriguing for further wet lab validation, since it satisfies both the classification optimization function and the biomedical knowledge. fsCoP may also be used for the interactive exploration of bio-OMIC big data by interactively adding user-defined constraints for modeling.


2021 ◽  
pp. 1-19
Author(s):  
Lulu Li

Set-valued data is a significant kind of data, such as data obtained from different search engines, market data, patients’ symptoms and behaviours. An information system (IS) based on incomplete set-valued data is called an incomplete set-valued information system (ISVIS), which generalized model of a single-valued incomplete information system. This paper gives feature selection for an ISVIS by means of uncertainty measurement. Firstly, the similarity degree between two information values on a given feature of an ISVIS is proposed. Then, the tolerance relation on the object set with respect to a given feature subset in an ISVIS is obtained. Next, λ-reduction in an ISVIS is presented. What’s more, connections between the proposed feature selection and uncertainty measurement are exhibited. Lastly, feature selection algorithms based on λ-discernibility matrix, λ-information granulation, λ-information entropy and λ-significance in an ISVIS are provided. In order to better prove the practical significance of the provided algorithms, a numerical experiment is carried out, and experiment results show the number of features and average size of features by each feature selection algorithm.


Genes ◽  
2021 ◽  
Vol 12 (11) ◽  
pp. 1814
Author(s):  
Yuanyuan Han ◽  
Lan Huang ◽  
Fengfeng Zhou

Biological omics data such as transcriptomes and methylomes have the inherent “large p small n” paradigm, i.e., the number of features is much larger than that of the samples. A feature selection (FS) algorithm selects a subset of the transcriptomic or methylomic biomarkers in order to build a better prediction model. The hidden patterns in the FS solution space make it challenging to achieve a feature subset with satisfying prediction performances. Swarm intelligence (SI) algorithms mimic the target searching behaviors of various animals and have demonstrated promising capabilities in selecting features with good machine learning performances. Our study revealed that different SI-based feature selection algorithms contributed complementary searching capabilities in the FS solution space, and their collaboration generated a better feature subset than the individual SI feature selection algorithms. Nine SI-based feature selection algorithms were integrated to vote for the selected features, which were further refined by the dynamic recursive feature elimination framework. In most cases, the proposed Zoo algorithm outperformed the existing feature selection algorithms on transcriptomics and methylomics datasets.


2012 ◽  
Vol 546-547 ◽  
pp. 1538-1543 ◽  
Author(s):  
Chao Chen ◽  
Hao Dong Zhu

In order to enhance the operating speed and reduce the occupied memory space and filter out irrelevant or lower degree of features, feature selection algorithms must be used. However, most of existing feature selection methods are serial and are inefficient timely to be applied to massive text data sets, so it is a hotspot how to improve efficiency of feature selection by means of parallel thinking. This paper presented a feature selection method based on Parallel Binary Immune Quantum-Behaved Particle Swarm Optimization (PBIQPSO). The presented method uses the Binary Immune Quantum-Behaved Particle Swarm Optimization to select feature subset, takes advantage of multiple computing nodes to enhance time efficiency, so can acquire quickly the feature subsets which are more representative. Experimental results show that the method is effective.


2013 ◽  
Vol 380-384 ◽  
pp. 1593-1599
Author(s):  
Hao Yan Guo ◽  
Da Zheng Wang

The traditional motivation behind feature selection algorithms is to find the best subset of features for a task using one particular learning algorithm. However, it has been often found that no single classifier is entirely satisfactory for a particular task. Therefore, how to further improve the performance of these single systems on the basis of the previous optimal feature subset is a very important issue.We investigate the notion of optimal feature selection and present a practical feature selection approach that is based on an optimal feature subset of a single CAD system, which is referred to as a multilevel optimal feature selection method (MOFS) in this paper. Through MOFS, we select the different optimal feature subsets in order to eliminate features that are redundant or irrelevant and obtain optimal features.


Sign in / Sign up

Export Citation Format

Share Document