Covid-19 Pandemic: Data Analysis and Forecasting using Machine Learning Algorithms (Preprint)
BACKGROUND India reported its first Covid-19 case on 30th Jan 2020 with no practically no significant rise noticed in the number of cases in the month of February but March2020 onwards there has been a huge escalation as has been the case with like many other countries the world over. This research paper analyses COVID -19 data initially at a global level and then drills down to the scenario obtained in India. Data is gathered from multiple data sources- several authentic government websites. Variables such as gender, geographical location, age etc. have been represented using Python and Data Visualization techniques. Getting insights on Trend pattern and time series analysis will bring more clarity to the current scenario as analysis is totally on real-time data(till 19th June). Time Series Analysis and other pattern-recognition techniques are deployed to bring more clarity to the current scenario as analysis is totally based on real-time data(till 19th June,2020) Finally we will use some machine learning algorithms and perform predictive analytics for the near future scenario. We are using a sigmoid model to give an estimate of the day on which we can expect the number of active cases to reach its peak and also when the curve will start to flatten. Strength of Sigmoid model lies in providing a count of date –this is unique feature of analysis in this paper. We are also using certain feature engineering techniques to transfer data into logarithmic scale for better comparison removing any data extremities or outliers. Certain feature engineering techniques have been used to transfer data into logarithmic scale as is affords better comparison removing any data extremities or outliers. Based on the predictions of the short-term interval, our model can be tuned to forecast long time intervals. Needless to mention there are a lot of factors responsible for the cases to come in the upcoming days. One factor being extent of adherence to the rules and restriction imposed by the Government by the citizens of the country. OBJECTIVE Prediction of the number of positive covid cases in the next few months . METHODS Machine Learning Model - Clustering Sigmoid Model RESULTS The model predicts maximum active cases at 258846. The curve flattens by day 154 i.e. 25th September and after that the curve goes down and the number of active cases eventually will decrease. CONCLUSIONS There are a lot of research works going on with respect to vaccines, economic dealings, precautions and reduction of Covid-19 cases. However currently we are at a mid-Covid situation. India along with many other countries are still witnessing upsurge in the number of cases at alarming rates on a daily basis. We have not yet reached the peak. Therefore cuff learning and downward growth are also yet to happen. Each day comes out with fresh information and large amount of data. Also there are many other predictive models using machine learning that beyond the scope of this paper. However at the end of the day it is only the precautionary measures we as responsible citizens can take that will help to flatten the curve. We can all join hands together and maintain all rules and regulations strictly. Maintaining social distancing, taking the lockdown seriously is the only key. This study is based on real time data and will be useful for certain key stakeholders like government officials, healthcare workers to prepare a combat plan along with stringent measures. Also the study will help mathematicians and statisticians to predict outbreak numbers more accurately.