scholarly journals Prediction Modeling of Household’s Preparedness of Natural Hazards Mitigation

Author(s):  
Chen Xia ◽  
Yuqing Hu

Natural disasters are showing an increase in the magnitude, frequency, and geographic distribution. Studies have shown that individuals’ self-sufficiency, which largely depends on household preparedness, is very important for hazard mitigation in at least the first 72 hours following a disaster. However, for factors that influence a household’s disaster preparedness, though there are many studies trying to identify from different aspects, we still lack an integrative analysis on how these factors contribute to a household’s preparation. This paper aims to build a classification model to predict whether a household has prepared for a potential disaster based on their personal characteristics and the environment they located. We collect data from the Federal Emergency Management Agency’s National Household Survey in 2018 and train four classification models - logistic regression, decision trees, support vector machines, and multi-layer perceptron classifier models- to predict the impact of personal characteristics and the environment they located on household prepare for the potential natural disaster. Results show that the multi-layer perceptron classifier model outperforms others with the highest scoring on both recall (0.8531) and f1 measure (0.7386). In addition, feature selection results also show that among other factors, a household’s accessibility to disaster-related information is the most critical factor that impacts household disaster preparation. Though there is still room for further parameter optimization, the model gives a clue that we could support disaster management by gathering publicly accessible data.

Author(s):  
Chen Xia ◽  
Yuqing Hu

Natural disasters are showing an increase in the magnitude, frequency, and geographic distribution. Studies have shown that individuals’ self-sufficiency, which largely depends on household preparedness, is very important for hazard mitigation in at least the first 72 hours following a disaster. However, for factors that influence a household’s disaster preparedness, though there are many studies trying to identify from different aspects, we still lack an integrative analysis on how these factors contribute to a household’s preparation. This paper aims to build a classification model to predict whether a household has prepared for a potential disaster based on their personal characteristics and the environment they located. We collect data from the Federal Emergency Management Agency’s National Household Survey in 2018 and train four classification models - logistic regression, decision trees, support vector machines, and multi-layer perceptron classifier models- to predict the impact of personal characteristics and the environment they located on household prepare for the potential natural disaster. Results show that the multi-layer perceptron classifier model outperforms others with the highest scoring on both recall (0.8531) and f1 measure (0.8531). In addition, feature selection results also show that among other factors, a household’s accessibility to disaster-related information is the most critical factor that impacts household disaster preparation. Though there is still room for further parameter optimization, the model gives a clue that we could support disaster management by gathering publicly accessible data.


Author(s):  
S. Boeke ◽  
M. J. C. van den Homberg ◽  
A. Teklesadik ◽  
J. L. D. Fabila ◽  
D. Riquet ◽  
...  

Abstract. Reliable predictions of the impact of natural hazards turning into a disaster is important for better targeting humanitarian response as well as for triggering early action. Open data and machine learning can be used to predict loss and damage to the houses and livelihoods of affected people. This research focuses on agricultural loss, more specifically rice loss in the Philippines due to typhoons. Regression and binary classification algorithms are trained using feature selection methods to find the most important explanatory features. Both geographical data from every province, and typhoon specific features of 11 historical typhoons are used as input. The percentage of lost rice area is considered as the output, with an average value of 7.1%. As for the regression task, the support vector regressor performed best with a Mean Absolute Error of 6.83 percentage points. For the classification model, thresholds of 20%, 30% and 40% are tested in order to find the best performing model. These thresholds represent different levels of lost rice fields for triggering anticipatory action towards farmers. The binary classifiers are trained to increase its ability to rightly predict the positive samples. In all three cases, the support vector classifier performed the best with a recall score of 88%, 75% and 81.82%, respectively. However, the precision score for each of these models was low: 17.05%, 14.46% and 10.84%, respectively. For both the support vector regressor and classifier, of all 14 available input features, only wind speed was selected as explanatory feature. Yet, for the other algorithms that were trained in this study, other sets of features were selected depending also on the hyperparameter settings. This variation in selected feature sets as well as the imprecise predictions were consequences of the small dataset that was used for this study. It is therefore important that data for more typhoons as well as data on other explanatory variables are gathered in order to make more robust and accurate predictions. Also, if loss data becomes available on municipality-level, rather than province-level, the models will become more accurate and valuable for operationalization.


Telematika ◽  
2018 ◽  
Vol 15 (1) ◽  
pp. 77
Author(s):  
Resky Rayvano Moningka ◽  
Djoko Budiyanto Setyohadi ◽  
Khaerunnisa Khaerunnisa ◽  
Pranowo Pranowo

AbstractMount Merapi Eruption in 2010 was the biggest after 1872. The impact of this eruption was felt by people who lived around the areas which were affected by this Merapi Eruption. Thus, disaster management was done. One of the disaster management was the fulfillment of basic needs. This research aims to collect public opinion against the fulfillment of basic needs in the shelters after Merapi Eruption based on Twitter data. The algorithm which is used in this research is Support Vector Machine to develop classification model over the data that has been collected. The expected result from this study is to know the basic needs in a shelter. The accuracy gained by performing Cross Validation for 10 folds from Support Vector Machine is 87.96% and Maximum Entropy is 87.45%. Keywords: twitter, sentiment analisis, merapi eruption, support vector machine AbstrakErupsi Gunung Merapi 2010 merupakan yang terbesar setelah tahun 1872. Dampak dari Erupsi Gunung Merapi dirasakan oleh masyarakat yang tinggal di daerah terdampak Erupsi Merapi. Oleh sebab itu dilakukan penanggulangan Bencana. salah satu penanggulangan bencana adalah pemenuhan kebutuhan dasar. Penelitian ini bertujuan untuk mengumpulkan opini publik terhadap pemenuhan kebutuhan dasar di tempat pengungsian pasca erupsi merapi berdasarkan data Twitter. Algoritma yang digunakan dalam penelitian ini adalah Support Vector Machine untuk membangun model klasifikasi atas data yang sudah dikumpulkan.   Hasil yang diharapkan dari penelitian ini adalah mengetahui kebutuhan dasar dari suatu tempat pengungsian. Akurasi yang didapatkan dengan melakukan Cross Validation sebanyak 10 fold dari model klasifikasi Support Vector Machine87,96% dan Maximum Entropy 87,45 Kata Kunci: twitter, analisis sentimen, erupsi merapi, support vector machine


Author(s):  
Ganesh Udge ◽  
Mahesh Mohite ◽  
Shubhankar Bendre ◽  
Yogeshwar Birnagal ◽  
Disha Wankhede

The spreading and learning of new discoveries and information is made available using current online social networks. In Recent days, the solutions may be irrelevant to the actual content; also termed as attacks in the layman’s term such attacks are been performed on Twitter as well and called as Twitter spammers. The quality of data is being compromised by addition of malicious and harmful information using URL, bio, emoticons, audio, images/videos & hash-tags through different accounts by exchanging tweets, personal messages (Direct Message’s) & re-tweets. Misleading sites may be linked with the malicious links which may affect adverse effects on the user and also interfere in their decision making processes. To improve user-experience from the spammers attacks, the training twitter dataset are applied and then by extracting and using the 12 lightweight features like user’s age, number of followers, count of tweets and re-tweets, etc. are used to distinguish the spam from non-spam. For enhancing the performance, the discretization of the function is important for transmission of spam detection between tweets. Our system creates classification model for Spam detection which includes binary classification and automatic learning algorithms viz. Naïve Bayes classifier or Support Vector Machine classifier which understands the behaviour of the model. The system will categorize the tweets from datasets into Spam and Non-spam classes and provide the user’s feed with only the relevant information. The system will report the impact of data-related factors such as relationship between spam and non-spam tweets, size of training dataset, data sampling and detection performance. The proposed system’s function is detection and analysis of the simple and variable twitter spam over time. The spam detection is a major challenge for the system and shortens the gap between performance appraisals and focuses primarily on data, features and patterns to identify real user and informing it about the spam tweets along with the performance statistics. The work is to detect spammed tweets in real time, since the new tweets may show patterns and this will help for training and updating dataset and in knowledge base.


2020 ◽  
Author(s):  
◽  
Erick Esteven Montelongo González

The existence of large volumes of data generated by the health area presents an important opportunity for analysis. This can obtain information to support physicians in the decisionmaking process for the diagnosis or treatment of diseases, such as cancer. The present work shows a methodology for the classification of patients with liver, lung and breast cancer, through machine learning models, to obtain the model that performs best in the classification. The methodology considers three classification models: Support Vector Machines (SVM), Multi-Layer Perceptron (MLP) and AdaBoost using both structured and unstructured information from the patient's clinical records. Results show that the best classification model is MLP using only unstructured data, obtaining 89% of precision, showing the usefulness of this type of data in the classification of cancer patients.


Author(s):  
Luiza Antonie ◽  
Kris Inwood ◽  
Chris Minns ◽  
Fraser Summerfield

IntroductionLinking distinct historical sources on an automated basis directs attention to the quality and representativeness of the linked data created by these systems. Linking with time-invariant personal characteristics arguably minimizes bias or departures from representativeness even though a wider set of features might generate more links. Objectives and ApproachThe objective of this research is to compare, evaluate and understand bias when two linking methodologies are employed on the same data sources. Our approach to studying this problem is by comparing linked records from Canadian censuses (linking 1871 to 1881) generated by two different linking strategies. The first method is a support vector machine based classification model on time-invariant individual characteristics. Using this method a large number of multiple matches is generated, as records look similar on a small number of time-invariant individual characteristics. The second method adds a second stage of disambiguating multiple matches using family information. ResultsWe compare the links produced by the two methods used in the study and we discuss the results. The comparison is in terms of number of links produced, their quality (false positive rate) and the bias of the linked data produced. A complication is that there are many dimensions of bias. Even time-invariant criteria typically generate some bias. As expected, the two-step process produces a larger linked sample. Interestingly, it also produces a lower error rate and different patterns of bias. Both methods understate the Quebec-born, French-ethnicity, the unmarried and adolescents. Unexpectedly, the bias in favour of married people is larger using individual (first method) than family information (second method). However, family-based linking does over-represent young children. Conclusion/ImplicationsResults suggest that neither method will be universally preferable. Rather, the choice of research question may affect the preferred balance of biases and link rate. Fortunately, the advance of computational capacity allows a researcher to select a method that generate links most appropriate for the problem at hand.


Agriculture ◽  
2021 ◽  
Vol 11 (11) ◽  
pp. 1106
Author(s):  
Yan Hu ◽  
Lijia Xu ◽  
Peng Huang ◽  
Xiong Luo ◽  
Peng Wang ◽  
...  

A rapid and nondestructive tea classification method is of great significance in today’s research. This study uses fluorescence hyperspectral technology and machine learning to distinguish Oolong tea by analyzing the spectral features of tea in the wavelength ranging from 475 to 1100 nm. The spectral data are preprocessed by multivariate scattering correction (MSC) and standard normal variable (SNV), which can effectively reduce the impact of baseline drift and tilt. Then principal component analysis (PCA) and t-distribution random neighborhood embedding (t-SNE) are adopted for feature dimensionality reduction and visual display. Random Forest-Recursive Feature Elimination (RF-RFE) is used for feature selection. Decision Tree (DT), Random Forest Classification (RFC), K-Nearest Neighbor (KNN) and Support Vector Machine (SVM) are used to establish the classification model. The results show that MSC-RF-RFE-SVM is the best model for the classification of Oolong tea in which the accuracy of the training set and test set is 100% and 98.73%, respectively. It can be concluded that fluorescence hyperspectral technology and machine learning are feasible to classify Oolong tea.


2020 ◽  
Vol 2020 ◽  
pp. 1-13
Author(s):  
Xiaoli Ruan ◽  
Dongming Zhou ◽  
Rencan Nie ◽  
Yanbu Guo

Apoptosis proteins are strongly related to many diseases and play an indispensable role in maintaining the dynamic balance between cell death and division in vivo. Obtaining localization information on apoptosis proteins is necessary in understanding their function. To date, few researchers have focused on the problem of apoptosis data imbalance before classification, while this data imbalance is prone to misclassification. Therefore, in this work, we introduce a method to resolve this problem and to enhance prediction accuracy. Firstly, the features of the protein sequence are captured by combining Improving Pseudo-Position-Specific Scoring Matrix (IM-Psepssm) with the Bidirectional Correlation Coefficient (Bid-CC) algorithm from position-specific scoring matrix. Secondly, different features of fusion and resampling strategies are used to reduce the impact of imbalance on apoptosis protein datasets. Finally, the eigenvector adopts the Support Vector Machine (SVM) to the training classification model, and the prediction accuracy is evaluated by jackknife cross-validation tests. The experimental results indicate that, under the same feature vector, adopting resampling methods remarkably boosts many significant indicators in the unsampling method for predicting the localization of apoptosis proteins in the ZD98, ZW225, and CL317 databases. Additionally, we also present new user-friendly local software for readers to apply; the codes and software can be freely accessed at https://github.com/ruanxiaoli/Im-Psepssm.


Metals ◽  
2021 ◽  
Vol 11 (6) ◽  
pp. 914
Author(s):  
Estela Ruiz ◽  
Diego Ferreño ◽  
Miguel Cuartas ◽  
Lara Lloret ◽  
Pablo M. Ruiz del Árbol ◽  
...  

Machine Learning classification models have been trained and validated from a dataset (73 features and 13,616 instances) including experimental information of a clean cold forming steel fabricated by electric arc furnace and hot rolling. A classification model was developed to identify inclusion contents above the median. The following algorithms were implemented: Logistic Regression, K-Nearest Neighbors, Decision Tree, Random Forests, AdaBoost, Gradient Boosting, Support Vector Classifier and Artificial Neural Networks. Random Forest displayed the best results overall and was selected for the subsequent analyses. The Permutation Importance method was used to identify the variables that influence the inclusion cleanliness and the impact of these variables was determined by means of Partial Dependence Plots. The influence of the final diameter of the coil has been interpreted considering the changes induced by the process of hot rolling in the distribution of inclusions. Several variables related to the secondary metallurgy and tundish operations have been identified and interpreted in metallurgical terms. In addition, the inspection area during the microscopic examination of the samples also appears to influence the inclusion content. Recommendations have been established for the sampling process and for the manufacturing conditions to optimize the inclusionary cleanliness of the steel.


2017 ◽  
Vol 2017 ◽  
pp. 1-9
Author(s):  
Zhuo Pang ◽  
Mei Yuan ◽  
Hao Song ◽  
Zongxia Jiao

Fiber Bragg Grating (FBG) sensors have been increasingly used in the field of Structural Health Monitoring (SHM) in recent years. In this paper, we proposed an impact localization algorithm based on the Empirical Mode Decomposition (EMD) and Particle Swarm Optimization-Support Vector Machine (PSO-SVM) to achieve better localization accuracy for the FBG-embedded plate. In our method, EMD is used to extract the features of FBG signals, and PSO-SVM is then applied to automatically train a classification model for the impact localization. Meanwhile, an impact monitoring system for the FBG-embedded composites has been established to actually validate our algorithm. Moreover, the relationship between the localization accuracy and the distance from impact to the nearest sensor has also been studied. Results suggest that the localization accuracy keeps increasing and is satisfactory, ranging from 93.89% to 97.14%, on our experimental conditions with the decrease of the distance. This article reports an effective and easy-implementing method for FBG signal processing on SHM systems of the composites.


Sign in / Sign up

Export Citation Format

Share Document