scholarly journals Effective Training Data Extraction Method to Improve Influenza Outbreak Prediction from Online News Articles (Preprint)

10.2196/23305 ◽  
2020 ◽  
Author(s):  
Beakcheol Jang ◽  
Inhwan Kim ◽  
Jong Wook Kim
2020 ◽  
Author(s):  
Beakcheol Jang ◽  
Inhwan Kim

BACKGROUND Each year, influenza affects 3 to 5 million people and causes 290,000 to 650,000 fatalities worldwide. To reduce the fatalities caused by influenza, several countries have established influenza surveillance systems to collect early-warning data. However, proper and timely warnings are hindered by a 1 to 2 weeks delay between the actual disease outbreaks and the publication of surveillance data. To avoid this delay of traditional monitoring methods, novel methods have been proposed for influenza surveillance and prediction by using real-time internet data (such as search queries, microblogging, and news). Some of the currently popular approaches extract online data and use machine learning to predict influenza occurrences in a classification mode. However, many of these methods extract training data subjectively, and it is difficult to capture the latent characteristics of the data correctly. There is a critical need to devise new approaches that focus on extracting training data by reflecting the latent characteristics of the data. OBJECTIVE In this paper, we propose an effective training data extraction method that reflects the hidden features and improves the performance by filtering and selecting only the keywords related to influenza before the prediction. METHODS Although the word embeddings provide a distributed representation of words by encoding the hidden relationships between various tokens, we enhance the word embeddings by selecting keywords related to the influenza outbreak and sorting the extracted keywords using the Pearson correlation coefficient (PCC) in order of correlation with the influenza outbreak. The keyword extraction process is followed by a predictive model based on long short-term memory (LSTM) that predicts the influenza outbreak. To assess the performance of the proposed predictive model, we use and compare a variety of word embeddings. RESULTS Word embeddings without our proposed sorting process showed 0.8705 prediction accuracy when 50.2 keywords were selected on average. On the other hand, word embeddings using our proposed sorting process showed 0.8868 prediction accuracy and 12.6% prediction accuracy improvement although smaller amount of training data are selected with only 20.6 keywords on average. CONCLUSIONS The sorting process empowers the embedding process, which improves the feature extraction process because it acts as a knowledge base for the prediction component. The model outperforms other current approaches that use flat extraction before prediction.


Author(s):  
Mitsuji MUNEYASU ◽  
Nayuta JINDA ◽  
Yuuya MORITANI ◽  
Soh YOSHIDA

2020 ◽  
Vol 13 (1) ◽  
pp. 34
Author(s):  
Rong Yang ◽  
Robert Wang ◽  
Yunkai Deng ◽  
Xiaoxue Jia ◽  
Heng Zhang

The random cropping data augmentation method is widely used to train convolutional neural network (CNN)-based target detectors to detect targets in optical images (e.g., COCO datasets). It can expand the scale of the dataset dozens of times while consuming only a small amount of calculations when training the neural network detector. In addition, random cropping can also greatly enhance the spatial robustness of the model, because it can make the same target appear in different positions of the sample image. Nowadays, random cropping and random flipping have become the standard configuration for those tasks with limited training data, which makes it natural to introduce them into the training of CNN-based synthetic aperture radar (SAR) image ship detectors. However, in this paper, we show that the introduction of traditional random cropping methods directly in the training of the CNN-based SAR image ship detector may generate a lot of noise in the gradient during back propagation, which hurts the detection performance. In order to eliminate the noise in the training gradient, a simple and effective training method based on feature map mask is proposed. Experiments prove that the proposed method can effectively eliminate the gradient noise introduced by random cropping and significantly improve the detection performance under a variety of evaluation indicators without increasing inference cost.


ELKHA ◽  
2020 ◽  
Vol 12 (2) ◽  
pp. 54
Author(s):  
Eska Rizqi Naufal ◽  
Gigih Priyandoko ◽  
Fachrudin Hunaini

The 3 phase induction motor is a reliable and strong motor also has cheap price. However induction motor are also vulnerable, from the result of survey conducted by Electric Power Research Institute (EPRI), there are 41% cases of damage occur in the bearing caused by working environment condition, bearing age, and several other factors. Bearing fault is not easily to identified, with applying the data extraction method using the Discrete Wavelet Transform (DWT) and the K-Medoids clustering method will facilitate the identification process. The extraction method will pass the data in the form of current signals into the digital filter (Low Pass Filter and High Pass Filter) to be mapped into the region of frequency and time simultaneously, and clustering method will group data based on certain characteristics. Based on the clustering tests that have been done on the 3 phase induction motor current signal data with 3 bearing conditions, the Discrete Wavelet Transformation with mother wavelet bior1.1 decomposition level 2 and K-Medoids produce an accuracy rate of 86.8%.


2020 ◽  
Vol 9 (2) ◽  
pp. 109 ◽  
Author(s):  
Bo Cheng ◽  
Shiai Cui ◽  
Xiaoxiao Ma ◽  
Chenbin Liang

Feature extraction of an urban area is one of the most important directions of polarimetric synthetic aperture radar (PolSAR) applications. A high-resolution PolSAR image has the characteristics of high dimensions and nonlinearity. Therefore, to find intrinsic features for target recognition, a building area extraction method for PolSAR images based on the Adaptive Neighborhoods selection Neighborhood Preserving Embedding (ANSNPE) algorithm is proposed. First, 52 features are extracted by using the Gray level co-occurrence matrix (GLCM) and five polarization decomposition methods. The feature set is divided into 20 dimensions, 36 dimensions, and 52 dimensions. Next, the ANSNPE algorithm is applied to the training samples, and the projection matrix is obtained for the test image to extract the new features. Lastly, the Support Vector machine (SVM) classifier and post processing are used to extract the building area, and the accuracy is evaluated. Comparative experiments are conducted using Radarsat-2, and the results show that the ANSNPE algorithm could effectively extract the building area and that it had a better generalization ability; the projection matrix is obtained using the training data and could be directly applied to the new sample, and the building area extraction accuracy is above 80%. The combination of polarization and texture features provide a wealth of information that is more conducive to the extraction of building areas.


Author(s):  
Vikas Menon ◽  
Sujita Kumar Kar ◽  
Natarajan Varadharajan ◽  
Charanya Kaliamoorthy ◽  
Jigyansa Ipsita Pattnaik ◽  
...  

Abstract Background Celebrity suicides have the potential to trigger suicide contagion, particularly when media reporting is detailed and imbalanced. We aimed to assess the quality of media reporting of suicide of a popular Indian entertainment celebrity against the World Health Organization (WHO) suicide reporting guidelines. Methods Relevant news articles that reported the actor’s suicide were retrieved from online news portals of regional and English language newspapers and television channels in the immediate week following the event. Deductive content analysis of these articles was done using a pre-designed data extraction form. Results A total of 573 news articles were analyzed. Several breaches of reporting were noted in relation to mentioning the word ‘celebrity’ in the title of report (14.7%), inclusion of the deceased’s photograph (88.5%), detailed descriptions of the method (50.4%) and location of suicide (70.6%); local language newspapers were more culpable than English newspapers. Helpful reporting characteristics such as mentioning warning signs (4.1%), including educational information (2.7%) and suicide support line details (14.0%) were rarely practiced. Conclusion Media reporting of celebrity suicide in India is imbalanced and poorly adherent to suicide reporting recommendations. Local language news reports display more frequent and serious violations in reporting as opposed to English news articles.


2020 ◽  
pp. 002076402096453
Author(s):  
Vikas Menon ◽  
Sujita Kumar Kar ◽  
Marthoenis Marthoenis ◽  
SM Yasir Arafat ◽  
Ginni Sharma ◽  
...  

Background: Little is known about the factors that determine vulnerability to subsequent suicide in the community following a celebrity suicide. Our objective was to investigate the link between an alleged celebrity suicide and further suicidal behaviour in the community in India. Methods: Relevant news articles that reported suicidal behaviour in the population were retrieved from online news portals of regional and English language newspapers in the immediate month following the actor’s death. A deductive analysis of the retrieved suicide news articles was carried out using a pre-designed data extraction form. Results: A total of 1160 relevant news articles were identified from the local language ( n = 985) and English ( n = 175) newspapers. For a sizeable percentage of these reports ( n = 65, 5.6%), the media reported links with celebrity suicide. Odds of subsequent suicide among young (Odds Ratios [OR] – 9.24), female (OR – 1.94), unemployed (OR – 7.26), those without precipitating life events (OR – 2.94) or mental illness (OR – 1.69) were higher among those with link to celebrity suicide; likewise, odds of death by hanging (OR – 49.84) and leaving a suicide note (OR – 2.03) were higher among those linked to celebrity suicide. English newspapers (OR – 4.23) were more likely to report events linked to celebrity suicide than local language newspapers Conclusion: Persons who died by suicide by hanging after a celebrity suicide are more likely to be young, female, unemployed, have a mental disorder or precipitating life events. Suicide prevention efforts must focus on this group and prevent the same method of suicide like that of the celebrity.


Sign in / Sign up

Export Citation Format

Share Document