Anthrax on Twitter: Analysis of Public Discussion of Anthrax Over Twelve Months of Data Collection (Preprint)

BACKGROUND A computational framework that utilizes machine learning methodologies was created to collect tweets discussing anthrax, further categorize them as relevant by month of data collection and detect anthrax related events. OBJECTIVE The objective of this study was to detect anthrax related events and to determine the relevancy of the tweets and topics of discussion over twelve months of data collection. METHODS Machine learning techniques were used to determine what people were tweeting about anthrax. Data over time was graphed to see if an event was detected (a three-fold spike in tweets). A machine learning classifier was created to categorize tweets as relevant. Relevant tweets by month were examined using a topic modeling approach to determine the topics of discussion over time and how events influence that discussion. RESULTS Over the twelve months of data collection 204,008 tweets were collected. Logistic regression performed best for relevancy (precision=0.81, recall=0.81, and F1-score=0.80). Twenty-six topics were found relating to anthrax events, tweets that were highly re-tweeted, natural outbreaks, and news stories. CONCLUSIONS This study demonstrated that tweets relating to anthrax can be collected and analyzed over time to determine what people are discussing and detect key anthrax-related events. Future studies can focus on opinion tweets only, use the methodology to study other terrorism events, or use the methodology to monitor for threats.

Download Full-text

A Semi-Supervised Learning Approach for Tackling Twitter Spam Drift

International Journal of Computational Intelligence and Applications ◽

10.1142/s146902681950010x ◽

2019 ◽

Vol 18 (02) ◽

pp. 1950010 ◽

Cited By ~ 2

Author(s):

Niddal Imam ◽

Biju Issac ◽

Seibu Mary Jacob

Keyword(s):

Machine Learning ◽

Supervised Learning ◽

Research Community ◽

Machine Learning Techniques ◽

Spam Detection ◽

Learning Approach ◽

New Approach ◽

Detection Systems ◽

Learning Techniques ◽

Over Time

Twitter has changed the way people get information by allowing them to express their opinion and comments on the daily tweets. Unfortunately, due to the high popularity of Twitter, it has become very attractive to spammers. Unlike other types of spam, Twitter spam has become a serious issue in the last few years. The large number of users and the high amount of information being shared on Twitter play an important role in accelerating the spread of spam. In order to protect the users, Twitter and the research community have been developing different spam detection systems by applying different machine-learning techniques. However, a recent study showed that the current machine learning-based detection systems are not able to detect spam accurately because spam tweet characteristics vary over time. This issue is called “Twitter Spam Drift”. In this paper, a semi-supervised learning approach (SSLA) has been proposed to tackle this. The new approach uses the unlabeled data to learn the structure of the domain. Different experiments were performed on English and Arabic datasets to test and evaluate the proposed approach and the results show that the proposed SSLA can reduce the effect of Twitter spam drift and outperform the existing techniques.

Download Full-text

Popular music lyrics and musicians’ gender over time: A computational approach

Psychology of Music ◽

10.1177/0305735619871602 ◽

2019 ◽

pp. 030573561987160 ◽

Cited By ~ 1

Author(s):

Manuel Anglada-Tort ◽

Amanda E Krause ◽

Adrian C North

Keyword(s):

Machine Learning ◽

Popular Music ◽

Machine Learning Techniques ◽

Mixed Effect ◽

Gender Distribution ◽

Learning Techniques ◽

Inflection Points ◽

The Uk ◽

Music Lyrics ◽

Over Time

The present study investigated how the gender distribution of the United Kingdom’s most popular artists has changed over time and the extent to which these changes might relate to popular music lyrics. Using data mining and machine learning techniques, we analyzed all songs that reached the UK weekly top 5 sales charts from 1960 to 2015 (4,222 songs). DICTION software facilitated a computerized analysis of the lyrics, measuring a total of 36 lyrical variables per song. Results showed a significant inequality in gender representation on the charts. However, the presence of female musicians increased significantly over the time span. The most critical inflection points leading to changes in the prevalence of female musicians were in 1968, 1976, and 1984. Linear mixed-effect models showed that the total number of words and the use of self-reference in popular music lyrics changed significantly as a function of musicians’ gender distribution over time, and particularly around the three critical inflection points identified. Irrespective of gender, there was a significant trend toward increasing repetition in the lyrics over time. Results are discussed in terms of the potential advantages of using machine learning techniques to study naturalistic singles sales charts data.

Download Full-text

Arabic tweets sentiment analysis – a hybrid scheme

Journal of Information Science ◽

10.1177/0165551515610513 ◽

2016 ◽

Vol 42 (6) ◽

pp. 782-797 ◽

Cited By ~ 42

Author(s):

Haifa K. Aldayel ◽

Aqil M. Azmi

Keyword(s):

Machine Learning ◽

Saudi Arabia ◽

Hybrid Approach ◽

Training Data ◽

Machine Learning Techniques ◽

Good Source ◽

Learning Classifier ◽

Learning Techniques ◽

Semantic Orientation ◽

F Measure

The fact that people freely express their opinions and ideas in no more than 140 characters makes Twitter one of the most prevalent social networking websites in the world. Being popular in Saudi Arabia, we believe that tweets are a good source to capture the public’s sentiment, especially since the country is in a fractious region. Going over the challenges and the difficulties that the Arabic tweets present – using Saudi Arabia as a basis – we propose our solution. A typical problem is the practice of tweeting in dialectical Arabic. Based on our observation we recommend a hybrid approach that combines semantic orientation and machine learning techniques. Through this approach, the lexical-based classifier will label the training data, a time-consuming task often prepared manually. The output of the lexical classifier will be used as training data for the SVM machine learning classifier. The experiments show that our hybrid approach improved the F-measure of the lexical classifier by 5.76% while the accuracy jumped by 16.41%, achieving an overall F-measure and accuracy of 84 and 84.01% respectively.

Download Full-text

Examining the Evolution of E-Government Development of Nations Through Machine Learning Techniques

Advances in Business Information Systems and Analytics - Handbook of Research on Applied Data Science and Artificial Intelligence in Business and Industry ◽

10.4018/978-1-7998-6985-6.ch004 ◽

2021 ◽

pp. 85-107

Author(s):

Niguissie Mengesha ◽

Anteneh Ayanso

Keyword(s):

Machine Learning ◽

Weighted Average ◽

Policy Implications ◽

Machine Learning Techniques ◽

Member States ◽

Theoretical Perspectives ◽

Learning Techniques ◽

Cluster Profiles ◽

Over Time ◽

Government Development

Several initiatives have tried to measure the efforts nations have made towards developing e-government. The UN E-Government Development Index (EGDI) is the only global report that ranks and classifies the UN Member States into four categories based on a weighted average of normalized scores on online service, telecom infrastructure, and human capital. The authors argue that the EGDI fails in showing the efforts of nations over time and in informing nations and policymakers as to what and from whom to draw policy lessons. Using the UN EGDI data from 2008 to 2020, they profile the UN Member States and show the relevance of machine learning techniques in addressing these issues. They examine the resulting cluster profiles in terms of theoretical perspectives in the literature and derive policy insights from the different groupings of nations and their evolution over time. Finally, they discuss the policy implications of the proposed methodology and the insights obtained.

Download Full-text

Ground penetrating radar for buried utilities detection and mapping: a review

Journal of Physics Conference Series ◽

10.1088/1742-6596/2107/1/012056 ◽

2021 ◽

Vol 2107 (1) ◽

pp. 012056

Author(s):

Hasimah Ali ◽

Nurul Syahirah Mohd Ideris ◽

A F Ahmad Zaidi ◽

M S Zanar Azalan ◽

T S Tengku Amran ◽

...

Keyword(s):

Machine Learning ◽

Image Processing ◽

Experimental Design ◽

Data Collection ◽

Ground Penetrating Radar ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Underground Utilities ◽

Ground Penetrating ◽

Non Destructive

Abstract This paper presents a review on Ground Penetrating Radar (GPR) detection and mapping of buried utilities which have been widely used as non-destructive investigation and efficiently in terms of usage. The reviews cover on experimental design in GPR data collection and survey, pre-processing, extracting hyperbolic feature using image processing and machine learning techniques. Some of the issues and challenges facing by the GPR interpretation particularly in extracting the hyperbolas pattern of underground utilities have also been highlighted.

Download Full-text

Machine Learning Against Terrorism: How Big Data Collection and Analysis Influences the Privacy-Security Dilemma

Science and Engineering Ethics ◽

10.1007/s11948-020-00254-w ◽

2020 ◽

Vol 26 (6) ◽

pp. 2975-2984

Author(s):

H. M. Verhelst ◽

A. W. Stannat ◽

G. Mecacci

Keyword(s):

Machine Learning ◽

Data Collection ◽

Learning Algorithms ◽

Class Imbalance ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Security Dilemma ◽

Learning Techniques ◽

Bulk Data ◽

Mass Surveillance

AbstractRapid advancements in machine learning techniques allow mass surveillance to be applied on larger scales and utilize more and more personal data. These developments demand reconsideration of the privacy-security dilemma, which describes the tradeoffs between national security interests and individual privacy concerns. By investigating mass surveillance techniques that use bulk data collection and machine learning algorithms, we show why these methods are unlikely to pinpoint terrorists in order to prevent attacks. The diverse characteristics of terrorist attacks—especially when considering lone-wolf terrorism—lead to irregular and isolated (digital) footprints. The irregularity of data affects the accuracy of machine learning algorithms and the mass surveillance that depends on them which can be explained by three kinds of known problems encountered in machine learning theory: class imbalance, the curse of dimensionality, and spurious correlations. Proponents of mass surveillance often invoke the distinction between collecting data and metadata, in which the latter is understood as a lesser breach of privacy. Their arguments commonly overlook the ambiguity in the definitions of data and metadata and ignore the ability of machine learning techniques to infer the former from the latter. Given the sparsity of datasets used for machine learning in counterterrorism and the privacy risks attendant with bulk data collection, policymakers and other relevant stakeholders should critically re-evaluate the likelihood of success of the algorithms and the collection of data on which they depend.

Download Full-text

Machine Learning Model for Imbalanced Cholera Dataset in Tanzania

The Scientific World JOURNAL ◽

10.1155/2019/9397578 ◽

2019 ◽

Vol 2019 ◽

pp. 1-12 ◽

Cited By ~ 2

Author(s):

Judith Leo ◽

Edith Luhanga ◽

Kisangiri Michael

Keyword(s):

Climate Change ◽

Machine Learning ◽

Health Care ◽

Data Collection ◽

Learning Strategies ◽

Principal Component ◽

Machine Learning Techniques ◽

Quality Data ◽

Rank Test ◽

Learning Techniques

Cholera epidemic remains a public threat throughout history, affecting vulnerable population living with unreliable water and substandard sanitary conditions. Various studies have observed that the occurrence of cholera has strong linkage with environmental factors such as climate change and geographical location. Climate change has been strongly linked to the seasonal occurrence and widespread of cholera through the creation of weather patterns that favor the disease’s transmission, infection, and the growth of Vibrio cholerae, which cause the disease. Over the past decades, there have been great achievements in developing epidemic models for the proper prediction of cholera. However, the integration of weather variables and use of machine learning techniques have not been explicitly deployed in modeling cholera epidemics in Tanzania due to the challenges that come with its datasets such as imbalanced data and missing information. This paper explores the use of machine learning techniques to model cholera epidemics with linkage to seasonal weather changes while overcoming the data imbalance problem. Adaptive Synthetic Sampling Approach (ADASYN) and Principal Component Analysis (PCA) were used to the restore sampling balance and dimensional of the dataset. In addition, sensitivity, specificity, and balanced-accuracy metrics were used to evaluate the performance of the seven models. Based on the results of the Wilcoxon sign-rank test and features of the models, XGBoost classifier was selected to be the best model for the study. Overall results improved our understanding of the significant roles of machine learning strategies in health-care data. However, the study could not be treated as a time series problem due to the data collection bias. The study recommends a review of health-care systems in order to facilitate quality data collection and deployment of machine learning techniques.

Download Full-text

Geographic Field Data Collection: Using machine learning techniques to verify minimum data requirements for the classification task.

Journal of Geography (Chigaku Zasshi) ◽

10.5026/jgeography.105.5_636 ◽

1996 ◽

Vol 105 (5) ◽

pp. 636-648

Author(s):

S. D. KIRKBY ◽

P. W. EKLUND

Keyword(s):

Machine Learning ◽

Data Collection ◽

Field Data ◽

Machine Learning Techniques ◽

Classification Task ◽

Minimum Data ◽

Learning Techniques ◽

Field Data Collection ◽

Data Requirements

Download Full-text

Myocardial Infarction Prediction Using Hybrid Machine Learning Techniques

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i3.1716 ◽

2021 ◽

Vol 12 (3) ◽

pp. 4251-4260

Author(s):

Vaddi Niranjan Reddy Et.al

Keyword(s):

Machine Learning ◽

Myocardial Infarction ◽

Supervised Learning ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Training Dataset ◽

Learning Classifier ◽

Learning Techniques ◽

Accuracy Performance ◽

Graphical Presentation

The myocardial infarction prediction is an important task in health care domain in the current days. So, Prediction of cardiovascular diseases is a critical challenge in the area of clinical data analysis. It is difficult to predict myocardial infarction prediction by physicians with huge health records. To overcome this complexity we need to implement the automatic heard disease prediction system to notify the patient and get to recovery from the disease. Here to gaining the automatic system we are using machine learning techniques to easily performing myocardial infarction prediction. The machine learning techniques can be split into multiple types like unsupervised and supervised learning classifier. The supervised learning techniques working with structured data which is recommended to implement this classifiers. So, in this system we are using supervised learning techniques namely KNN, RF, NN, DT, NB, and SVM classifiers. To predict myocardial infarction, this system is using training dataset which is accessing from UCI ML repository. As well as this system is comparing accuracy performance between various machine learning algorithms and accuracy results with graphical presentation. This makes the accessing of the risk of the disease in the early stages and can try to save the patient without having any loss.

Download Full-text

Classification of Autism Spectrum Disorder Data using Machine Learning Techniques

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f1114.0886s19 ◽

2019 ◽

Vol 8 (6S) ◽

pp. 565-569

Keyword(s):

Machine Learning ◽

Autism Spectrum ◽

Machine Learning Techniques ◽

Data Sets ◽

Human Communication ◽

Data Set ◽

Complex Disorder ◽

Learning Classifier ◽

Learning Techniques

Autism is a neuro-developmental disability that affects human communication and behaviour. It is a condition that is associated with the complex disorder of the brain which can lead to significant changes in social interaction and behaviour of a human being.Machine learning techniques are being applied to autism data sets to discover useful hidden patterns and to construct predictive models for detecting its risk.This paper focuses on finding the best machine learning classifier on the UCI autism disorder data set for identifying the main factors associated with autism. The results obtained using Multilayer Perceptron, Naive Bayes Classifier and Bayesian Networkwere compared with J48 Decision tree algorithm. The superiority of MultilayerPerceptron over the well known classification algorithms in predicting the autism risk is established in this paper.

Download Full-text