Analysis of Harassment Complaints to Detect Witness Intervention by Machine Learning and Soft Computing Techniques

This research is aimed to analyze textual descriptions of harassment situations collected anonymously by the Hollaback! project. Hollaback! is an international movement created to end harassment in all of its forms. Its goal is to collect stories of harassment through the web and a free app all around the world to elevate victims’ individual voices to find a societal solution. Hollaback! pretends to analyze the impact of a bystander during a harassment in order to launch a public awareness-raising campaign to equip everyday people with tools to undo harassment. Thus, the analysis presented in this paper is a first step in Hollaback!’s purpose: the automatic detection of a witness intervention inferred from the victim’s own report. In a first step, natural language processing techniques were used to analyze the victim’s free-text descriptions. For this part, we used the whole dataset with all its countries and locations. In addition, classification models, based on machine learning and soft computing techniques, were developed in the second part of this study to classify the descriptions into those that have bystander presence and those that do not. For this machine learning part, we selected the city of Madrid as an example, in order to establish a criterion of the witness behavior procedure.

Download Full-text

Whether the Weather Will Help Us Weather the COVID-19 Pandemic: Using Machine Learning to Measure Twitter Users' Perceptions

10.1101/2020.07.29.20164814 ◽

2020 ◽

Author(s):

Marichi Gupta ◽

Adity Bansal ◽

Bhav Jain ◽

Jillian Rochelle ◽

Atharv Oak ◽

...

Keyword(s):

Public Health ◽

Machine Learning ◽

Language Processing ◽

Scientific Evidence ◽

The Public ◽

Potential Impact ◽

Twitter Users ◽

Processing Techniques ◽

The Impact ◽

Weather’S Impact

Objective: The potential ability for weather to affect SARS-CoV-2 transmission has been an area of controversial discussion during the COVID-19 pandemic. Individuals' perceptions of the impact of weather can inform their adherence to public health guidelines; however, there is no measure of their perceptions. We quantified Twitter users' perceptions of the effect of weather and analyzed how they evolved with respect to real-world events and time. Materials and Methods: We collected 166,005 tweets posted between January 23 and June 22, 2020 and employed machine learning/natural language processing techniques to filter for relevant tweets, classify them by the type of effect they claimed, and identify topics of discussion. Results: We identified 28,555 relevant tweets and estimate that 40.4% indicate uncertainty about weather's impact, 33.5% indicate no effect, and 26.1% indicate some effect. We tracked changes in these proportions over time. Topic modeling revealed major latent areas of discussion. Discussion: There is no consensus among the public for weather's potential impact. Earlier months were characterized by tweets that were uncertain of weather's effect or claimed no effect; later, the portion of tweets claiming some effect of weather increased. Tweets claiming no effect of weather comprised the largest class by June. Major topics of discussion included comparisons to influenza's seasonality, President Trump's comments on weather's effect, and social distancing. Conclusion: There is a major gap between scientific evidence and public opinion of weather's impacts on COVID-19. We provide evidence of public's misconceptions and topics of discussion, which can inform public health communications.

Download Full-text

Text pre-processing of multilingual for sentiment analysis based on social network data

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v12i1.pp776-784 ◽

2022 ◽

Vol 12 (1) ◽

pp. 776

Author(s):

Neha Garg ◽

Kamlesh Sharma

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Language Processing ◽

Text Processing ◽

Problem Definition ◽

Domain Specific ◽

Twitter Data ◽

Stop Word ◽

Processing Techniques ◽

The Impact

<span>Sentiment analysis (SA) is an enduring area for research especially in the field of text analysis. Text pre-processing is an important aspect to perform SA accurately. This paper presents a text processing model for SA, using natural language processing techniques for twitter data. The basic phases for machine learning are text collection, text cleaning, pre-processing, feature extractions in a text and then categorize the data according to the SA techniques. Keeping the focus on twitter data, the data is extracted in domain specific manner. In data cleaning phase, noisy data, missing data, punctuation, tags and emoticons have been considered. For pre-processing, tokenization is performed which is followed by stop word removal (SWR). The proposed article provides an insight of the techniques, that are used for text pre-processing, the impact of their presence on the dataset. The accuracy of classification techniques has been improved after applying text pre-processing and dimensionality has been reduced. The proposed corpus can be utilized in the area of market analysis, customer behaviour, polling analysis, and brand monitoring. The text pre-processing process can serve as the baseline to apply predictive analysis, machine learning and deep learning algorithms which can be extended according to problem definition.</span>

Download Full-text

A Natural Language Processing Approach to Measuring Treatment Adherence and Consistency Using Semantic Similarity

AERA Open ◽

10.1177/23328584211028615 ◽

2021 ◽

Vol 7 ◽

pp. 233285842110286

Author(s):

Kylie L. Anglin ◽

Vivian C. Wong ◽

Arielle Boguslav

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Semantic Similarity ◽

Language Processing ◽

Intervention Implementation ◽

Proof Of Concept ◽

Coaching Intervention ◽

Processing Techniques ◽

Teacher Coaching ◽

The Impact

Though there is widespread recognition of the importance of implementation research, evaluators often face intense logistical, budgetary, and methodological challenges in their efforts to assess intervention implementation in the field. This article proposes a set of natural language processing techniques called semantic similarity as an innovative and scalable method of measuring implementation constructs. Semantic similarity methods are an automated approach to quantifying the similarity between texts. By applying semantic similarity to transcripts of intervention sessions, researchers can use the method to determine whether an intervention was delivered with adherence to a structured protocol, and the extent to which an intervention was replicated with consistency across sessions, sites, and studies. This article provides an overview of semantic similarity methods, describes their application within the context of educational evaluations, and provides a proof of concept using an experimental study of the impact of a standardized teacher coaching intervention.

Download Full-text

Applying natural language processing and machine learning techniques to patient experience feedback: a systematic review

BMJ Health & Care Informatics ◽

10.1136/bmjhci-2020-100262 ◽

2021 ◽

Vol 28 (1) ◽

pp. e100262

Author(s):

Mustafa Khanbhai ◽

Patrick Anyadi ◽

Joshua Symons ◽

Kelsey Flott ◽

Ara Darzi ◽

...

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Patient Experience ◽

Language Processing ◽

Performance Metrics ◽

Free Text ◽

Patient Feedback

ObjectivesUnstructured free-text patient feedback contains rich information, and analysing these data manually would require a lot of personnel resources which are not available in most healthcare organisations.To undertake a systematic review of the literature on the use of natural language processing (NLP) and machine learning (ML) to process and analyse free-text patient experience data.MethodsDatabases were systematically searched to identify articles published between January 2000 and December 2019 examining NLP to analyse free-text patient feedback. Due to the heterogeneous nature of the studies, a narrative synthesis was deemed most appropriate. Data related to the study purpose, corpus, methodology, performance metrics and indicators of quality were recorded.ResultsNineteen articles were included. The majority (80%) of studies applied language analysis techniques on patient feedback from social media sites (unsolicited) followed by structured surveys (solicited). Supervised learning was frequently used (n=9), followed by unsupervised (n=6) and semisupervised (n=3). Comments extracted from social media were analysed using an unsupervised approach, and free-text comments held within structured surveys were analysed using a supervised approach. Reported performance metrics included the precision, recall and F-measure, with support vector machine and Naïve Bayes being the best performing ML classifiers.ConclusionNLP and ML have emerged as an important tool for processing unstructured free text. Both supervised and unsupervised approaches have their role depending on the data source. With the advancement of data analysis tools, these techniques may be useful to healthcare organisations to generate insight from the volumes of unstructured free-text data.

Download Full-text

Evaluation of Urban-Scale Building Energy-Use Models and Tools—Application for the City of Fribourg, Switzerland

Sustainability ◽

10.3390/su13041595 ◽

2021 ◽

Vol 13 (4) ◽

pp. 1595

Author(s):

Valeria Todeschi ◽

Roberto Boghetti ◽

Jérôme H. Kämpf ◽

Guglielmina Mutani

Keyword(s):

Machine Learning ◽

Energy Use ◽

Residential Buildings ◽

Building Energy ◽

Energy Performance ◽

Space Heating ◽

Engineering Model ◽

Building Energy Use ◽

The Impact ◽

The City

Building energy-use models and tools can simulate and represent the distribution of energy consumption of buildings located in an urban area. The aim of these models is to simulate the energy performance of buildings at multiple temporal and spatial scales, taking into account both the building shape and the surrounding urban context. This paper investigates existing models by simulating the hourly space heating consumption of residential buildings in an urban environment. Existing bottom-up urban-energy models were applied to the city of Fribourg in order to evaluate the accuracy and flexibility of energy simulations. Two common energy-use models—a machine learning model and a GIS-based engineering model—were compared and evaluated against anonymized monitoring data. The study shows that the simulations were quite precise with an annual mean absolute percentage error of 12.8 and 19.3% for the machine learning and the GIS-based engineering model, respectively, on residential buildings built in different periods of construction. Moreover, a sensitivity analysis using the Morris method was carried out on the GIS-based engineering model in order to assess the impact of input variables on space heating consumption and to identify possible optimization opportunities of the existing model.

Download Full-text

Evaluating Cultural Impact in Discursive Space through Digital Footprints

Sustainability ◽

10.3390/su13074043 ◽

2021 ◽

Vol 13 (7) ◽

pp. 4043 ◽

Cited By ~ 1

Author(s):

Jesús López Baeza ◽

Jens Bley ◽

Kay Hartkopf ◽

Martin Niggemann ◽

James Arias ◽

...

Keyword(s):

Social Media ◽

Language Processing ◽

Social Activity ◽

Test Site ◽

Cultural Discourse ◽

Discursive Space ◽

Topic Identification ◽

Small Test ◽

The Impact ◽

The City

The research presented in this paper describes an evaluation of the impact of spatial interventions in public spaces, measured by social media data. This contribution aims at observing the way a spatial intervention in an urban location can affect what people talk about on social media. The test site for our research is Domplatz in the center of Hamburg, Germany. In recent years, several actions have taken place there, intending to attract social activity and spotlight the square as a landmark of cultural discourse in the city of Hamburg. To evaluate the impact of this strategy, textual data from the social networks Twitter and Instagram (i.e., tweets and image captions) are collected and analyzed using Natural Language Processing intelligence. These analyses identify and track the cultural topic or “people talking about culture” in the city of Hamburg. We observe the evolution of the cultural topic, and its potential correspondence in levels of activity, with certain intervention actions carried out in Domplatz. Two analytic methods of topic clustering and tracking are tested. The results show a successful topic identification and tracking with both methods, the second one being more accurate. This means that it is possible to isolate and observe the evolution of the city’s cultural discourse using NLP. However, it is shown that the effects of spatial interventions in our small test square have a limited local scale, rather than a city-wide relevance.

Download Full-text

Citizen Participation and Machine Learning for a Better Democracy

Digital Government: Research and Practice (DGOV) ◽

10.1145/3452118 ◽

2021 ◽

Author(s):

Robert Procter ◽

Miguel Arana-Catania ◽

Felix-Anselm van Lier ◽

Nataliya Tkachenko ◽

Yulan He ◽

...

Keyword(s):

Machine Learning ◽

Citizen Participation ◽

Language Processing ◽

Information Overload ◽

City Council ◽

Decision Making Processes ◽

Development Goals ◽

Democratic Systems ◽

The City

The development of democratic systems is a crucial task as confirmed by its selection as one of the Millennium Sustainable Development Goals by the United Nations. In this article, we report on the progress of a project that aims to address barriers, one of which is information overload, to achieving effective direct citizen participation in democratic decision-making processes. The main objectives are to explore if the application of Natural Language Processing (NLP) and machine learning can improve citizens? experience of digital citizen participation platforms. Taking as a case study the ?Decide Madrid? Consul platform, which enables citizens to post proposals for policies they would like to see adopted by the city council, we used NLP and machine learning to provide new ways to (a) suggest to citizens proposals they might wish to support; (b) group citizens by interests so that they can more easily interact with each other; (c) summarise comments posted in response to proposals; (d) assist citizens in aggregating and developing proposals. Evaluation of the results confirms that NLP and machine learning have a role to play in addressing some of the barriers users of platforms such as Consul currently experience.

Download Full-text

Application of natural language processing methods to extract coded data from administrative data held in the Scottish Prescribing Information System

International Journal for Population Data Science ◽

10.23889/ijpds.v1i1.263 ◽

2017 ◽

Vol 1 (1) ◽

Author(s):

Clifford Nangle ◽

Stuart McTaggart ◽

Margaret MacLeod ◽

Jackie Caldwell ◽

Marion Bennie

Keyword(s):

Information System ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Drug Exposure ◽

Drug Dose ◽

Free Text ◽

Wide Range ◽

The Impact ◽

Prescribing Information

ABSTRACT ObjectivesThe Prescribing Information System (PIS) datamart, hosted by NHS National Services Scotland receives around 90 million electronic prescription messages per year from GP practices across Scotland. Prescription messages contain information including drug name, quantity and strength stored as coded, machine readable, data while prescription dose instructions are unstructured free text and difficult to interpret and analyse in volume. The aim, using Natural Language Processing (NLP), was to extract drug dose amount, unit and frequency metadata from freely typed text in dose instructions to support calculating the intended number of days’ treatment. This then allows comparison with actual prescription frequency, treatment adherence and the impact upon prescribing safety and effectiveness. ApproachAn NLP algorithm was developed using the Ciao implementation of Prolog to extract dose amount, unit and frequency metadata from dose instructions held in the PIS datamart for drugs used in the treatment of gastrointestinal, cardiovascular and respiratory disease. Accuracy estimates were obtained by randomly sampling 0.1% of the distinct dose instructions from source records, comparing these with metadata extracted by the algorithm and an iterative approach was used to modify the algorithm to increase accuracy and coverage. ResultsThe NLP algorithm was applied to 39,943,465 prescription instructions issued in 2014, consisting of 575,340 distinct dose instructions. For drugs used in the gastrointestinal, cardiovascular and respiratory systems (i.e. chapters 1, 2 and 3 of the British National Formulary (BNF)) the NLP algorithm successfully extracted drug dose amount, unit and frequency metadata from 95.1%, 98.5% and 97.4% of prescriptions respectively. However, instructions containing terms such as ‘as directed’ or ‘as required’ reduce the usability of the metadata by making it difficult to calculate the total dose intended for a specific time period as 7.9%, 0.9% and 27.9% of dose instructions contained terms meaning ‘as required’ while 3.2%, 3.7% and 4.0% contained terms meaning ‘as directed’, for drugs used in BNF chapters 1, 2 and 3 respectively. ConclusionThe NLP algorithm developed can extract dose, unit and frequency metadata from text found in prescriptions issued to treat a wide range of conditions and this information may be used to support calculating treatment durations, medicines adherence and cumulative drug exposure. The presence of terms such as ‘as required’ and ‘as directed’ has a negative impact on the usability of the metadata and further work is required to determine the level of impact this has on calculating treatment durations and cumulative drug exposure.

Download Full-text

Application of Machine Learning with Impedance Based Techniques for Structural Health Monitoring of Civil Infrastructure

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.f1237.0486s419 ◽

2019 ◽

Vol 8 (6S4) ◽

pp. 1139-1148

Keyword(s):

Machine Learning ◽

Signal Processing ◽

Structural Health Monitoring ◽

Damage Detection ◽

Soft Computing ◽

Health Monitoring ◽

Multiple Cracks ◽

Civil Infrastructure ◽

Structural Health ◽

Soft Computing Techniques

Increased attentiveness on the environmental and effects of aging, deterioration and extreme events on civil infrastructure has created the need for more advanced damage detection tools and structural health monitoring (SHM). Today, these tasks are performed by signal processing, visual inspection techniques along with traditional well known impedance based health monitoring EMI technique. New research areas have been explored that improves damage detection at incipient stage and when the damage is substantial. Addressing these issues at early age prevents catastrophe situation for the safety of human lives. To improve the existing damage detection newly developed techniques in conjugation with EMI innovative new sensors, signal processing and soft computing techniques are discussed in details this paper. The advanced techniques (soft computing, signal processing, visual based, embedded IOT) are employed as a global method in prediction, to identify, locate, optimize, the damage area and deterioration. The amount and severity, multiple cracks on civil infrastructure like concrete and RC structures (beams and bridges) using above techniques along with EMI technique and use of PZT transducer. In addition to survey advanced innovative signal processing, machine learning techniques civil infrastructure connected to IOT that can make infrastructure smart and increases its efficiency that is aimed at socioeconomic, environmental and sustainable development.

Download Full-text

Quantifying changes in bicycle volumes using crowdsourced data

Environment and Planning B Urban Analytics and City Science ◽

10.1177/23998083211066103 ◽

2022 ◽

pp. 239980832110661

Author(s):

Ali Al-Ramini ◽

Mohammad A Takallou ◽

Daniel P Piatkowski ◽

Fadi Alsaleem

Keyword(s):

Machine Learning ◽

The United States ◽

Crowdsourced Data ◽

Machine Learning Approach ◽

Bicycle Infrastructure ◽

The Difference ◽

Infrastructure Investments ◽

Using Data ◽

The Impact ◽

The City

Most cities in the United States lack comprehensive or connected bicycle infrastructure; therefore, inexpensive and easy-to-implement solutions for connecting existing bicycle infrastructure are increasingly being employed. Signage is one of the promising solutions. However, the necessary data for evaluating its effect on cycling ridership is lacking. To overcome this challenge, this study tests the potential of using readily-available crowdsourced data in concert with machine-learning methods to provide insight into signage intervention effectiveness. We do this by assessing a natural experiment to identify the potential effects of adding or replacing signage within existing bicycle infrastructure in 2019 in the city of Omaha, Nebraska. Specifically, we first visually compare cycling traffic changes in 2019 to those from the previous two years (2017–2018) using data extracted from the Strava fitness app. Then, we use a new three-step machine-learning approach to quantify the impact of signage while controlling for weather, demographics, and street characteristics. The steps are as follows: Step 1 (modeling and validation) build and train a model from the available 2017 crowdsourced data (i.e., Strava, Census, and weather) that accurately predicts the cycling traffic data for any street within the study area in 2018; Step 2 (prediction) use the model from Step 1 to predict bicycle traffic in 2019 while assuming new signage was not added; Step 3 (impact evaluation) use the difference in prediction from actual traffic in 2019 as evidence of the likely impact of signage. While our work does not demonstrate causality, it does demonstrate an inexpensive method, using readily-available data, to identify changing trends in bicycling over the same time that new infrastructure investments are being added.

Download Full-text