Identifying Victims of Human Sex Trafficking in Online Ads

Author(s):  
Jessica Whitney ◽  
Marisa Hultgren ◽  
Murray Eugene Jennex ◽  
Aaron Elkins ◽  
Eric Frost

Social media and the interactive web have enabled human traffickers to lure victims and then sell them faster and in greater safety than ever before. However, these same tools have also enabled investigators in their search for victims and criminals. A prototype was designed to identify victims of human sex trafficking by analyzing online ads. The prototype used a knowledge management to generate actionable intelligence by applying a set of strong filters based on an ontology to identify potential victims. The prototype was used to analyze data sets generated from online ads. An unexpected outcome of the second data set was the discovery of the use of emojis in an expanded ontology. The final prototype used the expanded ontology to identify potential victims. The results of applying the prototypes suggest a viable approach to identifying victims of human sex trafficking in online ads.

Author(s):  
Jessica Whitney ◽  
Marisa Hultgren ◽  
Murray Eugene Jennex ◽  
Aaron Elkins ◽  
Eric Frost

Social media and the interactive Web have enabled human traffickers to lure victims and then sell them faster and in greater safety than ever before. However, these same tools have also enabled investigators in their search for victims and criminals. Authors used system development action research methodology to create and apply a prototype designed to identify victims of human sex trafficking by analyzing online ads. The prototype used a knowledge management approach of generating actionable intelligence by applying a set of strong filters based on an ontology to identify potential victims. Authors used the prototype to analyze a dataset generated from online ads from southern California and used the results of this process to generate a revised prototype that included the use of machine learning and text mining enhancements. An unexpected outcome of the second dataset was the discovery of the use of emojis in an expanded ontology.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Yahya Albalawi ◽  
Jim Buckley ◽  
Nikola S. Nikolov

AbstractThis paper presents a comprehensive evaluation of data pre-processing and word embedding techniques in the context of Arabic document classification in the domain of health-related communication on social media. We evaluate 26 text pre-processings applied to Arabic tweets within the process of training a classifier to identify health-related tweets. For this task we use the (traditional) machine learning classifiers KNN, SVM, Multinomial NB and Logistic Regression. Furthermore, we report experimental results with the deep learning architectures BLSTM and CNN for the same text classification problem. Since word embeddings are more typically used as the input layer in deep networks, in the deep learning experiments we evaluate several state-of-the-art pre-trained word embeddings with the same text pre-processing applied. To achieve these goals, we use two data sets: one for both training and testing, and another for testing the generality of our models only. Our results point to the conclusion that only four out of the 26 pre-processings improve the classification accuracy significantly. For the first data set of Arabic tweets, we found that Mazajak CBOW pre-trained word embeddings as the input to a BLSTM deep network led to the most accurate classifier with F1 score of 89.7%. For the second data set, Mazajak Skip-Gram pre-trained word embeddings as the input to BLSTM led to the most accurate model with F1 score of 75.2% and accuracy of 90.7% compared to F1 score of 90.8% achieved by Mazajak CBOW for the same architecture but with lower accuracy of 70.89%. Our results also show that the performance of the best of the traditional classifier we trained is comparable to the deep learning methods on the first dataset, but significantly worse on the second dataset.


Author(s):  
A Salman Avestimehr ◽  
Seyed Mohammadreza Mousavi Kalan ◽  
Mahdi Soltanolkotabi

Abstract Dealing with the shear size and complexity of today’s massive data sets requires computational platforms that can analyze data in a parallelized and distributed fashion. A major bottleneck that arises in such modern distributed computing environments is that some of the worker nodes may run slow. These nodes a.k.a. stragglers can significantly slow down computation as the slowest node may dictate the overall computational time. A recent computational framework, called encoded optimization, creates redundancy in the data to mitigate the effect of stragglers. In this paper, we develop novel mathematical understanding for this framework demonstrating its effectiveness in much broader settings than was previously understood. We also analyze the convergence behavior of iterative encoded optimization algorithms, allowing us to characterize fundamental trade-offs between convergence rate, size of data set, accuracy, computational load (or data redundancy) and straggler toleration in this framework.


2012 ◽  
Vol 7 (1) ◽  
pp. 174-197 ◽  
Author(s):  
Heather Small ◽  
Kristine Kasianovitz ◽  
Ronald Blanford ◽  
Ina Celaya

Social networking sites and other social media have enabled new forms of collaborative communication and participation for users, and created additional value as rich data sets for research. Research based on accessing, mining, and analyzing social media data has risen steadily over the last several years and is increasingly multidisciplinary; researchers from the social sciences, humanities, computer science and other domains have used social media data as the basis of their studies. The broad use of this form of data has implications for how curators address preservation, access and reuse for an audience with divergent disciplinary norms related to privacy, ownership, authenticity and reliability.In this paper, we explore how the characteristics of the Twitter platform, coupled with an ambiguous and evolving understanding of privacy in networked communication, and divergent disciplinary understandings of the resulting data, combine to create complex issues for curators trying to ensure broad-based and ethical reuse of Twitter data. We provide a case study of a specific data set to illustrate how data curators can engage with the topics and questions raised in the paper. While some initial suggestions are offered to librarians and other information professionals who are beginning to receive social media data from researchers, our larger goal is to stimulate discussion and prompt additional research on the curation and preservation of social media data.


2015 ◽  
Vol 115 (4) ◽  
pp. 612-624 ◽  
Author(s):  
Eugene Ch'ng

Purpose – The purpose of this paper is to explore the formation, maintenance and disintegration of a fringe Twitter community in order to understand if offline community structure applies to online communities. Design/methodology/approach – The research adopted Big Data methodological approaches in tracking user-generated contents over a series of months and mapped online Twitter interactions as a multimodal, longitudinal “social information landscape”. Centrality measures were employed to gauge the importance of particular user nodes within the complete network and time-series analysis were used to track ego centralities in order to see if this particular online communities were maintained by specific egos. Findings – The case study shows that communities with distinct boundaries and memberships can form and exist within Twitter’s limited user content and sequential policies, which unlike other social media services, do not support formal groups, demonstrating the resilience of desperate online users when their ideology overcome social media limitations. Analysis in this paper using social networks approaches also reveals that communities are formed and maintained from the bottom-up. Research limitations/implications – The research data is based on a particular data set which occurred within a specific time and space. However, due to the rapid, polarising group behaviour, growth, disintegration and decline of the online community, the data set presents a “laboratory” case from which many other online community can be compared with. It is highly possible that the case can be generalised to a broader range of communities and from which online community theories can be proved/disproved. Practical implications – The paper showed that particular group of egos with high activities, if removed, could entirely break the cohesiveness of the community. Conversely, strengthening such egos will reinforce the community strength. The questions mooted within the paper and the methodology outlined can potentially be applied in a variety of social science research areas. The contribution to the understanding of a complex social and political arena, as outlined in the paper, is a key example of such an application within an increasingly strategic research area – and this will surely be applied and developed further by the computer science and security community. Originality/value – The majority of researches that cover these domains have not focused on communities that are multimodal and longitudinal. This is mainly due to the challenges associated with the collection and analysis of continuous data sets that have high volume and velocity. Such data sets are therefore unexploited with regards to cyber-community research.


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Joshua M. Dempster ◽  
Clare Pacini ◽  
Sasha Pantel ◽  
Fiona M. Behan ◽  
Thomas Green ◽  
...  

AbstractGenome-scale CRISPR-Cas9 viability screens performed in cancer cell lines provide a systematic approach to identify cancer dependencies and new therapeutic targets. As multiple large-scale screens become available, a formal assessment of the reproducibility of these experiments becomes necessary. We analyze data from recently published pan-cancer CRISPR-Cas9 screens performed at the Broad and Sanger Institutes. Despite significant differences in experimental protocols and reagents, we find that the screen results are highly concordant across multiple metrics with both common and specific dependencies jointly identified across the two studies. Furthermore, robust biomarkers of gene dependency found in one data set are recovered in the other. Through further analysis and replication experiments at each institute, we show that batch effects are driven principally by two key experimental parameters: the reagent library and the assay length. These results indicate that the Broad and Sanger CRISPR-Cas9 viability screens yield robust and reproducible findings.


2021 ◽  
Vol 11 (3) ◽  
pp. 1294
Author(s):  
Krzysztof Fiok ◽  
Waldemar Karwowski ◽  
Edgar Gutierrez ◽  
Tameika Liciaga ◽  
Alessandro Belmonte ◽  
...  

Volcanoes of hate and disrespect erupt in societies often not without fatal consequences. To address this negative phenomenon scientists struggled to understand and analyze its roots and language expressions described as hate speech. As a result, it is now possible to automatically detect and counter hate speech in textual data spreading rapidly, for example, in social media. However, recently another approach to tackling the roots of disrespect was proposed, it is based on the concept of promoting positive behavior instead of only penalizing hate and disrespect. In our study, we followed this approach and discovered that it is hard to find any textual data sets or studies discussing automatic detection regarding respectful behaviors and their textual expressions. Therefore, we decided to contribute probably one of the first human-annotated data sets which allows for supervised training of text analysis methods for automatic detection of respectful messages. By choosing a data set of tweets which already possessed sentiment annotations we were also able to discuss the correlation of sentiment and respect. Finally, we provide a comparison of recent machine and deep learning text analysis methods and their performance which allowed us to demonstrate that automatic detection of respectful messages in social media is feasible.


Author(s):  
Dharmendra Trikamlal Patel

Exploratory data analysis is a technique to analyze data sets in order to summarize the main characteristics of them using quantitative and visual aspects. The chapter starts with the introduction of exploratory data analysis. It discusses the conventional view of it and describes the main limitations of it. It explores the features of quantitative and visual exploratory data analysis in detail. It deals with the statistical techniques relevant to EDA. It also emphasizes the main visual techniques to represent the data in an efficient way. R has extraordinary capabilities to deal with quantitative and visual aspects to summarize the main characteristics of the data set. The chapter provides the practical exposure of various plotting systems using R. Finally, the chapter deals with current research and future trends of the EDA.


2013 ◽  
Vol 2013 ◽  
pp. 1-11 ◽  
Author(s):  
B. Ojeda-Magaña ◽  
R. Ruelas ◽  
M. A. Corona Nakamura ◽  
D. W. Carr Finch ◽  
L. Gómez-Barba

We take the concept of typicality from the field of cognitive psychology, and we apply the meaning to the interpretation of numerical data sets and color images through fuzzy clustering algorithms, particularly the GKPFCM, looking to get better information from the processed data. The Gustafson Kessel Possibilistic Fuzzy c-means (GKPFCM) is a hybrid algorithm that is based on a relative typicality (membership degree, Fuzzy c-means) and an absolute typicality (typicality value, Possibilistic c-means). Thus, using both typicalities makes it possible to learn and analyze data as well as to relate the results with the theory of prototypes. In order to demonstrate these results we use a synthetic data set and a digitized image of a glass, in a first example, and images from the Berkley database, in a second example. The results clearly demonstrate the advantages of the information obtained about numerical data sets, taking into account the different meaning of typicalities and the availability of both values with the clustering algorithm used. This approach allows the identification of small homogeneous regions, which are difficult to find.


2018 ◽  
Vol 154 (2) ◽  
pp. 149-155
Author(s):  
Michael Archer

1. Yearly records of worker Vespula germanica (Fabricius) taken in suction traps at Silwood Park (28 years) and at Rothamsted Research (39 years) are examined. 2. Using the autocorrelation function (ACF), a significant negative 1-year lag followed by a lesser non-significant positive 2-year lag was found in all, or parts of, each data set, indicating an underlying population dynamic of a 2-year cycle with a damped waveform. 3. The minimum number of years before the 2-year cycle with damped waveform was shown varied between 17 and 26, or was not found in some data sets. 4. Ecological factors delaying or preventing the occurrence of the 2-year cycle are considered.


Sign in / Sign up

Export Citation Format

Share Document