Random Sampling in Corpus Design: Cross-Context Generalizability in Automated Multicountry Protest Event Collection

What is the most optimal way of creating a gold standard corpus for training a machine learning system that is designed for automatically collecting protest information in a cross-country context? We show that creating a gold standard corpus for training and testing machine learning models on the basis of randomly chosen news articles from news archives yields better performance than selecting news articles on the basis of keyword filtering, which is the most prevalent method currently used in automated event coding. We advance this new bottom-up approach to ensure generalizability and reliability in cross-country comparative protest event collection from international and local news in different countries, languages, sources and time periods, which entails a large variety of event types, actors, and targets. We present the results of comparing our random-sample approach with keyword filtering. We show that the machine learning algorithms, and particularly state-of-the-art deep learning tools, perform much better when they are trained with the gold standard corpus from a randomly selected set of news articles from China, India, and South Africa. Finally, we also present our approach to overcome the major ethical issues that are intrinsic to protest event coding.

Download Full-text

GoMi - A new gold standard corpus for miRNA Named Entity Recognition to test dictionary, rule-based and machine-learning approaches.

10.1101/2021.10.18.464801 ◽

2021 ◽

Author(s):

Anika Frericks-Zipper ◽

Markus Stepath ◽

Karin Schork ◽

Katrin Marcus ◽

Michael Turewicz ◽

...

Keyword(s):

Machine Learning ◽

Gold Standard ◽

Machine Learning Algorithms ◽

Entity Recognition ◽

Learning Approaches ◽

Rule Based ◽

Text Corpora ◽

Micro Rnas ◽

Gold Standard Corpus ◽

Gold Standards

Biomarkers have been the focus of research for more than 30 years [REF1] . Paone et al. were among the first scientists to use the term biomarker in the course of a comparative study dealing with breast carcinoma [REF2]. In recent years, in addition to proteins and genes, miRNA or micro RNAs, which play an essential role in gene expression, have gained increased interest as valuable biomarkers. As a result, more and more information on miRNA biomarkers can be extracted via text mining approaches from the increasing amount of scientific literature. In the late 1990s the recognition of specific terms in biomedical texts has become a focus of bioinformatic research to automatically extract knowledge out of the increasing number of publications. For this, amongst other methods, machine learning algorithms are applied. However, the recognition (classification) capability of terms by machine learning or rule based algorithms depends on their correct and reproducible training and development. In the case of machine learning-based algorithms the quality of the available training and test data is crucial. The algorithms have to be tested and trained with curated and trustable data sets, the so-called gold or silver standards. Gold standards are text corpora, which are annotated by expertes, whereby silver standards are curated automatically by other algorithms. Training and calibration of neural networks is based on such corpora. In the literature there are some silver standards with approx. 500,000 tokens [REF3]. Also there are already published gold standards for species, genes, proteins or diseases. However, there is no corpus that has been generated specifically for miRNA. To close this gap, we have generated GoMi, a novel and manually curated gold standard corpus for miRNA. GoMi can be directly used to train ML-methods to calibrate or test different algorithms based on the rule-based approach or dictionary-based approach. The GoMi gold standard corpus was created using publicly available PubMed abstracts. GoMi can be downloaded here: https://github.com/mpc-bioinformatics/mirnaGS---GoMi.

Download Full-text

Machine Learning Algorithms in Cardiology Domain: A Systematic Review (Preprint)

10.2196/preprints.14784 ◽

2019 ◽

Author(s):

Georgy Kopanitsa ◽

Aleksei Dudchenko ◽

Matthias Ganzinger

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Search Strategy ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Learning System ◽

Prisma Statement ◽

Meta Analyses ◽

Conference Papers ◽

Huge Variety

BACKGROUND It has been shown in previous decades, that Machine Learning (ML) has a huge variety of possible implementations in medicine and can be very helpful. Neretheless, cardiovascular diseases causes about third of of all global death. Does ML work in cardiology domain and what is current progress in that regard? OBJECTIVE The review aims at (1) identifying studies where machine-learning algorithms were applied in the cardiology domain; (2) providing an overview based on identified literature of the state of the art of the ML algorithm applying in cardiology. METHODS For organizing this review, we have employed PRISMA statement. PRISMA is a set of items for reporting in systematic reviews and meta-analyses, focused on the reporting of reviews evaluating randomized trials, but can also be used as a basis for reporting systematic review. For the review, we have adopted PRISMA statement and have identified the following items: review questions, information sources, search strategy, selection criteria. RESULTS In total 27 scientific articles or conference papers written in English and reporting about implementation of an ML-method or algorithm in cardiology domain were included in this review. We have examined four aspects: aims of ML-systems, methods, datasets and evaluation metrics. CONCLUSIONS We suppose, this systematic review will be helpful for researchers developing machine-learning system for a medical domain and in particular for cardiology.

Download Full-text

Machine Learning Applications in Breast Cancer Diagnosis

Handbook of Research on Machine Learning Innovations and Trends - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-2229-4.ch020 ◽

2017 ◽

pp. 465-490

Author(s):

Syed Jamal Safdar Gardezi ◽

Mohamed Meselhy Eltoukhy ◽

Ibrahima Faye

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Feature Extraction ◽

Machine Learning Algorithms ◽

Learning Tools ◽

Extraction Techniques ◽

Cad Systems ◽

Cad System ◽

Machine Learning Applications ◽

Detection And Diagnosis

Breast cancer is one of the leading causes of death in women worldwide. Early detection is the key to reduce the mortality rates. Mammography screening has proven to be one of the effective tools for diagnosis of breast cancer. Computer aided diagnosis (CAD) system is a fast, reliable, and cost-effective tool in assisting the radiologists/physicians for diagnosis of breast cancer. CAD systems play an increasingly important role in the clinics by providing a second opinion. Clinical trials have shown that CAD systems have improved the accuracy of breast cancer detection. A typical CAD system involves three major steps i.e. segmentation of suspected lesions, feature extraction and classification of these regions into normal or abnormal class and further into benign or malignant stages. The diagnostics ability of any CAD system is dependent on accurate segmentation, feature extraction techniques and most importantly classification tools that have ability to discriminate the normal tissues from the abnormal tissues. In this chapter we discuss the application of machine learning algorithms e.g. ANN, binary tree, SVM, etc. together with segmentation and feature extraction techniques in a CAD system development. Various methods used in the detection and diagnosis of breast lesions in mammography are reviewed. A brief introduction of machine learning tools, used in diagnosis and their classification performance on various segmentation and feature extraction techniques is presented.

Download Full-text

Spoken words as biomarkers: using machine learning to gain insight into communication as a predictor of anxiety

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocaa049 ◽

2020 ◽

Vol 27 (6) ◽

pp. 929-933

Author(s):

George Demiris ◽

Kristin L Corey Magan ◽

Debra Parker Oliver ◽

Karla T Washington ◽

Chad Chadwick ◽

...

Keyword(s):

Machine Learning ◽

Secondary Data ◽

Health Indicators ◽

Machine Learning Algorithms ◽

Standardized Assessments ◽

Learning Tools ◽

Data Set ◽

Problem Solving Therapy ◽

Audio Communication ◽

The Impact

Abstract Objective The goal of this study was to explore whether features of recorded and transcribed audio communication data extracted by machine learning algorithms can be used to train a classifier for anxiety. Materials and Methods We used a secondary data set generated by a clinical trial examining problem-solving therapy for hospice caregivers consisting of 140 transcripts of multiple, sequential conversations between an interviewer and a family caregiver along with standardized assessments of anxiety prior to each session; 98 of these transcripts (70%) served as the training set, holding the remaining 30% of the data for evaluation. Results A classifier for anxiety was developed relying on language-based features. An 86% precision, 78% recall, 81% accuracy, and 84% specificity were achieved with the use of the trained classifiers. High anxiety inflections were found among recently bereaved caregivers and were usually connected to issues related to transitioning out of the caregiving role. This analysis highlighted the impact of lowering anxiety by increasing reciprocity between interviewers and caregivers. Conclusion Verbal communication can provide a platform for machine learning tools to highlight and predict behavioral health indicators and trends.

Download Full-text

Machine learning for human learners: opportunities, issues, tensions and threats

Educational Technology Research and Development ◽

10.1007/s11423-020-09858-2 ◽

2020 ◽

Author(s):

Mary E. Webb ◽

Andrew Fluck ◽

Johannes Magenheim ◽

Joyce Malyn-Smith ◽

Juliet Waters ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Ethical Issues ◽

Practical Experience ◽

Learning Systems ◽

Learning System ◽

Adaptive Behaviour ◽

Recent Developments ◽

Key Aspects ◽

School Curricula

AbstractMachine learning systems are infiltrating our lives and are beginning to become important in our education systems. This article, developed from a synthesis and analysis of previous research, examines the implications of recent developments in machine learning for human learners and learning. In this article we first compare deep learning in computers and humans to examine their similarities and differences. Deep learning is identified as a sub-set of machine learning, which is itself a component of artificial intelligence. Deep learning often depends on backwards propagation in weighted neural networks, so is non-deterministic—the system adapts and changes through practical experience or training. This adaptive behaviour predicates the need for explainability and accountability in such systems. Accountability is the reverse of explainability. Explainability flows through the system from inputs to output (decision) whereas accountability flows backwards, from a decision to the person taking responsibility for it. Both explainability and accountability should be incorporated in machine learning system design from the outset to meet social, ethical and legislative requirements. For students to be able to understand the nature of the systems that may be supporting their own learning as well as to act as responsible citizens in contemplating the ethical issues that machine learning raises, they need to understand key aspects of machine learning systems and have opportunities to adapt and create such systems. Therefore, some changes are needed to school curricula. The article concludes with recommendations about machine learning for teachers, students, policymakers, developers and researchers.

Download Full-text

Algorithmic Opacity: Making Algorithmic Processes Transparent through Abstraction Hierarchy

Proceedings of the Human Factors and Ergonomics Society Annual Meeting ◽

10.1177/1541931218621046 ◽

2018 ◽

Vol 62 (1) ◽

pp. 192-196

Author(s):

Pragya Paudyal ◽

B.L. William Wong

Keyword(s):

Machine Learning ◽

Decision Making ◽

Ethical Decision Making ◽

Ethical Issues ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Intelligence Analysis ◽

Functional Relationships ◽

Decomposition Space ◽

Criminal Intelligence

In this paper we introduce the problem of algorithmic opacity and the challenges it presents to ethical decision-making in criminal intelligence analysis. Machine learning algorithms have played important roles in the decision-making process over the past decades. Intelligence analysts are increasingly being presented with smart black box automation that use machine learning algorithms to find patterns or interesting and unusual occurrences in big data sets. Algorithmic opacity is the lack visibility of computational processes such that humans are not able to inspect its inner workings to ascertain for themselves how the results and conclusions were computed. This is a problem that leads to several ethical issues. In the VALCRI project, we developed an abstraction hierarchy and abstraction decomposition space to identify important functional relationships and system invariants in relation to ethical goals. Such explanatory relationships can be valuable for making algorithmic process transparent during the criminal intelligence analysis process.

Download Full-text

Does Machine Learning Automate Moral Hazard and Error?

The American Economic Review ◽

10.1257/aer.p20171084 ◽

2017 ◽

Vol 107 (5) ◽

pp. 476-480 ◽

Cited By ~ 31

Author(s):

Sendhil Mullainathan ◽

Ziad Obermeyer

Keyword(s):

Machine Learning ◽

Health Care ◽

Characteristic Feature ◽

Health Data ◽

Machine Learning Algorithms ◽

Learning Tools ◽

Human Judgment ◽

Measurement Issues ◽

Applications Of Machine Learning ◽

A Minor

Machine learning tools are beginning to be deployed en masse in health care. While the statistical underpinnings of these techniques have been questioned with regard to causality and stability, we highlight a different concern here, relating to measurement issues. A characteristic feature of health data, unlike other applications of machine learning, is that neither y nor x is measured perfectly. Far from a minor nuance, this can undermine the power of machine learning algorithms to drive change in the health care system--and indeed, can cause them to reproduce and even magnify existing errors in human judgment.

Download Full-text

A Review on Various Algorithms used in Machine Learning

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit1952248 ◽

2019 ◽

pp. 915-920

Author(s):

Divya Chaudhary ◽

Er. Richa Vasuja

Keyword(s):

Machine Learning ◽

Data Mining ◽

New Technologies ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Learning System ◽

Training Set ◽

Learning Data ◽

Do So

In today's scenario all of data is being generated by everyone of us . so it becomes vital for us to handle this data. To do so new technologies are being developed such as machine learning, data mining etc. This paper gives the study related to machine learning(ML).Precise approximations are repetitively being produced by Machine Learning algorithms. Machine learning system effectively “learns” how to guess from training set of completed jobs. The main purpose of the review is to give a jagged estimate or overview about the mostly used algorithms in machine learning.

Download Full-text

Execution Assessment of Machine Learning Algorithms for Spam Profile Detection on Instagram

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/561032021 ◽

2021 ◽

Vol 10 (3) ◽

pp. 1889-1894

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Nearest Neighbor ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Tools ◽

Learning Models ◽

K Nearest Neighbor

Witheverypassingsecondsocialnetworkcommunityisgrowingrapidly,becauseofthat,attackershaveshownkeeninterestinthesekindsofplatformsandwanttodistributemischievouscontentsontheseplatforms.Withthefocus on introducing new set of characteristics and features forcounteractivemeasures,agreatdealofstudieshasresearchedthe possibility of lessening the malicious activities on social medianetworks. This research was to highlight features for identifyingspammers on Instagram and additional features were presentedto improve the performance of different machine learning algorithms. Performance of different machine learning algorithmsnamely, Multilayer Perceptron (MLP), Random Forest (RF), K-Nearest Neighbor (KNN) and Support Vector Machine (SVM)were evaluated on machine learning tools named, RapidMinerand WEKA. The results from this research tells us that RandomForest (RF) outperformed all other selected machine learningalgorithmsonbothselectedmachinelearningtools.OverallRandom Forest (RF) provided best results on RapidMiner. Theseresultsareusefulfortheresearcherswhoarekeentobuildmachine learning models to find out the spamming activities onsocialnetworkcommunities.

Download Full-text

Measurement of Ethical Issues in Software Products

10.20944/preprints202006.0294.v1 ◽

2020 ◽

Author(s):

Francesco Di Tria

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Computer Science ◽

Ethical Issues ◽

Research Field ◽

Machine Learning Algorithms ◽

Software Products ◽

Ethics Research ◽

Software Product ◽

Formal Requirements

Ethics is a research field that is obtaining more and more attention in Computer Science due to the proliferation of artificial intelligence software, machine learning algorithms, robot agents (like chatbot), and so on. Indeed, ethics research has produced till now a set of guidelines, such as ethical codes, to be followed by people involved in Computer Science. However, a little effort has been spent for producing formal requirements to be included in the design process of software able to act ethically with users. In the paper, we investigate those issues that make a software product ethical and propose a set of metrics devoted to quantitatively evaluate if a software product can be considered ethical or not.

Download Full-text