Minimizing the cost of iterative compilation with active learning

Many real-world problems can be formalized as predicting links in a partially observed network. Examples include Facebook friendship suggestions, the prediction of protein–protein interactions, and the identification of hidden relationships in a crime network. Several link prediction algorithms, notably those recently introduced using network embedding, are capable of doing this by just relying on the observed part of the network. Often, whether two nodes are linked can be queried, albeit at a substantial cost (e.g., by questionnaires, wet lab experiments, or undercover work). Such additional information can improve the link prediction accuracy, but owing to the cost, the queries must be made with due consideration. Thus, we argue that an active learning approach is of great potential interest and developed ALPINE (Active Link Prediction usIng Network Embedding), a framework that identifies the most useful link status by estimating the improvement in link prediction accuracy to be gained by querying it. We proposed several query strategies for use in combination with ALPINE, inspired by the optimal experimental design and active learning literature. Experimental results on real data not only showed that ALPINE was scalable and boosted link prediction accuracy with far fewer queries, but also shed light on the relative merits of the strategies, providing actionable guidance for practitioners.

Download Full-text

Active Learning Approaches for Labeling Text: Review and Assessment of the Performance of Active Learning Approaches

Political Analysis ◽

10.1017/pan.2020.4 ◽

2020 ◽

Vol 28 (4) ◽

pp. 532-551

Author(s):

Blake Miller ◽

Fridolin Linder ◽

Walter R. Mebane

Keyword(s):

Machine Learning ◽

Active Learning ◽

Random Sampling ◽

Supervised Machine Learning ◽

Learning Approaches ◽

Simulation Studies ◽

Text Data ◽

Passive Learning ◽

Machine Learning Model ◽

The Cost

Supervised machine learning methods are increasingly employed in political science. Such models require costly manual labeling of documents. In this paper, we introduce active learning, a framework in which data to be labeled by human coders are not chosen at random but rather targeted in such a way that the required amount of data to train a machine learning model can be minimized. We study the benefits of active learning using text data examples. We perform simulation studies that illustrate conditions where active learning can reduce the cost of labeling text data. We perform these simulations on three corpora that vary in size, document length, and domain. We find that in cases where the document class of interest is not balanced, researchers can label a fraction of the documents one would need using random sampling (or “passive” learning) to achieve equally performing classifiers. We further investigate how varying levels of intercoder reliability affect the active learning procedures and find that even with low reliability, active learning performs more efficiently than does random sampling.

Download Full-text

Studying Active Learning in the Cost-Sensitive Framework

2012 45th Hawaii International Conference on System Sciences ◽

10.1109/hicss.2012.552 ◽

2012 ◽

Cited By ~ 3

Author(s):

Victor S. Sheng

Keyword(s):

Active Learning ◽

The Cost

Download Full-text

Rough Set Based Clustering Using Active Learning Approach

International Journal of Artificial Life Research ◽

10.4018/jalr.2011100102 ◽

2011 ◽

Vol 2 (4) ◽

pp. 12-23 ◽

Cited By ~ 1

Author(s):

Rekha Kandwal ◽

Prerna Mahajan ◽

Ritu Vijay

Keyword(s):

Active Learning ◽

Rough Set ◽

Rough Set Theory ◽

Hamming Distance ◽

Theoretical Background ◽

Learning Approach ◽

Data Sets ◽

Data Set ◽

The Cost ◽

The Given

This paper revisits the problem of active learning and decision making when the cost of labeling incurs cost and unlabeled data is available in abundance. In many real world applications large amounts of data are available but the cost of correctly labeling it prohibits its use. In such cases, active learning can be employed. In this paper the authors propose rough set based clustering using active learning approach. The authors extend the basic notion of Hamming distance to propose a dissimilarity measure which helps in finding the approximations of clusters in the given data set. The underlying theoretical background for this decision is rough set theory. The authors have investigated our algorithm on the benchmark data sets from UCI machine learning repository which have shown promising results.

Download Full-text

Active Learning and Mapping

Data Mining ◽

10.4018/978-1-4666-2455-9.ch004 ◽

2013 ◽

pp. 66-91

Author(s):

Laurent A. Baumes

Keyword(s):

Data Mining ◽

Heterogeneous Catalysis ◽

Active Learning ◽

Data Collection ◽

Material Science ◽

Search Space ◽

Previous Knowledge ◽

Experimental Apparatus ◽

Samples Selection ◽

The Cost

The data mining technology increasingly employed into new industrial processes, which require automatic analysis of data and related results in order to quickly proceed to conclusions. However, for some applications, an absolute automation may not be appropriate. Unlike traditional data mining, contexts deal with voluminous amounts of data, some domains are actually characterized by a scarcity of data, owing to the cost and time involved in conducting simulations or setting up experimental apparatus for data collection. In such domains, it is hence prudent to balance speed through automation and the utility of the generated data. The authors review the active learning methodology, and a new one that aims at generating successively new samples in order to reach an improved final estimation of the entire search space investigated according to the knowledge accumulated iteratively through samples selection and corresponding obtained results, is presented. The methodology is shown to be of great interest for applications such as high throughput material science and especially heterogeneous catalysis where the chemists do not have previous knowledge allowing to direct and to guide the exploration.

Download Full-text

A Comparative Analysis of Active Learning for Biomedical Text Mining

Applied System Innovation ◽

10.3390/asi4010023 ◽

2021 ◽

Vol 4 (1) ◽

pp. 23

Author(s):

Usman Naseem ◽

Matloob Khushi ◽

Shah Khalid Khan ◽

Kamran Shaukat ◽

Mohammad Ali Moni

Keyword(s):

Machine Learning ◽

Active Learning ◽

De Novo ◽

Comparative Investigation ◽

Free Text ◽

Machine Learning Applications ◽

Text Information ◽

Pathology Reports ◽

Time Required ◽

The Cost

An enormous amount of clinical free-text information, such as pathology reports, progress reports, clinical notes and discharge summaries have been collected at hospitals and medical care clinics. These data provide an opportunity of developing many useful machine learning applications if the data could be transferred into a learn-able structure with appropriate labels for supervised learning. The annotation of this data has to be performed by qualified clinical experts, hence, limiting the use of this data due to the high cost of annotation. An underutilised technique of machine learning that can label new data called active learning (AL) is a promising candidate to address the high cost of the label the data. AL has been successfully applied to labelling speech recognition and text classification, however, there is a lack of literature investigating its use for clinical purposes. We performed a comparative investigation of various AL techniques using ML and deep learning (DL)-based strategies on three unique biomedical datasets. We investigated random sampling (RS), least confidence (LC), informative diversity and density (IDD), margin and maximum representativeness-diversity (MRD) AL query strategies. Our experiments show that AL has the potential to significantly reducing the cost of manual labelling. Furthermore, pre-labelling performed using AL expediates the labelling process by reducing the time required for labelling.

Download Full-text

Multi-instance multi-label active learning

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/262 ◽

2017 ◽

Cited By ~ 5

Author(s):

Sheng-Jun Huang ◽

Nengneng Gao ◽

Songcan Chen

Keyword(s):

Active Learning ◽

Learning Algorithm ◽

Expressive Power ◽

Traditional Learning ◽

Great Success ◽

Main Approach ◽

Learning Tasks ◽

Real World Applications ◽

The Cost ◽

Output Space

Multi-instance multi-label learning(MIML) has been successfully applied into many real-world applications. Along with the enhancing of the expressive power, the cost of labelling a MIML example increases significantly. And thus it becomes an important task to train an effective MIML model with as few labelled examples as possible. Active learning, which actively selects the most valuable data to query their labels, is a main approach to reducing labeling cost. Existing active methods achieved great success in traditional learning tasks, but cannot be directly applied to MIML problems. In this paper, we propose a MIML active learning algorithm, which exploits diversity and uncertainty in both the input and output space to query the most valuable information. This algorithm designs a novel query strategy for MIML objects specifically and acquires more precise information from the oracle without addition cost. Based on the queried information, the MIML model is then effectively trained by simultaneously optimizing the relative rank among instances and labels.

Download Full-text

Cost-Effective Active Learning from Diverse Labelers

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/261 ◽

2017 ◽

Cited By ~ 7

Author(s):

Sheng-Jun Huang ◽

Jia-Lve Chen ◽

Xin Mu ◽

Zhi-Hua Zhou

Keyword(s):

Cost Effectiveness ◽

Active Learning ◽

Low Cost ◽

Selection Criterion ◽

Ground Truth ◽

Cost Effective ◽

Classification Model ◽

Data Sets ◽

Active Selection ◽

The Cost

In traditional active learning, there is only one labeler that always returns the ground truth of queried labels. However, in many applications, multiple labelers are available to offer diverse qualities of labeling with different costs. In this paper, we perform active selection on both instances and labelers, aiming to improve the classification model most with the lowest cost. While the cost of a labeler is proportional to its overall labeling quality, we also observe that different labelers usually have diverse expertise, and thus it is likely that labelers with a low overall quality can provide accurate labels on some specific instances. Based on this fact, we propose a novel active selection criterion to evaluate the cost-effectiveness of instance-labeler pairs, which ensures that the selected instance is helpful for improving the classification model, and meanwhile the selected labeler can provide an accurate label for the instance with a relative low cost. Experiments on both UCI and real crowdsourcing data sets demonstrate the superiority of our proposed approach on selecting cost-effective queries.

Download Full-text

A Model for Animal Home Range Estimation Based on the Active Learning Method

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi8110490 ◽

2019 ◽

Vol 8 (11) ◽

pp. 490

Author(s):

Guo ◽

Du ◽

Ma ◽

Huo ◽

Peng

Keyword(s):

Active Learning ◽

Home Range ◽

Euclidean Distance ◽

Geodesic Distance ◽

Learning Method ◽

Range Estimation ◽

Distance Transformation ◽

Active Learning Method ◽

Local Convex ◽

The Cost

Home range estimation is the basis of ecology and animal behavior research. Some popular estimators have been presented; however, they have not fully considered the impacts of terrain and obstacles. To address this defect, a novel estimator named the density-based fuzzy home range estimator (DFHRE) is proposed in this study, based on the active learning method (ALM). The Euclidean distance is replaced by the cost distance-induced geodesic distance transformation to account for the effects of terrain and obstacles. Three datasets are used to verify the proposed method, and comparisons with the kernel density-based estimator (KDE) and the local convex hulls (LoCoH) estimators and the cross validation test indicate that the proposed estimator outperforms the KDE and the LoCoH estimators.

Download Full-text

Supporting Project-Based Learning through Economical and Flexible Learning Spaces

Education Sciences ◽

10.3390/educsci9030212 ◽

2019 ◽

Vol 9 (3) ◽

pp. 212 ◽

Cited By ~ 1

Author(s):

Jesse Eickholt ◽

Vikas Jogiparthi ◽

Patrick Seeling ◽

Quintrese Hinton ◽

Matthew Johnson

Keyword(s):

Active Learning ◽

Learning Environments ◽

Classroom Interaction ◽

Project Based Learning ◽

Learning Spaces ◽

Self Directed Learning ◽

Flexible Learning ◽

Quasi Experimental ◽

Directed Learning ◽

The Cost

Project-based learning often centers learning experiences around projects and is characterized by the application of knowledge, management of resources, and self-directed learning. In recent years, newer classroom designs have been developed to facilitate communication, classroom interaction and active learning but the cost of such spaces can be prohibitive. Here we present two economical options for flexible learning spaces that support the aims of project-based learning and cost much less than typical active learning classroom models. In a quasi-experimental study, one of our economical active learning environments was paired with a traditional classroom and a prototypical active learning classroom. These learning environments were used in a CS2 course that employed a group-based, active learning pedagogy centered on in-class projects. Students’ perceptions were gathered on the classrooms and their supporting technology. Between the economy and prototypical active learning environment, no significant differences were found in students’ perceptions of the space as it related to collaboration and supporting learning. Results from accompany focus groups indicates that the space was conducive to their learning and helped them engage with peers. These economical and flexible options support the aims of project-based learning at a reduced cost.

Download Full-text