scholarly journals Minimizing the cost of iterative compilation with active learning

Author(s):  
William F. Ogilvie ◽  
Pavlos Petoumenos ◽  
Zheng Wang ◽  
Hugh Leather
2021 ◽  
Vol 11 (11) ◽  
pp. 5043
Author(s):  
Xi Chen ◽  
Bo Kang ◽  
Jefrey Lijffijt ◽  
Tijl De Bie

Many real-world problems can be formalized as predicting links in a partially observed network. Examples include Facebook friendship suggestions, the prediction of protein–protein interactions, and the identification of hidden relationships in a crime network. Several link prediction algorithms, notably those recently introduced using network embedding, are capable of doing this by just relying on the observed part of the network. Often, whether two nodes are linked can be queried, albeit at a substantial cost (e.g., by questionnaires, wet lab experiments, or undercover work). Such additional information can improve the link prediction accuracy, but owing to the cost, the queries must be made with due consideration. Thus, we argue that an active learning approach is of great potential interest and developed ALPINE (Active Link Prediction usIng Network Embedding), a framework that identifies the most useful link status by estimating the improvement in link prediction accuracy to be gained by querying it. We proposed several query strategies for use in combination with ALPINE, inspired by the optimal experimental design and active learning literature. Experimental results on real data not only showed that ALPINE was scalable and boosted link prediction accuracy with far fewer queries, but also shed light on the relative merits of the strategies, providing actionable guidance for practitioners.


2020 ◽  
Vol 28 (4) ◽  
pp. 532-551
Author(s):  
Blake Miller ◽  
Fridolin Linder ◽  
Walter R. Mebane

Supervised machine learning methods are increasingly employed in political science. Such models require costly manual labeling of documents. In this paper, we introduce active learning, a framework in which data to be labeled by human coders are not chosen at random but rather targeted in such a way that the required amount of data to train a machine learning model can be minimized. We study the benefits of active learning using text data examples. We perform simulation studies that illustrate conditions where active learning can reduce the cost of labeling text data. We perform these simulations on three corpora that vary in size, document length, and domain. We find that in cases where the document class of interest is not balanced, researchers can label a fraction of the documents one would need using random sampling (or “passive” learning) to achieve equally performing classifiers. We further investigate how varying levels of intercoder reliability affect the active learning procedures and find that even with low reliability, active learning performs more efficiently than does random sampling.


2011 ◽  
Vol 2 (4) ◽  
pp. 12-23 ◽  
Author(s):  
Rekha Kandwal ◽  
Prerna Mahajan ◽  
Ritu Vijay

This paper revisits the problem of active learning and decision making when the cost of labeling incurs cost and unlabeled data is available in abundance. In many real world applications large amounts of data are available but the cost of correctly labeling it prohibits its use. In such cases, active learning can be employed. In this paper the authors propose rough set based clustering using active learning approach. The authors extend the basic notion of Hamming distance to propose a dissimilarity measure which helps in finding the approximations of clusters in the given data set. The underlying theoretical background for this decision is rough set theory. The authors have investigated our algorithm on the benchmark data sets from UCI machine learning repository which have shown promising results.


Data Mining ◽  
2013 ◽  
pp. 66-91
Author(s):  
Laurent A. Baumes

The data mining technology increasingly employed into new industrial processes, which require automatic analysis of data and related results in order to quickly proceed to conclusions. However, for some applications, an absolute automation may not be appropriate. Unlike traditional data mining, contexts deal with voluminous amounts of data, some domains are actually characterized by a scarcity of data, owing to the cost and time involved in conducting simulations or setting up experimental apparatus for data collection. In such domains, it is hence prudent to balance speed through automation and the utility of the generated data. The authors review the active learning methodology, and a new one that aims at generating successively new samples in order to reach an improved final estimation of the entire search space investigated according to the knowledge accumulated iteratively through samples selection and corresponding obtained results, is presented. The methodology is shown to be of great interest for applications such as high throughput material science and especially heterogeneous catalysis where the chemists do not have previous knowledge allowing to direct and to guide the exploration.


2021 ◽  
Vol 4 (1) ◽  
pp. 23
Author(s):  
Usman Naseem ◽  
Matloob Khushi ◽  
Shah Khalid Khan ◽  
Kamran Shaukat ◽  
Mohammad Ali Moni

An enormous amount of clinical free-text information, such as pathology reports, progress reports, clinical notes and discharge summaries have been collected at hospitals and medical care clinics. These data provide an opportunity of developing many useful machine learning applications if the data could be transferred into a learn-able structure with appropriate labels for supervised learning. The annotation of this data has to be performed by qualified clinical experts, hence, limiting the use of this data due to the high cost of annotation. An underutilised technique of machine learning that can label new data called active learning (AL) is a promising candidate to address the high cost of the label the data. AL has been successfully applied to labelling speech recognition and text classification, however, there is a lack of literature investigating its use for clinical purposes. We performed a comparative investigation of various AL techniques using ML and deep learning (DL)-based strategies on three unique biomedical datasets. We investigated random sampling (RS), least confidence (LC), informative diversity and density (IDD), margin and maximum representativeness-diversity (MRD) AL query strategies. Our experiments show that AL has the potential to significantly reducing the cost of manual labelling. Furthermore, pre-labelling performed using AL expediates the labelling process by reducing the time required for labelling.


Author(s):  
Sheng-Jun Huang ◽  
Nengneng Gao ◽  
Songcan Chen

Multi-instance multi-label learning(MIML) has been successfully applied into many real-world applications. Along with the enhancing of the expressive power, the cost of labelling a MIML example increases significantly. And thus it becomes an important task to train an effective MIML model with as few labelled examples as possible. Active learning, which actively selects the most valuable data to query their labels, is a main approach to reducing labeling cost. Existing active methods achieved great success in traditional learning tasks, but cannot be directly applied to MIML problems. In this paper, we propose a MIML active learning algorithm, which exploits diversity and uncertainty in both the input and output space to query the most valuable information. This algorithm designs a novel query strategy for MIML objects specifically and acquires more precise information from the oracle without addition cost. Based on the queried information, the MIML model is then effectively trained by simultaneously optimizing the relative rank among instances and labels.


Author(s):  
Sheng-Jun Huang ◽  
Jia-Lve Chen ◽  
Xin Mu ◽  
Zhi-Hua Zhou

In traditional active learning, there is only one labeler that always returns the ground truth of queried labels. However, in many applications, multiple labelers are available to offer diverse qualities of labeling with different costs. In this paper, we perform active selection on both instances and labelers, aiming to improve the classification model most with the lowest cost. While the cost of a labeler is proportional to its overall labeling quality, we also observe that different labelers usually have diverse expertise, and thus it is likely that labelers with a low overall quality can provide accurate labels on some specific instances. Based on this fact, we propose a novel active selection criterion to evaluate the cost-effectiveness of instance-labeler pairs, which ensures that the selected instance is helpful for improving the classification model, and meanwhile the selected labeler can provide an accurate label for the instance with a relative low cost. Experiments on both UCI and real crowdsourcing data sets demonstrate the superiority of our proposed approach on selecting cost-effective queries.


2019 ◽  
Vol 8 (11) ◽  
pp. 490
Author(s):  
Guo ◽  
Du ◽  
Ma ◽  
Huo ◽  
Peng

Home range estimation is the basis of ecology and animal behavior research. Some popular estimators have been presented; however, they have not fully considered the impacts of terrain and obstacles. To address this defect, a novel estimator named the density-based fuzzy home range estimator (DFHRE) is proposed in this study, based on the active learning method (ALM). The Euclidean distance is replaced by the cost distance-induced geodesic distance transformation to account for the effects of terrain and obstacles. Three datasets are used to verify the proposed method, and comparisons with the kernel density-based estimator (KDE) and the local convex hulls (LoCoH) estimators and the cross validation test indicate that the proposed estimator outperforms the KDE and the LoCoH estimators.


2019 ◽  
Vol 9 (3) ◽  
pp. 212 ◽  
Author(s):  
Jesse Eickholt ◽  
Vikas Jogiparthi ◽  
Patrick Seeling ◽  
Quintrese Hinton ◽  
Matthew Johnson

Project-based learning often centers learning experiences around projects and is characterized by the application of knowledge, management of resources, and self-directed learning. In recent years, newer classroom designs have been developed to facilitate communication, classroom interaction and active learning but the cost of such spaces can be prohibitive. Here we present two economical options for flexible learning spaces that support the aims of project-based learning and cost much less than typical active learning classroom models. In a quasi-experimental study, one of our economical active learning environments was paired with a traditional classroom and a prototypical active learning classroom. These learning environments were used in a CS2 course that employed a group-based, active learning pedagogy centered on in-class projects. Students’ perceptions were gathered on the classrooms and their supporting technology. Between the economy and prototypical active learning environment, no significant differences were found in students’ perceptions of the space as it related to collaboration and supporting learning. Results from accompany focus groups indicates that the space was conducive to their learning and helped them engage with peers. These economical and flexible options support the aims of project-based learning at a reduced cost.


Sign in / Sign up

Export Citation Format

Share Document