Active Learning and Mapping

Data Mining ◽  
2013 ◽  
pp. 66-91
Author(s):  
Laurent A. Baumes

The data mining technology increasingly employed into new industrial processes, which require automatic analysis of data and related results in order to quickly proceed to conclusions. However, for some applications, an absolute automation may not be appropriate. Unlike traditional data mining, contexts deal with voluminous amounts of data, some domains are actually characterized by a scarcity of data, owing to the cost and time involved in conducting simulations or setting up experimental apparatus for data collection. In such domains, it is hence prudent to balance speed through automation and the utility of the generated data. The authors review the active learning methodology, and a new one that aims at generating successively new samples in order to reach an improved final estimation of the entire search space investigated according to the knowledge accumulated iteratively through samples selection and corresponding obtained results, is presented. The methodology is shown to be of great interest for applications such as high throughput material science and especially heterogeneous catalysis where the chemists do not have previous knowledge allowing to direct and to guide the exploration.

Author(s):  
Laurent A. Baumes

The data mining technology increasingly employed into new industrial processes, which require automatic analysis of data and related results in order to quickly proceed to conclusions. However, for some applications, an absolute automation may not be appropriate. Unlike traditional data mining, contexts deal with voluminous amounts of data, some domains are actually characterized by a scarcity of data, owing to the cost and time involved in conducting simulations or setting up experimental apparatus for data collection. In such domains, it is hence prudent to balance speed through automation and the utility of the generated data. The authors review the active learning methodology, and a new one that aims at generating successively new samples in order to reach an improved final estimation of the entire search space investigated according to the knowledge accumulated iteratively through samples selection and corresponding obtained results, is presented. The methodology is shown to be of great interest for applications such as high throughput material science and especially heterogeneous catalysis where the chemists do not have previous knowledge allowing to direct and to guide the exploration.


Author(s):  
Antonella Capriello ◽  
Piercarlo Rossi

With the advent of Web 2.0 technologies, online forms of communication are rich sources of data to study socio-economic growth patterns and consumer behaviours. In this research field, the more robust development of data mining and opinion monitoring depends on fully automating data collection to monitor the evolution of customer opinions and preferences in real time. Although web crawlers or spiders can assist researchers in an innovative and effective way, this data collection approach could give rise to ethical concerns on the cost of web crawling processes and on data protection and privacy. With a focus on opinion monitoring, the chapter aims to discuss the ethical and legal issues of data mining in relation to spidering scripts. This contribution proposes a detailed analysis of the ethical and legal aspects of online data collection by comparing existing legislations. For illustrative purposes, a spidering software is presented to discuss its potential and explore ethical solutions in the data-mining sphere.


1983 ◽  
Vol 20 (4) ◽  
pp. 439-442 ◽  
Author(s):  
Frederick Wiseman ◽  
Marianne Schafer ◽  
Richard Schafer

The authors describe an experimental study designed to determine the effects of a monetary incentive on (1) a potential respondent's decision to participate in a central-location interview, (2) that person's expressed willingness to participate in a future survey, and (3) the cost of data collection.


2021 ◽  
Vol 11 (11) ◽  
pp. 5043
Author(s):  
Xi Chen ◽  
Bo Kang ◽  
Jefrey Lijffijt ◽  
Tijl De Bie

Many real-world problems can be formalized as predicting links in a partially observed network. Examples include Facebook friendship suggestions, the prediction of protein–protein interactions, and the identification of hidden relationships in a crime network. Several link prediction algorithms, notably those recently introduced using network embedding, are capable of doing this by just relying on the observed part of the network. Often, whether two nodes are linked can be queried, albeit at a substantial cost (e.g., by questionnaires, wet lab experiments, or undercover work). Such additional information can improve the link prediction accuracy, but owing to the cost, the queries must be made with due consideration. Thus, we argue that an active learning approach is of great potential interest and developed ALPINE (Active Link Prediction usIng Network Embedding), a framework that identifies the most useful link status by estimating the improvement in link prediction accuracy to be gained by querying it. We proposed several query strategies for use in combination with ALPINE, inspired by the optimal experimental design and active learning literature. Experimental results on real data not only showed that ALPINE was scalable and boosted link prediction accuracy with far fewer queries, but also shed light on the relative merits of the strategies, providing actionable guidance for practitioners.


2010 ◽  
Vol 40-41 ◽  
pp. 156-161 ◽  
Author(s):  
Yang Li ◽  
Yan Qiang Li ◽  
Zhi Xue Wang

With the rapid development of automotive ECUs(Electronic Control Unit), the fault diagnosis becomes increasingly complicated. And the link between fault and symptom becomes less obvious. In order to improve the maintenance quality and efficiency, the paper proposes a fault diagnosis approach based on data mining technologies. By making full use of data stream, we firstly extract fault symptom vectors by processing data stream, and then establish a diagnosis decision tree through the ID3 decision tree algorithm, and finally store the link rules between faults and the related symptoms into historical fault database as a foundation for the fault diagnosis. The database provides the basis of trend judgments for a future fault. To verify this approach, an example of diagnosing faults of entertainment ECU is showed. The test result testifies the reliability and validity of this diagnostic method and reduces the cost of ECU diagnosis.


BioResources ◽  
2021 ◽  
Vol 16 (3) ◽  
pp. 4891-4904
Author(s):  
Selahattin Bardak ◽  
Timucin Bardak ◽  
Hüseyin Peker ◽  
Eser Sözen ◽  
Yildiz Çabuk

Wood materials have been used in many products such as furniture, stairs, windows, and doors for centuries. There are differences in methods used to adapt wood to ambient conditions. Impregnation is a widely used method of wood preservation. In terms of efficiency, it is critical to optimize the parameters for impregnation. Data mining techniques reduce most of the cost and operational challenges with accurate prediction in the wood industry. In this study, three data-mining algorithms were applied to predict bending strength in impregnated wood materials (Pinus sylvestris L. and Millettia laurentii). Models were created from real experimental data to examine the relationship between bending strength, diffusion time, vacuum duration, and wood type, based on decision trees (DT), random forest (RF), and Gaussian process (GP) algorithms. The highest bending strength was achieved with wenge (Millettia laurentii) wood in 10 bar vacuum and the diffusion condition during 25 min. The results showed that all algorithms are suitable for predicting bending strength. The goodness of fit for the testing phase was determined as 0.994, 0.986, and 0.989 in the DT, RF, and GP algorithms, respectively. Moreover, the importance of attributes was determined in the algorithms.


2019 ◽  
Vol 4 (2) ◽  
Author(s):  
Ahmad Jubaidi

The purpose of this study is to determine the effectiveness of KK, KTP, and AK services in Samarinda Kota sub district and its factors influencing the effectiveness of KK, KTP, AKservices.The research used field research method which gives an overview on the effectiveness of KK, KTP, and AK services in Samarinda Kota sub district. Data collection techniques use observation techniques, interviews, and media questionnaires by selecting informants who play a role and are involved technically and functionally in service delivery to the community. The data obtained are then analyzed qualitatively and supported by quantitative data.The results showed that service implementation in Samarinda Kota sub-district, especially in the field of population administration and civil registration is done in accordance with existing mechanism and regulation which have been determined by seeing some service indicator such as simplicity is in very safe category with 6.67% and certainty of service procedure and tariff cost are in accordance with the value of 88.33% and 70% respectively, the security and convenience of facilities and infrastructure are in safe and comfortable category with 65% and 73.33% respectively, openness about the ease of obtaining information and provisions services in the categories easy and explained if requested with the value of 71.67% and 63.33% respectively, economical about the cost of KK rates, ID cards, AK category Rp 10,000 - Rp 15,000, equitable fairness with a value of 60%, the timeliness is in category 1 - 2 days, and the efficiency is an exact category with a value of 80%.  And the factors that affect the service is the resources apparatus, facilities and infrastructure, and public awareness. Keywords: Effectiveness, Public Service


2021 ◽  
Vol 2083 (3) ◽  
pp. 032023
Author(s):  
Le Zhang

Abstract Traditional data collection and review methods in power grid planning have always had the problems of time-consuming, poor real-time performance, and cumbersome processes. The application of mobile GIS solves the problems of data collection and review methods and makes more efficient use of mobile GIS terminal collection. The data of the mobile GIS solve the urgent problems that need to be solved since the popularization and application of mobile GIS. This system implements functions such as storage, transmission, and review based on mobile GIS data, which will greatly improve the efficiency of data collection by mobile terminals and reduce the cost of data collection. Realize the planning simulation of the power grid under the intelligent cycle of the whole scene.


Sign in / Sign up

Export Citation Format

Share Document