scholarly journals IP Geolocation through Reverse DNS

2022 ◽  
Vol 22 (1) ◽  
pp. 1-29
Author(s):  
Ovidiu Dan ◽  
Vaibhav Parikh ◽  
Brian D. Davison

IP Geolocation databases are widely used in online services to map end-user IP addresses to their geographical location. However, they use proprietary geolocation methods, and in some cases they have poor accuracy. We propose a systematic approach to use reverse DNS hostnames for geolocating IP addresses, with a focus on end-user IP addresses as opposed to router IPs. Our method is designed to be combined with other geolocation data sources. We cast the task as a machine learning problem where, for a given hostname, we first generate a list of potential location candidates, and then we classify each hostname and candidate pair using a binary classifier to determine which location candidates are plausible. Finally, we rank the remaining candidates by confidence (class probability) and break ties by population count. We evaluate our approach against three state-of-the-art academic baselines and two state-of-the-art commercial IP geolocation databases. We show that our work significantly outperforms the academic baselines and is complementary and competitive with commercial databases. To aid reproducibility, we open source our entire approach and make it available to the academic community.

Author(s):  
Chuang Zhang ◽  
Dexin Ren ◽  
Tongliang Liu ◽  
Jian Yang ◽  
Chen Gong

Positive and Unlabeled (PU) learning aims to learn a binary classifier from only positive and unlabeled training data. The state-of-the-art methods usually formulate PU learning as a cost-sensitive learning problem, in which every unlabeled example is simultaneously treated as positive and negative with different class weights. However, the ground-truth label of an unlabeled example should be unique, so the existing models inadvertently introduce the label noise which may lead to the biased classifier and deteriorated performance. To solve this problem, this paper  proposes a novel algorithm dubbed as "Positive and Unlabeled learning with Label Disambiguation'' (PULD). We first regard all the unlabeled examples in PU learning as ambiguously labeled as positive and negative, and then employ the margin-based label disambiguation strategy, which enlarges the margin of classifier response between the most likely label and the less likely one, to find the unique ground-truth label of each unlabeled example. Theoretically, we derive the generalization error bound of the proposed method by analyzing its Rademacher complexity. Experimentally, we conduct intensive experiments on both benchmark and real-world datasets, and the results clearly demonstrate the superiority of the proposed PULD to the existing PU learning approaches.


Semantic Web ◽  
2021 ◽  
pp. 1-16
Author(s):  
Esko Ikkala ◽  
Eero Hyvönen ◽  
Heikki Rantala ◽  
Mikko Koho

This paper presents a new software framework, Sampo-UI, for developing user interfaces for semantic portals. The goal is to provide the end-user with multiple application perspectives to Linked Data knowledge graphs, and a two-step usage cycle based on faceted search combined with ready-to-use tooling for data analysis. For the software developer, the Sampo-UI framework makes it possible to create highly customizable, user-friendly, and responsive user interfaces using current state-of-the-art JavaScript libraries and data from SPARQL endpoints, while saving substantial coding effort. Sampo-UI is published on GitHub under the open MIT License and has been utilized in several internal and external projects. The framework has been used thus far in creating six published and five forth-coming portals, mostly related to the Cultural Heritage domain, that have had tens of thousands of end-users on the Web.


MRS Bulletin ◽  
1995 ◽  
Vol 20 (8) ◽  
pp. 40-48 ◽  
Author(s):  
J.H. Westbrook ◽  
J.G. Kaufman ◽  
F. Cverna

Over the past 30 years we have seen a strong but uncoordinated effort to both increase the availability of numeric materials-property data in electronic media and to make the resultant mass of data more readily accessible and searchable for the end-user engineer. The end user is best able to formulate the question and to judge the utility of the answer for numeric property data inquiries, in contrast to textual or bibliographic data for which information specialists can expeditiously carry out searches.Despite the best efforts of several major programs, there remains a shortfall with respect to comprehensiveness and a gap between the goal of easy access to all the world's numeric databases and what can presently be achieved. The task has proven thornier and therefore much more costly than anyone envisioned, and computer access to data for materials scientists and engineers is still inadequate compared, for example, to the situation for molecular biologists or astronomers. However, progress has been made. More than 100 materials databases are listed and categorized by Wawrousek et al. that address several types of applications including: fundamental research, materials selection, component design, process control, materials identification and equivalency, expert systems, and education. Standardization is improving and access has been made more easy.In the discussion that follows, we will examine several characteristics of available information and delivery systems to assess their impact on the successes and limitations of the available products. The discussion will include the types and uses of the data, issues around data reliability and quality, the various formats in which data need to be accessed, and the various media available for delivery. Then we will focus on the state of the art by giving examples of the three major media through which broad electronic access to numeric properties has emerged: on-line systems, workstations, and disks, both floppy and CD-ROM. We will also cite some resources of where to look for numeric property data.


2016 ◽  
Vol 2016 ◽  
pp. 1-10 ◽  
Author(s):  
Huaping Guo ◽  
Weimei Zhi ◽  
Hongbing Liu ◽  
Mingliang Xu

In recent years, imbalanced learning problem has attracted more and more attentions from both academia and industry, and the problem is concerned with the performance of learning algorithms in the presence of data with severe class distribution skews. In this paper, we apply the well-known statistical model logistic discrimination to this problem and propose a novel method to improve its performance. To fully consider the class imbalance, we design a new cost function which takes into account the accuracies of both positive class and negative class as well as the precision of positive class. Unlike traditional logistic discrimination, the proposed method learns its parameters by maximizing the proposed cost function. Experimental results show that, compared with other state-of-the-art methods, the proposed one shows significantly better performance on measures of recall,g-mean,f-measure, AUC, and accuracy.


2022 ◽  
Vol 40 (2) ◽  
pp. 1-31
Author(s):  
Masoud Mansoury ◽  
Himan Abdollahpouri ◽  
Mykola Pechenizkiy ◽  
Bamshad Mobasher ◽  
Robin Burke

Fairness is a critical system-level objective in recommender systems that has been the subject of extensive recent research. A specific form of fairness is supplier exposure fairness, where the objective is to ensure equitable coverage of items across all suppliers in recommendations provided to users. This is especially important in multistakeholder recommendation scenarios where it may be important to optimize utilities not just for the end user but also for other stakeholders such as item sellers or producers who desire a fair representation of their items. This type of supplier fairness is sometimes accomplished by attempting to increase aggregate diversity to mitigate popularity bias and to improve the coverage of long-tail items in recommendations. In this article, we introduce FairMatch, a general graph-based algorithm that works as a post-processing approach after recommendation generation to improve exposure fairness for items and suppliers. The algorithm iteratively adds high-quality items that have low visibility or items from suppliers with low exposure to the users’ final recommendation lists. A comprehensive set of experiments on two datasets and comparison with state-of-the-art baselines show that FairMatch, although it significantly improves exposure fairness and aggregate diversity, maintains an acceptable level of relevance of the recommendations.


2020 ◽  
Vol 34 (04) ◽  
pp. 3962-3969
Author(s):  
Evrard Garcelon ◽  
Mohammad Ghavamzadeh ◽  
Alessandro Lazaric ◽  
Matteo Pirotta

In many fields such as digital marketing, healthcare, finance, and robotics, it is common to have a well-tested and reliable baseline policy running in production (e.g., a recommender system). Nonetheless, the baseline policy is often suboptimal. In this case, it is desirable to deploy online learning algorithms (e.g., a multi-armed bandit algorithm) that interact with the system to learn a better/optimal policy under the constraint that during the learning process the performance is almost never worse than the performance of the baseline itself. In this paper, we study the conservative learning problem in the contextual linear bandit setting and introduce a novel algorithm, the Conservative Constrained LinUCB (CLUCB2). We derive regret bounds for CLUCB2 that match existing results and empirically show that it outperforms state-of-the-art conservative bandit algorithms in a number of synthetic and real-world problems. Finally, we consider a more realistic constraint where the performance is verified only at predefined checkpoints (instead of at every step) and show how this relaxed constraint favorably impacts the regret and empirical performance of CLUCB2.


2021 ◽  
Vol 7 ◽  
pp. e661
Author(s):  
Raghad Baker Sadiq ◽  
Nurhizam Safie ◽  
Abdul Hadi Abd Rahman ◽  
Shidrokh Goudarzi

Organizations in various industries have widely developed the artificial intelligence (AI) maturity model as a systematic approach. This study aims to review state-of-the-art studies related to AI maturity models systematically. It allows a deeper understanding of the methodological issues relevant to maturity models, especially in terms of the objectives, methods employed to develop and validate the models, and the scope and characteristics of maturity model development. Our analysis reveals that most works concentrate on developing maturity models with or without their empirical validation. It shows that the most significant proportion of models were designed for specific domains and purposes. Maturity model development typically uses a bottom-up design approach, and most of the models have a descriptive characteristic. Besides that, maturity grid and continuous representation with five levels are currently trending in maturity model development. Six out of 13 studies (46%) on AI maturity pertain to assess the technology aspect, even in specific domains. It confirms that organizations still require an improvement in their AI capability and in strengthening AI maturity. This review provides an essential contribution to the evolution of organizations using AI to explain the concepts, approaches, and elements of maturity models.


2018 ◽  
Vol 8 (12) ◽  
pp. 2512 ◽  
Author(s):  
Ghouthi Boukli Hacene ◽  
Vincent Gripon ◽  
Nicolas Farrugia ◽  
Matthieu Arzel ◽  
Michel Jezequel

Deep learning-based methods have reached state of the art performances, relying on a large quantity of available data and computational power. Such methods still remain highly inappropriate when facing a major open machine learning problem, which consists of learning incrementally new classes and examples over time. Combining the outstanding performances of Deep Neural Networks (DNNs) with the flexibility of incremental learning techniques is a promising venue of research. In this contribution, we introduce Transfer Incremental Learning using Data Augmentation (TILDA). TILDA is based on pre-trained DNNs as feature extractors, robust selection of feature vectors in subspaces using a nearest-class-mean based technique, majority votes and data augmentation at both the training and the prediction stages. Experiments on challenging vision datasets demonstrate the ability of the proposed method for low complexity incremental learning, while achieving significantly better accuracy than existing incremental counterparts.


Author(s):  
Prayag Tiwari ◽  
Massimo Melucci

Machine Learning (ML) helps us to recognize patterns from raw data. ML is used in numerous domains i.e. biomedical, agricultural, food technology, etc. Despite recent technological advancements, there is still room for substantial improvement in prediction. Current ML models are based on classical theories of probability and statistics, which can now be replaced by Quantum Theory (QT) with the aim of improving the effectiveness of ML. In this paper, we propose the Binary Classifier Inspired by Quantum Theory (BCIQT) model, which outperforms the state of the art classification in terms of recall for every category.


Entropy ◽  
2020 ◽  
Vol 22 (10) ◽  
pp. 1143
Author(s):  
Zhenwu Wang ◽  
Tielin Wang ◽  
Benting Wan ◽  
Mengjie Han

Multi-label classification (MLC) is a supervised learning problem where an object is naturally associated with multiple concepts because it can be described from various dimensions. How to exploit the resulting label correlations is the key issue in MLC problems. The classifier chain (CC) is a well-known MLC approach that can learn complex coupling relationships between labels. CC suffers from two obvious drawbacks: (1) label ordering is decided at random although it usually has a strong effect on predictive performance; (2) all the labels are inserted into the chain, although some of them may carry irrelevant information that discriminates against the others. In this work, we propose a partial classifier chain method with feature selection (PCC-FS) that exploits the label correlation between label and feature spaces and thus solves the two previously mentioned problems simultaneously. In the PCC-FS algorithm, feature selection is performed by learning the covariance between feature set and label set, thus eliminating the irrelevant features that can diminish classification performance. Couplings in the label set are extracted, and the coupled labels of each label are inserted simultaneously into the chain structure to execute the training and prediction activities. The experimental results from five metrics demonstrate that, in comparison to eight state-of-the-art MLC algorithms, the proposed method is a significant improvement on existing multi-label classification.


Sign in / Sign up

Export Citation Format

Share Document