binary classifier
Recently Published Documents


TOTAL DOCUMENTS

167
(FIVE YEARS 72)

H-INDEX

13
(FIVE YEARS 4)

2022 ◽  
Vol 22 (1) ◽  
pp. 1-29
Author(s):  
Ovidiu Dan ◽  
Vaibhav Parikh ◽  
Brian D. Davison

IP Geolocation databases are widely used in online services to map end-user IP addresses to their geographical location. However, they use proprietary geolocation methods, and in some cases they have poor accuracy. We propose a systematic approach to use reverse DNS hostnames for geolocating IP addresses, with a focus on end-user IP addresses as opposed to router IPs. Our method is designed to be combined with other geolocation data sources. We cast the task as a machine learning problem where, for a given hostname, we first generate a list of potential location candidates, and then we classify each hostname and candidate pair using a binary classifier to determine which location candidates are plausible. Finally, we rank the remaining candidates by confidence (class probability) and break ties by population count. We evaluate our approach against three state-of-the-art academic baselines and two state-of-the-art commercial IP geolocation databases. We show that our work significantly outperforms the academic baselines and is complementary and competitive with commercial databases. To aid reproducibility, we open source our entire approach and make it available to the academic community.


2022 ◽  
Vol 12 (1) ◽  
pp. 94
Author(s):  
K. K. Mujeeb Rahman ◽  
M. Monica Subashini

Autism spectrum disorder (ASD) is a complicated neurological developmental disorder that manifests itself in a variety of ways. The child diagnosed with ASD and their parents’ daily lives can be dramatically improved with early diagnosis and appropriate medical intervention. The applicability of static features extracted from autistic children’s face photographs as a biomarker to distinguish them from typically developing children is investigated in this study paper. We used five pre-trained CNN models: MobileNet, Xception, EfficientNetB0, EfficientNetB1, and EfficientNetB2 as feature extractors and a DNN model as a binary classifier to identify autism in children accurately. We used a publicly available dataset to train the suggested models, which consisted of face pictures of children diagnosed with autism and controls classed as autistic and non-autistic. The Xception model outperformed the others, with an AUC of 96.63%, a sensitivity of 88.46%, and an NPV of 88%. EfficientNetB0 produced a consistent prediction score of 59% for autistic and non-autistic groups with a 95% confidence level.


Author(s):  
Banghee So ◽  
Emiliano A. Valdez

Classification predictive modeling involves the accurate assignment of observations in a dataset to target classes or categories. There is an increasing growth of real-world classification problems with severely imbalanced class distributions. In this case, minority classes have much fewer observations to learn from than those from majority classes. Despite this sparsity, a minority class is often considered the more interesting class yet developing a scientific learning algorithm suitable for the observations presents countless challenges. In this article, we suggest a novel multi-class classification algorithm specialized to handle severely imbalanced classes based on the method we refer to as SAMME.C2. It blends the flexible mechanics of the boosting techniques from SAMME algorithm, a multi-class classifier, and Ada.C2 algorithm, a cost-sensitive binary classifier designed to address highly class imbalances. Not only do we provide the resulting algorithm but we also establish scientific and statistical formulation of our proposed SAMME.C2 algorithm. Through numerical experiments examining various degrees of classifier difficulty, we demonstrate consistent superior performance of our proposed model.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Shashvat Prakash ◽  
Antoni Brzoska

Component failures in complex systems are often expensive. The loss of operation time is compounded by the costs of emergency repairs, excess labor, and compensation to aggrieved customers. Prognostic health management presents a viable option when the failure onset is observable and the mitigation plan actionable. As data-driven approaches become more favorable, success has been measured in many ways, from the basic outcomes, i.e. costs justify the prognostic, to the more nuanced detection tests. Prognostic models, likewise, run the gamut from purely physics-based to statistically inferred. Preserving some physics has merit as that is the source of justification for removing a fully functioning component. However, the method for evaluating competing strategies and optimizing for performance has been inconsistent. One common approach relies on the binary classifier construct, which compares two prediction states (alert or no alert) with two actual states (failure or no failure). A model alert is a positive; true positives are followed by actual failures and false positives are not. False negatives are when failures occur without any alert, and true negatives complete the table, indicating no alert and no failure. Derivatives of the binary classifier include concepts like precision, i.e. the ratio of alerts which are true positives, and recall, the ratio of events which are preceded by an alert. Both precision and recall are useful in determining whether an alert can be trusted (precision) or how many failures it can catch (recall).  Other analyses recognize the fact that the underlying sensor signal is continuous, so the alerts will change along with the threshold. For instance, a threshold that is more extreme will result in fewer alerts and therefore more precision at the cost of some recall. These types of tradeoff studies have produced the receiver operating characteristic (ROC) curve. A few ambiguities persist when we apply the binary classifier construct to continuous signals. First, there is no time axis. When does an alert transition from prescriptive to low-value or nuisance? Second, there is no consideration of the nascent information contained in the underlying continuous signal. Instead, it is reduced to alerts via a discriminate threshold. Fundamentally, prognostic health management is the detection of precursors. Failures which can be prognosticated are necessarily a result of wear-out modes. Whether the wear out is detectable and trackable is a system observability issue. Observability in signals is a concept rooted in signal processing and controls. A system is considered observable if the internal state of the system can be estimated using only the sensor information. In a prognostic application, sensor signals intended to detect wear will also contain some amount of noise. This case, noise is anything that is not the wear-out mode. It encompasses everything from random variations of the signal, to situations where the detection is intermittent or inconsistent. Hence, processing the raw sensor signal to maximize the wear-out precursors and minimize noise will provide an overall benefit to the detection before thresholds are applied. The proposed solution is a filter tuned to maximize detection of the wear-out mode. The evaluation of the filter is crucial, because that is also the evaluation of the entire prognostic. The problem statement transforms from a binary classifier to a discrete event detection using a continuous signal. Now, we can incorporate the time dimension and require a minimum lead time between a prognostic alert and the event. Filter evaluation is fundamentally performance evaluation for the prognostic detection. First, we aggregate the filtered values in a prescribed lead interval n samples before each event. Each lead trace is averaged so that there is one characteristic averaged behavior before an event. In this characteristic trace, we can consider the value at some critical actionable time, tac, before the event, after which there is insufficient time to act on the alert. The filtered signal value at this critical time should be anomalous, i.e. it should be far from its mean value. Further, the filtered value in the interval preceding tac should transition from near-average to anomalous. Both the signal value at tac­ as well as the filtered signal behavior up to that point present independent evaluation metrics. These frame the prognostic detection problem as it should be stated, as a continuous signal detecting a discrete event, rather than a binary classifier. A strong anomaly in the signal that precedes events on an aggregated basis is the alternate performance metric. If only a subset of events show an anomaly, that means the detection failure mode is unique to those events, and the performance can be evaluated accordingly. Thresholding is the final step, once the detection is optimized. The threshold need not be ambiguous at this step. The aggregated trace will indicate clearly which threshold will provide the most value.


2021 ◽  
Author(s):  
Giorgio Gnecco ◽  
Federico Nutarelli ◽  
Massimo Riccaboni

Abstract This work applies Matrix Completion (MC) – a class of machine-learning methods commonly used in the context of recommendation systems – to analyze economic complexity. MC is applied to reconstruct the Revealed Comparative Advantage (RCA) matrix, whose elements express the relative advantage of countries in given classes of products, as evidenced by yearly trade flows. A high-accuracy binary classifier is derived from the MC application, with the aim of discriminating between elements of the RCA matrix that are, respectively, higher/lower than one. We introduce a novel Matrix cOmpletion iNdex of Economic complexitY (MONEY) based on MC, and related to the degree of predictability of the RCA entries of different countries (the lower the predictability, the higher the complexity). Differently from previously-developed economic complexity indices, MONEY takes into account several singular vectors of the matrix reconstructed by MC, whereas other indices are based only on one/two eigenvectors of a suitable symmetric matrix, derived from the RCA matrix. Finally, MC is compared with a state-of-the-art economic complexity index (GENEPY), showing that the false positive rate per country of a binary classifier constructed starting from the average entry-wise output of MC is a proxy of GENEPY.


Geosciences ◽  
2021 ◽  
Vol 11 (11) ◽  
pp. 469
Author(s):  
Giacomo Titti ◽  
Cees van Westen ◽  
Lisa Borgatti ◽  
Alessandro Pasuto ◽  
Luigi Lombardo

Mapping existing landslides is a fundamental prerequisite to build any reliable susceptibility model. From a series of landslide presence/absence conditions and associated landscape characteristics, a binary classifier learns how to distinguish potentially stable and unstable slopes. In data rich areas where landslide inventories are available, addressing the collection of these can already be a challenging task. However, in data scarce contexts, where geoscientists do not get access to pre-existing inventories, the only solution is to map landslides from scratch. This operation can be extremely time-consuming if manually performed or prone to type I errors if done automatically. This is even more exacerbated if done over large geographic regions. In this manuscript we examine the issue of mapping requirements for west Tajikistan where no complete landslide inventory is available. The key question is: How many landslides should be required to develop reliable landslide susceptibility models based on statistical modeling? In fact, for such a wide and extremely complex territory, the collection of an inventory that is sufficiently detailed requires a large investment in time and human resources. However, at which point of the mapping procedure, would the resulting susceptibility model produce significantly better results as compared to a model built with less information? We addressed this question by implementing a binomial Generalized Additive Model trained and validated with different landslide proportions and measured the induced variability in the resulting susceptibility model. The results of this study are very site-specific but we proposed a very functional protocol to investigate a problem which is underestimated in the literature.


2021 ◽  
Vol 11 (21) ◽  
pp. 10268
Author(s):  
Parag Verma ◽  
Ankur Dumka ◽  
Rajesh Singh ◽  
Alaknanda Ashok ◽  
Anita Gehlot ◽  
...  

The Internet of Things (IoT) has gained significant importance due to its applicability in diverse environments. Another reason for the influence of the IoT is its use of a flexible and scalable framework. The extensive and diversified use of the IoT in the past few years has attracted cyber-criminals. They exploit the vulnerabilities of the open-source IoT framework due to the absentia of robust and standard security protocols, hence discouraging existing and potential stakeholders. The authors propose a binary classifier approach developed from a machine learning ensemble method to filter and dump malicious traffic to prevent malicious actors from accessing the IoT network and its peripherals. The gradient boosting machine (GBM) ensemble approach is used to train the binary classifier using pre-processed recorded data packets to detect the anomaly and prevent the IoT networks from zero-day attacks. The positive class performance metrics of the model resulted in an accuracy of 98.27%, a precision of 96.40%, and a recall of 95.70%. The simulation results prove the effectiveness of the proposed model against cyber threats, thus making it suitable for critical applications for the IoT.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Christian Porschen ◽  
Ralf Schmitz ◽  
Rene Schmidt ◽  
Kathrin Oelmeier ◽  
Kerstin Hammer ◽  
...  

Abstract Objectives The aim of this study was to compare the second trimester thymus-thorax-ratio (TTR) between fetuses born preterm (study group) and those born after 37 weeks of gestation were completed (control group). Methods This study was conducted as a retrospective evaluation of the ultrasound images of 492 fetuses in the three vessel view. The TTR was defined as the quotient of a.p. thymus diameter and a.p. thoracic diameter. Results Fetuses that were preterm showed larger TTR (p<0.001) the second trimester than those born after 37 weeks of gestation were completed. The sensitivity of a binary classifier based on TTR for predicting preterm birth (PTB) was 0.792 and the specificity 0.552. Conclusions In our study, fetuses affected by PTB showed enlarged thymus size. These findings led us to hypothesize, that inflammation and immunomodulatory processes are altered early in pregnancies affected by PTB. However, TTR alone is not able to predict PTB.


Sign in / Sign up

Export Citation Format

Share Document