scholarly journals Shared Data Set for Free-Text Keystroke Dynamics Authentication Algorithms

Author(s):  
Augustin-Catalin Iapa ◽  
Vladimir-Ioan Cretu

Identifying or authenticating a computer user are necessary steps to keep systems secure on the network and to prevent fraudulent users from accessing accounts. Keystroke dynamics authentication can be used as an additional authentication method. Keystroke dynamics involves in-depth analysis of how you type on the keyboard, analysis of how long a key is pressed or the time between two consecutive keys. This field has seen a continuous growth in scientific research. In the last five years alone, about 10,000 scientific researches in this field have been published. One of the main problems facing researchers is the small number of public data sets that include how users type on the keyboard. This paper aims to provide researchers with a data set that includes how to type free text on the keyboard by 80 users. The data were collected in a single session via a web platform. The dataset contains 410,633 key-events collected in a total time interval of almost 24 hours. In similar research, most datasets are with texts written by users in English. The language in which the users wrote for this research is Romanian. This paper also provides an extensive analysis of the data set collected and presents relevant information for the analysis of the data set in future research.

2003 ◽  
Vol 127 (6) ◽  
pp. 680-686 ◽  
Author(s):  
Jules J. Berman

Abstract Context.—In the normal course of activity, pathologists create and archive immense data sets of scientifically valuable information. Researchers need pathology-based data sets, annotated with clinical information and linked to archived tissues, to discover and validate new diagnostic tests and therapies. Pathology records can be used for research purposes (without obtaining informed patient consent for each use of each record), provided the data are rendered harmless. Large data sets can be made harmless through 3 computational steps: (1) deidentification, the removal or modification of data fields that can be used to identify a patient (name, social security number, etc); (2) rendering the data ambiguous, ensuring that every data record in a public data set has a nonunique set of characterizing data; and (3) data scrubbing, the removal or transformation of words in free text that can be used to identify persons or that contain information that is incriminating or otherwise private. This article addresses the problem of data scrubbing. Objective.—To design and implement a general algorithm that scrubs pathology free text, removing all identifying or private information. Methods.—The Concept-Match algorithm steps through confidential text. When a medical term matching a standard nomenclature term is encountered, the term is replaced by a nomenclature code and a synonym for the original term. When a high-frequency “stop” word, such as a, an, the, or for, is encountered, it is left in place. When any other word is encountered, it is blocked and replaced by asterisks. This produces a scrubbed text. An open-source implementation of the algorithm is freely available. Results.—The Concept-Match scrub method transformed pathology free text into scrubbed output that preserved the sense of the original sentences, while it blocked terms that did not match terms found in the Unified Medical Language System (UMLS). The scrubbed product is safe, in the restricted sense that the output retains only standard medical terms. The software implementation scrubbed more than half a million surgical pathology report phrases in less than an hour. Conclusions.—Computerized scrubbing can render the textual portion of a pathology report harmless for research purposes. Scrubbing and deidentification methods allow pathologists to create and use large pathology databases to conduct medical research.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Jiawei Lian ◽  
Junhong He ◽  
Yun Niu ◽  
Tianze Wang

Purpose The current popular image processing technologies based on convolutional neural network have the characteristics of large computation, high storage cost and low accuracy for tiny defect detection, which is contrary to the high real-time and accuracy, limited computing resources and storage required by industrial applications. Therefore, an improved YOLOv4 named as YOLOv4-Defect is proposed aim to solve the above problems. Design/methodology/approach On the one hand, this study performs multi-dimensional compression processing on the feature extraction network of YOLOv4 to simplify the model and improve the feature extraction ability of the model through knowledge distillation. On the other hand, a prediction scale with more detailed receptive field is added to optimize the model structure, which can improve the detection performance for tiny defects. Findings The effectiveness of the method is verified by public data sets NEU-CLS and DAGM 2007, and the steel ingot data set collected in the actual industrial field. The experimental results demonstrated that the proposed YOLOv4-Defect method can greatly improve the recognition efficiency and accuracy and reduce the size and computation consumption of the model. Originality/value This paper proposed an improved YOLOv4 named as YOLOv4-Defect for the detection of surface defect, which is conducive to application in various industrial scenarios with limited storage and computing resources, and meets the requirements of high real-time and precision.


Author(s):  
Todd D. Jack ◽  
Carl N. Ford ◽  
Shari-Beth Nadell ◽  
Vicki Crisp

A causal analysis of aviation accidents by engine type is presented. The analysis employs a top-down methodology that performs a detailed analysis of the causes and factors cited in accident reports to develop a “fingerprint” profile for each engine type. This is followed by an in-depth analysis of each fingerprint that produces a sequential breakdown. Analysis results of National Transportation Safety Board (NTSB) accidents, both fatal and non-fatal, that occurred during the time period of 1990–1998 are presented. Each data set is comprised of all accidents that involved aircraft with the following engine types: turbofan, turbojet, turboprop, and turboshaft (includes turbine helicopters). During this time frame there were 1461 accidents involving turbine powered aircraft; 306 of these involved propulsion malfunctions and/ or failures. Analyses are performed to investigate the sequential relationships between propulsion system malfunctions or failures with other causes and factors for each engine type. Other malfunctions or events prominent within each data set are also analyzed. Significant trends are identified. The results from this study can be used to identify areas for future research into intervention, prevention, and mitigation strategies.


2011 ◽  
pp. 24-32 ◽  
Author(s):  
Nicoleta Rogovschi ◽  
Mustapha Lebbah ◽  
Younès Bennani

Most traditional clustering algorithms are limited to handle data sets that contain either continuous or categorical variables. However data sets with mixed types of variables are commonly used in data mining field. In this paper we introduce a weighted self-organizing map for clustering, analysis and visualization mixed data (continuous/binary). The learning of weights and prototypes is done in a simultaneous manner assuring an optimized data clustering. More variables has a high weight, more the clustering algorithm will take into account the informations transmitted by these variables. The learning of these topological maps is combined with a weighting process of different variables by computing weights which influence the quality of clustering. We illustrate the power of this method with data sets taken from a public data set repository: a handwritten digit data set, Zoo data set and other three mixed data sets. The results show a good quality of the topological ordering and homogenous clustering.


2019 ◽  
Vol 18 ◽  
pp. 117693511989029
Author(s):  
James LT Dalgleish ◽  
Yonghong Wang ◽  
Jack Zhu ◽  
Paul S Meltzer

Motivation: DNA copy number (CN) data are a fast-growing source of information used in basic and translational cancer research. Most CN segmentation data are presented without regard to the relationship between chromosomal regions. We offer both a toolkit to help scientists without programming experience visually explore the CN interactome and a package that constructs CN interactomes from publicly available data sets. Results: The CNVScope visualization, based on a publicly available neuroblastoma CN data set, clearly displays a distinct CN interaction in the region of the MYCN, a canonical frequent amplicon target in this cancer. Exploration of the data rapidly identified cis and trans events, including a strong anticorrelation between 11q loss and17q gain with the region of 11q loss bounded by the cell cycle regulator CCND1. Availability: The shiny application is readily available for use at http://cnvscope.nci.nih.gov/ , and the package can be downloaded from CRAN ( https://cran.r-project.org/package=CNVScope ), where help pages and vignettes are located. A newer version is available on the GitHub site ( https://github.com/jamesdalg/CNVScope/ ), which features an animated tutorial. The CNVScope package can be locally installed using instructions on the GitHub site for Windows and Macintosh systems. This CN analysis package also runs on a linux high-performance computing cluster, with options for multinode and multiprocessor analysis of CN variant data. The shiny application can be started using a single command (which will automatically install the public data package).


2013 ◽  
Vol 31 (4) ◽  
pp. 231-252 ◽  
Author(s):  
Rajat Gupta ◽  
Matthew Gregg ◽  
Hu Du ◽  
Katie Williams

PurposeTo critically compare three future weather year (FWY) downscaling approaches, based on the 2009 UK Climate Projections, used for climate change impact and adaptation analysis in building simulation software.Design/methodology/approachThe validity of these FWYs is assessed through dynamic building simulation modelling to project future overheating risk in typical English homes in 2050s and 2080s.FindingsThe modelling results show that the variation in overheating projections is far too significant to consider the tested FWY data sets equally suitable for the task.Research and practical implicationsIt is recommended that future research should consider harmonisation of the downscaling approaches so as to generate a unified data set of FWYs to be used for a given location and climate projection. If FWY are to be used in practice, live projects will need viable and reliable FWY on which to base their adaptation decisions. The difference between the data sets tested could potentially lead to different adaptation priorities specifically with regard to time series and adaptation phasing through the life of a building.Originality/valueThe paper investigates the different results derived from FWY application to building simulation. The outcome and implications are important considerations for research and practice involved in FWY data use in building simulation intended for climate change adaptation modelling.


2015 ◽  
Vol 5 (3) ◽  
pp. 350-380 ◽  
Author(s):  
Abdifatah Ahmed Haji ◽  
Sanni Mubaraq

Purpose – The purpose of this paper is to examine the impact of corporate governance and ownership structure attributes on firm performance following the revised code on corporate governance in Malaysia. The study presents a longitudinal assessment of the compliance and implications of the revised code on firm performance. Design/methodology/approach – Two data sets consisting of before (2006) and after (2008-2010) the revised code are examined. Drawing from the largest companies listed on Bursa Malaysia (BM), the first data set contains 92 observations in the year 2006 while the second data set comprises of 282 observations drawn from the largest companies listed on BM over a three-year period, from 2008-2010. Both accounting (return on assets and return on equity) and market performance (Tobin’s Q) measures were used to measure firm performance. Multiple and panel data regression analyses were adopted to analyze the data. Findings – The study shows that there were still cases of non-compliance to the basic requirements of the code such as the one-third independent non-executive director (INDs) requirement even after the revised code. While the regression models indicate marginal significance of board size and independent directors before the revised code, the results indicate all corporate governance variables have a significant negative relationship with at least one of the measures of corporate performance. Independent chairperson, however, showed a consistent positive impact on firm performance both before and after the revised code. In addition, ownership structure elements were found to have a negative relationship with either accounting or market performance measures, with institutional ownership showing a consistent negative impact on firm performance. Firm size and leverage, as control variables, were significant in determining corporate performance. Research limitations/implications – One limitation is the use of separate measures of corporate governance attributes, as opposed to a corporate governance index (CGI). As a result, the study constructs a CGI based on the recommendations of the revised code and proposes for future research use. Practical implications – Some of the largest companies did not even comply with basic requirements such as the “one-third INDs” mandatory requirement. Hence, the regulators may want to reinforce the requirements of the code and also detail examples of good governance practices. The results, which show a consistent positive relationship between the presence of an independent chairperson and firm performance in both data sets, suggest listed companies to consider appointing an independent chairperson in the corporate leadership. The regulatory authorities may also wish to note this phenomenon when drafting any future corporate governance codes. Originality/value – This study offers new insights of the implications of regulatory changes on the relationship between corporate governance attributes and firm performance from the perspective of a developing country. The development of a CGI for future research is a novel approach of this study.


Author(s):  
MUSTAPHA LEBBAH ◽  
YOUNÈS BENNANI ◽  
NICOLETA ROGOVSCHI

This paper introduces a probabilistic self-organizing map for topographic clustering, analysis and visualization of multivariate binary data or categorical data using binary coding. We propose a probabilistic formalism dedicated to binary data in which cells are represented by a Bernoulli distribution. Each cell is characterized by a prototype with the same binary coding as used in the data space and the probability of being different from this prototype. The learning algorithm, Bernoulli on self-organizing map, that we propose is an application of the EM standard algorithm. We illustrate the power of this method with six data sets taken from a public data set repository. The results show a good quality of the topological ordering and homogenous clustering.


2006 ◽  
Vol 101 (2) ◽  
pp. 598-608 ◽  
Author(s):  
Zhenwei Lu ◽  
Ramakrishna Mukkamala

We present an evaluation of a novel technique for continuous (i.e., automatic) monitoring of relative cardiac output (CO) changes by long time interval analysis of a peripheral arterial blood pressure (ABP) waveform in humans. We specifically tested the mathematical analysis technique based on existing invasive and noninvasive hemodynamic data sets. With the former data set, we compared the application of the technique to peripheral ABP waveforms obtained via radial artery catheterization with simultaneous thermodilution CO measurements in 15 intensive care unit patients in which CO was changing because of disease progression and therapy. With the latter data set, we compared the application of the technique to noninvasive peripheral ABP waveforms obtained via a finger-cuff photoplethysmography system with simultaneous Doppler ultrasound CO measurements made by an expert in 10 healthy subjects during pharmacological and postural interventions. We report an overall CO root-mean-squared normalized error of 15.3% with respect to the invasive hemodynamic data set and 15.1% with respect to the noninvasive hemodynamic data set. Moreover, the CO errors from the invasive and noninvasive hemodynamic data sets were only mildly correlated with mean ABP (ρ = 0.41, 0.37) and even less correlated with CO (ρ = −0.14, −0.17), heart rate (ρ = 0.04, 0.19), total peripheral resistance (ρ = 0.38, 0.10), CO changes (ρ = −0.26, −0.20), and absolute CO changes (ρ = 0.03, 0.38). With further development and successful prospective testing, the technique may potentially be employed for continuous hemodynamic monitoring in the acute setting such as critical care and emergency care.


2018 ◽  
Vol 36 (4) ◽  
pp. 1
Author(s):  
Thaís Machado Scherrer ◽  
George Sand França ◽  
Raimundo Silva ◽  
Daniel Brito de Freitas ◽  
Carlos da Silva Vilar

ABSTRACT. Following our own previous work, we reanalyze the nonextensive behavior over the circum-Pacific subduction zones evaluating the impact of using different types of magnitudes in the results. We used the same data source and time interval of our previous work, the NEIC catalog in the years between 2001 and 2010. Even considering different data sets, the correlation between q and the subduction zone asperity is perceptible, but the values found for the nonextensive parameter in the considered data sets presents an expressive variation. The data set with surface magnitude exhibits the best adjustments.Keywords: Nonextensivity, Seismicity, Solid Earth, Earthquake.RESUMO. No mesmo caminho do nosso trabalho anterior, reanalisamos o comportamento não extensivo sobre as zonas de subducção do círcuo de fogo do Pacífico, avaliando o impacto do uso de diferentes tipos de magnitude nos resultados. Utilizamos o mesmo intervalo de dados e fonte de nosso trabalho anterior, do catálogo NEIC entre os anos 2001 e 2010. Mesmo considerando diferentes conjuntos de dados, a correlação entre q e a aspereza das zonas de subducção é perceptível, mas os valores encontrados para o parâmetro não extensivo no conjuntos de dados considerados apresentam uma variação expressiva. O conjunto de dados com magnitude de superfície exibe os melhores ajustes.Palavras-chave: Não extensividade, Sismicidade, Terra Sólida, Terremotos.


Sign in / Sign up

Export Citation Format

Share Document