Predicting the user navigation pattern from web logs using weighted support approach

A weblog contains the history of previous user navigation pattern. If the customer accesses any portal of organization website, the log is generated in web server, based on sequence of user transaction. The weblog stored in the web server as unstructured format, it contains both positive and negative responses i.e. successful and unsuccessful responses, identifying the positive and negative response is not useful for identifying user behavior of individual user. Initially the successful response is taken, from that conversion of unstructured log format to structured log format through data preprocessing technique. The process of data preprocessor contains three step process data cleaning, user identification and session identification. The pattern is discovered by preprocessing technique from that user navigation pattern is generated. From that navigation pattern classifier technique is applied, the conversion of sequence pattern to sub sequence pattern by clustering technique. This research is to identify the user navigation pattern from weblog. The Improved Spanning classification algorithm classifies the frequent, infrequent and semi frequent pattern. To identify the optimal webpage using classificatopn algorithm from thet user behavior is identified.

Download Full-text

Analyzing and Predicting User Navigation Pattern from Weblogs using Modified Classification Algorithm

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v11.i1.pp333-340 ◽

2018 ◽

Vol 11 (1) ◽

pp. 333 ◽

Cited By ~ 1

Author(s):

P.G. OM Prakash ◽

A. Jaya

Keyword(s):

User Behavior ◽

Research Work ◽

Classification Algorithm ◽

User Interest ◽

Ip Address ◽

History Of ◽

Interested User ◽

Behavior Based ◽

User Personalization ◽

User Navigation

A Weblogs contains the history of User Navigation Pattern while user accessing the websites. The user navigation pattern can be analyzed based on the previous user navigation that is stored in weblog. The weblog comprises of various entries like IP address, status code and number of bytes transferred, categories and time stamp. The user interest can be classified based on categories and attributes and it is helpful in identifying user behavior. The aim of the research is to identifying the interested user behavior and not interested user behavior based on classification. The process of identifying user interest, it consists of Modified Span Algorithm and Personalization Algorithm based on the classification algorithm user prediction can be analyzed. The research work explores to analyze user prediction behavior based on user personalization that is captured from weblogs.

Download Full-text

A Numerically Coded File of Operative Procedures Derived from a Free Text Data Collection System : A Measure of the Accuracy

Methods of Information in Medicine ◽

10.1055/s-0038-1635717 ◽

1976 ◽

Vol 15 (01) ◽

pp. 21-28 ◽

Cited By ~ 3

Author(s):

Carmen A. Scudiero ◽

Ruth L. Wong

Keyword(s):

Data Collection ◽

Pap Smear ◽

Operative Procedures ◽

Free Text ◽

Collection System ◽

Process Data ◽

Text Data ◽

Data Collection System ◽

History Of ◽

Correlation System

A free text data collection system has been developed at the University of Illinois utilizing single word, syntax free dictionary lookup to process data for retrieval. The source document for the system is the Surgical Pathology Request and Report form. To date 12,653 documents have been entered into the system.The free text data was used to create an IRS (Information Retrieval System) database. A program to interrogate this database has been developed to numerically coded operative procedures. A total of 16,519 procedures records were generated. One and nine tenths percent of the procedures could not be fitted into any procedures category; 6.1% could not be specifically coded, while 92% were coded into specific categories. A system of PL/1 programs has been developed to facilitate manual editing of these records, which can be performed in a reasonable length of time (1 week). This manual check reveals that these 92% were coded with precision = 0.931 and recall = 0.924. Correction of the readily correctable errors could improve these figures to precision = 0.977 and recall = 0.987. Syntax errors were relatively unimportant in the overall coding process, but did introduce significant error in some categories, such as when right-left-bilateral distinction was attempted.The coded file that has been constructed will be used as an input file to a gynecological disease/PAP smear correlation system. The outputs of this system will include retrospective information on the natural history of selected diseases and a patient log providing information to the clinician on patient follow-up.Thus a free text data collection system can be utilized to produce numerically coded files of reasonable accuracy. Further, these files can be used as a source of useful information both for the clinician and for the medical researcher.

Download Full-text

User Identification in the Process of Web Usage Data Preprocessing

International Journal of Emerging Technologies in Learning (iJET) ◽

10.3991/ijet.v14i09.9854 ◽

2019 ◽

Vol 14 (09) ◽

pp. 21 ◽

Cited By ~ 1

Author(s):

Jozef Kapusta ◽

Michal Munk ◽

Dominik Halvoník ◽

Martin Drlík

Keyword(s):

User Behavior ◽

Web Server ◽

Web Browser ◽

Ip Address ◽

User Session ◽

Log Files ◽

The Us ◽

Log File ◽

Default Form

If we are talking about user behavior analytics, we have to understand what the main source of valuable information is. One of these sources is definitely a web server. There are multiple places where we can extract the necessary data. The most common ways are to search for these data in access log, error log, custom log files of web server, proxy server log file, web browser log, browser cookies etc. A web server log is in its default form known as a Common Log File (W3C, 1995) and keeps information about IP address; date and time of visit; ac-cessed and referenced resource. There are standardized methodologies which contain several steps leading to extract new knowledge from provided data. Usu-ally, the first step is in each one of them to identify users, users’ sessions, page views, and clickstreams. This process is called pre-processing. Main goal of this stage is to receive unprocessed web server log file as input and after processing outputs meaningful representations which can be used in next phase. In this pa-per, we describe in detail user session identification which can be considered as most important part of data pre-processing. Our paper aims to compare the us-er/session identification using the STT with the identification of user/session us-ing cookies. This comparison was performed concerning the quality of the se-quential rules generated, i.e., a comparison was made regarding generation useful, trivial and inexplicable rules.

Download Full-text

Data cleaning process for HIV-indicator data extracted from DHIS2 national reporting system: a case study of Kenya

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-020-01315-7 ◽

2020 ◽

Vol 20 (1) ◽

Cited By ~ 1

Author(s):

Milka Bochere Gesicho ◽

Martin Chieng Were ◽

Ankica Babic

Keyword(s):

Data Cleaning ◽

National Level ◽

Voluntary Medical Male Circumcision ◽

Cleaning Process ◽

Process Data ◽

District Health ◽

Medical Male Circumcision ◽

Secondary Analyses ◽

Indicator Data ◽

Quality Issues

Abstract Background The District Health Information Software-2 (DHIS2) is widely used by countries for national-level aggregate reporting of health-data. To best leverage DHIS2 data for decision-making, countries need to ensure that data within their systems are of the highest quality. Comprehensive, systematic, and transparent data cleaning approaches form a core component of preparing DHIS2 data for analyses. Unfortunately, there is paucity of exhaustive and systematic descriptions of data cleaning processes employed on DHIS2-based data. The aim of this study was to report on methods and results of a systematic and replicable data cleaning approach applied on HIV-data gathered within DHIS2 from 2011 to 2018 in Kenya, for secondary analyses. Methods Six programmatic area reports containing HIV-indicators were extracted from DHIS2 for all care facilities in all counties in Kenya from 2011 to 2018. Data variables extracted included reporting rate, reporting timeliness, and HIV-indicator data elements per facility per year. 93,179 facility-records from 11,446 health facilities were extracted from year 2011 to 2018. Van den Broeck et al.’s framework, involving repeated cycles of a three-phase process (data screening, data diagnosis and data treatment), was employed semi-automatically within a generic five-step data-cleaning sequence, which was developed and applied in cleaning the extracted data. Various quality issues were identified, and Friedman analysis of variance conducted to examine differences in distribution of records with selected issues across eight years. Results Facility-records with no data accounted for 50.23% and were removed. Of the remaining, 0.03% had over 100% in reporting rates. Of facility-records with reporting data, 0.66% and 0.46% were retained for voluntary medical male circumcision and blood safety programmatic area reports respectively, given that few facilities submitted data or offered these services. Distribution of facility-records with selected quality issues varied significantly by programmatic area (p < 0.001). The final clean dataset obtained was suitable to be used for subsequent secondary analyses. Conclusions Comprehensive, systematic, and transparent reporting of cleaning-process is important for validity of the research studies as well as data utilization. The semi-automatic procedures used resulted in improved data quality for use in secondary analyses, which could not be secured by automated procedures solemnly.

Download Full-text

Effectively Capturing User Navigation Paths in the Web Using Web Server Logs

Lecture Notes in Computer Science - Web Engineering ◽

10.1007/11531371_11 ◽

2005 ◽

pp. 63-68

Author(s):

Amithalal Caldera ◽

Yogesh Deshpande

Keyword(s):

Web Server ◽

Server Logs ◽

Web Server Logs ◽

The Web ◽

User Navigation

Download Full-text

The Nonparametric Analysis of Point Process Data: The Freezing History of Lake Konstanz

Journal of Climate ◽

10.1175/1520-0442(1991)004<0116:tnaopp>2.0.co;2 ◽

1991 ◽

Vol 4 (1) ◽

pp. 116-119 ◽

Cited By ~ 2

Author(s):

Andrew R. Solow

Keyword(s):

Point Process ◽

Nonparametric Analysis ◽

Process Data ◽

History Of ◽

Point Process Data

Download Full-text

A Study on Prediction of User Behavior Based on Web Server Log Files in Web Usage Mining

International Journal Of Engineering And Computer Science ◽

10.18535/ijecs/v6i2.12 ◽

2017 ◽

Cited By ~ 1

Author(s):

Anurag Kumar ◽

◽

Vaishali Ahirwar ◽

Ravi Kumar Singh ◽

◽

...

Keyword(s):

User Behavior ◽

Web Server ◽

Web Usage Mining ◽

Web Usage ◽

Log Files ◽

Behavior Based

Download Full-text

A XSS Defensive Scheme Based on Behavior Certification

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.241-244.2365 ◽

2012 ◽

Vol 241-244 ◽

pp. 2365-2369

Author(s):

Hua Jie Xu ◽

Xiao Ming Hu ◽

Dong Dong Zhang

Keyword(s):

User Experience ◽

User Behavior ◽

Web Server ◽

Behavior Model ◽

Scripting Languages ◽

Browsing Behavior ◽

Cross Site

The Scripting languages (mostly JavaScript) applications in the network are heavily used to improve the user experience now. The trends make XSS (Cross-site Scripting Attacks) the most serious security problems in the current Internet. A XSS defensive scheme based on behavior certification is proposed in the paper. The website behavior model is generated based on the website logic and the user behavior. The browsing behavior certification is implemented based on the expected behavior of the resulting model, so as to offer security for the client even in the case that web server has suffered XSS attacks.

Download Full-text

Evaluation of Suicidal Behaviour in a General Psychiatric Consultation

European Psychiatry ◽

10.1016/s0924-9338(09)71206-4 ◽

2009 ◽

Vol 24 (S1) ◽

pp. 1-1

Author(s):

C. Silva ◽

I. Gil ◽

M.A. Mateus ◽

Ó. Nogueiro

Keyword(s):

Suicidal Ideation ◽

Suicidal Behavior ◽

Suicide Attempts ◽

Age Group ◽

Psychiatric Consultation ◽

Process Data ◽

Systematic Screening ◽

High Prevalence ◽

History Of ◽

Religious Aspects

Several studies have been conducted to establish a profile of the suicidal/parasuicidal patient. Also several factors have been identified as possibly influencing the suicidal rates, including the religious practices.Objectives:Characterize the profile of suicidal behavior in a sample of patients followed in a general psychiatric consultation.Methods:It was done an analytical observational study of a random sample of 100 patients followed in a general psychiatric consultation. A survey was conducted with the collection of socio-economic, religious aspects and clinical data, and it was consulted the patient"s clinical process. Data analysis was done in Excel 2003.Results:The sample was consisted mostly by women (74%), being the most representative age group between 40 and 50 years (27%), mostly married (61%), 24% had 2 children and 65% lived in the rural area. The clinical diagnosis (ICD-9) was in 46% of cases, neurotic depression. 52% consider themselves religious not practitioners, being 90% catholics. History of suicide attempts/parasuicide occurred in 32% of patients, in the form of drug intoxication (31%) or with another method (11%). Most of the individuals said to have already thought about suicide at least once in their lifetime (74%). Only 8% had current suicidal ideation. Family history of suicide occurred in 27%, particularly in first degree family members, mainly by drowning (7%) and hanging (7%).Conclusions:Our results suggest that exists a high prevalence of suicidal behavior in this patients. For that reason, it should be done a systematic screening for suicidal ideation in this risk population.

Download Full-text

Data Cleaning Process for HIV Indicator Data Extracted from DHIS2 National Reporting System: Case Example of Kenya

10.21203/rs.3.rs-21675/v1 ◽

2020 ◽

Author(s):

Milka Gesicho ◽

Ankica Babic ◽

Martin Were

Keyword(s):

Error Detection ◽

Data Cleaning ◽

National Level ◽

Quality Data ◽

Process Data ◽

Systematic Data ◽

Report Quality ◽

Indicator Data ◽

Quality Issues ◽

Error Types

Abstract Background The District Health Information Software 2 (DHIS2) is widely used by countries for national-level aggregate reporting of health data. To best leverage DHIS2 data for decision-making, countries need to ensure that data within their systems are of the highest quality. Comprehensive, systematic and transparent data cleaning approaches form a core component of preparing DHIS2 data for use. Unfortunately, there is paucity of exhaustive and systematic descriptions of data cleaning processes employed on DHIS2-based data. In this paper, we describe results of systematic data cleaning approach applied on a national-level DHIS2 instance, using Kenya as the case example. Methods Broeck et al’s framework, involving repeated cycles of a three-phase process (data screening, data diagnosis and data treatment), was employed on six HIV indicator reports collected monthly from all care facilities in Kenya from 2011 to 2018. This resulted to repeated facility reporting instances. Quality dimensions evaluated included reporting rate, reporting timeliness, and indicator completeness of submitted reports each done per facility per year. The various error types were categorized, and Friedman analyses of variance conducted to examine differences in distribution of facilities by error types. Data cleaning was done during the treatment phases. Results A generic five-step data cleaning sequence was developed and applied in cleaning HIV indicator data reports extracted from DHIS2. Initially, 93,179 facility reporting instances were extracted from year 2011 to 2018. 50.23% of these instances submitted no reports and were removed. Of the remaining reporting instances, there was over reporting in 0.03%. Quality issues related to timeliness included scenarios where reports were empty or had data but were never on time. Percentage of reporting instances in these scenarios varied by reporting type. Of submitted reports empty reports also varied by report type and ranged from 1.32–18.04%. Report quality varied significantly by facility distribution (p = 0.00) and report type. Conclusions The case instance of Kenya reveals significant data quality issues for HIV reported data that were not detected by the inbuilt error detection procedures within DHIS2. More robust and systematic data cleaning processes should be integrated to current DHIS2 implementations to ensure highest quality data.

Download Full-text