Assisting the appraisal of e-mail records with automatic classification

2016 ◽  
Vol 26 (3) ◽  
pp. 293-313 ◽  
Author(s):  
André Vellino ◽  
Inge Alberts

Purpose This paper aims to investigate how automatic classification can assist employees and records managers with the appraisal of e-mails as records of value for the organization. Design/methodology/approach The study performed a qualitative analysis of the appraisal behaviours of eight records management experts to train a series of support vector machine classifiers to replicate the decision process for identifying e-mails of business value. Automatic classification experiments were performed on a corpus of 846 e-mails from two of these experts’ mailboxes. Findings Despite the highly contextual nature of record value, these experiments show that classifiers have a high degree of accuracy. Unlike existing manual practices in corporate e-mail archiving, machine classification models are not highly dependent on features such as the identity of the sender and receiver or on threading, forwarding or importance flags. Rather, the dominant discriminating features are textual features from the e-mail body and subject field. Research limitations/implications The need to automatically classify corporate e-mails is growing in importance, as e-mail remains one of the prevalent recordkeeping challenges. Practical implications Automated methods for identifying e-mail records promise to be of significant benefit to organizations that need to appraise e-mail for long-term preservation and access on demand. Social implications The research adopts an innovative approach to assist employees and records managers with the appraisal of digital records. By doing so, the research fosters new insights on the adoption of technological strategies to automate recordkeeping tasks, an important research gap. Originality/value Our experiment show that a SVM classifier can be trained to replicate an expert's decision process for identifying e-mails of business value with a reasonably high degree of accuracy. In principle, such a classifier could be integrated into a corporate Electronic Document and Records Management System (EDRMS) to improve the quality of e-mail records appraisal.

2018 ◽  
Vol 6 (2) ◽  
pp. 69-92 ◽  
Author(s):  
Asanka G. Perera ◽  
Yee Wei Law ◽  
Ali Al-Naji ◽  
Javaan Chahl

Purpose The purpose of this paper is to present a preliminary solution to address the problem of estimating human pose and trajectory by an aerial robot with a monocular camera in near real time. Design/methodology/approach The distinguishing feature of the solution is a dynamic classifier selection architecture. Each video frame is corrected for perspective using projective transformation. Then, a silhouette is extracted as a Histogram of Oriented Gradients (HOG). The HOG is then classified using a dynamic classifier. A class is defined as a pose-viewpoint pair, and a total of 64 classes are defined to represent a forward walking and turning gait sequence. The dynamic classifier consists of a Support Vector Machine (SVM) classifier C64 that recognizes all 64 classes, and 64 SVM classifiers that recognize four classes each – these four classes are chosen based on the temporal relationship between them, dictated by the gait sequence. Findings The solution provides three main advantages: first, classification is efficient due to dynamic selection (4-class vs 64-class classification). Second, classification errors are confined to neighbors of the true viewpoints. This means a wrongly estimated viewpoint is at most an adjacent viewpoint of the true viewpoint, enabling fast recovery from incorrect estimations. Third, the robust temporal relationship between poses is used to resolve the left-right ambiguities of human silhouettes. Originality/value Experiments conducted on both fronto-parallel videos and aerial videos confirm that the solution can achieve accurate pose and trajectory estimation for these different kinds of videos. For example, the “walking on an 8-shaped path” data set (1,652 frames) can achieve the following estimation accuracies: 85 percent for viewpoints and 98.14 percent for poses.


2019 ◽  
Vol 12 (4) ◽  
pp. 466-480
Author(s):  
Li Na ◽  
Xiong Zhiyong ◽  
Deng Tianqi ◽  
Ren Kai

Purpose The precise segmentation of brain tumors is the most important and crucial step in their diagnosis and treatment. Due to the presence of noise, uneven gray levels, blurred boundaries and edema around the brain tumor region, the brain tumor image has indistinct features in the tumor region, which pose a problem for diagnostics. The paper aims to discuss these issues. Design/methodology/approach In this paper, the authors propose an original solution for segmentation using Tamura Texture and ensemble Support Vector Machine (SVM) structure. In the proposed technique, 124 features of each voxel are extracted, including Tamura texture features and grayscale features. Then, these features are ranked using the SVM-Recursive Feature Elimination method, which is also adopted to optimize the parameters of the Radial Basis Function kernel of SVMs. Finally, the bagging random sampling method is utilized to construct the ensemble SVM classifier based on a weighted voting mechanism to classify the types of voxel. Findings The experiments are conducted over a sample data set to be called BraTS2015. The experiments demonstrate that Tamura texture is very useful in the segmentation of brain tumors, especially the feature of line-likeness. The superior performance of the proposed ensemble SVM classifier is demonstrated by comparison with single SVM classifiers as well as other methods. Originality/value The authors propose an original solution for segmentation using Tamura Texture and ensemble SVM structure.


2016 ◽  
Vol 26 (2) ◽  
pp. 206-217 ◽  
Author(s):  
Jason R. Baron ◽  
Anne Thurston

Purpose This paper aims to present a high-level summary of the US archivist’s digital mandate for 2019, embodied in the publication “Managing Government Records”, issued on August 24, 2012, and a summary of US policy. The authors then consider the implications of the US e-recordkeeping initiative for lower-resource countries. Design/methodology/approach After setting out key elements of the US Archivist’s digital mandate, the paper proceeds to evaluate its policy implications for lower-resource countries based on the authors’ field experience and knowledge of case studies. Findings The USA is embarking on a state of the art approach for managing public sector archives in a digital form, with deadlines approaching for all federal agencies to manage e-mail and other e-records. Although a similar need exists in lesser-resourced countries, there are enormous barriers to successful implementation of a similar approach. Research limitations/implications The archivist’s 2019 digital mandate assumes that the technology sector will embrace the needs of public sector agencies in working on applicable electronic archiving solutions. Practical implications The Archivist’s Directive has the potential to be an enormous driver of change in the records management profession with respect to future management of increasingly digital archive collections. Vast collections of public sector e-mail and other forms of e-records potentially will be preserved under the directive, raising the stakes that archivists and records managers work on solutions in the area of long-term preservation and future access. Social implications The importance of capturing the activities of public-sector institutions in all countries for the purpose of openness, transparency and access cannot be overstated. In an increasingly digital age, new methods are needed to ensure that the historical record of governmental institutions is preserved and made accessible. Originality/value The US Archivist’s mandate represents a cutting-edge approach to long-term digital archiving with potential future applicability to the management of public sector records worldwide.


Author(s):  
Sathish Eswaramoorthy ◽  
N. Sivakumaran ◽  
Sankaranarayanan Sekaran

Purpose The purpose of this paper is to tune support vector machine (SVM) classifier using grey wolf optimizer (GWO). Design/methodology/approach The schema of the work aims at extracting the features from the collected data followed by a SVM classifier and metaheuristic optimization to tune the classifier parameters. Findings The optimal tuning of classifier parameters lowers errors due to manual elucidation and decreases the risk in human perceptions and repeated visual dignosis. Originality/value A novel, GWO based tuning algorithm is used for SVM classifier, which is implemented in classifying the complex and nonlinear biomedical signals like intracranial electroencephalogram.


2021 ◽  
pp. 1-13
Author(s):  
Nadir O. Hamed ◽  
Ahmed H. Samak ◽  
Mostafa A. Ahmad

The evolution of technology has brought new challenges and opportunities for the different dimensions of feature space. The higher dimension of the feature space is one of the most critical issues in e-mail classification problems due to accuracy considerations. The problem of finding the subset features that significantly influence the performance of e-mail spam classification has become one of the important challenges. This paper proposes to overcome such a problem, an intelligent approach to Binary Differential Evolution Support Vector Machine (BDE-SVM). The proposed approach enhances the Binary Differential Evolution (BDE) algorithm based on the correlation coefficient as a fitness function to select the significant subset feature evaluated by an SVM classifier. To our best of knowledge, the correlation coefficient as the fitness function has not been used in the differential evolution algorithm before. The selected subset feature is used to assess the most features that contribute to the reliability of the email spam classification. The finding of the enhanced BDE is to present a powerful accuracy. The tests were conducted using “Spambase” and “SpamAssassin.” Identified benchmark datasets are to assess the feasibility of the proposed solution. The result with full-feature accuracy was 93.55 percent compared to the proposed BDE-SVM approach, which is 93.99 percent. Empirical findings also show that our method is capable of effectively increasing the number of features required to enhance the reliability of the email spam classification.


2018 ◽  
pp. 30-36
Author(s):  
Miklós Gábriel Tulics ◽  
Klára Vicsi

Dysphonia is a common complaint, almost every fourth child produces a pathological voice. A mobile based filtering system, that can be used by pre-school workers in order to recognize dysphonic voiced children in order to get professional help as soon as possible, would be desired. The goal of this research is to identify acoustic parameters that are able to distinguish healthy voices of children from those with dysphonia voices of children. In addition, the possibility of automatic classification is children. In addition, the possibility of automatic classification is examined. Two sample T-tests were used for statistical significance testing for the mean values of the acoustic parameters between healthy voices and those with dysphonia. A two-class classification was performed between the two groups using leave-one-out cross validation, with support vector machine (SVM) classifier. Formant frequencies, mel-frequency cepstral coefficients (MFCCs), Harmonics-to-Noise Ratio (HNR), Soft Phonation Index (SPI) and frequency band energy ratios, based on intrinsic mode functions measured on different variations of phonemes showed statistical difference between the groups. A high classification accuracy of 93% was achieved by SVM with linear and rbf kernel using only 8 acoustic parameters. Additional data is needed to build a more general model, but this research can be a reference point in the classification of voices using continuous speech between healthy children and children with dysphonia.


2014 ◽  
Vol 8 (2) ◽  
pp. 146-159 ◽  
Author(s):  
D. Saxena ◽  
S.N. Singh ◽  
K.S. Verma ◽  
Shiv K. Singh

Purpose – An electrical power system is expected to deliver undistorted sinusoidal, rated voltage and current continuously to the end-users. The problem of power quality (PQ) occurs when there is (are) deviation(s) in voltage and/or current which cause(s) failure or mal-operation of the customer's equipments. Various methods are suggested to detect and classify single PQ event in a power system, the performance of such methods to classify composite PQ events is limited. The purpose of this paper is the classification of composite PQ events in emerging power systems. Design/methodology/approach – This paper proposes an effective method to classify composite PQ events using Hilbert Huang transform (HHT). The performance of probabilistic neural network (PNN) classifier and support vector machine (SVM) classifier to efficiently classify composite PQ events is compared. Findings – The features extracted from HHT are simple yet effective. SVMs and PNN classifiers are used for PQ classification. It is found that PNN classifier outperforms SVM with the classification accuracy of 100 percent. Originality/value – Different PQ signals used for analysis are generated by simulating a practical distribution system of an Indian academic institution.


Sensor Review ◽  
2018 ◽  
Vol 38 (1) ◽  
pp. 65-73 ◽  
Author(s):  
Rabeb Faleh ◽  
Sami Gomri ◽  
Mehdi Othman ◽  
Khalifa Aguir ◽  
Abdennaceur Kachouri

Purpose In this paper, a novel hybrid approach aimed at solving the problem of cross-selectivity of gases in electronic nose (E-nose) using the combination classifiers of support vector machine (SVM) and k-nearest neighbors (KNN) methods was proposed. Design/methodology/approach First, three WO3 sensors E-nose system was used for data acquisition to detect three gases, namely, ozone, ethanol and acetone. Then, two transient parameters, derivate and integral, were extracted for each gas response. Next, the principal component analysis (PCA) was been applied to extract the most relevant sensor data and dimensionality reduction. The new coordinates calculated by PCA were used as inputs for classification by the SVM method. Finally, the classification achieved by the KNN method was carried out to calculate only the support vectors (SVs), not all the data. Findings This work has proved that the proposed fusion method led to the highest classification rate (100 per cent) compared to the accuracy of the individual classifiers: KNN, SVM-linear, SVM-RBF, SVM-polynomial that present, respectively, 89, 75.2, 80 and 79.9 per cent as classification rate. Originality/value The authors propose a fusion classifier approach to improve the classification rate. In this method, the extracted features are projected into the PCA subspace to reduce the dimensionality. Then, the obtained principal components are introduced to the SVM classifier and calculated SVs which will be used in the KNN method.


Kybernetes ◽  
2016 ◽  
Vol 45 (6) ◽  
pp. 977-994 ◽  
Author(s):  
Oluyinka Aderemi Adewumi ◽  
Ayobami Andronicus Akinyelu

Purpose – Phishing is one of the major challenges faced by the world of e-commerce today. Thanks to phishing attacks, billions of dollars has been lost by many companies and individuals. The global impact of phishing attacks will continue to be on the increase and thus a more efficient phishing detection technique is required. The purpose of this paper is to investigate and report the use of a nature inspired based-machine learning (ML) approach in classification of phishing e-mails. Design/methodology/approach – ML-based techniques have been shown to be efficient in detecting phishing attacks. In this paper, firefly algorithm (FFA) was integrated with support vector machine (SVM) with the primary aim of developing an improved phishing e-mail classifier (known as FFA_SVM), capable of accurately detecting new phishing patterns as they occur. From a data set consisting of 4,000 phishing and ham e-mails, a set of features, suitable for phishing e-mail detection, was extracted and used to construct the hybrid classifier. Findings – The FFA_SVM was applied to a data set consisting of up to 4,000 phishing and ham e-mails. Simulation experiments were performed to evaluate and compared the performance of the classifier. The tests yielded a classification accuracy of 99.94 percent, false positive rate of 0.06 percent and false negative rate of 0.04 percent. Originality/value – The hybrid algorithm has not been earlier apply, as in this work, to the classification and detection of phishing e-mail, to the best of the authors’ knowledge.


2020 ◽  
Author(s):  
Nalika Ulapane ◽  
Karthick Thiyagarajan ◽  
sarath kodagoda

<div>Classification has become a vital task in modern machine learning and Artificial Intelligence applications, including smart sensing. Numerous machine learning techniques are available to perform classification. Similarly, numerous practices, such as feature selection (i.e., selection of a subset of descriptor variables that optimally describe the output), are available to improve classifier performance. In this paper, we consider the case of a given supervised learning classification task that has to be performed making use of continuous-valued features. It is assumed that an optimal subset of features has already been selected. Therefore, no further feature reduction, or feature addition, is to be carried out. Then, we attempt to improve the classification performance by passing the given feature set through a transformation that produces a new feature set which we have named the “Binary Spectrum”. Via a case study example done on some Pulsed Eddy Current sensor data captured from an infrastructure monitoring task, we demonstrate how the classification accuracy of a Support Vector Machine (SVM) classifier increases through the use of this Binary Spectrum feature, indicating the feature transformation’s potential for broader usage.</div><div><br></div>


Sign in / Sign up

Export Citation Format

Share Document