Linking serial homicide – towards an ecologically valid application

Purpose Crime linkage analysis (CLA) can be applied in the police investigation-phase to sift through a database to find behaviorally similar cases to the one under investigation and in the trial-phase to try to prove that the perpetrator of two or more offences is the same, by showing similarity and distinctiveness in the offences. Lately, research has moved toward more naturalistic settings, analyzing data sets that are as similar to actual crime databases as possible. One such step has been to include one-off offences in the data sets, but this has not yet been done with homicide. The purpose of this paper is to investigate how linking accuracy of serial homicide is affected as a function of added hard-to-solve one-off offences. Design/methodology/approach A sample (N = 117–1160) of Italian serial homicides (n = 116) and hard-to-solve one-off homicides (n = 1–1044, simulated from 45 cases) was analyzed using a Bayesian approach to identify series membership, and a case by case comparison of similarity using Jaccard’s coefficient. Linking accuracy was evaluated using receiver operating characteristics and by examining the sensitivity and specificity of the model. Findings After an initial dip in linking accuracy (as measured by the AUC), the accuracy increased as more one-offs were added to the data. While adding one-offs made it easier to identify correct series (increased sensitivity), there was an increase in false positives (decreased specificity) in the linkage decisions. When rank ordering cases according to similarity, linkage accuracy was affected negatively as a function of added non-serial cases. Practical implications While using a more natural data set, in terms of adding a significant portion of non-serial homicides into the mix, does introduce error into the linkage decision, the authors conclude that taken overall, the findings still support the validity of CLA in practice. Originality/value This is the first crime linkage study on homicide to investigate how linking accuracy is affected as a function of non-serial cases being introduced into the data.

Download Full-text

The implications of the revised code of corporate governance on firm performance

Journal of Accounting in Emerging Economies ◽

10.1108/jaee-11-2012-0048 ◽

2015 ◽

Vol 5 (3) ◽

pp. 350-380 ◽

Cited By ~ 15

Author(s):

Abdifatah Ahmed Haji ◽

Sanni Mubaraq

Keyword(s):

Corporate Governance ◽

Firm Performance ◽

Ownership Structure ◽

Market Performance ◽

Negative Relationship ◽

Future Research ◽

Data Sets ◽

Data Set ◽

Content Type ◽

The One

Purpose – The purpose of this paper is to examine the impact of corporate governance and ownership structure attributes on firm performance following the revised code on corporate governance in Malaysia. The study presents a longitudinal assessment of the compliance and implications of the revised code on firm performance. Design/methodology/approach – Two data sets consisting of before (2006) and after (2008-2010) the revised code are examined. Drawing from the largest companies listed on Bursa Malaysia (BM), the first data set contains 92 observations in the year 2006 while the second data set comprises of 282 observations drawn from the largest companies listed on BM over a three-year period, from 2008-2010. Both accounting (return on assets and return on equity) and market performance (Tobin’s Q) measures were used to measure firm performance. Multiple and panel data regression analyses were adopted to analyze the data. Findings – The study shows that there were still cases of non-compliance to the basic requirements of the code such as the one-third independent non-executive director (INDs) requirement even after the revised code. While the regression models indicate marginal significance of board size and independent directors before the revised code, the results indicate all corporate governance variables have a significant negative relationship with at least one of the measures of corporate performance. Independent chairperson, however, showed a consistent positive impact on firm performance both before and after the revised code. In addition, ownership structure elements were found to have a negative relationship with either accounting or market performance measures, with institutional ownership showing a consistent negative impact on firm performance. Firm size and leverage, as control variables, were significant in determining corporate performance. Research limitations/implications – One limitation is the use of separate measures of corporate governance attributes, as opposed to a corporate governance index (CGI). As a result, the study constructs a CGI based on the recommendations of the revised code and proposes for future research use. Practical implications – Some of the largest companies did not even comply with basic requirements such as the “one-third INDs” mandatory requirement. Hence, the regulators may want to reinforce the requirements of the code and also detail examples of good governance practices. The results, which show a consistent positive relationship between the presence of an independent chairperson and firm performance in both data sets, suggest listed companies to consider appointing an independent chairperson in the corporate leadership. The regulatory authorities may also wish to note this phenomenon when drafting any future corporate governance codes. Originality/value – This study offers new insights of the implications of regulatory changes on the relationship between corporate governance attributes and firm performance from the perspective of a developing country. The development of a CGI for future research is a novel approach of this study.

Download Full-text

A systematic review of machine learning-based missing value imputation techniques

Data Technologies and Applications ◽

10.1108/dta-12-2020-0298 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Tressy Thomas ◽

Enayat Rajabi

Keyword(s):

Machine Learning ◽

Selection Process ◽

Evaluation Metrics ◽

Correct Prediction ◽

Data Sets ◽

Data Set ◽

Missing Value ◽

Content Type ◽

Missing Value Imputation ◽

Literature Reviews

PurposeThe primary aim of this study is to review the studies from different dimensions including type of methods, experimentation setup and evaluation metrics used in the novel approaches proposed for data imputation, particularly in the machine learning (ML) area. This ultimately provides an understanding about how well the proposed framework is evaluated and what type and ratio of missingness are addressed in the proposals. The review questions in this study are (1) what are the ML-based imputation methods studied and proposed during 2010–2020? (2) How the experimentation setup, characteristics of data sets and missingness are employed in these studies? (3) What metrics were used for the evaluation of imputation method?Design/methodology/approachThe review process went through the standard identification, screening and selection process. The initial search on electronic databases for missing value imputation (MVI) based on ML algorithms returned a large number of papers totaling at 2,883. Most of the papers at this stage were not exactly an MVI technique relevant to this study. The literature reviews are first scanned in the title for relevancy, and 306 literature reviews were identified as appropriate. Upon reviewing the abstract text, 151 literature reviews that are not eligible for this study are dropped. This resulted in 155 research papers suitable for full-text review. From this, 117 papers are used in assessment of the review questions.FindingsThis study shows that clustering- and instance-based algorithms are the most proposed MVI methods. Percentage of correct prediction (PCP) and root mean square error (RMSE) are most used evaluation metrics in these studies. For experimentation, majority of the studies sourced the data sets from publicly available data set repositories. A common approach is that the complete data set is set as baseline to evaluate the effectiveness of imputation on the test data sets with artificially induced missingness. The data set size and missingness ratio varied across the experimentations, while missing datatype and mechanism are pertaining to the capability of imputation. Computational expense is a concern, and experimentation using large data sets appears to be a challenge.Originality/valueIt is understood from the review that there is no single universal solution to missing data problem. Variants of ML approaches work well with the missingness based on the characteristics of the data set. Most of the methods reviewed lack generalization with regard to applicability. Another concern related to applicability is the complexity of the formulation and implementation of the algorithm. Imputations based on k-nearest neighbors (kNN) and clustering algorithms which are simple and easy to implement make it popular across various domains.

Download Full-text

Fast and accurate detection of surface defect based on improved YOLOv4

Assembly Automation ◽

10.1108/aa-04-2021-0044 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Jiawei Lian ◽

Junhong He ◽

Yun Niu ◽

Tianze Wang

Keyword(s):

Feature Extraction ◽

Real Time ◽

Surface Defect ◽

Steel Ingot ◽

Industrial Applications ◽

Data Sets ◽

Data Set ◽

Processing Technologies ◽

Content Type ◽

Public Data

Purpose The current popular image processing technologies based on convolutional neural network have the characteristics of large computation, high storage cost and low accuracy for tiny defect detection, which is contrary to the high real-time and accuracy, limited computing resources and storage required by industrial applications. Therefore, an improved YOLOv4 named as YOLOv4-Defect is proposed aim to solve the above problems. Design/methodology/approach On the one hand, this study performs multi-dimensional compression processing on the feature extraction network of YOLOv4 to simplify the model and improve the feature extraction ability of the model through knowledge distillation. On the other hand, a prediction scale with more detailed receptive field is added to optimize the model structure, which can improve the detection performance for tiny defects. Findings The effectiveness of the method is verified by public data sets NEU-CLS and DAGM 2007, and the steel ingot data set collected in the actual industrial field. The experimental results demonstrated that the proposed YOLOv4-Defect method can greatly improve the recognition efficiency and accuracy and reduce the size and computation consumption of the model. Originality/value This paper proposed an improved YOLOv4 named as YOLOv4-Defect for the detection of surface defect, which is conducive to application in various industrial scenarios with limited storage and computing resources, and meets the requirements of high real-time and precision.

Download Full-text

The insoluble problems of books: what does Altmetric.com have to offer?

Aslib Journal of Information Management ◽

10.1108/ajim-06-2018-0152 ◽

2018 ◽

Vol 70 (6) ◽

pp. 691-707 ◽

Cited By ~ 8

Author(s):

Daniel Torres-Salinas ◽

Juan Gorraiz ◽

Nicolas Robinson-Garcia

Keyword(s):

Special Focus ◽

Data Sets ◽

Content Type ◽

Social Sciences And Humanities ◽

Promising Tool ◽

The Social ◽

Google Books ◽

Data Source ◽

Alternative Sources ◽

The One

Purpose The purpose of this paper is to analyze the capabilities, functionalities and appropriateness of Altmetric.com as a data source for the bibliometric analysis of books in comparison to PlumX. Design/methodology/approach The authors perform an exploratory analysis on the metrics the Altmetric Explorer for Institutions, platform offers for books. The authors use two distinct data sets of books. On the one hand, the authors analyze the Book Collection included in Altmetric.com. On the other hand, the authors use Clarivate’s Master Book List, to analyze Altmetric.com’s capabilities to download and merge data with external databases. Finally, the authors compare the findings with those obtained in a previous study performed in PlumX. Findings Altmetric.com combines and orderly tracks a set of data sources combined by DOI identifiers to retrieve metadata from books, being Google Books its main provider. It also retrieves information from commercial publishers and from some Open Access initiatives, including those led by university libraries, such as Harvard Library. We find issues with linkages between records and mentions or ISBN discrepancies. Furthermore, the authors find that automatic bots affect greatly Wikipedia mentions to books. The comparison with PlumX suggests that none of these tools provide a complete picture of the social attention generated by books and are rather complementary than comparable tools. Practical implications This study targets different audience which can benefit from the findings. First, bibliometricians and researchers who seek for alternative sources to develop bibliometric analyses of books, with a special focus on the Social Sciences and Humanities fields. Second, librarians and research managers who are the main clients to which these tools are directed. Third, Altmetric.com itself as well as other altmetric providers who might get a better understanding of the limitations users encounter and improve this promising tool. Originality/value This is the first study to analyze Altmetric.com’s functionalities and capabilities for providing metric data for books and to compare results from this platform, with those obtained via PlumX.

Download Full-text

Key Parts of Transmission Line Detection Using Improved YOLO v3

The International Arab Journal of Information Technology ◽

10.34028/iajit/18/6/1 ◽

2021 ◽

Vol 18 (6) ◽

Author(s):

Tu Renwei ◽

Zhu Zhongjie ◽

Bai Yongqiang ◽

Gao Ming ◽

Ge Zhifeng

Keyword(s):

Transmission Line ◽

Detection Accuracy ◽

Data Sets ◽

Feature Maps ◽

Data Set ◽

Detection Model ◽

The Neural Network ◽

Low Efficiency ◽

The One ◽

Detection Speed

Unmanned Aerial Vehicle (UAV) inspection has become one of main methods for current transmission line inspection, but there are still some shortcomings such as slow detection speed, low efficiency, and inability for low light environment. To address these issues, this paper proposes a deep learning detection model based on You Only Look Once (YOLO) v3. On the one hand, the neural network structure is simplified, that is the three feature maps of YOLO v3 are pruned into two to meet specific detection requirements. Meanwhile, the K-means++ clustering method is used to calculate the anchor value of the data set to improve the detection accuracy. On the other hand, 1000 sets of power tower and insulator data sets are collected, which are inverted and scaled to expand the data set, and are fully optimized by adding different illumination and viewing angles. The experimental results show that this model using improved YOLO v3 can effectively improve the detection accuracy by 6.0%, flops by 8.4%, and the detection speed by about 6.0%.

Download Full-text

An application of data envelopment analysis for Korean banks with negative data

Benchmarking An International Journal ◽

10.1108/bij-02-2016-0023 ◽

2017 ◽

Vol 24 (4) ◽

pp. 1052-1064 ◽

Cited By ~ 9

Author(s):

Yong Joo Lee ◽

Seong-Jong Joo ◽

Hong Gyun Park

Keyword(s):

Data Envelopment Analysis ◽

Data Sets ◽

Data Envelopment ◽

Negative Data ◽

Ownership Type ◽

Data Set ◽

Translation Invariant ◽

Content Type ◽

Dea Models ◽

Regional Banks

Purpose The purpose of this paper is to measure the comparative efficiency of 18 Korean commercial banks under the presence of negative observations and examine performance differences among them by grouping them according to their market conditions. Design/methodology/approach The authors employ two data envelopment analysis (DEA) models such as a Banker, Charnes, and Cooper (BCC) model and a modified slacks-based measure of efficiency (MSBM) model, which can handle negative data. The BCC model is proven to be translation invariant for inputs or outputs depending on output or input orientation. Meanwhile, the MSBM model is unit invariant in addition to translation invariant. The authors compare results from both models and choose one for interpreting results. Findings Most Korean banks recovered from the worst performance in 2011 and showed similar performance in recent years. Among three groups such as national banks, regional banks, and special banks, the most special banks demonstrated superb performance across models and years. Especially, the performance difference between the special banks and the regional banks was statistically significant. The authors concluded that the high performance of the special banks was due to their nationwide market access and ownership type. Practical implications This study demonstrates how to analyze and measure the efficiency of entities when variables contain negative observations using a data set for Korean banks. The authors have tried two major DEA models that are able to handle negative data and proposed a practical direction for future studies. Originality/value Although there are research papers for measuring the performance of banks in Korea, all of the papers in the topic have studied efficiency or productivity using positive data sets. However, variables such as net incomes and growth rates frequently include negative observations in bank data sets. This is the first paper to investigate the efficiency of bank operations in the presence of negative data in Korea.

Download Full-text

A scalable eigenspace-based fuzzy c-means for topic detection

Data Technologies and Applications ◽

10.1108/dta-11-2020-0262 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Hendri Murfi

Keyword(s):

Representation Learning ◽

Detection Methods ◽

Data Sets ◽

Topic Detection ◽

Data Set ◽

Content Type ◽

Running Time ◽

Fuzzy C Means ◽

Coherence Score ◽

Value Decomposition

PurposeThe aim of this research is to develop an eigenspace-based fuzzy c-means method for scalable topic detection.Design/methodology/approachThe eigenspace-based fuzzy c-means (EFCM) combines representation learning and clustering. The textual data are transformed into a lower-dimensional eigenspace using truncated singular value decomposition. Fuzzy c-means is performed on the eigenspace to identify the centroids of each cluster. The topics are provided by transforming back the centroids into the nonnegative subspace of the original space. In this paper, we extend the EFCM method for scalability by using the two approaches, i.e. single-pass and online. We call the developed topic detection methods as oEFCM and spEFCM.FindingsOur simulation shows that both oEFCM and spEFCM methods provide faster running times than EFCM for data sets that do not fit in memory. However, there is a decrease in the average coherence score. For both data sets that fit and do not fit into memory, the oEFCM method provides a tradeoff between running time and coherence score, which is better than spEFCM.Originality/valueThis research produces a scalable topic detection method. Besides this scalability capability, the developed method also provides a faster running time for the data set that fits in memory.

Download Full-text

Data Sets

Hurricane Climatology ◽

10.1093/oso/9780199827633.003.0009 ◽

2013 ◽

Author(s):

James B. Elsner ◽

Thomas H. Jagger

Keyword(s):

Model Building ◽

Careful Analysis ◽

Logical Structure ◽

Fortran Program ◽

Data Sets ◽

Data Set ◽

Near Surface ◽

Wind Speeds ◽

The North ◽

The One

Hurricane data originate from careful analysis of past storms by operational meteorologists. The data include estimates of the hurricane position and intensity at 6-hourly intervals. Information related to landfall time, local wind speeds, damages, and deaths, as well as cyclone size, are included. The data are archived by season. Some effort is needed to make the data useful for hurricane climate studies. In this chapter, we describe the data sets used throughout this book. We show you a work flow that includes importing, interpolating, smoothing, and adding attributes. We also show you how to create subsets of the data. Code in this chapter is more complicated and it can take longer to run. You can skip this material on first reading and continue with model building in Chapter 7. You can return here when you have an updated version of the data that includes the most recent years. Most statistical models in this book use the best-track data. Here we describe these data and provide original source material. We also explain how to smooth and interpolate them. Interpolations are needed for regional hurricane analyses. The best-track data set contains the 6-hourly center locations and intensities of all known tropical cyclones across the North Atlantic basin, including the Gulf of Mexico and Caribbean Sea. The data set is called HURDAT for HURricane DATa. It is maintained by the U.S. National Oceanic and Atmospheric Administration (NOAA) at the National Hurricane Center (NHC). Center locations are given in geographic coordinates (in tenths of degrees) and the intensities, representing the one-minute near-surface (∼10 m) wind speeds, are given in knots (1 kt = .5144 m s−1) and the minimum central pressures are given in millibars (1 mb = 1 hPa). The data are provided in 6-hourly intervals starting at 00 UTC (Universal Time Coordinate). The version of HURDAT file used here contains cyclones over the period 1851 through 2010 inclusive. Information on the history and origin of these data is found in Jarvinen et al (1984). The file has a logical structure that makes it easy to read with a FORTRAN program. Each cyclone contains a header record, a series of data records, and a trailer record.

Download Full-text

A systematical approach to classification problems with feature space heterogeneity

Kybernetes ◽

10.1108/k-06-2018-0313 ◽

2019 ◽

Vol 48 (9) ◽

pp. 2006-2029

Author(s):

Hongshan Xiao ◽

Yu Wang

Keyword(s):

Factor Analysis ◽

Meta Analysis ◽

Feature Space ◽

Classification Performance ◽

Classification Algorithm ◽

Significant Feature ◽

Data Sets ◽

Data Set ◽

Classification Techniques ◽

Content Type

Purpose Feature space heterogeneity exists widely in various application fields of classification techniques, such as customs inspection decision, credit scoring and medical diagnosis. This paper aims to study the relationship between feature space heterogeneity and classification performance. Design/methodology/approach A measurement is first developed for measuring and identifying any significant heterogeneity that exists in the feature space of a data set. The main idea of this measurement is derived from a meta-analysis. For the data set with significant feature space heterogeneity, a classification algorithm based on factor analysis and clustering is proposed to learn the data patterns, which, in turn, are used for data classification. Findings The proposed approach has two main advantages over the previous methods. The first advantage lies in feature transform using orthogonal factor analysis, which results in new features without redundancy and irrelevance. The second advantage rests on samples partitioning to capture the feature space heterogeneity reflected by differences of factor scores. The validity and effectiveness of the proposed approach is verified on a number of benchmarking data sets. Research limitations/implications Measurement should be used to guide the heterogeneity elimination process, which is an interesting topic in future research. In addition, to develop a classification algorithm that enables scalable and incremental learning for large data sets with significant feature space heterogeneity is also an important issue. Practical implications Measuring and eliminating the feature space heterogeneity possibly existing in the data are important for accurate classification. This study provides a systematical approach to feature space heterogeneity measurement and elimination for better classification performance, which is favorable for applications of classification techniques in real-word problems. Originality/value A measurement based on meta-analysis for measuring and identifying any significant feature space heterogeneity in a classification problem is developed, and an ensemble classification framework is proposed to deal with the feature space heterogeneity and improve the classification accuracy.

Download Full-text

Determinates of Islamic banks liquidity

Journal of Islamic Accounting and Business Research ◽

10.1108/jiabr-08-2016-0096 ◽

2020 ◽

Vol 11 (8) ◽

pp. 1619-1632

Author(s):

Ahmad Al-Harbi

Keyword(s):

Monetary Policy ◽

Deposit Insurance ◽

Fixed Effect Model ◽

Least Square ◽

Data Set ◽

Content Type ◽

Effect Model ◽

Ratio Size ◽

Cross Country ◽

The One

Purpose The purpose of this paper is to investigate the determinants of Islam banks (IBs) liquidity. Design/methodology/approach In this paper, the author uses a generalized least square fixed effect model on an unbalanced panel data set of all IBs operating in the Organization of Islamic Cooperation countries over the period 1989-2008. Findings The estimation results show that all the determinants have statistically significant relationships with IBs’ liquidity but with different signs. On the one hand, foreign ownership, credit risk, profitability, inflation rate, monetary policy and deposit insurance negatively affected IBs liquidity. On the other hand, capital ratio, size gross domestic product growth and concentration have a positive nexus with IBs’ liquidity. Originality/value According to the best of the author’s knowledge, this is the first empirical study to investigate the determinants of IBs liquidity using cross-country data with a large sample of IBs (110 banks) and over a long period (19 years). Also, the paper included variables that had not been discussed on the previous studies, which used cross-country data, such as efficiency, deposit insurance, monetary policy, concentration and market capitalization.

Download Full-text