Knowledge discovery in sociological databases

2019 ◽  
Vol 3 (3) ◽  
pp. 315-332
Author(s):  
Zhiwen Pan ◽  
Jiangtian Li ◽  
Yiqiang Chen ◽  
Jesus Pacheco ◽  
Lianjun Dai ◽  
...  

Purpose The General Society Survey(GSS) is a kind of government-funded survey which aims at examining the Socio-economic status, quality of life, and structure of contemporary society. GSS data set is regarded as one of the authoritative source for the government and organization practitioners to make data-driven policies. The previous analytic approaches for GSS data set are designed by combining expert knowledges and simple statistics. By utilizing the emerging data mining algorithms, we proposed a comprehensive data management and data mining approach for GSS data sets. Design/methodology/approach The approach are designed to be operated in a two-phase manner: a data management phase which can improve the quality of GSS data by performing attribute pre-processing and filter-based attribute selection; a data mining phase which can extract hidden knowledge from the data set by performing data mining analysis including prediction analysis, classification analysis, association analysis and clustering analysis. Findings According to experimental evaluation results, the paper have the following findings: Performing attribute selection on GSS data set can increase the performance of both classification analysis and clustering analysis; all the data mining analysis can effectively extract hidden knowledge from the GSS data set; the knowledge generated by different data mining analysis can somehow cross-validate each other. Originality/value By leveraging the power of data mining techniques, the proposed approach can explore knowledge in a fine-grained manner with minimum human interference. Experiments on Chinese General Social Survey data set are conducted at the end to evaluate the performance of our approach.

2018 ◽  
Vol 21 (2) ◽  
pp. 62-71
Author(s):  
Henry O’Lawrence ◽  
Rohan Chowlkar

Purpose The purpose of this paper is to determine the cost effectiveness of palliative care on patients in a home health and hospice setting. Secondary data set was utilized to test the hypotheses of this study. Home health care and hospice care services have the potential to avert hospital admissions in patients requiring palliative care, which significantly affects medicare spending. With the aging population, it has become evident that demand of palliative care will increase four-fold. It was determined that current spending on end-of-life care is radically emptying medicare funds and fiscally weakening numerous families who have patients under palliative care during life-threatening illnesses. The study found that a majority of people registering for palliative and hospice care settings are above the age group of 55 years old. Design/methodology/approach Different variables like length of stay, mode of payment and disease diagnosis were used to filter the available data set. Secondary data were utilized to test the hypothesis of this study. There are very few studies on hospice and palliative care services and no study focuses on the cost associated with this care. Since a very large number of the USA, population is turning 65 and over, it is very important to analyze the cost of care for palliative and hospice care. For the purpose of this analysis, data were utilized from the National Home and Hospice Care Survey (NHHCS), which has been conducted periodically by the Centers for Disease Control and Prevention’s National Center for Health Statistics. Descriptive statistics, χ2 tests and t-tests were used to test for statistical significance at the p<0.05 level. Findings The Statistical Package for Social Sciences (SPSS) was utilized for this result. H1 predicted that patients in the age group of 65 years and up have the highest utilization of home and hospice care. This study examined various demographic variables in hospice and home health care which may help to evaluate the cost of care and the modes of payments. This section of the result presents the descriptive analysis of dependent, independent and covariate variables that provide the overall national estimates on differences in use of home and hospice care in various age groups and sex. Research limitations/implications The data set used was from the 2007 NHHCS survey, no data have been collected thereafter, and therefore, gap in data analysis may give inaccurate findings. To compensate for this gap in the data set, recent studies were reviewed which analyzed cost in palliative care in the USA. There has been a lack of evidence to prove the cost savings and improved quality of life in palliative/hospice care. There is a need for new research on the various cost factors affecting palliative care services as well as considering the quality of life. Although, it is evident that palliative care treatment is less expensive as compared to the regular care, since it eliminates the direct hospitalization cost, but there is inadequate research to prove that it improves the quality of life. A detailed research is required considering the additional cost incurred in palliative/hospice care services and a cost-benefit analysis of the same. Practical implications While various studies reporting information applicable to the expenses and effect of family caregiving toward the end-of-life were distinguished, none of the previous research discussed this issue as their central focus. Most studies addressed more extensive financial effect of palliative and end-of-life care, including expenses borne by the patients themselves, the medicinal services framework and safety net providers or beneficent/willful suppliers. This shows a significant hole in the current writing. Social implications With the aging population, it has become evident that demand of palliative/hospice care will increase four-fold. The NHHCS have stopped keeping track of the palliative care requirements after 2007, which has a negative impact on the growing needs. Cost analysis can only be performed by analyzing existing data. This review has recognized a huge niche in the evidence base with respect to the cost cares of giving care and supporting a relative inside a palliative/hospice care setting. Originality/value The study exhibited that cost diminishments in aggressive medications can take care of the expenses of palliative/hospice care services. The issue of evaluating result in such a physically measurable way is complicated by the impalpable nature of large portions of the individual components of outcome. Although physical and mental well-being can be evaluated to a certain degree, it is significantly more difficult to gauge in a quantifiable way, the social and profound measurements of care that help fundamentally to general quality of care.


2017 ◽  
Vol 19 (2) ◽  
pp. 53-66 ◽  
Author(s):  
Michael Preston-Shoot

Purpose The purpose of this paper is twofold: first, to update the core data set of self-neglect serious case reviews (SCRs) and safeguarding adult reviews (SARs), and accompanying thematic analysis; second, to respond to the critique in the Wood Report of SCRs commissioned by Local Safeguarding Children Boards (LSCBs) by exploring the degree to which the reviews scrutinised here can transform and improve the quality of adult safeguarding practice. Design/methodology/approach Further published reviews are added to the core data set from the websites of Safeguarding Adults Boards (SABs) and from contacts with SAB independent chairs and business managers. Thematic analysis is updated using the four domains employed previously. The findings are then further used to respond to the critique in the Wood Report of SCRs commissioned by LSCBs, with implications discussed for Safeguarding Adult Boards. Findings Thematic analysis within and recommendations from reviews have tended to focus on the micro context, namely, what takes place between individual practitioners, their teams and adults who self-neglect. This level of analysis enables an understanding of local geography. However, there are other wider systems that impact on and influence this work. If review findings and recommendations are to fully answer the question “why”, systemic analysis should appreciate the influence of national geography. Review findings and recommendations may also be used to contest the critique of reviews, namely, that they fail to engage practitioners, are insufficiently systemic and of variable quality, and generate repetitive findings from which lessons are not learned. Research limitations/implications There is still no national database of reviews commissioned by SABs so the data set reported here might be incomplete. The Care Act 2014 does not require publication of reports but only a summary of findings and recommendations in SAB annual reports. This makes learning for service improvement challenging. Reading the reviews reported here against the strands in the critique of SCRs enables conclusions to be reached about their potential to transform adult safeguarding policy and practice. Practical implications Answering the question “why” is a significant challenge for SARs. Different approaches have been recommended, some rooted in systems theory. The critique of SCRs challenges those now engaged in SARs to reflect on how transformational change can be achieved to improve the quality of adult safeguarding policy and practice. Originality/value The paper extends the thematic analysis of available reviews that focus on work with adults who self-neglect, further building on the evidence base for practice. The paper also contributes new perspectives to the process of conducting SARs by using the analysis of themes and recommendations within this data set to evaluate the critique that reviews are insufficiently systemic, fail to engage those involved in reviewed cases and in their repetitive conclusions demonstrate that lessons are not being learned.


2020 ◽  
Vol 2 (2) ◽  
pp. 01-17
Author(s):  
Khamami Herusantoso ◽  
Ardyanto Dwi Saputra

In the dwell-time, the customs clearance is considered as the most complex phase, even though its portion is the shortest among other phases, such as pre-clearance and post clearance. In order to improve the efficiency and effectiveness on the services performed in the customs clearance process, the customs authorities must start considering the help of database analysis in identifying obstacles instead of depending on the personal analysis. Useful information is hidden among the importation data set and it is extractable through data mining techniques. This study explores the customs clearance process of import cargo whose document is declared through the red channel at Prime Customs Office Type A of Tanjung Priok (PCO Tanjung Priok), and applies a specific data mining classifier called the decision tree with J48 algorithm to evaluate the process. There are 11 classification models developed using unpruned, online pruning, and post-pruning features. One best model is chosen to extract the hidden knowledge that describes factors affecting the customs clearance process and allows the customs authorities to improve their services performed in the future.


2008 ◽  
pp. 2088-2104
Author(s):  
Qingyu Zhang ◽  
Richard S. Segall

This chapter illustrates the use of data mining as a computational intelligence methodology for forecasting data management needs. Specifically, this chapter discusses the use of data mining with multidimensional databases for determining data management needs for the selected biotechnology data of forest cover data (63,377 rows and 54 attributes) and human lung cancer data set (12,600 rows of transcript sequences and 156 columns of gene types). The data mining is performed using four selected software of SAS® Enterprise MinerTM, Megaputer PolyAnalyst® 5.0, NeuralWare Predict®, and Bio- Discovery GeneSight®. The analysis and results will be used to enhance the intelligence capabilities of biotechnology research by improving data visualization and forecasting for organizations. The tools and techniques discussed here can be representative of those applicable in a typical manufacturing and production environment. Screen shots of each of the four selected software are presented, as are conclusions and future directions.


Author(s):  
Nikos Pelekis ◽  
Babis Theodoulidis ◽  
Ioannis Kopanakis ◽  
Yannis Theodoridis

QOSP Quality of Service Open Shortest Path First based on QoS routing has been recognized as a missing piece in the evolution of QoS-based services in the Internet. Data mining has emerged as a tool for data analysis, discovery of new information, and autonomous decision-making. This paper focuses on routing algorithms and their appli-cations for computing QoS routes in OSPF protocol. The proposed approach is based on a data mining approach using rough set theory, for which the attribute-value system about links of networks is created from network topology. Rough set theory offers a knowledge discovery approach to extracting routing-decisions from attribute set. The extracted rules can then be used to select significant routing-attributes and make routing-selections in routers. A case study is conducted to demonstrate that rough set theory is effective in finding the most significant attribute set. It is shown that the algorithm based on data mining and rough set offers a promising approach to the attribute-selection prob-lem in internet routing.


2018 ◽  
Vol 2 (2) ◽  
pp. 164-176
Author(s):  
Zhiwen Pan ◽  
Wen Ji ◽  
Yiqiang Chen ◽  
Lianjun Dai ◽  
Jun Zhang

Purpose The disability datasets are the datasets that contain the information of disabled populations. By analyzing these datasets, professionals who work with disabled populations can have a better understanding of the inherent characteristics of the disabled populations, so that working plans and policies, which can effectively help the disabled populations, can be made accordingly. Design/methodology/approach In this paper, the authors proposed a big data management and analytic approach for disability datasets. Findings By using a set of data mining algorithms, the proposed approach can provide the following services. The data management scheme in the approach can improve the quality of disability data by estimating miss attribute values and detecting anomaly and low-quality data instances. The data mining scheme in the approach can explore useful patterns which reflect the correlation, association and interactional between the disability data attributes. Experiments based on real-world dataset are conducted at the end to prove the effectiveness of the approach. Originality/value The proposed approach can enable data-driven decision-making for professionals who work with disabled populations.


Author(s):  
Dominique Haughton ◽  
Guangying Hua ◽  
Danny Jin ◽  
John Lin ◽  
Qizhi Wei ◽  
...  

Purpose – The purpose of this paper is to propose data mining techniques to model the return on investment from various types of promotional spending to market a drug and then use the model to draw conclusions on how the pharmaceutical industry might go about allocating promotion expenditures in a more efficient manner, potentially reducing costs to the consumer. The main contributions of the paper are two-fold. First, it demonstrates how to undertake a promotion mix optimization process in the pharmaceutical context and carry it through from the beginning to the end. Second, the paper proposes using directed acyclic graphs (DAGs) to help unravel the direct and indirect effects of various promotional media on sales volume. Design/methodology/approach – A synthetic data set was constructed to prototype proposed data mining techniques and two analyses approaches were investigated. Findings – The two methods were found to yield insights into the problem of the promotion mix in the context of the healthcare industry. First, a factor analysis followed by a regression analysis and an optimization algorithm applied to the resulting equation were used. Second, DAG was used to unravel direct and indirect effects of promotional expenditures on new prescriptions. Research limitations/implications – The data are synthetic and do not incorporate any time autocorrelations. Practical implications – The promotion mix optimization process is demonstrated from the beginning to the end, and the issue of negative coefficient in promotion mix models are addressed. In addition, a method is proposed to identify direct and indirect effects on new prescriptions. Social implications – A better allocation of promotional expenditures has the potential for reducing the cost of healthcare to consumers. Originality/value – The contributions of the paper are two-fold: for the first time in the literature (to the best of the authors’ knowledge), the authors have undertaken a promotion mix optimization process and have carried it through from the beginning to the end Second, the authors propose the use of DAGs to help unravel the effects of various promotion media on sales volume, notably direct and indirect effects.


2017 ◽  
Vol 10 (2) ◽  
pp. 111-129 ◽  
Author(s):  
Ali Hasan Alsaffar

Purpose The purpose of this paper is to present an empirical study on the effect of two synthetic attributes to popular classification algorithms on data originating from student transcripts. The attributes represent past performance achievements in a course, which are defined as global performance (GP) and local performance (LP). GP of a course is an aggregated performance achieved by all students who have taken this course, and LP of a course is an aggregated performance achieved in the prerequisite courses by the student taking the course. Design/methodology/approach The paper uses Educational Data Mining techniques to predict student performance in courses, where it identifies the relevant attributes that are the most key influencers for predicting the final grade (performance) and reports the effect of the two suggested attributes on the classification algorithms. As a research paradigm, the paper follows Cross-Industry Standard Process for Data Mining using RapidMiner Studio software tool. Six classification algorithms are experimented: C4.5 and CART Decision Trees, Naive Bayes, k-neighboring, rule-based induction and support vector machines. Findings The outcomes of the paper show that the synthetic attributes have positively improved the performance of the classification algorithms, and also they have been highly ranked according to their influence to the target variable. Originality/value This paper proposes two synthetic attributes that are integrated into real data set. The key motivation is to improve the quality of the data and make classification algorithms perform better. The paper also presents empirical results showing the effect of these attributes on selected classification algorithms.


Author(s):  
Maria Torres Vega ◽  
Vittorio Sguazzo ◽  
Decebal Constantin Mocanu ◽  
Antonio Liotta

Purpose The Video Quality Metric (VQM) is one of the most used objective methods to assess video quality, because of its high correlation with the human visual system (HVS). VQM is, however, not viable in real-time deployments such as mobile streaming, not only due to its high computational demands but also because, as a Full Reference (FR) metric, it requires both the original video and its impaired counterpart. In contrast, No Reference (NR) objective algorithms operate directly on the impaired video and are considerably faster but loose out in accuracy. The purpose of this paper is to study how differently NR metrics perform in the presence of network impairments. Design/methodology/approach The authors assess eight NR metrics, alongside a lightweight FR metric, using VQM as benchmark in a self-developed network-impaired video data set. This paper covers a range of methods, a diverse set of video types and encoding conditions and a variety of network impairment test-cases. Findings The authors show the extent by which packet loss affects different video types, correlating the accuracy of NR metrics to the FR benchmark. This paper helps identifying the conditions under which simple metrics may be used effectively and indicates an avenue to control the quality of streaming systems. Originality/value Most studies in literature have focused on assessing streams that are either unaffected by the network (e.g. looking at the effects of video compression algorithms) or are affected by synthetic network impairments (i.e. via simulated network conditions). The authors show that when streams are affected by real network conditions, assessing Quality of Experience becomes even harder, as the existing metrics perform poorly.


2019 ◽  
Vol 5 (2) ◽  
pp. 108-119
Author(s):  
Yeslam Al-Saggaf ◽  
Amanda Davies

Purpose The purpose of this paper is to discuss the design, application and findings of a case study in which the application of a machine learning algorithm is utilised to identify the grievances in Twitter in an Arabian context. Design/methodology/approach To understand the characteristics of the Twitter users who expressed the identified grievances, data mining techniques and social network analysis were utilised. The study extracted a total of 23,363 tweets and these were stored as a data set. The machine learning algorithm applied to this data set was followed by utilising a data mining process to explore the characteristics of the Twitter feed users. The network of the users was mapped and the individual level of interactivity and network density were calculated. Findings The machine learning algorithm revealed 12 themes all of which were underpinned by the coalition of Arab countries blockade of Qatar. The data mining analysis revealed that the tweets could be clustered in three clusters, the main cluster included users with a large number of followers and friends but who did not mention other users in their tweets. The social network analysis revealed that whilst a large proportion of users engaged in direct messages with others, the network ties between them were not registered as strong. Practical implications Borum (2011) notes that invoking grievances is the first step in the radicalisation process. It is hoped that by understanding these grievances, the study will shed light on what radical groups could invoke to win the sympathy of aggrieved people. Originality/value In combination, the machine learning algorithm offered insights into the grievances expressed within the tweets in an Arabian context. The data mining and the social network analyses revealed the characteristics of the Twitter users highlighting identifying and managing early intervention of radicalisation.


Sign in / Sign up

Export Citation Format

Share Document