scholarly journals An Emperical Investigation of Socio-Economic Parameters Associated with Crime in Karnataka Using Predictive Analysis

Author(s):  
Dr. Monica R Mundada

The crime rate in India has been on a rise with growing population and rapid development. Crimes are a social nuisance and brings disrepute to the society and nation at large. Data mining techniques have enabled models to predict crime. The law enforcement officers and police personnel’s need to spend umpteen time to analyze crime from the crime reports published by National Crime Records Bureau and respective State police departments. Therefore, there is necessity of a mechanism which can predict crime by identifying factors responsible for increase in crime rate. This project is an attempt to address this. Socio-economic variables like poverty, urbanization, literacy, and the demographic and social composition of the population are recorded from Census data. Machine Learning algorithms namely Multiple Linear Regression, Random Forest Regressor and Generalized Linear Models are trained and deployed. These algorithms have been implemented assuming both linear and non-linear ship of data and subsequent results have been compared. Further, choropleth maps have been plotted to represent and understand data in a simple way. The results show that socio-economic factors are good indicator in explaining crime rates and good performance is observed with few models

2019 ◽  
Vol 5 ◽  
pp. 237802311982588 ◽  
Author(s):  
Nicole Bohme Carnegie ◽  
James Wu

Our goal for the Fragile Families Challenge was to develop a hands-off approach that could be applied in many settings to identify relationships that theory-based models might miss. Data processing was our first and most time-consuming task, particularly handling missing values. Our second task was to reduce the number of variables for modeling, and we compared several techniques for variable selection: least absolute selection and shrinkage operator, regression with a horseshoe prior, Bayesian generalized linear models, and Bayesian additive regression trees (BART). We found minimal differences in final performance based on the choice of variable selection method. We proceeded with BART for modeling because it requires minimal assumptions and permits great flexibility in fitting surfaces and based on previous success using BART in black-box modeling competitions. In addition, BART allows for probabilistic statements about the predictions and other inferences, which is an advantage over most machine learning algorithms. A drawback to BART, however, is that it is often difficult to identify or characterize individual predictors that have strong influences on the outcome variable.


2021 ◽  
Vol 8 (2) ◽  
pp. 205395172110203
Author(s):  
Mohammad Hossein Jarrahi ◽  
Gemma Newlands ◽  
Min Kyung Lee ◽  
Christine T. Wolf ◽  
Eliscia Kinder ◽  
...  

The rapid development of machine-learning algorithms, which underpin contemporary artificial intelligence systems, has created new opportunities for the automation of work processes and management functions. While algorithmic management has been observed primarily within the platform-mediated gig economy, its transformative reach and consequences are also spreading to more standard work settings. Exploring algorithmic management as a sociotechnical concept, which reflects both technological infrastructures and organizational choices, we discuss how algorithmic management may influence existing power and social structures within organizations. We identify three key issues. First, we explore how algorithmic management shapes pre-existing power dynamics between workers and managers. Second, we discuss how algorithmic management demands new roles and competencies while also fostering oppositional attitudes toward algorithms. Third, we explain how algorithmic management impacts knowledge and information exchange within an organization, unpacking the concept of opacity on both a technical and organizational level. We conclude by situating this piece in broader discussions on the future of work, accountability, and identifying future research steps.


Biosensors ◽  
2021 ◽  
Vol 11 (8) ◽  
pp. 257
Author(s):  
Sebastian Fudickar ◽  
Eike Jannik Nustede ◽  
Eike Dreyer ◽  
Julia Bornhorst

Caenorhabditis elegans (C. elegans) is an important model organism for studying molecular genetics, developmental biology, neuroscience, and cell biology. Advantages of the model organism include its rapid development and aging, easy cultivation, and genetic tractability. C. elegans has been proven to be a well-suited model to study toxicity with identified toxic compounds closely matching those observed in mammals. For phenotypic screening, especially the worm number and the locomotion are of central importance. Traditional methods such as human counting or analyzing high-resolution microscope images are time-consuming and rather low throughput. The article explores the feasibility of low-cost, low-resolution do-it-yourself microscopes for image acquisition and automated evaluation by deep learning methods to reduce cost and allow high-throughput screening strategies. An image acquisition system is proposed within these constraints and used to create a large data-set of whole Petri dishes containing C. elegans. By utilizing the object detection framework Mask R-CNN, the nematodes are located, classified, and their contours predicted. The system has a precision of 0.96 and a recall of 0.956, resulting in an F1-Score of 0.958. Considering only correctly located C. elegans with an [email protected] IoU, the system achieved an average precision of 0.902 and a corresponding F1 Score of 0.906.


Author(s):  
Jerome Laviolette ◽  
Catherine Morency ◽  
Owen D. Waygood ◽  
Konstadinos G. Goulias

Car ownership is linked to higher car use, which leads to important environmental, social and health consequences. As car ownership keeps increasing in most countries, it remains relevant to examine what factors and policies can help contain this growth. This paper uses an advanced spatial econometric modeling framework to investigate spatial dependences in household car ownership rates measured at fine geographical scales using administrative data of registered vehicles and census data of household counts for the Island of Montreal, Canada. The use of a finer level of spatial resolution allows for the use of more explanatory variables than previous aggregate models of car ownership. Theoretical considerations and formal testing suggested the choice of the Spatial Durbin Error Model (SDEM) as an appropriate modeling option. The final model specification includes sociodemographic and built environment variables supported by theory and achieves a Nagelkerke pseudo-R2 of 0.93. Despite the inclusion of those variables the spatial linear models with and without lagged explanatory variables still exhibit residual spatial dependence. This indicates the presence of unobserved autocorrelated factors influencing car ownership rates. Model results indicate that sociodemographic variables explain much of the variance, but that built environment characteristics, including transit level of service and local commercial accessibility (e.g., to grocery stores) are strongly and negatively associated with neighborhood car ownership rates. Comparison of estimates between the SDEM and a non-spatial model indicates that failing to control for spatial dependence leads to an overestimation of the strength of the direct influence of built environment variables.


Author(s):  
Virendra Tiwari ◽  
Balendra Garg ◽  
Uday Prakash Sharma

The machine learning algorithms are capable of managing multi-dimensional data under the dynamic environment. Despite its so many vital features, there are some challenges to overcome. The machine learning algorithms still requires some additional mechanisms or procedures for predicting a large number of new classes with managing privacy. The deficiencies show the reliable use of a machine learning algorithm relies on human experts because raw data may complicate the learning process which may generate inaccurate results. So the interpretation of outcomes with expertise in machine learning mechanisms is a significant challenge in the machine learning algorithm. The machine learning technique suffers from the issue of high dimensionality, adaptability, distributed computing, scalability, the streaming data, and the duplicity. The main issue of the machine learning algorithm is found its vulnerability to manage errors. Furthermore, machine learning techniques are also found to lack variability. This paper studies how can be reduced the computational complexity of machine learning algorithms by finding how to make predictions using an improved algorithm.


2013 ◽  
Vol 441 ◽  
pp. 691-694
Author(s):  
Yi Qun Zeng ◽  
Jing Bin Wang

With the rapid development of information technology, data grows explosionly, how to deal with the large scale data become more and more important. Based on the characteristics of RDF data, we propose to compress RDF data. We construct an index structure called PAR-Tree Index, then base on the MapReduce parallel computing framework and the PAR-Tree Index to execute the query. Experimental results show that the algorithm can improve the efficiency of large data query.


2021 ◽  
Vol 99 (Supplement_3) ◽  
pp. 264-265
Author(s):  
Duy Ngoc Do ◽  
Guoyu Hu ◽  
Younes Miar

Abstract American mink (Neovison vison) is the major source of fur for the fur industries worldwide and Aleutian disease (AD) is causing severe financial losses to the mink industry. Different methods have been used to diagnose the AD in mink, but the combination of several methods can be the most appropriate approach for the selection of AD resilient mink. Iodine agglutination test (IAT) and counterimmunoelectrophoresis (CIEP) methods are commonly employed in test-and-remove strategy; meanwhile, enzyme-linked immunosorbent assay (ELISA) and packed-cell volume (PCV) methods are complementary. However, using multiple methods are expensive; and therefore, hindering the corrected use of AD tests in selection. This research presented the assessments of the AD classification based on machine learning algorithms. The Aleutian disease was tested on 1,830 individuals using these tests in an AD positive mink farm (Canadian Centre for Fur Animal Research, NS, Canada). The accuracy of classification for CIEP was evaluated based on the sex information, and IAT, ELISA and PCV test results implemented in seven machine learning classification algorithms (Random Forest, Artificial Neural Networks, C50Tree, Naive Bayes, Generalized Linear Models, Boost, and Linear Discriminant Analysis) using the Caret package in R. The accuracy of prediction varied among the methods. Overall, the Random Forest was the best-performing algorithm for the current dataset with an accuracy of 0.89 in the training data and 0.94 in the testing data. Our work demonstrated the utility and relative ease of using machine learning algorithms to assess the CIEP information, and consequently reducing the cost of AD tests. However, further works require the inclusion of production and reproduction information in the models and extension of phenotypic collection to increase the accuracy of current methods.


2021 ◽  
pp. 1-29
Author(s):  
Fikrewold H. Bitew ◽  
Corey S. Sparks ◽  
Samuel H. Nyarko

Abstract Objective: Child undernutrition is a global public health problem with serious implications. In this study, estimate predictive algorithms for the determinants of childhood stunting by using various machine learning (ML) algorithms. Design: This study draws on data from the Ethiopian Demographic and Health Survey of 2016. Five machine learning algorithms including eXtreme gradient boosting (xgbTree), k-nearest neighbors (K-NN), random forest (RF), neural network (NNet), and the generalized linear models (GLM) were considered to predict the socio-demographic risk factors for undernutrition in Ethiopia. Setting: Households in Ethiopia. Participants: A total of 9,471 children below five years of age. Results: The descriptive results show substantial regional variations in child stunting, wasting, and underweight in Ethiopia. Also, among the five ML algorithms, xgbTree algorithm shows a better prediction ability than the generalized linear mixed algorithm. The best predicting algorithm (xgbTree) shows diverse important predictors of undernutrition across the three outcomes which include time to water source, anemia history, child age greater than 30 months, small birth size, and maternal underweight, among others. Conclusions: The xgbTree algorithm was a reasonably superior ML algorithm for predicting childhood undernutrition in Ethiopia compared to other ML algorithms considered in this study. The findings support improvement in access to water supply, food security, and fertility regulation among others in the quest to considerably improve childhood nutrition in Ethiopia.


Big Data ◽  
2016 ◽  
pp. 2249-2274
Author(s):  
Chinh Nguyen ◽  
Rosemary Stockdale ◽  
Helana Scheepers ◽  
Jason Sargent

The rapid development of technology and interactive nature of Government 2.0 (Gov 2.0) is generating large data sets for Government, resulting in a struggle to control, manage, and extract the right information. Therefore, research into these large data sets (termed Big Data) has become necessary. Governments are now spending significant finances on storing and processing vast amounts of information because of the huge proliferation and complexity of Big Data and a lack of effective records management. On the other hand, there is a method called Electronic Records Management (ERM), for controlling and governing the important data of an organisation. This paper investigates the challenges identified from reviewing the literature for Gov 2.0, Big Data, and ERM in order to develop a better understanding of the application of ERM to Big Data to extract useable information in the context of Gov 2.0. The paper suggests that a key building block in providing useable information to stakeholders could potentially be ERM with its well established governance policies. A framework is constructed to illustrate how ERM can play a role in the context of Gov 2.0. Future research is necessary to address the specific constraints and expectations placed on governments in terms of data retention and use.


2022 ◽  
pp. 27-50
Author(s):  
Rajalaxmi Prabhu B. ◽  
Seema S.

A lot of user-generated data is available these days from huge platforms, blogs, websites, and other review sites. These data are usually unstructured. Analyzing sentiments from these data automatically is considered an important challenge. Several machine learning algorithms are implemented to check the opinions from large data sets. A lot of research has been undergone in understanding machine learning approaches to analyze sentiments. Machine learning mainly depends on the data required for model building, and hence, suitable feature exactions techniques also need to be carried. In this chapter, several deep learning approaches, its challenges, and future issues will be addressed. Deep learning techniques are considered important in predicting the sentiments of users. This chapter aims to analyze the deep-learning techniques for predicting sentiments and understanding the importance of several approaches for mining opinions and determining sentiment polarity.


Sign in / Sign up

Export Citation Format

Share Document