scholarly journals Built Environment Factors Affecting Bike Sharing Ridership: Data-Driven Approach for Multiple Cities

Author(s):  
David Duran-Rodas ◽  
Emmanouil Chaniotakis ◽  
Constantinos Antoniou

Identification of factors influencing ridership is necessary for policy-making, as well as, when examining transferability and aspects of performance and reliability. In this work, a data-driven method is formulated to correlate arrivals and departures of station-based bike sharing systems with built environment factors in multiple cities. Ridership data from stations of multiple cities are pooled in one data set regardless of their geographic boundaries. The method bundles the collection, analysis, and processing of data, as well as, the model’s estimation using statistical and machine learning techniques. The method was applied on a national level in six cities in Germany, and also on an international level in three cities in Europe and North America. The results suggest that the model’s performance did not depend on clustering cities by size but by the relative daily distribution of the rentals. Selected statistically significant factors were identified to vary temporally (e.g., nightclubs were significant during the night). The most influencing variables were related to the city population, distance to city center, leisure-related establishments, and transport-related infrastructure. This data-driven method can help as a support decision-making tool to implement or expand bike sharing systems.

2019 ◽  
Vol 5 ◽  
pp. 237802311881872 ◽  
Author(s):  
Ryan Compton

Sociological research typically involves exploring theoretical relationships, but the emergence of “big data” enables alternative approaches. This work shows the promise of data-driven machine-learning techniques involving feature engineering and predictive model optimization to address a sociological data challenge. The author’s group develops improved generalizable models to identify at-risk families. Principal-components analysis and decision tree modeling are used to predict six main dependent variables in the Fragile Families Challenge, successfully modeling one binary variable but no continuous dependent variables in the diagnostic data set. This indicates that some binary dependent variables are more predictable using a reduced set of uncorrelated independent variables, and continuous dependent variables demand more complexity.


Author(s):  
Ritu Khandelwal ◽  
Hemlata Goyal ◽  
Rajveer Singh Shekhawat

Introduction: Machine learning is an intelligent technology that works as a bridge between businesses and data science. With the involvement of data science, the business goal focuses on findings to get valuable insights on available data. The large part of Indian Cinema is Bollywood which is a multi-million dollar industry. This paper attempts to predict whether the upcoming Bollywood Movie would be Blockbuster, Superhit, Hit, Average or Flop. For this Machine Learning techniques (classification and prediction) will be applied. To make classifier or prediction model first step is the learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations. Methods: All the techniques related to classification and Prediction such as Support Vector Machine(SVM), Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, Adaboost, and KNN will be applied and try to find out efficient and effective results. All these functionalities can be applied with GUI Based workflows available with various categories such as data, Visualize, Model, and Evaluate. Result: To make classifier or prediction model first step is learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations Conclusion: This paper focuses on Comparative Analysis that would be performed based on different parameters such as Accuracy, Confusion Matrix to identify the best possible model for predicting the movie Success. By using Advertisement Propaganda, they can plan for the best time to release the movie according to the predicted success rate to gain higher benefits. Discussion: Data Mining is the process of discovering different patterns from large data sets and from that various relationships are also discovered to solve various problems that come in business and helps to predict the forthcoming trends. This Prediction can help Production Houses for Advertisement Propaganda and also they can plan their costs and by assuring these factors they can make the movie more profitable.


2021 ◽  
pp. 155005942110608
Author(s):  
Jakša Vukojević ◽  
Damir Mulc ◽  
Ivana Kinder ◽  
Eda Jovičić ◽  
Krešimir Friganović ◽  
...  

In everyday clinical practice, there is an ongoing debate about the nature of major depressive disorder (MDD) in patients with borderline personality disorder (BPD). The underlying research does not give us a clear distinction between those 2 entities, although depression is among the most frequent comorbid diagnosis in borderline personality patients. The notion that depression can be a distinct disorder but also a symptom in other psychopathologies led our team to try and delineate those 2 entities using 146 EEG recordings and machine learning. The utilized algorithms, developed solely for this purpose, could not differentiate those 2 entities, meaning that patients suffering from MDD did not have significantly different EEG in terms of patients diagnosed with MDD and BPD respecting the given data and methods used. By increasing the data set and the spatiotemporal specificity, one could have a more sensitive diagnostic approach when using EEG recordings. To our knowledge, this is the first study that used EEG recordings and advanced machine learning techniques and further confirmed the close interrelationship between those 2 entities.


2018 ◽  
Vol 34 (3) ◽  
pp. 569-581 ◽  
Author(s):  
Sujata Rani ◽  
Parteek Kumar

Abstract In this article, an innovative approach to perform the sentiment analysis (SA) has been presented. The proposed system handles the issues of Romanized or abbreviated text and spelling variations in the text to perform the sentiment analysis. The training data set of 3,000 movie reviews and tweets has been manually labeled by native speakers of Hindi in three classes, i.e. positive, negative, and neutral. The system uses WEKA (Waikato Environment for Knowledge Analysis) tool to convert these string data into numerical matrices and applies three machine learning techniques, i.e. Naive Bayes (NB), J48, and support vector machine (SVM). The proposed system has been tested on 100 movie reviews and tweets, and it has been observed that SVM has performed best in comparison to other classifiers, and it has an accuracy of 68% for movie reviews and 82% in case of tweets. The results of the proposed system are very promising and can be used in emerging applications like SA of product reviews and social media analysis. Additionally, the proposed system can be used in other cultural/social benefits like predicting/fighting human riots.


2021 ◽  
Author(s):  
Rogini Runghen ◽  
Daniel B Stouffer ◽  
Giulio Valentino Dalla Riva

Collecting network interaction data is difficult. Non-exhaustive sampling and complex hidden processes often result in an incomplete data set. Thus, identifying potentially present but unobserved interactions is crucial both in understanding the structure of large scale data, and in predicting how previously unseen elements will interact. Recent studies in network analysis have shown that accounting for metadata (such as node attributes) can improve both our understanding of how nodes interact with one another, and the accuracy of link prediction. However, the dimension of the object we need to learn to predict interactions in a network grows quickly with the number of nodes. Therefore, it becomes computationally and conceptually challenging for large networks. Here, we present a new predictive procedure combining a graph embedding method with machine learning techniques to predict interactions on the base of nodes' metadata. Graph embedding methods project the nodes of a network onto a---low dimensional---latent feature space. The position of the nodes in the latent feature space can then be used to predict interactions between nodes. Learning a mapping of the nodes' metadata to their position in a latent feature space corresponds to a classic---and low dimensional---machine learning problem. In our current study we used the Random Dot Product Graph model to estimate the embedding of an observed network, and we tested different neural networks architectures to predict the position of nodes in the latent feature space. Flexible machine learning techniques to map the nodes onto their latent positions allow to account for multivariate and possibly complex nodes' metadata. To illustrate the utility of the proposed procedure, we apply it to a large dataset of tourist visits to destinations across New Zealand. We found that our procedure accurately predicts interactions for both existing nodes and nodes newly added to the network, while being computationally feasible even for very large networks. Overall, our study highlights that by exploiting the properties of a well understood statistical model for complex networks and combining it with standard machine learning techniques, we can simplify the link prediction problem when incorporating multivariate node metadata. Our procedure can be immediately applied to different types of networks, and to a wide variety of data from different systems. As such, both from a network science and data science perspective, our work offers a flexible and generalisable procedure for link prediction.


Author(s):  
Shashidhara Bola

A new method is proposed to classify the lung nodules as benign and malignant. The method is based on analysis of lung nodule shape, contour, and texture for better classification. The data set consists of 39 lung nodules of 39 patients which contain 19 benign and 20 malignant nodules. Lung regions are segmented based on morphological operators and lung nodules are detected based on shape and area features. The proposed algorithm was tested on LIDC (lung image database consortium) datasets and the results were found to be satisfactory. The performance of the method for distinction between benign and malignant was evaluated by the use of receiver operating characteristic (ROC) analysis. The method achieved area under the ROC curve was 0.903 which reduces the false positive rate.


Author(s):  
Hesham M. Al-Ammal

Detection of anomalies in a given data set is a vital step in several applications in cybersecurity; including intrusion detection, fraud, and social network analysis. Many of these techniques detect anomalies by examining graph-based data. Analyzing graphs makes it possible to capture relationships, communities, as well as anomalies. The advantage of using graphs is that many real-life situations can be easily modeled by a graph that captures their structure and inter-dependencies. Although anomaly detection in graphs dates back to the 1990s, recent advances in research utilized machine learning methods for anomaly detection over graphs. This chapter will concentrate on static graphs (both labeled and unlabeled), and the chapter summarizes some of these recent studies in machine learning for anomaly detection in graphs. This includes methods such as support vector machines, neural networks, generative neural networks, and deep learning methods. The chapter will reflect the success and challenges of using these methods in the context of graph-based anomaly detection.


Sign in / Sign up

Export Citation Format

Share Document