Using OpenStreetMap Data and Machine Learning to Generate Socio-Economic Indicators

Socio-economic indicators are key to understanding societal challenges. They disassemble complex phenomena to gain insights and deepen understanding. Specific subsets of indicators have been developed to describe sustainability, human development, vulnerability, risk, resilience and climate change adaptation. Nonetheless, insufficient quality and availability of data often limit their explanatory power. Spatial and temporal resolution are often not at a scale appropriate for monitoring. Socio-economic indicators are mostly provided by governmental institutions and are therefore limited to administrative boundaries. Furthermore, different methodological computation approaches for the same indicator impair comparability between countries and regions. OpenStreetMap (OSM) provides an unparalleled standardized global database with a high spatiotemporal resolution. Surprisingly, the potential of OSM seems largely unexplored in this context. In this study, we used machine learning to predict four exemplary socio-economic indicators for municipalities based on OSM. By comparing the predictive power of neural networks to statistical regression models, we evaluated the unhinged resources of OSM for indicator development. OSM provides prospects for monitoring across administrative boundaries, interdisciplinary topics, and semi-quantitative factors like social cohesion. Further research is still required to, for example, determine the impact of regional and international differences in user contributions on the outputs. Nonetheless, this database can provide meaningful insight into otherwise unknown spatial differences in social, environmental or economic inequalities.

Download Full-text

The Influence of Cognitive Biases and Financial Factors on Forecast Accuracy of Analysts

Frontiers in Psychology ◽

10.3389/fpsyg.2021.773894 ◽

2022 ◽

Vol 12 ◽

Author(s):

Paula Carolina Ciampaglia Nardi ◽

Evandro Marcos Saidel Ribeiro ◽

José Lino Oliveira Bueno ◽

Ishani Aggarwal

Keyword(s):

Machine Learning ◽

Text Analysis ◽

Positive Relationship ◽

Cognitive Biases ◽

Explanatory Power ◽

Negative Relationship ◽

Fair Value ◽

Statistical Regression ◽

Learning Methods ◽

Financial Factors

The objective of this study was to jointly analyze the importance of cognitive and financial factors in the accuracy of profit forecasting by analysts. Data from publicly traded Brazilian companies in 2019 were obtained. We used text analysis to assess the cognitive biases from the qualitative reports of analysts. Further, we analyzed the data using statistical regression learning methods and statistical classification learning methods, such as Multiple Linear Regression (MRL), k-dependence Bayesian (k-DB), and Random Forest (RF). The Bayesian inference and classification methods allow an expansion of the research line, especially in the area of machine learning, which can benefit from the examples of factors addressed in this research. The results indicated that, among cognitive biases, optimism had a negative relationship with forecasting accuracy while anchoring bias had a positive relationship. Commonality, to a lesser extent, also had a positive relationship with the analyst’s accuracy. Among financial factors, the most important aspects in the accuracy of analysts were volatility, indebtedness, and profitability. Age of the company, fair value, American Depositary Receipts (ADRs), performance, and loss were still important but on a smaller scale. The results of the RF models showed a greater explanatory power. This research sheds light on the cognitive as well as financial aspects that influence the analyst’s accuracy, jointly using text analysis and machine learning methods, capable of improving the explanatory power of predictive models, together with the use of training models followed by testing.

Download Full-text

Explaining and predicting the impact of authors within a community: an assessment of the bibliometric literature and application of machine learning

Industrial and Corporate Change ◽

10.1093/icc/dtz042 ◽

2019 ◽

Vol 29 (1) ◽

pp. 61-80

Author(s):

Sen Chai ◽

Alexander D’Amour ◽

Lee Fleming

Keyword(s):

Machine Learning ◽

Linear Models ◽

Explanatory Power ◽

Large Body ◽

Explanatory Models ◽

Support Vector ◽

One Year ◽

Number Of Publications ◽

The Impact ◽

Highly Cited

Abstract Following widespread availability of computerized databases, much research has correlated bibliometric measures from papers or patents to subsequent success, typically measured as the number of publications or citations. Building on this large body of work, we ask the following questions: given available bibliometric information in one year, along with the combined theories on sources of creative breakthroughs from the literatures on creativity and innovation, how accurately can we explain the impact of authors in a given research community in the following year? In particular, who is most likely to publish, publish highly cited work, and even publish a highly cited outlier? And, how accurately can these existing theories predict breakthroughs using only contemporaneous data? After reviewing and synthesizing (often competing) theories from the literatures, we simultaneously model the collective hypotheses based on available data in the year before RNA interference was discovered. We operationalize author impact using publication count, forward citations, and the more stringent definition of being in the top decile of the citation distribution. Explanatory power of current theories altogether ranges from less than 9% for being top cited to 24% for productivity. Machine learning (ML) methods yield similar findings as the explanatory linear models, and tangible improvement only for non-linear Support Vector Machine models. We also perform predictions using only existing data until 1997, and find lower predictability than using explanatory models. We conclude with an agenda for future progress in the bibliometric study of creativity and look forward to ML research that can explain its models.

Download Full-text

Forecasting US movies box office performances in Turkey using machine learning algorithms

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189120 ◽

2020 ◽

Vol 39 (5) ◽

pp. 6579-6590

Author(s):

Sandy Çağlıyor ◽

Başar Öztayşi ◽

Selime Sezgin

Keyword(s):

Machine Learning ◽

Global Economy ◽

Learning Algorithms ◽

Forecast Model ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

High Stakes ◽

Box Office ◽

Industry Forecast ◽

The Impact

The motion picture industry is one of the largest industries worldwide and has significant importance in the global economy. Considering the high stakes and high risks in the industry, forecast models and decision support systems are gaining importance. Several attempts have been made to estimate the theatrical performance of a movie before or at the early stages of its release. Nevertheless, these models are mostly used for predicting domestic performances and the industry still struggles to predict box office performances in overseas markets. In this study, the aim is to design a forecast model using different machine learning algorithms to estimate the theatrical success of US movies in Turkey. From various sources, a dataset of 1559 movies is constructed. Firstly, independent variables are grouped as pre-release, distributor type, and international distribution based on their characteristic. The number of attendances is discretized into three classes. Four popular machine learning algorithms, artificial neural networks, decision tree regression and gradient boosting tree and random forest are employed, and the impact of each group is observed by compared by the performance models. Then the number of target classes is increased into five and eight and results are compared with the previously developed models in the literature.

Download Full-text

The impact of economic plans on the Chinese education system: a machine learning approach

CADMO ◽

10.3280/cad2018-001005 ◽

2018 ◽

pp. 37-49

Author(s):

Wenjun Lin ◽

Xuefu Xu ◽

Francesco Dell’Anna

Keyword(s):

Machine Learning ◽

Education System ◽

Learning Approach ◽

Chinese Education ◽

System A ◽

Machine Learning Approach ◽

The Impact

Download Full-text

PODC 2020 Review

ACM SIGACT News ◽

10.1145/3444815.3444827 ◽

2021 ◽

Vol 51 (4) ◽

pp. 75-81

Author(s):

Ahad Mirza Baig ◽

Alkida Balliu ◽

Peter Davies ◽

Michal Dory

Keyword(s):

Machine Learning ◽

Distributed Computing ◽

Keynote Speaker ◽

Lively Discussion ◽

Theoretical Understanding ◽

New Directions ◽

New Ideas ◽

New Challenges ◽

The Impact ◽

Distributed Machine Learning

Rachid Guerraoui was the rst keynote speaker, and he got things o to a great start by discussing the broad relevance of the research done in our community relative to both industry and academia. He rst argued that, in some sense, the fact that distributed computing is so pervasive nowadays could end up sti ing progress in our community by inducing people to work on marginal problems, and becoming isolated. His rst suggestion was to try to understand and incorporate new ideas coming from applied elds into our research, and argued that this has been historically very successful. He illustrated this point via the distributed payment problem, which appears in the context of blockchains, in particular Bitcoin, but then turned out to be very theoretically interesting; furthermore, the theoretical understanding of the problem inspired new practical protocols. He then went further to discuss new directions in distributed computing, such as the COVID tracing problem, and new challenges in Byzantine-resilient distributed machine learning. Another source of innovation Rachid suggested was hardware innovations, which he illustrated with work studying the impact of RDMA-based primitives on fundamental problems in distributed computing. The talk concluded with a very lively discussion.

Download Full-text

Machine Learning Based Device Simulation Using Multi-variable Non-linear Regression to Assess the Impact of Device Parameter Variability on Threshold Voltage of Double Gate-All-Around (DGAA) MOSFET

2020 IEEE 2nd International Conference on Circuits and Systems (ICCS) ◽

10.1109/iccs51219.2020.9336608 ◽

2020 ◽

Author(s):

Sandeep Moparthi ◽

Chandan Yadav ◽

Gopi Krishna Saramekala ◽

Pramod Kumar Tiwari

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Threshold Voltage ◽

Device Simulation ◽

Double Gate ◽

Device Parameter ◽

Non Linear ◽

The Impact

Download Full-text

Individualized embryo selection strategy developed by stacking machine learning model for better in vitro fertilization outcomes: an application study

Reproductive Biology and Endocrinology ◽

10.1186/s12958-021-00734-z ◽

2021 ◽

Vol 19 (1) ◽

Author(s):

Qingsong Xi ◽

Qiyu Yang ◽

Meng Wang ◽

Bo Huang ◽

Bo Zhang ◽

...

Keyword(s):

Machine Learning ◽

In Vitro Fertilization ◽

Endometrial Thickness ◽

Learning System ◽

Embryo Selection ◽

Selection Strategy ◽

Application Study ◽

Vitro Fertilization ◽

The Impact

Abstract Background To minimize the rate of in vitro fertilization (IVF)- associated multiple-embryo gestation, significant efforts have been made. Previous studies related to machine learning in IVF mainly focused on selecting the top-quality embryos to improve outcomes, however, in patients with sub-optimal prognosis or with medium- or inferior-quality embryos, the selection between SET and DET could be perplexing. Methods This was an application study including 9211 patients with 10,076 embryos treated during 2016 to 2018, in Tongji Hospital, Wuhan, China. A hierarchical model was established using the machine learning system XGBoost, to learn embryo implantation potential and the impact of double embryos transfer (DET) simultaneously. The performance of the model was evaluated with the AUC of the ROC curve. Multiple regression analyses were also conducted on the 19 selected features to demonstrate the differences between feature importance for prediction and statistical relationship with outcomes. Results For a single embryo transfer (SET) pregnancy, the following variables remained significant: age, attempts at IVF, estradiol level on hCG day, and endometrial thickness. For DET pregnancy, age, attempts at IVF, endometrial thickness, and the newly added P1 + P2 remained significant. For DET twin risk, age, attempts at IVF, 2PN/ MII, and P1 × P2 remained significant. The algorithm was repeated 30 times, and averaged AUC of 0.7945, 0.8385, and 0.7229 were achieved for SET pregnancy, DET pregnancy, and DET twin risk, respectively. The trend of predictive and observed rates both in pregnancy and twin risk was basically identical. XGBoost outperformed the other two algorithms: logistic regression and classification and regression tree. Conclusion Artificial intelligence based on determinant-weighting analysis could offer an individualized embryo selection strategy for any given patient, and predict clinical pregnancy rate and twin risk, therefore optimizing clinical outcomes.

Download Full-text

The Impact of Work Characteristics on Social Distancing: Implications at the Time of COVID-19

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18105074 ◽

2021 ◽

Vol 18 (10) ◽

pp. 5074

Author(s):

Keisuke Kokubun ◽

Yoshinori Yamakawa

Keyword(s):

Information Processing ◽

Multiple Regression ◽

Explanatory Power ◽

Work Conditions ◽

Multiple Regression Model ◽

Social Characteristics ◽

Work Characteristics ◽

Social Distancing ◽

Using Data ◽

The Impact

The coronavirus disease (COVID-19) continues to spread globally. While social distancing has attracted attention as a measure to prevent the spread of infection, some occupations find it difficult to implement. Therefore, this study aims to investigate the relationship between work characteristics and social distancing using data available on O*NET, an occupational information site. A total of eight factors were extracted by performing an exploratory factor analysis: work conditions, supervisory work, information processing, response to aggression, specialization, autonomy, interaction outside the organization, and interdependence. A multiple regression analysis showed that interdependence, response to aggression, and interaction outside the organization, which are categorized as ”social characteristics,” and information processing and specialization, which are categorized as “knowledge characteristics,” were associated with physical proximity. Furthermore, we added customer, which represents contact with the customer, and remote working, which represents a small amount of outdoor activity, to our multiple regression model, and confirmed that they increased the explanatory power of the model. This suggests that those who work under interdependence, face aggression, and engage in outside activities, and/or have frequent contact with customers, little interaction outside the organization, and little information processing will have the most difficulty in maintaining social distancing.

Download Full-text

Spatiotemporal Evolution and Determinant Factors of the Intra-Regional Trade Community Structures of the Indian Ocean Region

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10040214 ◽

2021 ◽

Vol 10 (4) ◽

pp. 214

Author(s):

Lihua Yuan ◽

Xiaoqiang Chen ◽

Changqing Song ◽

Danping Cao ◽

Hong Yi

Keyword(s):

Indian Ocean ◽

Explanatory Power ◽

Community Structures ◽

Indian Ocean Region ◽

Spatiotemporal Evolution ◽

Regional Trade ◽

Determinant Factors ◽

Ocean Region ◽

The Indian Ocean ◽

The Impact

The Indian Ocean Region (IOR) has become one of the main economic forces globally, and countries within the IOR have attempted to promote their intra-regional trade. This study investigates the spatiotemporal evolution of the community structures of the intra-regional trade and the impact of determinant factors on the formation of trade community structures of the IOR from 1996 to 2017 using the methods of social network analysis. Trade communities are groups of countries with measurably denser intra-trade ties but with extra-trade ties that are measurably sparser among different communities. The results show that the extent of trade integration and the trade community structures of the IOR changed from strengthening between 1996 and 2014 to weakening between 2015 and 2017. The largest explanatory power of the formation of the IOR trade community structures was the IOR countries’ economic size, indicating that market remained the strongest driver. The second-largest explanatory power was geographical proximity, suggesting that countries within the IOR engaged in intra-regional trade still tended to select geographically proximate trading partners. The third- and the fourth-largest were common civilization and regional organizational memberships, respectively. This indicates that sharing a common civilization and constructing intra-regional institutional arrangements (especially open trade policies) helped the countries within the IOR strengthen their trade communities.

Download Full-text

The Impact of COVID-19 Epidemic on Indian Economy Unleashed By Machine Learning

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/1022/1/012085 ◽

2021 ◽

Vol 1022 ◽

pp. 012085

Author(s):

Kamal Deep Garg ◽

Manik Gupta ◽

Munish Kumar

Keyword(s):

Machine Learning ◽

Indian Economy ◽

The Impact

Download Full-text