scholarly journals The Promise and Pitfalls of Conflict Prediction: Evidence from Colombia and Indonesia

2021 ◽  
pp. 1-45
Author(s):  
Samuel Bazzi ◽  
Robert A. Blair ◽  
Christopher Blattman ◽  
Oeindrila Dube ◽  
Matthew Gudgeon ◽  
...  

How feasible is violence early-warning prediction? Columbia and Indonesia have unusually fine-grained data. We assemble two decades of local violent events alongside hundreds of annual risk factors. We attempt to predict violence one year ahead with a range of machine learning techniques. Our models reliably identify persistent, high-violence hot spots. Violence is not simply autoregressive, as detailed histories of disaggregated violence perform best, but socioeconomic data substitute well for these histories. Even with unusually rich data, however, our models poorly predict new outbreaks or escalations of violence. These “best case” scenarios with annual data fall short of workable early-warning systems.

2019 ◽  
Author(s):  
Samuel Bazzi ◽  
Robert Blair ◽  
Chris Blattman ◽  
Oeindrila Dube ◽  
Matthew Gudgeon ◽  
...  

Policymakers can take actions to prevent local conflict before it begins, if such violence can be accurately predicted. We examine the two countries with the richest available sub-national data: Colombia and Indonesia. We assemble two decades one fine- grained violence data by type, alongside hundreds of annual risk factors. We predict violence one year ahead with a range of machine learning techniques. Models reliably identify persistent, high-violence hot spots. Violence is not simply autoregressive, as detailed histories of disaggregated violence perform best. Rich socio-economic data also substitute well for these histories. Even with such unusually rich data, however, the models poorly predict new outbreaks or escalations of violence. \Best case" scenarios with panel data fall short of workable early-warning systems.


Hydrology ◽  
2021 ◽  
Vol 8 (4) ◽  
pp. 183
Author(s):  
Paul Muñoz ◽  
Johanna Orellana-Alvear ◽  
Jörg Bendix ◽  
Jan Feyen ◽  
Rolando Célleri

Worldwide, machine learning (ML) is increasingly being used for developing flood early warning systems (FEWSs). However, previous studies have not focused on establishing a methodology for determining the most efficient ML technique. We assessed FEWSs with three river states, No-alert, Pre-alert and Alert for flooding, for lead times between 1 to 12 h using the most common ML techniques, such as multi-layer perceptron (MLP), logistic regression (LR), K-nearest neighbors (KNN), naive Bayes (NB), and random forest (RF). The Tomebamba catchment in the tropical Andes of Ecuador was selected as a case study. For all lead times, MLP models achieve the highest performance followed by LR, with f1-macro (log-loss) scores of 0.82 (0.09) and 0.46 (0.20) for the 1 h and 12 h cases, respectively. The ranking was highly variable for the remaining ML techniques. According to the g-mean, LR models correctly forecast and show more stability at all states, while the MLP models perform better in the Pre-alert and Alert states. The proposed methodology for selecting the optimal ML technique for a FEWS can be extrapolated to other case studies. Future efforts are recommended to enhance the input data representation and develop communication applications to boost the awareness of society of floods.


Author(s):  
Paul Muñoz ◽  
Johanna Orellana-Alvear ◽  
Jörg Bendix ◽  
Jan Feyen ◽  
Rolando Célleri

Flood Early Warning Systems (FEWSs) using Machine Learning (ML) has gained worldwide popularity. However, determining the most efficient ML technique is still a bottleneck. We assessed FEWSs with three river states, No-alert, Pre-alert, and Alert for flooding, for lead times between 1 to 12 hours using the most common ML techniques, such as Multi-Layer Perceptron (MLP), Logistic Regression (LR), K-Nearest Neighbors (KNN), Naive Bayes (NB), and Random Forest (RF). The Tomebamba catchment in the tropical Andes of Ecuador was selected as case study. For all lead times, MLP models achieve the highest performance followed by LR, with f1-macro (log-loss) scores of 0.82 (0.09) and 0.46 (0.20) for the 1- and 12-hour cases, respectively. The ranking was highly variable for the remaining ML techniques. According to the g-mean, LR models correctly forecast and show more stability at all states, while the MLP models perform better in the Pre-alert and Alert states. Future efforts are recommended to enhance the input data representation and develop communication applications to boost the awareness of the society for floods.


2020 ◽  
Vol 122 (14) ◽  
pp. 1-30
Author(s):  
James Soland ◽  
Benjamin Domingue ◽  
David Lang

Background/Context Early warning indicators (EWI) are often used by states and districts to identify students who are not on track to finish high school, and provide supports/interventions to increase the odds the student will graduate. While EWI are diverse in terms of the academic behaviors they capture, research suggests that indicators like course failures, chronic absenteeism, and suspensions can help identify students in need of additional supports. In parallel with the expansion of administrative data that have made early versions of EWI possible, new machine learning methods have been developed. These methods are data-driven and often designed to sift through thousands of variables with the purpose of identifying the best predictors of a given outcome. While applications of machine learning techniques to identify students at-risk of high school dropout have obvious appeal, few studies consider the benefits and limitations of applying those models in an EWI context, especially as they relate to questions of fairness and equity. Focus of Study In this study, we will provide applied examples of how machine learning can be used to support EWI selection. The purpose is to articulate the broad risks and benefits of using machine learning methods to identify students who may be at risk of dropping out. We focus on dropping out given its salience in the EWI literature, but also anticipate generating insights that will be germane to EWI used for a variety of outcomes. Research Design We explore these issues by using several hypothetical examples of how ML techniques might be used to identify EWI. For example, we show results from decision tree algorithms used to identify predictors of dropout that use simulated data. Conclusions/Recommendations Generally, we argue that machine learning techniques have several potential benefits in the EWI context. For example, some related methods can help create clear decision rules for which students are a dropout risk, and their predictive accuracy can be higher than for more traditional, regression-based models. At the same time, these methods often require additional statistical and data management expertise to be used appropriately. Further, the black-box nature of machine learning algorithms could invite their users to interpret results through the lens of preexisting biases about students and educational settings.


2018 ◽  
Vol 15 (2) ◽  
pp. 595-600
Author(s):  
R. Sathish Kumar ◽  
M. Chandrasekaran

Web query classification, the task of inferring topical categories from a web search query is a non-trivial problem in Information Retrieval domain. The topic categories inferred by a Web query classification system may provide a rich set of features for improving query expansion and web advertising. Conventional methods for Web query classification derive corpus statistics from the web and employ machine-learning techniques to infer Open Directory Project categories. But they suffer from two major drawbacks, the computational overhead to derive corpus statistics and inferring topic categories that are too abstract for semantic discrimination due to polysemy. Concepts too shallow or too deep in the semantic gradient are produced due to the wrong senses of the query terms coalescing with the correct senses. This paper proposes and demonstrates a succinct solution to these problems through a method based on the Tree cut model and Wordnet Thesarus to infer fine-grained topic categories for Web query classification, and also suggests an enhancement to the Tree Cut Model to resolve sense ambiguities.


Author(s):  
Seumas Miller

Recent revelations concerning data firm Cambridge Analytica’s illegitimate use of the data of millions of Facebook users highlights the ethical and, relatedly, legal issues arising from the use of machine learning techniques. Cambridge Analytica is, or was – the revelations brought about its demise - a firm that used machine learning processes to try to influence elections in the US and elsewhere by, for instance, targeting ‘vulnerable’ voters in marginal seats with political advertising. Of course, there is nothing new about political candidates and parties employing firms to engage in political advertising on their behalf, but if a data firm has access to the personal information of millions of voters, and is skilled in the use of machine learning techniques, then it can develop detailed, fine-grained voter profiles that enable political actors to reach a whole new level of manipulative influence over voters. My focus in this paper is not with the highly publicised ethical and legal issues arising from Cambridge Analytic’s activities but rather with some important ethical issues arising from the use of machine learning techniques that have not received the attention and analysis that they deserve. I focus on three areas in which machine learning techniques are used or, it is claimed, should be used, and which give rise to problems at the interface of law and ethics (or law and morality, I use the terms “ethics” and “morality” interchangeably). The three areas are profiling and predictive policing (Saunders et al. 2016), legal adjudication (Zeleznikow, 2017), and machines’ compliance with legally enshrined moral principles (Arkin 2010). I note that here, as elsewhere, new and emerging technologies are developing rapidly making it difficult to predict what might or might not be able to be achieved in the future. For this reason, I have adopted the conservative stance of restricting my ethical analysis to existing machine learning techniques and applications rather than those that are the object of speculation or even informed extrapolation (Mittelstadt et al. 2015). This has the consequence that what I might regard as a limitation of machine learning techniques, e.g. in respect of predicting novel outcomes or of accommodating moral principles, might be thought by others to be merely a limitation of currently available techniques. After all, has not the history of AI recently shown the naysayers to have been proved wrong? Certainly, AI has seen some impressive results, including the construction of computers that can defeat human experts in complex games, such as chess and Go (Silver et al. 2017), and others that can do a better job than human medical experts at identifying the malignancy of moles and the like (Esteva et al. 2017). However, since by definition future machine learning techniques and applications are not yet with us the general claim that current limitations will be overcome cannot at this time be confirmed or disconfirmed on the basis of empirical evidence.


2021 ◽  
Vol 2021 ◽  
pp. 1-17
Author(s):  
Weiyuan Tong ◽  
Rong Li ◽  
Xiaoqing Gong ◽  
Shuangjiao Zhai ◽  
Xia Zheng ◽  
...  

Gestures serve an important role in enabling natural interactions with computing devices, and they form an important part of everyday nonverbal communication. In increasingly many application scenarios of gesture interaction, such as gesture-based authentication, calligraphy, sketching, and even artistic expression, not only are the underlying gestures complex and consist of multiple strokes but also the correctness of the gestures depends on the order at which the strokes are performed. In this paper, we present WiCG, an innovative and novel WiFi sensing approach for capturing and providing feedback on stroke order. Our approach tracks the user’s hand movement during writing and exploits this information in combination with statistical methods and machine learning techniques to infer what characters have been written and at which stroke order. We consider Chinese calligraphy as our use case as the resulting gestures are highly complex, and their assessment depends on the correct stroke order. We develop a set of analyses and algorithms to overcome many issues of this challenging task. We have conducted extensive experiments and user studies to evaluate our approach. Experimental results show that our approach is highly effective in identifying the written characters and their written stroke order. We show that our approach can adapt to different deployment environments and user patterns.


2021 ◽  
Author(s):  
Paul Muñoz ◽  
Johanna Orellana-Alvear ◽  
Jörg Bendix ◽  
Rolando Célleri

Abstract Short-rain floods, especially flash-floods, produce devastating impacts on society, the economy, and ecosystems. A key countermeasure is to develop Flood Early Warning Systems (FEWSs) aimed at forecasting flood warnings with sufficient lead time for decision making. Although Machine Learning (ML) techniques have gained popularity among hydrologists, the research question poorly answered is what is the best ML technique for flood forecasting? To answer this, we compare the efficiencies of FEWSs developed with the five most common ML techniques for flood forecasting, and for lead times between 1 to 12 hours. We use the Tomebamba catchment in the Ecuadorean Andes as a case study, with three warning classes to forecast No-alert, Pre-alert, and Alert of floods. For all lead times, the Multi-Layer Perceptron (MLP) technique achieves the highest model performances (f1-macro score) followed by Logistic Regression (LR), from 0.82 (1-hour) to 0.46 (12-hour). This ranking was confirmed by the log-loss scores, ranging from 0.09 (1-hour) to 0.20 (12-hour) for the above mentioned methods. Model performances decreased for the remaining ML techniques (K-Nearest Neighbors, Naive Bayes and Random Forest) but their ranking was highly variable and not conclusive. Moreover, according to the g-mean, LR models depict greater stability for correctly classifying all flood classes, whereas MLP models are specialized in the minority (Pre-alert and Alert) classes. To improve the performance and the applicability of FEWSs, we recommend future efforts to enhance input data representation and to develop communication applications between FEWSs and the public as tools to boost the preparedness of the society against floods.


2021 ◽  
Author(s):  
Randa Natras ◽  
Michael Schmidt

<p>The accuracy and reliability of Global Navigation Satellite System (GNSS) applications are affected by the state of the Earth‘s ionosphere, especially when using single frequency observations, which are employed mostly in mass-market GNSS receivers. In addition, space weather can be the cause of strong sudden disturbances in the ionosphere, representing a major risk for GNSS performance and reliability. Accurate corrections of ionospheric effects and early warning information in the presence of space weather are therefore crucial for GNSS applications. This correction information can be obtained by employing a model that describes the complex relation of space weather processes with the non-linear spatial and temporal variability of the Vertical Total Electron Content (VTEC) within the ionosphere and includes a forecast component considering space weather events to provide an early warning system. To develop such a model is challenging but an important task and of high interest for the GNSS community.</p><p>To model the impact of space weather, a complex chain of physical dynamical processes between the Sun, the interplanetary magnetic field, the Earth's magnetic field and the ionosphere need to be taken into account. Machine learning techniques are suitable in finding patterns and relationships from historical data to solve problems that are too complex for a traditional approach requiring an extensive set of rules (equations) or for which there is no acceptable solution available yet.</p><p>The main objective of this study is to develop a model for forecasting the ionospheric VTEC taking into account physical processes and utilizing state-of-art machine learning techniques to learn complex non-linear relationships from the data. In this work, supervised learning is applied to forecast VTEC. This means that the model is provided by a set of (input) variables that have some influence on the VTEC forecast (output). To be more specific, data of solar activity, solar wind, interplanetary and geomagnetic field and other information connected to the VTEC variability are used as input to predict VTEC values in the future. Different machine learning algorithms are applied, such as decision tree regression, random forest regression and gradient boosting. The decision trees are the simplest and easiest to interpret machine learning algorithms, but the forecasted VTEC lacks smoothness. On the other hand, random forest and gradient boosting use a combination of multiple regression trees, which lead to improvements in the prediction accuracy and smoothness. However, the results show that the overall performance of the algorithms, measured by the root mean square error, does not differ much from each other and improves when the data are well prepared, i.e. cleaned and transformed to remove trends. Preliminary results of this study will be presented including the methodology, goals, challenges and perspectives of developing the machine learning model.</p>


Sign in / Sign up

Export Citation Format

Share Document