Bird view/satellite image of Mexico city
SEDACMaps, Creative Commons 2.0

ISCN Global Mixer: Identifying safe public spaces for women in Mexico City with the DPPD Method

In this episode of the ISCN Global Mixer we found out how the Data Powered Positive Deviance Method (DPPD) can be used for positive impact in our cities. Robin Nowok from the GIZ Data Lab presented a case that illustrates how safe public spaces for woman in Mexico City can be identified by combining rigorous statistics and qualitative research.

Event details

Datetime
17.07.2024, 11:00 - 11:30
Event type
Online (virtual)
Dokumentation

Paragraphs

Key takeaways

  • Rigorous statistical analysis: Data-powered positive deviance (DPPD) is a method to help detect and track down hitherto unseen good practices for common-good oriented development - and it’s also very much applicable to smart cities.
  • First quant, then qual: DPPD informs statistically and in data-driven manner what aspects to investigate. The quantitative method thus informs where to target the often more resource-intensive qualitative investigations.
  • Positively deviant factors for safe public spaces: In the described case from Mexico City, DPPD identified the location of spatial units, population density, no. of bars, no. of financial services and (negatively correlated) the distance to the nearest metro-station as factors relevant for women’s safety and to be further investigated. 
  • Again, data: The case from Mexico City made use of 60+ (mostly open) data banks, underscoring the importance and opportunities of consolidated provision of data in structured environments.
Global Mixer DPPD Cover

It’s one of the fundamental promises of smart cities: Leveraging data to detect patterns and translate them into adaptations and improvements for an effective and efficient common good-oriented urban development. 

But how to cut through all the noise that big data inadvertently produces as well, especially when applied to thicker problems of policy and public administration? In this edition of the ISCN Global Mixer we looked at the data-powered positive deviance (DPPD) method as one exciting approach for that.

The core idea is to detect and focalize statistical outliers that defy standard distributions – positive deviants. One of the approach’s classics is a project in Vietnam in 1990 where development economists used it to find individual practices that achieved significantly better results in child nutrition among communities facing similar challenges and restricted resources. The scaling of these successful behaviors achieved improvements in a largely closed system without depending on influx of external resources and technology.

Applied to an urban context, GIZ and its Data Lab cooperated with the United Nations Development Programme (UNDP), the University of Manchester, Codeando México and Cohesión Comunitaria e Innovación Social A.C. to identify safe(r) public spaces for women in Mexico City.

In the research design the number of crimes against women in public spaces was the dependent variable set against 25 independent variables (e.g. geography, infrastructure, demographics). The unit of analysis was AGEB, a geostatistical unit in Mexico. For better comparability among AGEBs, they were grouped homogenously. This returned prediction models for expected crime rates in AGEBs and their public spaces against which positive deviants could be detected. A subsequent round of validation checked qualitatively if results are plausible. For example, in single positive deviants there was a large number of closed housing units or a large footprint of a military college, naturally reducing crime rates in public spaces there. The remaining positively deviant AGEBs were targeted for further qualitative analysis and in-depth field studies. As a result of this DPPD process a recommendation catalogue could be formulated including the proposal of physical interventions in public space, such as removing abandoned vehicles in public area, or promoting diverse foot traffic including children, families, women and elderly.

How could DPPD be applied in your city and community to render your city smarter in a common-good oriented way? Join the discussion and feel free to reach out to us via iscn@giz.de 

For more details on the presented case, you can watch the recording of the keynote above and refer to the links below. 

Moreover, there is a short technical Q&A specifically on the conducted statistical analyses and research designs at the end of this page.
 

Further links:

Paper: Data-powered positive deviance: Combining traditional and non-traditional data to identify and characterize development-related outperformers

GIZ Data Lab Blog: What happened and what’s next for DPPD

GIZ Data Lab Blog: Identifying safe(r) public spaces for women in Mexico City

Podcast-Episode: Data-Powered Positive Deviance

Technical Q&A on research design

  1.  How was the homogeneous grouping done? Cluster analysis? ANOVA?

The homogeneous grouping was executed using cluster analysis, which was anchored on three pivotal variables: socio-economic level, daily incoming trips, and population density. This method allowed for the identification of natural groupings within the data, enhancing the interpretability and actionable insights derived from the analysis.

  1. Why was homogeneous grouping chosen instead of normalization, e.g., crime against women per area per women population?

Clustering and normalization serve different purposes. Clustering enables the control of multiple factors and grouping similar observations whereas normalization ensures values are on the same scale, enhancing model performance. We normalize our variables before the grouping for numeric stability. In the Mexico use case we wanted to control for three variables: socio-economic level, daily incoming trips and population density. This clustering can reveal the underlying structure and groupings within the data, which can take into account unobserved heterogeneity that we didn't control for in the set of clustering variables. Homogeneous grouping allows us to control for multiple dimensions of similarity across different areas, beyond just a single aspect such as crimes against women. This multi-faceted approach provides a more comprehensive understanding of the data, capturing nuances that single-factor normalization might miss.

  1. Which statistical model was used? Was machine learning considered?

For the predictions we used three models: First, we ran a multiple linear regression. Then, we did a LASSO linear regression to perform variable selection to enhance the prediction accuracy and interpretability of the statistical model. Finally, we did a negative binomial regression because this type of regression is designed to fit models in which the performance measure consists of counts with overdispersion. We did the three-regression analysis for each one of the four clusters of the homogenous grouping per category of crime severity and for all crimes, so we have several results depending on the type of regression, cluster, and crime severity. 

 

Contacts