Fitting Logistic Regression with Aggregate Data in R

Example Using Simulated Migration Data

I often get asked: can I use a logistic regression on aggregate data?

We generally associate logistic regressions to individual-level or micro data encoding attributes into a binary category, such as moving 1 or not 0. However, it is possible to use logistic regressions with aggregate data. In human mobility and migration research, for example, we are often interested in the proportion, or probability of people moving from an origin to a destination, and we can use a logistic regression to identify the factors that relate to this probability. These models were popularised by the work on discrete choice modelling by Daniel McFadden who was later awarded the Nobel Prize in Economics Science in 2000 for this work. McFadden developed a very elegant formulation in the framework of random utility maximisation and later applied it to study travel choices.

Recently I wrote a computational notebook in R to illustrate three ways to estimate a logistic regression model based on individual-level data and aggregate data leading to the same model estimates - see tweet and link to the notebook below.

Computational Notebook

Francisco Rowe
Francisco Rowe
Professor of Population Data Science

My research interests include human mobility and migration; economic geography and spatial inequality; geographic data science.

Related