Dagster setup that scrapes GTFS and GTFS-RT for specified transit agencies and adds them to a DuckDB
Find a file
2025-12-07 09:27:41 -07:00
config removed agency_list.csv 2025-11-07 18:09:40 -08:00
user_code edited sensors to account for a fresh start, edited README 2025-12-07 09:27:41 -07:00
.gitignore added agency_list.csv.sample and updated README 2025-11-07 18:07:35 -08:00
dagster.yaml added volumes_from kwarg to run_launcher 2025-11-07 09:28:46 -08:00
docker-compose.yaml added volumes_from kwarg to run_launcher 2025-11-07 09:28:46 -08:00
Dockerfile_dagster initial dagster setup 2025-11-05 17:24:58 -08:00
Dockerfile_dagster_code rearranged directory structure, automaterialize to automationCondition 2025-11-07 07:40:46 -08:00
env.sample edited README and env.sample 2025-11-07 08:43:48 -08:00
LICENSE Initial commit 2025-11-05 18:22:18 -06:00
README.md edited sensors to account for a fresh start, edited README 2025-12-07 09:27:41 -07:00
workspace.yaml rearranged directory structure, automaterialize to automationCondition 2025-11-07 07:40:46 -08:00

gtfs-dagster

Dagster setup that scrapes GTFS and GTFS-RT for specified transit agencies and adds them to a DuckDB

Quick start

  1. Edit the .env file. copy env.sample to .env and change:
  • Postgres database password - make it something random before the first run
  • MobilityDatabase.org API token
  • Location of data, config, and postgres_data directories (default is in working directory). config is part of the repo as it comes with sample configuration files.
  1. Edit config/agency_list.csv
  • See config/agency_list.csv.sample for an example.
  • Define which agencies and feeds to scrape with the file.
  • To include the transit agencies that you want to scrape, add the relevant Feed IDs from mobilitydatabase.org
  1. Build the docker containers docker compose build

  2. Run the docker containers docker compose up -d

  3. Access the Dagster web ui at 127.0.0.1:3001

  4. Materialize the first asset: agency_list

To-do:

  1. Change mobilitydata from using the API with a key, to using the csv on their GitHub page.
  2. Load data into duckdb
  3. Transform data in duckdb
  4. Analyze data