Dagster setup that scrapes GTFS and GTFS-RT for specified transit agencies and adds them to a DuckDB
Find a file
2025-11-07 18:09:40 -08:00
config removed agency_list.csv 2025-11-07 18:09:40 -08:00
user_code changed gtfs_feed_downloads to a sensor instead of cron 2025-11-07 17:04:52 -08:00
.gitignore added agency_list.csv.sample and updated README 2025-11-07 18:07:35 -08:00
dagster.yaml added volumes_from kwarg to run_launcher 2025-11-07 09:28:46 -08:00
docker-compose.yaml added volumes_from kwarg to run_launcher 2025-11-07 09:28:46 -08:00
Dockerfile_dagster initial dagster setup 2025-11-05 17:24:58 -08:00
Dockerfile_dagster_code rearranged directory structure, automaterialize to automationCondition 2025-11-07 07:40:46 -08:00
env.sample edited README and env.sample 2025-11-07 08:43:48 -08:00
LICENSE Initial commit 2025-11-05 18:22:18 -06:00
README.md added agency_list.csv.sample and updated README 2025-11-07 18:07:35 -08:00
workspace.yaml rearranged directory structure, automaterialize to automationCondition 2025-11-07 07:40:46 -08:00

gtfs-dagster

Dagster setup that scrapes GTFS and GTFS-RT for specified transit agencies and adds them to a DuckDB

Input

This reads from the config/agency_list.csv file, copy agency_list.csv.sample to agency_list.csv and edit this file to include the transit agencies that you want to scrape, add the relevant IDs from mobilitydatabase.org

set your environment

.env file

copy env.sample to .env and change:

  • Postgres database password - make it something random before the first run
  • MobilityDatabase.org API token
  • Location of data, config, and postgres_data directories (default is in working directory)

Run it

docker compose build docker compose up -d access the Dagster web ui at 127.0.0.1:3001