Dagster setup that scrapes GTFS and GTFS-RT for specified transit agencies and adds them to a DuckDB
Find a file
2025-11-07 18:12:34 -08:00
config removed agency_list.csv 2025-11-07 18:09:40 -08:00
user_code changed gtfs_feed_downloads to a sensor instead of cron 2025-11-07 17:04:52 -08:00
.gitignore added agency_list.csv.sample and updated README 2025-11-07 18:07:35 -08:00
dagster.yaml added volumes_from kwarg to run_launcher 2025-11-07 09:28:46 -08:00
docker-compose.yaml added volumes_from kwarg to run_launcher 2025-11-07 09:28:46 -08:00
Dockerfile_dagster initial dagster setup 2025-11-05 17:24:58 -08:00
Dockerfile_dagster_code rearranged directory structure, automaterialize to automationCondition 2025-11-07 07:40:46 -08:00
env.sample edited README and env.sample 2025-11-07 08:43:48 -08:00
LICENSE Initial commit 2025-11-05 18:22:18 -06:00
README.md edited README 2025-11-07 18:12:34 -08:00
workspace.yaml rearranged directory structure, automaterialize to automationCondition 2025-11-07 07:40:46 -08:00

gtfs-dagster

Dagster setup that scrapes GTFS and GTFS-RT for specified transit agencies and adds them to a DuckDB

Input

You define which agencies and feeds to scrape with the fileconfig/agency_list.csv

To include the transit agencies that you want to scrape, add the relevant IDs from mobilitydatabase.org

See config/agency_list.csv.sample for an example.

set your environment

.env file

copy env.sample to .env and change:

  • Postgres database password - make it something random before the first run
  • MobilityDatabase.org API token
  • Location of data, config, and postgres_data directories (default is in working directory)

Run it

docker compose build docker compose up -d access the Dagster web ui at 127.0.0.1:3001