ben/gtfs-dagster

SHA256

Dagster setup that scrapes GTFS and GTFS-RT for specified transit agencies and adds them to a DuckDB

Find a file

Ben Varick e068cbed20 edited sensors to account for a fresh start, edited README		2025-12-07 09:27:41 -07:00
config	removed agency_list.csv	2025-11-07 18:09:40 -08:00
user_code	edited sensors to account for a fresh start, edited README	2025-12-07 09:27:41 -07:00
.gitignore	added agency_list.csv.sample and updated README	2025-11-07 18:07:35 -08:00
dagster.yaml	added volumes_from kwarg to run_launcher	2025-11-07 09:28:46 -08:00
docker-compose.yaml	added volumes_from kwarg to run_launcher	2025-11-07 09:28:46 -08:00
Dockerfile_dagster	initial dagster setup	2025-11-05 17:24:58 -08:00
Dockerfile_dagster_code	rearranged directory structure, automaterialize to automationCondition	2025-11-07 07:40:46 -08:00
env.sample	edited README and env.sample	2025-11-07 08:43:48 -08:00
LICENSE	Initial commit	2025-11-05 18:22:18 -06:00
README.md	edited sensors to account for a fresh start, edited README	2025-12-07 09:27:41 -07:00
workspace.yaml	rearranged directory structure, automaterialize to automationCondition	2025-11-07 07:40:46 -08:00

README.md

gtfs-dagster

Dagster setup that scrapes GTFS and GTFS-RT for specified transit agencies and adds them to a DuckDB

Quick start

Edit the .env file. copy env.sample to .env and change:

Postgres database password - make it something random before the first run
MobilityDatabase.org API token
Location of data, config, and postgres_data directories (default is in working directory). config is part of the repo as it comes with sample configuration files.

Edit config/agency_list.csv

See config/agency_list.csv.sample for an example.
Define which agencies and feeds to scrape with the file.
To include the transit agencies that you want to scrape, add the relevant Feed IDs from mobilitydatabase.org

Build the docker containers docker compose build
Run the docker containers docker compose up -d
Access the Dagster web ui at 127.0.0.1:3001
Materialize the first asset: agency_list

To-do:

Change mobilitydata from using the API with a key, to using the csv on their GitHub page.
Load data into duckdb
Transform data in duckdb
Analyze data