Dagster setup that scrapes GTFS and GTFS-RT for specified transit agencies and adds them to a DuckDB
| config | ||
| user_code | ||
| .gitignore | ||
| dagster.yaml | ||
| docker-compose.yaml | ||
| Dockerfile_dagster | ||
| Dockerfile_dagster_code | ||
| env.sample | ||
| LICENSE | ||
| README.md | ||
| workspace.yaml | ||
gtfs-dagster
Dagster setup that scrapes GTFS and GTFS-RT for specified transit agencies and adds them to a DuckDB
Quick start
- Edit the .env file.
copy
env.sampleto.envand change:
- Postgres database password - make it something random before the first run
- MobilityDatabase.org API token
- Location of
data,config, andpostgres_datadirectories (default is in working directory).configis part of the repo as it comes with sample configuration files.
- Edit
config/agency_list.csv
- See
config/agency_list.csv.samplefor an example. - Define which agencies and feeds to scrape with the file.
- To include the transit agencies that you want to scrape, add the relevant Feed IDs from mobilitydatabase.org
-
Build the docker containers
docker compose build -
Run the docker containers
docker compose up -d -
Access the Dagster web ui at 127.0.0.1:3001
-
Materialize the first asset:
agency_list
To-do:
- Change mobilitydata from using the API with a key, to using the csv on their GitHub page.
- Load data into duckdb
- Transform data in duckdb
- Analyze data