gtfs-dagster/README.md

1.1 KiB

gtfs-dagster

Dagster setup that scrapes GTFS and GTFS-RT for specified transit agencies and adds them to a DuckDB

Quick start

  1. Edit the .env file. copy env.sample to .env and change:
  • Postgres database password - make it something random before the first run
  • MobilityDatabase.org API token
  • Location of data, config, and postgres_data directories (default is in working directory). config is part of the repo as it comes with sample configuration files.
  1. Edit config/agency_list.csv
  • See config/agency_list.csv.sample for an example.
  • Define which agencies and feeds to scrape with the file.
  • To include the transit agencies that you want to scrape, add the relevant Feed IDs from mobilitydatabase.org
  1. Build the docker containers docker compose build

  2. Run the docker containers docker compose up -d

  3. Access the Dagster web ui at 127.0.0.1:3001

  4. Materialize the first asset: agency_list

To-do:

  1. Change mobilitydata from using the API with a key, to using the csv on their GitHub page.
  2. Load data into duckdb
  3. Transform data in duckdb
  4. Analyze data