gtfs-dagster/README.md

32 lines
No EOL
1.1 KiB
Markdown

# gtfs-dagster
Dagster setup that scrapes GTFS and GTFS-RT for specified transit agencies and adds them to a DuckDB
## Quick start
1. Edit the .env file.
copy `env.sample` to `.env` and change:
- Postgres database password - make it something random before the first run.
- MobilityDatabase.org API token
- *Optional* Change location of `data`, `config`, and `postgres_data` directories, default is in working directory. `config` is part of the repo as it comes with sample configuration files.
2. Edit `config/agency_list.csv`
- See `config/agency_list.csv.sample` for an example.
- Define which agencies and feeds to scrape with the file.
- To include the transit agencies that you want to scrape, add the relevant Feed IDs from mobilitydatabase.org
3. Build the docker containers
`docker compose build`
4. Run the docker containers
`docker compose up -d`
5. Access the Dagster web ui at 127.0.0.1:3001
6. Materialize the first asset: `agency_list`
## To-do:
1. Change mobilitydata from using the API with a key, to using the csv on their GitHub page.
2. Load data into duckdb
3. Transform data in duckdb
4. Analyze data