Add a Table to the BigQuery NDS dataset
Notifications Data Service (NDS) uses Spanner as its relational database. Every 6 hours an Airflow DAG (defined here), pushes new updates to the notifications_data_service dataset in BigQuery.
Currently, the following tables are replicated in this dataset:
notificationsin_app_notificationsemail_notificationspush_notificationspush_notification_tokens
If a new table is added to Spanner and you'd like for this data to exist in BigQuery as well, the DAG will need to be updated. These are the steps for doing so:
Step 1: Add an index
Add an index to the updated_at column of the Spanner table you wish to replicate. This field is used by the DAG to fetch newly updated records and it helps to make this query more performant.
Examples:
- https://github.com/joinhandshake/notifications/pull/388
- https://github.com/joinhandshake/notifications/pull/257
Step 2: Define the new BigQuery table via terraform
Tables within the BigQuery notifications_data_service dataset are managed by terraform to keep things consistent between environments. It's an extra step, but prevents environment-specific DAG bugs.
Examples:
After you've merged terraform changes, you can use Spacelift to apply them. The notifications stacks can be found here
Step 3: Update the DAG
Update the sync-incremental-nds-spanner-to-bq DAG on how to merge newly updated data from the Spanner table into the corresponding BigQuery table.
Examples:
DAG Links
For each environment running this DAG, you can find links below to the Airflow homepage and to the specific sync-incremental-nds-spanner-to-bq DAG.
In Airflow, you can check the bottom left-hand corner to verify which environment you're in
go/airflow-us-staging | US Staging: sync-incremental-nds-spanner-to-bq
go/airflow-us-production | US Production: sync-incremental-nds-spanner-to-bq
go/airflow-eu-staging | EU Staging: sync-incremental-nds-spanner-to-bq
go/airflow-eu-production | EU Production: sync-incremental-nds-spanner-to-bq