Skip to main content

Add a Table to the BigQuery NDS dataset

Notifications Data Service (NDS) uses Spanner as its relational database. Every 6 hours an Airflow DAG (defined here), pushes new updates to the notifications_data_service dataset in BigQuery.

Currently, the following tables are replicated in this dataset:

  • notifications
  • in_app_notifications
  • email_notifications
  • push_notifications
  • push_notification_tokens

If a new table is added to Spanner and you'd like for this data to exist in BigQuery as well, the DAG will need to be updated. These are the steps for doing so:

Step 1: Add an index

Add an index to the updated_at column of the Spanner table you wish to replicate. This field is used by the DAG to fetch newly updated records and it helps to make this query more performant.

Examples:

Step 2: Define the new BigQuery table via terraform

Tables within the BigQuery notifications_data_service dataset are managed by terraform to keep things consistent between environments. It's an extra step, but prevents environment-specific DAG bugs.

Examples:

info

After you've merged terraform changes, you can use Spacelift to apply them. The notifications stacks can be found here

Step 3: Update the DAG

Update the sync-incremental-nds-spanner-to-bq DAG on how to merge newly updated data from the Spanner table into the corresponding BigQuery table.

Examples:


For each environment running this DAG, you can find links below to the Airflow homepage and to the specific sync-incremental-nds-spanner-to-bq DAG.

info

In Airflow, you can check the bottom left-hand corner to verify which environment you're in

go/airflow-us-staging | US Staging: sync-incremental-nds-spanner-to-bq

go/airflow-us-production | US Production: sync-incremental-nds-spanner-to-bq

go/airflow-eu-staging | EU Staging: sync-incremental-nds-spanner-to-bq

go/airflow-eu-production | EU Production: sync-incremental-nds-spanner-to-bq