Tuesday, June 30, 2026
Priorities
- Export BigQuery results into Google Drive — DA-1348
Work Log
Export BigQuery results into Google Drive — DA-1348
Situation
Yesterday, I already created an exploration document about exporting BigQuery query results into Google Drive. Today, I planned to implement the exploration document by creating an example of the integration within the Airflow.
Tasks
Create an Airflow DAG to export BigQuery query results into Google Drive.
Actions
Step 1. Create an Image and push into Artifact Registry.
rclone.conf
[gdrive]
type = drive
scope = drive
service_account_file = <service-account-json>
; https://drive.google.com/drive/folders/<root-folder-id>
root_folder_id = <root-folder-id>
Dockerfile
FROM google/cloud-sdk:alpine
COPY --from=rclone/rclone:latest /usr/local/bin/rclone /usr/local/bin/rclone
RUN apk add --no-cache bash
WORKDIR /app
COPY sync.sh .
COPY rclone.conf .
RUN chmod +x sync.sh
sync.sh
#!/bin/bash
GCS_BUCKET="gs://${BUCKET_ID}"
DRIVE_REMOTE="gdrive"
GCS_GLOB="${GCS_BUCKET}/${USER_ID}/${QUERY_ID}/${QUERY_ID}_*.csv"
QUERY=$(cat "${QUERY_PATH}")
gcloud auth activate-service-account --key-file="${GOOGLE_APPLICATION_CREDENTIALS}"
bq query --use_legacy_sql=false \
"EXPORT DATA OPTIONS (
uri = '${GCS_GLOB}',
format = 'CSV',
header = true,
overwrite = true
) AS
${QUERY}"
# If the file size is greater than 100 MB, throw an error.
# This is to ensure the Airflow VM Instance keep running.
MAX_BYTES=$((100 * 1024 * 1024))
TOTAL_BYTES=$(gcloud storage du -s "${GCS_GLOB}" | awk '{print $1}')
if [ "${TOTAL_BYTES}" -gt "${MAX_BYTES}" ]; then
echo "Total size ${TOTAL_BYTES} bytes exceeds the ${MAX_BYTES} bytes (100 MB) limit." >&2
exit 1
fi
TMP="$(mktemp -d)"
trap 'rm -rf "$TMP"' EXIT
gcloud storage cp "${GCS_GLOB}" "${TMP}/"
rclone config show
rclone sync "${TMP}" "${DRIVE_REMOTE}:${USER_ID}/${QUERY_ID}"
Step 2. Create an Airflow DAG, and use the created Image.
...
t1 = DockerOperator(
task_id='run_task_1',
image='bigquery_to_gdrive:0.0.9',
mounts=[
Mount(source='/secrets', target='/secrets', type='bind'),
Mount(source='/scripts', target='/scripts', type='bind'),
],
api_version='auto',
auto_remove=True,
command="sh /app/sync.sh ",
docker_url="unix://var/run/docker.sock",
network_mode="bridge",
dag=dag,
environment = {
'GOOGLE_APPLICATION_CREDENTIALS':'<service-account-json>',
'USER_ID':'<user-id>',
'QUERY_ID':'<query-id>',
'QUERY_PATH':'<query-path>',
'BUCKET_ID':'<bucket-id>',
'RCLONE_CONFIG':'<rclone-config>',
},
xcom_all=True,
tty=True,
)
t1
Result
Currently, We have a workflow for exporting the BigQuery query results into Google Drive. Furthermore, I already created a playbook for this workflow so that the data stakeholders can create the workflow by themselves.
Blockers
N/A
Carry-overs
- Create a blog post for the llm inference live demo
Reflection
N/A