Friday, June 19, 2026

Priorities

Work Log

Insider Ingestion Pipeline — The export file is too large — DA-1349

Situation: The Insider ingestion pipeline was failed due to the export file was more than 1GB, and it's prohibited. Additionally, I need to fix this data as soon as possible because the data scientist team needs the Insider data for cutover dashboard queries from Mixpanel to Insider.

Task: Resolve the issue and fix the yesterday data so that it's available in the bigquery.

Action I modified the script to use a wildcard to export the data into multiple files, and fixed the yesterday data. Find the detail below:

before

json_filename=$(echo "$filename" | sed -e "s/.csv/.json/g")
echo "Extracting to JSON: gs://${BUCKET}/temp/${json_filename}"
bq extract \
--location=asia-southeast2 \
--destination_format NEWLINE_DELIMITED_JSON \
"${TEMP_DATASET_ID}.${TEMP_TABLE_ID}" \
gs://"$BUCKET"/temp/"$json_filename"

after

json_filename=$(echo "$filename" | sed -e "s/.csv/-*.json/g")
echo "Extracting to JSON: gs://${BUCKET}/temp/${json_filename}"
bq extract \
--location=asia-southeast2 \
--destination_format NEWLINE_DELIMITED_JSON \
"${TEMP_DATASET_ID}.${TEMP_TABLE_ID}" \
gs://"$BUCKET"/temp/"$json_filename"

Result As a result, the yesterday data is now available in the bigquery, and the data scientist team can continue the cutover initiative.

Blockers

N/A

Carry-overs

Reflection

I can't sleep last night for no reason, and my beloved team recommended a Magnesium bisglycinate to help me sleep tonight. I tried to fall asleep at 8:30 PM but couldn't, and I swear I don't have a single thing on my mind. This happened for several days this month. I'd like to thank my manager for letting me work from home today.