Friday, June 19, 2026
Priorities
- Resolve the Insider ingestion pipeline issue — DA-1349
Work Log
Insider Ingestion Pipeline — The export file is too large — DA-1349
Situation: The Insider ingestion pipeline was failed due to the export file was more than 1GB, and it's prohibited. Additionally, I need to fix this data as soon as possible because the data scientist team needs the Insider data for cutover dashboard queries from Mixpanel to Insider.
Task: Resolve the issue and fix the yesterday data so that it's available in the bigquery.
Action I modified the script to use a wildcard to export the data into multiple files, and fixed the yesterday data. Find the detail below:
before
json_filename=$(echo "$filename" | sed -e "s/.csv/.json/g")
echo "Extracting to JSON: gs://${BUCKET}/temp/${json_filename}"
bq extract \
--location=asia-southeast2 \
--destination_format NEWLINE_DELIMITED_JSON \
"${TEMP_DATASET_ID}.${TEMP_TABLE_ID}" \
gs://"$BUCKET"/temp/"$json_filename"
after
json_filename=$(echo "$filename" | sed -e "s/.csv/-*.json/g")
echo "Extracting to JSON: gs://${BUCKET}/temp/${json_filename}"
bq extract \
--location=asia-southeast2 \
--destination_format NEWLINE_DELIMITED_JSON \
"${TEMP_DATASET_ID}.${TEMP_TABLE_ID}" \
gs://"$BUCKET"/temp/"$json_filename"
Result As a result, the yesterday data is now available in the bigquery, and the data scientist team can continue the cutover initiative.
Blockers
N/A
Carry-overs
- Create a live demo for LLM Inference
Reflection
I can't sleep last night for no reason, and my beloved team recommended a Magnesium bisglycinate to help me sleep tonight. I tried to fall asleep at 8:30 PM but couldn't, and I swear I don't have a single thing on my mind. This happened for several days this month. I'd like to thank my manager for letting me work from home today.