Monday, June 22, 2026
Priorities
Work Log
GitLab to GitHub Migration — DA-1346, DA-1343
Situation
I continue the migration process to migrate our repositories from GitLab to GitHub. (details)
Today, I planned to migrate two repositories and its CI/CD pipelines from GitLab to GitHub
Task
Migrate the following GitLab repositories to GitHub. 1. gitlab warehouseflow 2. gitlab etl-batch
Action
I migrated the repositories with the following command.
git clone --mirror git@gitlab.com:allofresh/data/<repo-name>.git
cd <repo-name>.git
git remote add origin git@github.com:allofresh/<repo-name>.git
git push origin --force --all
git push origin --force --tags
After that, I manually converted the CI/CD pipeline from GitLab CI/CD to GitHub Actions.
Result
As a result. the following repositories and ci/cd pipeline has been migrated to GitHub. 1. github warehouseflow 2. github etl-batch
Insider Data Pipeline — Data Duplication Issue
Situation
The Insider team informed us that their integration system produce duplicated data.
They sent the CSV files to our cloud storage bucket twice.
Only CSV files whose names contain the 2531 code are impacted, and they all have exactly the same size.
If this happens I could deduplicate the data by session_id, event_name, user_id , timestamp fields.
Task
Find out the duplicated files in the cloud storage.
Action
I manually validate the duplicated files by eye-balling since I only need to check from partition 4/6/26 until 22/6/26.
Result
Fortunately, There is no duplicated file so that I don't need to do deduplication process.
Blockers
N/A
Carry-overs
- Create a live demo for LLM Inference
Reflection
N/A