Sunday, June 28, 2026
Priorities
- Create a live demo for LLM Inference
Work Log
Situation
I want to create a live demo about LLM Inference without using third-party vendors such as OpenAI, Claude, etc. To achieve that I have establish the following things:
- Vector database.
- Embedding service.
- Language model service.
To complete the LLM Inference, I need to create a landing page where the users can ask question and get the result like from human.
Tasks
- Create a frontend service where the users can ask questions and got the results.
- Create a backend service that able to vectorize users question, search it in the vector database, and give the result using language model.
Actions
For the frontend service, I create a simple HTML page contains a textbox with 2 buttons, and serve this using nginx. For the backend service, I created a Python script utilize Flask for REST-API to communicate with the other services.
Result
As a result the LLM Inference live demo is available now, and can access with the following link: - LLM Inference Live Demo: https://ilyasahsan.xyz/chat-server
Additionally, all the LLM Inference components running in my own Kubernetes without using any external dependencies such as OpenAI or Claude.
Blockers
N/A
Carry-overs
N/A
Reflection
N/A