Sunday, June 28, 2026

Priorities

Create a live demo for LLM Inference

Work Log

Situation

I want to create a live demo about LLM Inference without using third-party vendors such as OpenAI, Claude, etc. To achieve that I have establish the following things:

Vector database.
Embedding service.
Language model service.

To complete the LLM Inference, I need to create a landing page where the users can ask question and get the result like from human.

Tasks

Create a frontend service where the users can ask questions and got the results.
Create a backend service that able to vectorize users question, search it in the vector database, and give the result using language model.

Actions

For the frontend service, I create a simple HTML page contains a textbox with 2 buttons, and serve this using nginx. For the backend service, I created a Python script utilize Flask for REST-API to communicate with the other services.

Result

As a result the LLM Inference live demo is available now, and can access with the following link: - LLM Inference Live Demo: https://ilyasahsan.xyz/chat-server

Additionally, all the LLM Inference components running in my own Kubernetes without using any external dependencies such as OpenAI or Claude.

Blockers

N/A

Carry-overs

N/A

Reflection

N/A