Sunday, June 28, 2026

Priorities

Work Log


Situation

I want to create a live demo about LLM Inference without using third-party vendors such as OpenAI, Claude, etc. To achieve that I have establish the following things:

  1. Vector database.
  2. Embedding service.
  3. Language model service.

To complete the LLM Inference, I need to create a landing page where the users can ask question and get the result like from human.

Tasks

  1. Create a frontend service where the users can ask questions and got the results.
  2. Create a backend service that able to vectorize users question, search it in the vector database, and give the result using language model.

Actions

For the frontend service, I create a simple HTML page contains a textbox with 2 buttons, and serve this using nginx. For the backend service, I created a Python script utilize Flask for REST-API to communicate with the other services.

Result

As a result the LLM Inference live demo is available now, and can access with the following link: - LLM Inference Live Demo: https://ilyasahsan.xyz/chat-server

Additionally, all the LLM Inference components running in my own Kubernetes without using any external dependencies such as OpenAI or Claude.


Blockers

N/A

Carry-overs

N/A

Reflection

N/A