I have just built a sql chatbot for local data using langchain and streamlit. I found the whole process surprisingly smooth with these two great tools, I put here how I built this and some thoughts alongside.

In case you don’t know what langchain and streamlit are, here is a brief introduction. Langchain in my mind is a tool to connect to all kinds of language models(e.g. openai) and provide interfaces to easily using them. Streamlit is a tool that allows you to write UI included apps with only python. so langchain+streamlit means you can build UI included llm apps with only python.

The final app looks like this:

sqlbot

Here is how I built it:

1. Install langchain and streamlit

pip install langchain streamlit

(I simplified the installation, you should in your project set up a virtual environment and install them in it.)

2. Find your data connection string

under the hood langchain uses sqlalchemy to connect to database, so you can use any database that sqlalchemy supports. Here is what a connection string for mysql looks like

mysql+pymysql://<username>:<password>@<host>/<dbname>[?<options>]

3. Set up a streamlit app

you just need to create a python file and import streamlit

4. Use langchain sql agent to talk to your database

langchain sql agentallows you to use an agent to explore your database, the agent is powered by an llm model, it could be openai or some open source models like llama2.
besides sql agent , you can also use sql chain which in my sense is a manual car vs sqlagent as an automatic car.

5. Hook up langchain and streamlit

StreamlitCallbackHandler is the trick to make langchain and streamlit work together, it’s like for each response from langchain agent, it will call a function to update the streamlit UI. The function is provided by streamlit and included in langchain.

finally here is the all the codes(removing sensitive info):

from langchain.sql_database import SQLDatabase
from langchain.llms import OpenAI
from langchain.agents import AgentType, initialize_agent, load_tools
from langchain.callbacks import StreamlitCallbackHandler
from langchain.agents import create_sql_agent
from langchain.agents.agent_toolkits import SQLDatabaseToolkit
import streamlit as st


st.title("Talk to your data")
api_key = st.text_input("api_key")
db_string = st.text_input("db_string")

if api_key:
    db = SQLDatabase.from_uri(db_string)
    toolkit = SQLDatabaseToolkit(db=db, llm=OpenAI(temperature=0, openai_api_key=api_key))
    agent_executor = create_sql_agent(
        llm=OpenAI(temperature=0, streaming=True, openai_api_key=api_key),
        toolkit=toolkit,
        verbose=True,
        agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    )
else:
    st.write("Please input openai_api_key")


if prompt := st.chat_input():
    st.chat_message("user").write(prompt)
    with st.chat_message("assistant"):
        st_callback = StreamlitCallbackHandler(st.container())
        response = agent_executor.run(prompt, callbacks=[st_callback])
        
        st.write(response)

say the name is app.py, then you can run it with

streamlit run app.py

6. Deploy

you can deploy it to streamlit cloud, or you can deploy it to your own server, I deployed it to my own server with docker, here is the dockerfile:

FROM python:3.11

WORKDIR /app

COPY requirements.txt .

RUN pip install -r requirements.txt

COPY ./app .

CMD ["streamlit", "run", "index.py"]

Then you have a sql chatbot which applies to almost all database that you have access to. easy right?

Some thoughts

While it’s easy to build this app up, it still have some limitations.

  1. Openai has a limitition of 5000 tokens per request, so if you have a large data schema to upload, it will fail.
  2. The llm for sql is not as good as it works on other daily tasks, according to my using experience the accurate rate is roughly 60-70%.
  3. You have to pay for openai or you need a decent GPU to run local llm models.

So ideally this tool can be used for exploration of your data, you can used it for generating some sql queries and then use the query in other places.Howere I understand both llm and its related tools are developing fast, so I’m looking forward to seeing more powerful usage in the future.

Update 2023-10-14

I have put this app together with other streamlit tools I built in this repo: streamlit-tools you can check it out and give it a try.