A colleague recently asked me if we could build a chatbot to search a bunch of documents. He wanted to find past proposals where we had previously pitched services related to an upcoming project. It took literally 5-10 minutes to get a working version.

I gave ChatGPT a very basic prompt (the only 'AI knowledge' required here was the phrase "RAG implementation"): "Give me a basic RAG implementation in Python that can search across my pdf's".

After less than 30 seconds it gave me a bunch of code.

A bunch of code

I also asked it how to run the code (in Google Colab):

Running the code

And then I followed the instructions (all within Colab):

And now we can run !python rag_pdf.py ask "question goes here" --top_k 5 --llm openai to answer questions using uploaded source documents.

Answer questions

This answer indeed matches the source document on page 89 (Seven Sketches in Compositionality: An Invitation to Applied Category Theory by Brendan Fong and David I. Spivak):

Source validation

Limitations

Obviously this is just a prototype solution. There is no clever chunking of the documents, only one type of search algorithm, no evaluation of accuracy, and out-of-the-box it only works within this ephemeral Colab environment. But it was literally faster to build an AI and start asking it questions, than it would have been to manually read through all of the PDFs.