Using Chroma DB as a search engine

ChromaDB an out of the box semantic search engine

I always wanted some solution on my laptop where i can communicate to my laptop like a human, like taking to an assitant in natural language and getting things done.
With the advent of LLMs or transformers in general we have got access to AI agents which can make us 2-10X more productive.

I am interested in building small and smart solutions for my personal laptop which make me x% more productive, in the same direction i am going to work on a tool which son't require a graphic card to run but still respond to my text queries in acceptable time frame.

How i am going to use this tool - searching some text in PDFs available on my laptop - searching for new items in the rss feeds already synced to my laptop - searching code snippets - n other ways i am going to use this


Creating an in-memory collection

import chromadb
client = chromadb.Client()
collection = client.create_collection("chroma_demo")

Chroma by default use following transformer to create embeddings : all-MiniLM-L6-v2

Add some documents

collection.add(
    documents = ["Divij is the class monitor", "Vikas is new doctor in village", "Gopal can fix your car"],
    metadatas=[{"category": "education"}, {"category": "medical"}, {"category": "vehicle"}],
    ids=["id1", "id2", "id3"]
)

Do symentic query for matching documents

results = collection.query(
    query_texts=["tell me about doctor"],
    n_results=1
)
print(results)

Output:

{'ids': [['id2']], 'distances': [[1.2264858484268188]], 'metadatas': [[{'category': 'medical'}]], 'embeddings': None, 'documents': [['Vikas is new doctor in village']], 'uris': None, 'data': None, 'included': ['metadatas', 'documents', 'distances']}

links

social