ChromaDB an out of the box semantic search engine
I always wanted some solution on my laptop where i can communicate to my laptop like a human, like taking to an assitant in natural language and getting things done.
With the advent of LLMs or transformers in general we have got access to AI agents which can make us 2-10X more productive.
I am interested in building small and smart solutions for my personal laptop which make me x% more productive, in the same direction i am going to work on a tool which son't require a graphic card to run but still respond to my text queries in acceptable time frame.
How i am going to use this tool - searching some text in PDFs available on my laptop - searching for new items in the rss feeds already synced to my laptop - searching code snippets - n other ways i am going to use this
Creating an in-memory collection
import chromadb
client = chromadb.Client()
collection = client.create_collection("chroma_demo")
Chroma by default use following transformer to create embeddings : all-MiniLM-L6-v2
Add some documents
collection.add(
documents = ["Divij is the class monitor", "Vikas is new doctor in village", "Gopal can fix your car"],
metadatas=[{"category": "education"}, {"category": "medical"}, {"category": "vehicle"}],
ids=["id1", "id2", "id3"]
)
Do symentic query for matching documents
results = collection.query(
query_texts=["tell me about doctor"],
n_results=1
)
print(results)
Output:
{'ids': [['id2']], 'distances': [[1.2264858484268188]], 'metadatas': [[{'category': 'medical'}]], 'embeddings': None, 'documents': [['Vikas is new doctor in village']], 'uris': None, 'data': None, 'included': ['metadatas', 'documents', 'distances']}