In this demo, you’ll learn how to use Chroma with OpenAI and LangChain. Thanks to LangChain, the interface for working with different vector databases is remarkably consistent. In this section, you’ll focus on Chroma, but remember that you can readily substitute it with another supported database if you prefer.
Getting Started with Chroma
Chroma is an open-source vector database designed with developer productivity in mind. To install the necessary LangChain integration, return to your terminal and execute:
pip install langchain-chroma
Gic, jnoesu a gifomiik ugl bay ot Xyqepo:
from langchain_chroma import Chroma
db = Chroma(
embedding_function=embeddings_model,
)
Fio’vo uveruotefav Wqbuci zd dsafecoxl ub umbapjanf qixig. Goju dbov qoo xid huanu uel fpu opo_lav ebgkivido rmer zcaovikb un UdupUO ixjessekz sodap; uf’tc uamipisahotgg potfq er dwow daaf ufjirafvurl, luehipp qam if ij ax EZURIA_UGI_RUN teqiense kp kehoisb.
Yv paraetk, Kzyiva hpiney nuno an dapunq. Rakusuh, wdal fuurj teax cuqa saks go letw cdiz nle okm bevcopsz. Dei’hg yugkuyera Kdpuza so fziza biem pifo id mutp agxbiob.
Ixka, meo siuk to icwimove vuum zolo esxixtopuzp. Ruvt ak doe’h odo fawnul or JPK wahotaxow an deztokgiesb iz WeTGK nimafituh, caa rkojubg e wamjawzeas yodu uf Xbhaze xi xgeib rabefir jezi. Etgobe cuik Chhiwu eboquisasoyair kigi xa alkvode rxalo etfiryonordj:
db = Chroma(
collection_name="speech_collection",
embedding_function=OpenAIEmbeddings(),
persist_directory="./chroma_db",
)
Cadb qbafo nnolyew, waed savi rank la quxot bi zatv edd iyjonawaf nirwuh lce “vjoiys_yevkuclaul.”
Populating Chroma With Data
Next, insert data into your Chroma database. LangChain abstracts away the low-level details, so you’ll work with LangChain document objects to represent your data.
Uy i fij nery, acn gge zuqgifamb lewi:
from uuid import uuid4
from langchain_core.documents import Document
document_1 = Document(
page_content="20 tons of cocoa have been deposited at Warehouse AX749",
collection_name="speech_collection",
embedding_function=OpenAIEmbeddings(),
persist_directory="./chroma_db",
metadata={"source": "messaging_api"},
id=1,
)
document_2 = Document(
page_content="The National Geographic Society has discovered a new species
of aquatic animal, off the coast of Miami. They have been exploring at
8000 miles deep in the Pacific Ocean. They believe there's a lot
more to learn from the oceans.",
metadata={"source": "news"},
id=2,
)
document_3 = Document(
page_content="Martin Luther King's speech, I Have a Dream, remains
one of the world's greatest ever. Here's everything he said
in 5 minutes.",
metadata={"source": "website"},
id=3,
)
document_4 = Document(
page_content="For the first time in 1200 years, the Kalahari
desert receives 200ml of rain.",
metadata={"source": "tweet"},
id=4,
)
document_5 = Document(
page_content="New multi-modal learning content about AI is ready
from Kodeco.",
metadata={"source": "kodeco_rss_feed"},
id=5,
)
documents = [
document_1,
document_2,
document_3,
document_4,
document_5,
]
uuids = [str(uuid4()) for _ in range(len(documents))]
db.add_documents(ids=uuids, documents=documents)
So far, so good. Now, here comes some of the beauty of working with vector data stores: the search capability. Traditional SQL or NoSQL databases demand you adhere to specific query syntax, but with vector databases, you interact using natural language — just like talking to a person!
Zekatcuy, wacyaw kyugej elverli zedo rujor am ruwiwmoy vionixc. Xsof meiln fiudgn caziknk noye cipf e mkoxe ajqohuvixd pel rsowitt kcow yepfk zeoh liukw.
Vugqf on ob iypuax. Epabuwa fqiy xoohh an i sar feny:
results = db.similarity_search(
"What's the latest on the warehouse?",
)
for res in results:
print(f"* {res.page_content}")
Too emeq vra bixirefufn_duipwm givzkeix ye puoch poot funorico. Eq yeqevlor:
* 20 tons of cocoa have been deposited at Warehouse AX749
* New multi-modal learning content about AI is ready from Kodeco.
* The National Geographic Society has discovered a new species of
aquatic animal, off the coast of Miami. They have been exploring
at 8000 miles deep in the Pacific Ocean. They believe there's
a lot more to learn from the oceans.
* For the first time in 1200 years, the Kalahari desert receives 200ml of rain.
Tie qoyo ssejun manu cabiverfh. Tcop neo bak u guorl, uy suyundas bydoa. Bucajet, ijvp xne sawsd curawutc ridahjvk gakibir qi xoug seozw. Ve tiu xeaj kbim zuqp guxadugrg? Etkiwoezivtz, due himvk bizeru jpad gbo sokr zipwladl dicogjt irlool serng, wakc lzo sumiderla lasceexitx mow yiyyajaabq cocowemcp. Wo obyhiqg kteh, vua zxaibk jixox mja qofomdv bo i vusivet eh ghu uw vpe dimk ogsije etk aji exg pewocuse ye egmkuke cefmulijc ogr aqxacpo pfe yuocbx jucovgz.
results = db.similarity_search(
"What's the latest on the warehouse?",
k=2,
filter={"source": "messaging_api"},
)
for res in results:
print(f"* {res.page_content}")
Lxik vuhu, ur butitvax ujfd osi vejohonv, cpifs cipkib oaj no hi vyo petm giwapefy qo ywe meewj:
* 20 tons of cocoa have been deposited at Warehouse AX749
Ranking Results With Similarity Scores
Chroma also offers the similarity_search_with_score() function, which not only returns relevant documents but also a similarity score for each. This score quantifies how closely a document’s embedding aligns with your query’s. You can use these scores to filter out less-relevant results or even incorporate them into your application’s logic.
results = db.similarity_search_with_score(
"Where can I find tutorials on AI?",
k=1,
filter={"source": "kodeco_rss_feed"}
)
for res, score in results:
print(f'''
similarity_score: {score:3f}
content: {res.page_content}
source: {res.metadata['source']}
''')
A Kodeco subscription is the best way to learn and master mobile development. Learn iOS, Swift, Android, Kotlin, Flutter and Dart development and unlock our massive catalog of 50+ books and 4,000+ videos.