Deploy an embedding model in OpenSearch for vector similarity retrieval, then deploy a large language model via PAI for online inference, combining both into a retrieval-augmented generation (RAG) pipeline.
Deploy an embedding model in OpenSearch for vector similarity retrieval, then deploy a large language model via PAI for online inference, combining both into a retrieval-augmented generation (RAG) pipeline.
See pai/pai-deploy-inference.
See opensearch/opensearch-deploy-model.
Q: How do I build and deploy a retrieval-augmented generation pipeline using embedding search and LLM inference? A: You can deploy a retrieval-augmented generation pipeline by combining an embedding model in OpenSearch for vector similarity retrieval with a large language model on PAI for online inference. This cross-product integration links vector search directly to model inference to create a complete RAG workflow.