DaaS / Products / RAG Pipeline: Embedding Search + LLM Inference

RAG Pipeline: Embedding Search + LLM Inference

Deploy an embedding model in OpenSearch for vector similarity retrieval, then deploy a large language model via PAI for online inference, combining both into a retrieval-augmented generation (RAG) pipeline.

Products involved

Scenario

How the products combine

pai · pai-deploy-inference — Platform for AI (PAI) — Deploy a model for online inference

See pai/pai-deploy-inference.

opensearch · opensearch-deploy-model — OpenSearch — Deploy embedding model for inference

See opensearch/opensearch-deploy-model.

Typical questions

build RAG system
deploy RAG pipeline
embedding search plus LLM
vector search and model inference
retrieval augmented generation deploy
部署RAG系统
向量检索加大模型推理
构建检索增强生成

FAQ

Q: How do I build and deploy a retrieval-augmented generation pipeline using embedding search and LLM inference? A: You can deploy a retrieval-augmented generation pipeline by combining an embedding model in OpenSearch for vector similarity retrieval with a large language model on PAI for online inference. This cross-product integration links vector search directly to model inference to create a complete RAG workflow.