---
Title: Build system
URL Source: https://company-skill.com/p/bailian/bailian-build-system
Language: en
Description: You want to build a Retrieval-Augmented Generation (RAG) system that grounds Large Language Models (LLMs) with your custom enterprise data, documents, or real-time web search results. This involves…
---

# Build system

Part of **Bailian (Alibaba Cloud Model Studio)**. Route queries via `POST https://company-skill.com/api/route`.

## What You Want to Do

You want to build a Retrieval-Augmented Generation (RAG) system that grounds Large Language Models (LLMs) with your custom enterprise data, documents, or real-time web search results. This involves ingesting data, generating embeddings, retrieving relevant context, and optionally reranking results before passing them to the LLM.

**Typical User Questions**:
- How to build a RAG application?
- Use custom dataset for RAG

- RAG data usage guide

## Decision Tree

Pick the best path for your situation:

- **If** you need to process large batches of text asynchronously (up to 100,000 lines / 200 MB) or use specific models like `text-embedding-v4` and `qwen3-rerank` → Use **Custom Vector Search & Reranking API** (go to *bailian/bailian-search*)
- **If** you want to upload `PDF, TXT, DOCX` files under `50MB` via a web UI and configure `Chunk Size` visually → Use **Platform RAG Data Management** (go to *bailian/bailian-llm*)
- **If** you need to integrate real-time web or image search using `enable_search` and `search_strategy` parameters in your LLM calls → Use **Custom Vector Search & Reranking API** (go to *bailian/bailian-search*)
- **Otherwise (default)** → **Platform RAG Data Management** (the fastest way to get a basic RAG app running without writing code, ideal for standard document Q&A).

## Path Comparison

| Path | Best For | Complexity | Code Required | Automation | Key Fact | Detail Skill |
|------|----------|------------|---------------|------------|----------|-------------|
| Custom Vector Search & Reranking API | Building highly customized RAG pipelines with granular control | High | Yes | Yes | 100 QPS per model for embedding/reranking APIs | `bailian/api/bailian-search` |
| Platform RAG Data Management | Using built-in platform capabilities to manage custom datasets | Medium | No | No | 1 million tokens free per month for standard RAG | `bailian/guide/bailian-llm` |

## Path Details

### Path 1: Custom Vector Search & Reranking API

**Best For**: Building highly customized RAG pipelines with granular control over embeddings, retrieval, and text reranking.

**Brief Description**: 
A collection of DashScope APIs (OpenAI Compatible and Native) for programmatically building RAG pipelines. It includes text and multimodal embeddings, document reranking, knowledge retrieval via `file_search` and `vector_store_ids`, and web or image search augmentation. You can orchestrate the entire retrieval flow using SDKs.

**Key technical facts**:
- **Billing**: Per-token billing for embeddings/NLU (e.g., text-embedding-v4 at CNY 0.0005 / 1K tokens); per-call fees for Web Search (CNY 3-4 / 1K calls) and Image Search (CNY 48 / 1K calls).
- **Max Concurrency**: 100 QPS per model for most embedding/reranking APIs; 15 RPS per account for Web Search; 1 QPS for async task submission (max 3 concurrent tasks).
- **Regions Available**: China (Beijing), International (Singapore).
- **Auth Method**: Bearer Token via Authorization header (Authorization: Bearer $DASHSCOPE_API_KEY).
- **Prerequisites**: `DASHSCOPE_API_KEY` environment variable configured; OpenAI SDK (>=1.0.0) or DashScope SDK (>=1.14.0) installed.

**When to Use**:
- You need granular programmatic control over RAG pipeline components like text/multimodal embeddings, reranking, and intent recognition via SDKs.
- You need to process large batches of text for embeddings asynchronously (up to 100,000 lines / 200 MB via Async Task).
- You want to integrate real-time Web Search or Image Search tools directly into LLM API calls using parameters like `search_strategy` and `assigned_site_list`.

**When NOT to Use**:
- You want a no-code, UI-based approach to upload documents, configure chunking, and manage knowledge base data sources.
- You need to upload unsupported file formats or files larger than 50MB directly into a managed knowledge base without building a custom preprocessing pipeline.

**Known Limitations**:
- Web Search is limited to 15 RPS per account; exceeding this silently skips the search without returning an error.
- Synchronous embeddings are limited to max 10 texts per request and 8,192 tokens per line (for v3/v4 models).
- Reranking is limited to max 500 text documents, 40 images, or 4 videos per request, and max 4,000 tokens per document.
- Async batch task data and output URLs are retained for only 24 hours.

### Path 2: Platform RAG Data Management

**Best For**: Using built-in platform capabilities to manage custom datasets and quickly integrate RAG into LLM applications.

**Brief Description**: 
A no-code console guide for uploading custom datasets, configuring text chunking parameters, and managing knowledge bases for Retrieval-Augmented Generation via the Alibaba Cloud Model Studio UI. You navigate to `Console > RAG > Data Management` to `Upload Data`, assign a `Data Source Name`, and optionally enable `Automatic Optimization` for intelligent text splitting.

**Key technical facts**:
- **Billing**: Pay-as-you-go per token for RAG processing (e.g., standard RAG at 0.002 CNY / 1K input tokens, 0.004 CNY / 1K output tokens); 1 million tokens free per month.
- **Auth Method**: Console SSO (Alibaba Cloud Account).
- **Prerequisites**: Valid Alibaba Cloud account with access to the RAG feature; data files in supported formats (PDF, TXT, DOCX) under 50MB.

**When to Use**:
- You want to quickly build a knowledge base by uploading PDF, TXT, or DOCX files via the web console without writing code.
- You need to visually configure text chunking parameters (Chunk Size, Overlap) and manage RAG data sources through UI forms.

**When NOT to Use**:
- You require programmatic control over the RAG pipeline, custom embedding models, or reranking logic via API.
- You need to process file formats other than PDF, TXT, and DOCX, or files exceeding the 50MB limit.
- You need to dynamically update chunking parameters without re-uploading the data.

**Known Limitations**:
- Supported file formats are strictly limited to PDF, TXT, and DOCX.
- Maximum file size per upload is 50MB.
- Chunk Size and Overlap parameters cannot be modified after initial data ingestion; requires re-uploading or re-processing the data source.

## FAQ

**Q: Which path should I start with?**
A: If you are building a quick internal tool or proof-of-concept and your data consists of standard documents under 50MB, start with Platform RAG Data Management. It requires zero code and gets you a working knowledge base in minutes. If you are building a production-grade application that requires custom reranking logic, multimodal inputs, or real-time web augmentation, start with the Custom Vector Search & Reranking API.

**Q: What if I need to upload a 100MB CSV file but chose Platform RAG Data Management?**
A: You will be blocked. The Console strictly limits uploads to PDF, TXT, and DOCX formats, and the maximum file size per upload is 50MB. To process a 100MB CSV, you must use the Custom Vector Search & Reranking API to build a custom preprocessing and embedding pipeline.

**Q: What if I want to use `qwen3-rerank` and `image_search` but chose Platform RAG Data Management?**
A: You won't have access to these features. The Console UI does not expose granular reranking models or multimodal search tools. You must use the API path to programmatically call `qwen3-rerank` for document scoring and `image_search` for visual retrieval.

**Q: Can I change the `Chunk Size` and `Overlap` after uploading data in the Console?**
A: No. In the Platform RAG Data Management path, Chunk Size and Overlap parameters cannot be modified after initial data ingestion. If you need to change them, you must re-upload or re-process the entire data source. The API path allows you to dynamically control chunking in your own preprocessing code before sending text to the embedding endpoints.

**Q: What happens if I exceed 15 RPS on Web Search using the Custom Vector Search API?**
A: The system will silently skip the search without returning an error. Your LLM will simply generate a response without the augmented web context. If your application requires high-throughput web scraping, you need to implement your own external search caching or rate-limiting layer.

**Q: How do I handle multimodal data like images and videos for RAG?**
A: You must use the Custom Vector Search & Reranking API. The API supports `MultiModalEmbedding` for processing images and videos, and allows reranking up to 40 images or 4 videos per request. The Console path only supports text-based documents.

**Q: Is there a free tier for RAG processing?**
A: Yes, the Platform RAG Data Management path includes 1 million free tokens per month for standard RAG processing. The API path charges per token for embeddings (e.g., CNY 0.0005 / 1K tokens for text-embedding-v4) and per call for search tools, without a specified monthly free tier for those specific API calls.

## Related queries

build rag, rag pipeline, retrieval augmented generation, knowledge base, vector search, reranking pipeline, rag system, how to build rag, how to do rag, where to upload rag data, can I use custom dataset for rag, dashscope embedding, bailian rag, file_search, vector_store_ids, qwen3-rerank, retrival

---
Part of [Bailian (Alibaba Cloud Model Studio)](https://company-skill.com/p/bailian.md) · https://company-skill.com/llms.txt