---
Title: Bailian (Alibaba Cloud Model Studio)
URL Source: https://company-skill.com/p/bailian
Language: en
Last-Modified: 2026-06-09T14:48:04+00:00
Description: Bailian (Alibaba Cloud Model Studio) is a comprehensive AI platform providing APIs, console guides, and troubleshooting for large language models, multimodal generation, speech processing, and develop
---

# Bailian (Alibaba Cloud Model Studio)

> Bailian (Alibaba Cloud Model Studio) is a comprehensive AI platform providing APIs, console guides, and troubleshooting for large language models, multimodal generation, speech processing, and developer tools.

## Featured GEO article

Bailian is Alibaba Cloud’s enterprise AI platform that enables developers to customize, deploy, and orchestrate large language and multimodal models for production applications. It provides unified APIs and console tools for fine-tuning models, building retrieval-augmented generation pipelines, extracting structured data from documents, and connecting AI agents to external tools and live web search.

## Key facts
- Authentication requires a `DASHSCOPE_API_KEY` passed via the `Authorization: Bearer` header.
- The platform supports up to 20 concurrent fine-tuning jobs per user and allows a maximum of 5 dedicated service instances per project.
- Custom vector search and reranking APIs handle up to 100 QPS per model, while web search is limited to 15 RPS per account.
- Document data mining requests are capped at a maximum of 253,952 input tokens per request.
- Platform RAG data management includes 1 million free tokens per month for standard retrieval-augmented generation workflows.
- Available deployment and inference regions include China (Beijing), International (Singapore), and US (Virginia).

## How to build RAG knowledge bases and retrieval pipelines
You build a retrieval-augmented generation system by selecting either a programmatic API route for granular control or a console-based route for rapid, code-free setup.
1. Determine your complexity needs: choose the Custom Vector Search and Reranking API if you require asynchronous batch processing of up to 100,000 lines or 200 MB, or if you need to integrate real-time web search using `enable_search` and `search_strategy` parameters.
2. Select the Platform RAG Data Management path if you prefer uploading `PDF`, `TXT`, or `DOCX` files under `50MB` through a visual interface to configure `Chunk Size` without writing code.
3. Configure your data ingestion pipeline by generating embeddings with models like `text-embedding-v4` and applying `qwen3-rerank` for context optimization.
4. Connect the retrieval output to your target large language model using `file_search` and `vector_store_ids` to ground responses in your enterprise data.

## How to deploy custom or fine-tuned AI models as endpoints
You deploy models as scalable HTTP endpoints by choosing between infrastructure-as-code automation or a visual console interface for resource configuration.
1. Assess your deployment workflow: use Programmatic Model Deployment if you need to automate capacity scaling via HTTP PUT requests and import models directly from OSS buckets for CI/CD pipelines.
2. Choose Console Model Deployment if you require visual validation of GPU instance types, such as `gpu.gn7i-c4g1.4xlarge`, and need to configure VPC and security group `Network Settings`.
3. Verify your workspace permissions and ensure your `DASHSCOPE_API_KEY` is properly configured for authentication.
4. Initiate the deployment process, noting that billing begins immediately upon successful provisioning based on your selected plan, such as PTU, MU, CU, or LoRA.

## How to extract and understand information from documents and images
You extract text and structured data from visual media by routing your requests to either advanced multimodal vision models or specialized optical character recognition services.
1. Identify your extraction goal: route to Multimodal Vision and Document Mining if you need complex visual reasoning, GUI automation via `gui-plus`, or deep layout analysis using `qwen-doc-turbo`.
2. Select Specialized OCR and Image Translation if your primary requirement is translating embedded text while preserving layout with `qwen-mt-image` or performing real-time speech translation.
3. Configure request parameters like `vl_high_resolution_images`, `file_parsing_strategy`, and `ocr_options` to optimize handling of complex document structures.
4. Monitor concurrency limits, which cap at 100 QPS per model with a maximum of 10 concurrent requests, and ensure your input stays within the 253,952 token limit per request.

## How to fine-tune a large language or multimodal model
You customize base models with proprietary datasets by preparing your training data and selecting a deployment-ready fine-tuning path.
1. Prepare your custom dataset and determine whether you require full-parameter training or efficient LoRA adaptation.
2. Submit your training job through the platform, keeping in mind the limit of 20 concurrent fine-tuning jobs per user.
3. Monitor training progress and validate model performance before transitioning to the deployment phase.
4. Once validated, route the fine-tuned model to your chosen endpoint configuration for production inference.

## How to integrate external tools, MCP servers, and web search into AI agents
You connect large language models to external systems by configuring model calling parameters and leveraging platform-managed search capabilities.
1. Enable real-time data augmentation by adding `enable_search` and defining a `search_strategy` in your model API requests.
2. Configure Web Search MCP settings through the console to allow agents to query live internet data and image repositories.
3. Integrate external tool calls and MCP servers directly into your agent orchestration layer to expand functional capabilities beyond native model knowledge.
4. Test the integration using the platform’s API testing guides to verify that tool outputs are correctly formatted and passed back to the language model.

## Frequently Asked Questions

**Q: how do I build rag knowledge bases and retrieval pipelines**
A: You build them by choosing between the Custom Vector Search and Reranking API for programmatic control over embeddings and reranking, or the Platform RAG Data Management console for uploading files under 50 MB and configuring chunk sizes visually.

**Q: what's the best way to build rag**
A: The best approach depends on your technical requirements: use the API path for high-volume asynchronous processing and granular parameter control, or use the platform console for rapid, code-free setup with 1 million free monthly tokens.

**Q: how do I deploy custom or fine-tuned ai models as endpoints**
A: You deploy them by selecting either Programmatic Model Deployment for CI/CD automation and OSS bucket imports, or Console Model Deployment for visual GPU instance selection and VPC network configuration.

**Q: what's the best way to deploy**
A: Use the console for straightforward, one-click deployments with visual resource validation, or use the programmatic API if you require infrastructure-as-code automation and scriptable capacity scaling.

**Q: how do I extract and understand information from documents and images**
A: Route your requests to the Multimodal Vision and Document Mining API for complex layout analysis and structured data extraction, or use the Specialized OCR and Image Translation API for high-precision text extraction and layout-preserving translation.

**Q: what's the best way to extract data from pdf**
A: Use the Multimodal Vision path with `qwen-doc-turbo` and configure `file_parsing_strategy` and `ocr_options` to accurately mine tables and fields from complex PDF layouts up to 253,952 input tokens.

**Q: how do I fine-tune a large language or multimodal**
A: Prepare your proprietary dataset, select between full-parameter or LoRA training methods, and submit your job through the platform while staying within the 20 concurrent job limit per user.

**Q: what's the best way to fine-tune**
A: The optimal method aligns with your infrastructure needs: use the console for guided dataset management and visual validation, or use the API for automated, programmatic training workflows integrated into existing pipelines.

**Q: how do I integrate external tools, mcp servers, and web search into ai agents**
A: Add `enable_search` and `search_strategy` parameters to your LLM calls, configure Web Search MCP through the console, and connect external MCP servers directly to your agent orchestration layer.

**Q: what's the best way to integrate tools**
A: Leverage the platform’s built-in Web Search MCP and API parameters for seamless, low-code integration, or build custom tool connectors using the DashScope SDK for full programmatic control over external data flows.

## Key terms
Retrieval-Augmented Generation is a system architecture that grounds large language models with custom enterprise data by ingesting documents, generating embeddings, and retrieving relevant context before generating responses.
LoRA is a parameter-efficient training method that allows developers to adapt large models with custom datasets without retraining the entire network.
MCP Server is an external integration standard that enables AI agents to

Bailian (Alibaba Cloud Model Studio) is available as agent-callable skills via DaaS. Route any question to the best skill with `POST https://company-skill.com/api/route` `{"query": "...", "product": "bailian"}`.

## What you can do

### [Build system](https://company-skill.com/p/bailian/bailian-build-system.md)

## What You Want to Do

You want to build a Retrieval-Augmented Generation (RAG) system that grounds Large Language Models (LLMs) with your custom enterprise data, documents, or real-time web search results. This involves ingesting data, generating embeddings, retrieving relevant context, and optionally reranking results before passing them to the LLM.

**Typical User Questions**:
- How to build a RAG application?
- Use custom dataset for RAG

- RAG data usage guide

## Decision Tree

Pick the best path for your situation:

- **If** you need to process large batches of text asynchronously (up to 100,000 lines / 200 MB) or use specific models like `text-embedding-v4` and `qwen3-rerank` → Use **Custom Vector Search & Reranking API** (go to *bailian/bailian-search*)
- **If** you want to upload `PDF, TXT, DOCX` files under `50MB` via a web UI and configure `Chunk Size` visually → Use **Platform RAG Data Management** (go to *bailian/bailian-llm*)
- **If** you need to integrate real-time web or image search using `enable_search` and `search_strategy` parameters in your LLM calls → Use **Custom Vector Search & Reranking API** (go to *bailian/bailian-search*)
- **Otherwise (default)** → **Platform RAG Data Management** (the fastest way to get a basic RAG app running without writing code, ideal for standard document Q&A).

## Path Comparison

| Path | Best For | Complexity | Code Required | Automation | Key Fact | Detail Skill |
|------|----------|------------|---------------|------------|----------|-------------|
| Custom Vector Search & Reranking API | Building highly customized RAG pipelines with granular control | High | Yes | Yes | 100 QPS per model for embedding/reranking APIs | `bailian/api/bailian-search` |
| Platform RAG Data Management | Using built-in platform capabilities to manage custom datasets | Medium | No | No | 1 million tokens free per month for standard RAG | `bailian/guide/bailian-llm` |

## Path Details

### Path 1: Custom Vector Search & Reranking API

**Best For**: Building highly customized RAG pipelines with granular control over embeddings, retrieval, and text reranking.

**Brief Description**: 
A collection of DashScope APIs (OpenAI Compatible and Native) for programmatically building RAG pipelines. It includes text and multimodal embeddings, document reranking, knowledge retrieval via `file_search` and `vector_store_ids`, and web or image search augmentation. You can orchestrate the entire retrieval flow using SDKs.

**Key technical facts**:
- **Billing**: Per-token billing for embeddings/NLU (e.g., text-embedding-v4 at CNY 0.0005 / 1K tokens); per-call fees for Web Search (CNY 3-4 / 1K calls) and Image Search (CNY 48 / 1K calls).
- **Max Concurrency**: 100 QPS per model for most embedding/reranking APIs; 15 RPS per account for Web Search; 1 QPS for async task submission (max 3 concurrent tasks).
- **Regions Available**: China (Beijing), International (Singapore).
- **Auth Method**: Bearer Token via Authorization header (Authorization: Bearer $DASHSCOPE_API_KEY).
- **Prerequisites**: `DASHSCOPE_API_KEY` environment variable configured; OpenAI SDK (>=1.0.0) or DashScope SDK (>=1.14.0) installed.

**When to Use**:
- You need granular programmatic control over RAG pipeline components like text/multimodal embeddings, reranking, and intent recognition via SDKs.
- You need to process large batches of text for embeddings asynchronously (up to 100,000 lines / 200 MB via Async Task).
- You want to integrate real-time Web Search or Image Search tools directly into LLM API calls using parameters like `search_strategy` and `assigned_site_list`.

**When NOT to Use**:
- You want a no-code, UI-based approach to upload documents, configure chunking, and manage knowledge base data sources.
- You need to upload unsupported file formats or files larger than 50MB directly into a managed knowledge base without building a custom preprocessing pipeline.

**Known Limitations**:
- Web Search is limited to 15 RPS per account; exceeding this silently skips the search without returning an error.
- Synchronous embeddings are limited to max 10 texts per request and 8,192 tokens per line (for v3/v4 models).
- Reranking is limited to max 500 text documents, 40 images, or 4 videos per request, and max 4,000 tokens per document.
- Async batch task data and output URLs are retained for only 24 hours.

### Path 2: Platform RAG Data Management

**Best For**: Using built-in platform capabilities to manage custom datasets and quickly integrate RAG into LLM applications.

**Brief Description**: 
A no-code console guide for uploading custom datasets, configuring text chunking parameters, and managing knowledge bases for Retrieval-Augmented Generation via the Alibaba Cloud Model Studio UI. You navigate to `Console > RAG > Data Management` to `Upload Data`, assign a `Data Source Name`, and optionally enable `Automatic Optimization` for intelligent text splitting.

**Key technical facts**:
- **Billing**: Pay-as-you-go per token for RAG processing (e.g., standard RAG at 0.002 CNY / 1K input tokens, 0.004 CNY / 1K output tokens); 1 million tokens free per month.
- **Auth Method**: Console SSO (Alibaba Cloud Account).
- **Prerequisites**: Valid Alibaba Cloud account with access to the RAG feature; data files in supported formats (PDF, TXT, DOCX) under 50MB.

**When to Use**:
- You want to quickly build a knowledge base by uploading PDF, TXT, or DOCX files via the web console without writing code.
- You need to visually configure text chunking parameters (Chunk Size, Overlap) and manage RAG data sources through UI forms.

**When NOT to Use**:
- You require programmatic control over the RAG pipeline, custom embedding models, or reranking logic via API.
- You need to process file formats other than PDF, TXT, and DOCX, or files exceeding the 50MB limit.
- You need to dynamically update chunking parameters without re-uploading the data.

**Known Limitations**:
- Supported file formats are strictly limited to PDF, TXT, and DOCX.
- Maximum file size per upload is 50MB.
- Chunk Size and Overlap parameters cannot be modified after initial data ingestion; requires re-uploading or re-processing the data source.

## FAQ

**Q: Which path should I start with?**
A: If you are building a quick internal tool or proof-of-concept and your data consists of standard documents under 50MB, start with Platform RAG Data Management. It requires zero code and gets you a working knowledge base in minutes. If you are building a production-grade application that requires custom reranking logic, multimodal inputs, or real-time web augmentation, start with the Custom Vector Search & Reranking API.

**Q: What if I need to upload a 100MB CSV file but chose Platform RAG Data Management?**
A: You will be blocked. The Console strictly limits uploads to PDF, TXT, and DOCX formats, and the maximum file size per upload is 50MB. To process a 100MB CSV, you must use the Custom Vector Search & Reranking API to build a custom preprocessing and embedding pipeline.

**Q: What if I want to use `qwen3-rerank` and `image_search` but chose Platform RAG Data Management?**
A: You won't have access to these features. The Console UI does not expose granular reranking models or multimodal search tools. You must use the API path to programmatically call `qwen3-rerank` for document scoring and `image_search` for visual retrieval.

**Q: Can I change the `Chunk Size` and `Overlap` after uploading data in the Console?**
A: No. In the Platform RAG Data Management path, Chunk Size and Overlap parameters cannot be modified after initial data ingestion. If you need to change them, you must re-upload or re-process the entire data source. The API path allows you to dynamically control chunking in your own preprocessing code before sending text to the embedding endpoints.

**Q: What happens if I exceed 15 RPS on Web Search using the Custom Vector Search API?**
A: The system will silently skip the search without returning an error. Your LLM will simply generate a response without the augmented web context. If your application requires high-throughput web scraping, you need to implement your own external search caching or rate-limiting layer.

**Q: How do I handle multimodal data like images and videos for RAG?**
A: You must use the Custom Vector Search & Reranking API. The API supports `MultiModalEmbedding` for processing images and videos, and allows reranking up to 40 images or 4 videos per request. The Console path only supports text-based documents.

**Q: Is there a free tier for RAG processing?**
A: Yes, the Platform RAG Data Management path includes 1 million free tokens per month for standard RAG processing. The API path charges per token for embeddings (e.g., CNY 0.0005 / 1K tokens for text-embedding-v4) and per call for search tools, without a specified monthly free tier for those specific API calls.

### [Deploy model](https://company-skill.com/p/bailian/bailian-deploy-model.md)

## What You Want to Do

You want to take a custom, fine-tuned, or LoRA model and deploy it as a dedicated, scalable API endpoint on Alibaba Cloud's Bailian platform so that applications can consume it via HTTP requests.

**Typical User Questions**:
- How to deploy my fine-tuned model?

- Deploy custom LoRA model from OSS
- Can I deploy models via API?

## Decision Tree

Pick the best path for your situation:

- **If** you are managing infrastructure as code, integrating with CI/CD pipelines, or automating capacity scaling via HTTP PUT requests → Use **Programmatic Model Deployment** (go to *bailian/bailian-model*)
- **If** you need to visually select specific GPU instance types (e.g., `gpu.gn7i-c4g1.4xlarge`) and configure VPC/security group `Network Settings` → Use **Console Model Deployment** (go to *bailian/bailian-model*)
- **If** you require subscription-based billing for training units (which the API does not support) → Use **Console Model Deployment** (go to *bailian/bailian-model*)
- **Otherwise (default)** → Use **Console Model Deployment**, as it provides visual validation of resources, network configurations, and region scoping before committing to dedicated GPU costs.

## Path Comparison

| Path | Best For | Complexity | Code Required | Automation | Key Fact | Detail Skill |
|------|----------|------------|---------------|------------|----------|-------------|
| Programmatic Model Deployment | Infrastructure-as-code, automated scaling, and managing dedicated endpoints programmatically. | Medium | Yes | Yes | Max 20 concurrent fine-tune jobs per user | `bailian/api/bailian-model` |
| Console Model Deployment | Quick one-click deployment, visual resource selection, environment preparation, and region scoping. | Low | No | No | Max 5 dedicated service instances per project | `bailian/guide/bailian-model` |

## Path Details

### Path 1: Programmatic Model Deployment

**Best For**: Infrastructure-as-code, automated scaling, and managing dedicated endpoints programmatically.

**Brief Description**: A stateless HTTP API for deploying, scaling, and managing dedicated model services. It allows you to automate the import of custom models directly from OSS buckets using REST calls and Bearer token authentication, making it ideal for CI/CD pipelines.

**Key Facts** — pulled from fact_card:
- **Billing**: Billed per request/token based on the selected plan (PTU, MU, CU, LoRA). Billing starts immediately upon successful deployment.
- **Regions**: China, International
- **Auth**: Authorization: Bearer $DASHSCOPE_API_KEY
- **Prerequisites**: DASHSCOPE_API_KEY environment variable, Workspace with model deployment permissions

**When to Use**:
- Need scriptable deployment for CI/CD pipelines and Infrastructure-as-Code.
- Automating the import of custom LoRA or full-parameter models directly from OSS buckets.
- Managing deployment capacity and scaling programmatically via HTTP PUT requests.

**When NOT to Use**:
- User requires visual selection of specific GPU instance types (e.g., `gpu.gn7i-c4g1.4xlarge`) and VPC/security group network configurations.
- Need to perform manual model evaluation and annotation through a visual interface.

**Known Limitations**:
- Maximum of 20 concurrent or succeeded fine-tune jobs per user.
- Deployment name suffix must be unique and maximum 8 characters long, otherwise a Conflict error occurs.
- Training jobs created via API support only token-based billing; subscription-based training units require the console.

### Path 2: Console Model Deployment

**Best For**: Quick one-click deployment, visual resource selection, environment preparation, and region scoping.

**Brief Description**: A visual web console interface for configuring and deploying dedicated model services. It provides UI workflows to `Create Dedicated Service`, select your `Instance Type`, configure `Network Settings`, and choose your `Billing Method`. It also includes tools to `Import Model` weights and take services `Offline`.

**Key Facts** — pulled from fact_card:
- **Billing**: Per Token (Pay-as-you-go), Per Instance Hour (Dedicated GPU), or Per Model Unit (MU) / Provisioned Throughput (PTU) subscription.
- **Instance Types**: `gpu.gn7i-c4g1.4xlarge`, `gpu.gn7i-c8g1.8xlarge`, `gpu.gn7i-c16g1.16xlarge`
- **Regions**: China (Beijing), US (Virginia), Singapore, Germany (Frankfurt), China (Hong Kong)
- **Auth**: Console SSO / Alibaba Cloud account login
- **Prerequisites**: Active Alibaba Cloud account with valid payment method, OSS bucket with LoRA files (for importing custom models), Published dataset (for fine-tuning)

**When to Use**:
- Need to visually select specific GPU instance types and configure VPC/security group network settings.
- Performing manual model evaluation, annotation, and comparative evaluation using visual dimension templates.
- Setting up region-specific deployments and service scopes for data residency compliance.

**When NOT to Use**:
- Need to automate deployment and scaling via CI/CD pipelines or Infrastructure-as-Code.
- Managing more than 5 dedicated service instances per project.

**Known Limitations**:
- Maximum of 5 dedicated service instances per project.
- CPT and Image-to-Video training sets do not support draft status and must be published immediately upon creation.
- Importing custom LoRA models requires specific files (`adapter_model.safetensors` and `adapter_config.json`) in an OSS bucket that is not using Archive storage.

## FAQ

**Q: Which path should I start with?**
A: Start with Console Model Deployment if this is your first time setting up a dedicated endpoint. It allows you to visually verify your `Instance Type` selection and `Network Settings` before incurring dedicated GPU costs, and ensures your region scope meets data residency requirements.

**Q: What if I need to manage more than 5 dedicated service instances per project but chose Console Model Deployment?**
A: If you need to manage more than 5 dedicated service instances per project but chose Console Model Deployment, you'll hit a hard limit. The console restricts you to a maximum of 5 dedicated service instances per project. You must use Programmatic Model Deployment to exceed this project-level cap.

**Q: What if I need subscription-based training units but used Programmatic Model Deployment?**
A: If you need subscription-based training units but used Programmatic Model Deployment, you'll hit a billing limitation. Training jobs created via the API only support token-based billing. You must use the Console to purchase and apply subscription-based training units.

**Q: How do I import a custom LoRA model from OSS?**
A: Both paths support this. In the console, use the `Import Model` feature and ensure your OSS bucket contains `adapter_model.safetensors` and `adapter_config.json` (and is not using Archive storage). Via the API, you can automate the import directly from your OSS buckets using REST calls.

**Q: How do I take a deployed model offline?**
A: In the console, you can change the service status to `Offline` through the Model Deployment Console UI. Programmatically, you manage the deployment lifecycle and scale down or delete the endpoint via HTTP DELETE/PUT requests.

**Q: Can I perform A/B testing or manual evaluation on my deployed model?**
A: Manual evaluation and annotation are strictly features of the Console Model Deployment path, which provides visual dimension templates for comparative evaluation. The API path does not include visual evaluation tools.

### [Extract documents](https://company-skill.com/p/bailian/bailian-extract-documents.md)

## What You Want to Do

You want to extract text, structured data, or visual information from documents (like PDFs) and images, or translate text embedded within visual media while maintaining the original context and layout.

**Typical User Questions**:
- How to extract data from PDF?
- OCR text extraction from images
- GUI automation using vision models
- Image translation preserving layout

## Decision Tree

Pick the best path for your situation:

- **If** you need to perform complex visual reasoning using `qvq-max` with `enable_thinking`, execute GUI automation using `gui-plus`, or mine structured data from complex PDF layouts using `qwen-doc-turbo` → Use **Multimodal Vision & Document Mining** (go to *bailian/bailian-multimodal*)
- **If** you need to translate text embedded in images while preserving the original layout using `qwen-mt-image`, or perform real-time speech-to-speech translation using `qwen3.5-livetranslate-flash-realtime` → Use **Specialized OCR & Image Translation** (go to *bailian/bailian-translation*)
- **Otherwise (default)** → Use **Multimodal Vision & Document Mining**. It is the most versatile path for general document understanding, raw OCR, and visual question-answering tasks.

## Path Comparison

| Path | Best For | Complexity | Code Required | Automation | Key Fact | Detail Skill |
|------|----------|------------|---------------|------------|----------|-------------|
| Multimodal Vision & Document Mining | Complex visual reasoning, GUI automation, and extracting structured data from complex PDF layouts using large vision models. | Medium | Yes | Yes | Document Data Mining is limited to max 253,952 input tokens per request. | `bailian/api/bailian-multimodal` |
| Specialized OCR & Image Translation | High-precision raw text extraction, layout preservation, and translating text embedded within images using dedicated OCR models. | Low | Yes | Yes | Image translation query API has a default rate limit of 1 RPS for polling task status. | `bailian/api/bailian-translation` |

## Path Details

### Path 1: Multimodal Vision & Document Mining

**Best For**: Complex visual reasoning, GUI automation, and extracting structured data from complex PDF layouts using large vision models.

**Brief Description**: 
This path leverages Bailian Multimodal Understanding and Interaction APIs to process images, videos, and documents. It utilizes advanced vision models like `qvq-max` for step-by-step visual reasoning, `qwen-doc-turbo` for deep document data mining, and `gui-plus` for UI interaction. You can fine-tune extraction behavior using parameters like `vl_high_resolution_images`, `file_parsing_strategy`, and `ocr_options` to handle complex layouts and high-resolution inputs.

**Key technical facts**:
- Billing: Per-token billing model. Input tokens (including text, image, video, and audio tokens) and output tokens are priced separately.
- Concurrency: 100 QPS per model, with a maximum of 10 concurrent requests per model.
- Auth: Bearer Token (Header: Authorization: Bearer $DASHSCOPE_API_KEY)
- Regions: China (Beijing), Singapore (International), US (Virginia)

**When to Use**:
- User needs to extract structured data, tables, and specific fields from complex PDF and document files using `qwen-doc-turbo`.
- User requires GUI automation and interaction based on UI screenshots using `gui-plus` models.
- User needs complex visual reasoning and step-by-step image analysis using `qvq-max` or thinking models.

**When NOT to Use**:
- User needs to translate text embedded within images while preserving the original layout (use `qwen-mt-image` in Specialized OCR & Image Translation).
- User needs real-time speech-to-speech translation or live audio/video stream translation with voice cloning (use `qwen3.5-livetranslate-flash-realtime` in Specialized OCR & Image Translation).

**Known Limitations**:
- Video maximum duration is 2 hours for qwen3.6 series, 1 hour for qwen3-vl series, and 10 minutes for other models.
- OCR is limited to max 8K tokens per request and max image size 10MB.
- Document Data Mining is limited to max 253,952 input tokens per request, max 32,768 output tokens, and max 9,000 tokens per message.
- Audio understanding is limited to max 40 minutes of audio per request.

### Path 2: Specialized OCR & Image Translation

**Best For**: High-precision raw text extraction, layout preservation, and translating text embedded within images using dedicated OCR models.

**Brief Description**: 
This path utilizes Bailian Translation and Localization APIs specifically designed for machine translation and image localization. It features `qwen-mt-image` for layout-preserving image translation, `qwen-mt-plus` for domain-specific text translation, and `gummy-realtime-v1` for live audio streams. You can control translation behavior using `translation_options`, `domainHint`, and `imageSegment`, and handle long-running image tasks by passing the `X-DashScope-Async: enable` header to retrieve a `task_id` for polling.

**Key technical facts**:
- Billing: Text/OCR billed per 1,000 tokens; Audio/Video billed per 1,000 tokens or per second; Image translation billed per successfully generated image; Real-time Speech (Gummy) billed per second of active connection.
- Concurrency: 100 QPS per model; Max 10 concurrent WebSocket connections for real-time translation; Default 1 QPS for polling image translation task status API.
- Auth: Bearer Token (Header: Authorization: Bearer $DASHSCOPE_API_KEY)
- Regions: China (Beijing), Singapore (International), US (Virginia)

**When to Use**:
- User needs to translate text embedded within images while preserving the original layout using `qwen-mt-image`.
- User requires real-time speech-to-speech translation or live audio/video stream translation with voice cloning using `qwen3.5-livetranslate-flash-realtime`.
- User needs machine translation with custom terminology, translation memory, and domain prompting using `qwen-mt-plus`.

**When NOT to Use**:
- User needs to extract structured data, tables, and specific fields from complex PDF and document files (use `qwen-doc-turbo` in Multimodal Vision & Document Mining).
- User needs GUI automation and interaction based on UI screenshots (use `gui-plus` in Multimodal Vision & Document Mining).

**Known Limitations**:
- Image translation limits: Max 100 MB per image, dimensions between 15x15 and 8192x8192 pixels, and URL cannot contain Chinese characters.
- Image translation query API has a default rate limit of 1 RPS; requires async task callback for more frequent queries.
- Qwen-MT text translation models have a max of 8,192 tokens per request.
- Gummy short-sentence models have a max 1 minute audio duration per task.
- Real-time translation (Gummy) only supports one target language for translation at a time.

## FAQ

Q: Which path should I start with?
A: Start with **Multimodal Vision & Document Mining** if your primary goal is to read, understand, or extract data from PDFs and images into text or JSON formats. It is the most robust path for general document intelligence and visual reasoning.

Q: What if I need to extract structured tables from a 50-page PDF but chose Specialized OCR & Image Translation?
A: You'll hit a wall because the translation path lacks document data mining capabilities and is limited to 8,192 tokens for text models. You must use `qwen-doc-turbo` in the Multimodal path, which supports up to 253,952 input tokens per request specifically for complex PDF layout extraction.

Q: What if I need to translate a UI screenshot while preserving the original layout but chose Multimodal Vision & Document Mining?
A: You'll fail to preserve the layout. The Multimodal path is for extraction and reasoning, not layout-preserving image generation. You must use `qwen-mt-image` in the Specialized OCR & Image Translation path to generate a translated image with the original visual layout intact.

Q: Can I use the multimodal path for real-time speech-to-speech translation during a live video stream?
A: No. The Multimodal path only supports audio understanding (up to 40 minutes per request) for analysis, not live translation. For real-time speech-to-speech translation with voice cloning, you must use `qwen3.5-livetranslate-flash-realtime` in the Specialized OCR & Image Translation path.

Q: What are the concurrency limits if I process a large batch of document translations?
A: Both paths support 100 QPS per model for standard API calls. However, if you are using the async image translation API in the Translation path, polling the task status endpoint is strictly rate-limited to 1 RPS by default. You should implement async task callbacks instead of aggressive polling to avoid throttling.

Q: How do I handle high-resolution images for OCR in the Multimodal path?
A: You should enable the `vl_high_resolution_images` parameter in your API request to ensure the model processes the full resolution rather than downscaling it. Keep in mind the hard limit of 10MB per image and 8K tokens per OCR request.

### [Fine model](https://company-skill.com/p/bailian/bailian-fine-model.md)

## What You Want to Do

You want to customize a foundational large language model (LLM) or multimodal model (like Qwen or wan2.6-i2v) using your own proprietary data. This process, known as fine-tuning, adapts the model's weights to improve performance on specific downstream tasks, adopt a specific tone, or learn new domain-specific knowledge. 

Depending on your workflow and technical resources, you can either script this process for automated pipelines or use a visual interface for hands-on data preparation and monitoring. Alibaba Cloud Model Studio (Bailian) supports multiple fine-tuning paradigms, including Supervised Fine-Tuning (SFT), Continual Pre-Training (CPT), and Direct Preference Optimization (DPO), as well as Efficient training (LoRA) for resource-efficient adaptation.

**Typical User Questions**:
- How do I fine-tune Qwen?
- Can I automate model training via API?
- Fine-tuning best practices for Qwen

## Decision Tree

Pick the best path for your situation:

- **If** you need to integrate fine-tuning into CI/CD pipelines, execute asynchronous batch tasks, or manage datasets programmatically using OpenAI-compatible SDKs → Use Programmatic Fine-Tuning via API (go to *bailian/bailian-model*)
- **If** you require visual Data Cleansing (e.g., Sensitive Data Masking), Data Augmentation, or subscription-based training units for billing → Use Console-based Visual Fine-Tuning (go to *bailian/bailian-model*)
- **If** you are training video models like wan2.6-i2v and need to ensure you are using the efficient_sft (LoRA) training type via automated scripts → Use Programmatic Fine-Tuning via API (go to *bailian/bailian-model*)
- **Otherwise (default)** → Console-based Visual Fine-Tuning (safest for interactive hyperparameter tuning, visual log monitoring, and one-click model publishing without writing polling scripts).

## Path Comparison

| Path | Best For | Complexity | Code Required | Automation | Key Fact | Detail Skill |
|------|----------|------------|---------------|------------|----------|--------------|
| Programmatic Fine-Tuning via API | Automating training pipelines, managing datasets programmatically, and integrating into CI/CD workflows. | high | Yes | Yes | Billed per token consumed; max 20 concurrent or succeeded jobs per user. | `bailian/api/bailian-model` |
| Console-based Visual Fine-Tuning | Interactive data preparation, visual hyperparameter tuning, and monitoring training progress via the UI. | low | No | No | Supports subscription-based training units and includes 5 hours free training per month. | `bailian/guide/bailian-model` |

## Path Details

### Path 1: Programmatic Fine-Tuning via API

**Best For**: Automating training pipelines, managing datasets programmatically, and integrating into CI/CD workflows.

**Brief Description**: 
A stateless HTTP API and SDK-based workflow for creating and managing fine-tuning jobs on Alibaba Cloud Model Studio. You submit training file IDs and hyperparameters directly to the DashScope endpoint. This approach is ideal for developers who want to treat model training as an infrastructure-as-code component, executing asynchronous batch training tasks and polling job status via the returned job_id.

**Core Workflow Concepts**:
- **File Management**: Training data must first be uploaded via the File Management API to generate file IDs.
- **Job Submission**: Hyperparameters and file IDs are submitted to the DashScope endpoint to create the job.
- **Status Polling**: Because the API is stateless, you must poll the job_id to track training progress and retrieve the final model checkpoint.

**Key technical facts**:
- Billing: Billed per token consumed during training (Total tokens * epochs * unit price). API-created jobs support only token-based billing.
- Max concurrency: 20 fine-tune jobs running or succeeded per user
- Regions available: China (Standard), International (Standard)
- Prerequisites: DASHSCOPE_API_KEY environment variable, Training dataset uploaded via File Management API to get file IDs

**When to Use**:
- Automating training pipelines and integrating fine-tuning into CI/CD workflows.
- Managing datasets and launching jobs programmatically using OpenAI-compatible SDKs or DashScope SDK.
- Executing asynchronous batch training tasks and polling job status via job_id.

**When NOT to Use**:
- User requires subscription-based billing (training units) instead of pay-as-you-go token billing.
- User needs visual data cleansing, sensitive data masking, or data augmentation workflows before training.
- User wants to visually monitor training logs and metrics without writing polling scripts.

**Known Limitations**:
- Maximum of 20 concurrent or succeeded fine-tune jobs per user.
- Maximum file size for training data is 1 GB per file.
- Video models like wan2.6-i2v currently only support the efficient_sft (LoRA) training type.

### Path 2: Console-based Visual Fine-Tuning

**Best For**: Interactive data preparation, visual hyperparameter tuning, and monitoring training progress via the UI.

**Brief Description**: 
A visual web interface in Alibaba Cloud Model Studio for preparing datasets, configuring hyperparameters, and launching SFT, CPT, or DPO fine-tuning jobs. It provides built-in tools like Data Stream and Data Cleansing, allows you to select your specific Training Method, and utilizes Platform Storage for managing datasets. You can initiate the entire process seamlessly via the Create Training Task UI, making it highly accessible for non-engineers or those doing exploratory training.

**Core Workflow Concepts**:
- **Data Preparation**: Upload data to Platform Storage, use Data Stream to inspect it, and apply Data Cleansing to remove PII or format errors.
- **Task Configuration**: Use the Create Training Task wizard to select your base model, Training Method (SFT, CPT, DPO), and hyperparameters.
- **Monitoring & Publishing**: Watch visual training logs and loss curves, then use one-click publishing to deploy the model to an endpoint.

**Key technical facts**:
- Billing: Supports both per-token pay-as-you-go billing and subscription-based training units. Includes 5 hours free training per month.
- Regions available: China (Beijing), US (Virginia), Singapore, Germany (Frankfurt), China (Hong Kong)
- Prerequisites: Active Alibaba Cloud account, Dataset in 'Published' status

**When to Use**:
- User needs to perform visual data cleansing (e.g., Sensitive Data Masking) and data augmentation before training.
- User wants to use subscription-based training units for billing instead of token-based pay-as-you-go.
- User prefers interactive hyperparameter tuning, visual log monitoring, and one-click model publishing without writing code.

**When NOT to Use**:
- User needs to automate recurring fine-tuning jobs via CI/CD pipelines or external scripts.
- User wants to manage training files programmatically using OpenAI-compatible file APIs.

**Known Limitations**:
- Datasets must be in 'Published' status to be used in training jobs; draft datasets are not supported.
- CPT and Image-to-Video training sets do not support draft status and must be published immediately upon creation.
- Platform Storage for datasets is free and unlimited, but relies on console UI rather than programmatic OSS mounting for training data.

## FAQ

Q: Which path should I start with?
A: Start with Console-based Visual Fine-Tuning if you are doing this manually for the first time, as it provides visual log monitoring, interactive hyperparameter tuning, and 5 hours of free training per month. Choose the API path only if you are building an automated CI/CD pipeline or need to manage hundreds of datasets programmatically.

Q: What if I need subscription-based billing but chose the API path?
A: If you need to use training units but chose the API, you'll hit a strict billing limitation: API-created training jobs only support pay-as-you-go token-based billing. You must use the Console to access subscription-based training units and claim the monthly free training hours.

Q: What if I want to automate CI/CD pipelines but chose the Console path?
A: If you need to automate recurring jobs but chose the Console, you'll hit a wall because Platform Storage relies entirely on the console UI rather than programmatic OSS mounting. Furthermore, the UI lacks OpenAI-compatible file APIs for automated dataset management, making CI/CD integration practically impossible.

Q: Can I use draft datasets for training in the Console?
A: No. Datasets must be in 'Published' status to be used in training jobs. Specifically, CPT and Image-to-Video training sets do not support draft status and must be published immediately upon creation. If your data is still in draft, the Create Training Task flow will not let you select it.

Q: What is the maximum file size and concurrency limit for the API approach?
A: The maximum file size for training data is 1 GB per file. Additionally, there is a strict limit of 20 concurrent or succeeded fine-tune jobs per user. If you exceed this, you must delete old succeeded jobs before launching new ones via the API.

Q: What happens if I try to train a video model like wan2.6-i2v using an unsupported method?
A: Video models like wan2.6-i2v currently only support the efficient_sft (LoRA) training type. If you attempt to use a different Training Method via the API or Console, the job will fail or be rejected during the validation phase.

Q: How do the regions differ between the API and Console paths?
A: The API path is broadly available in China (Standard) and International (Standard) regions. The Console path is available in specific regional hubs: China (Beijing), US (Virginia), Singapore, Germany (Frankfurt), and China (Hong Kong). Ensure your Alibaba Cloud account is provisioned in a supported region before starting.

Q: What if my training dataset contains sensitive PII (Personally Identifiable Information)?
A: If you use the API path, you must handle PII masking externally before uploading the file. If you use the Console path, you can leverage the built-in visual Data Cleansing tools, which include Sensitive Data Masking features to automatically redact PII before the training job begins.

### [Integrate mcp](https://company-skill.com/p/bailian/bailian-integrate-mcp.md)

## What You Want to Do

You want to extend the capabilities of your Qwen or other LLM-based agents by connecting them to external data sources, custom APIs, or the live internet using the Model Context Protocol (MCP) or built-in search features.

**Typical User Questions**:
- How to connect LLM to external tools?
- MCP Server
- Enable web search for Qwen

- Model Context Protocol setup
- Tool integration guide

## Decision Tree

Pick the best path for your situation:

- **If** you are coding an agent using `client.responses.create` and need to programmatically connect to custom MCP servers or external APIs via the `tools array` → Use **MCP Server API Connection** (go to *bailian/bailian-integration*)
- **If** you want to quickly enable built-in web search via the `MCP Marketplace` and need the `Streamable HTTP protocol` endpoint for AI coding tools without writing backend code → Use **Web Search MCP Configuration** (go to *bailian/bailian-search*)
- **Otherwise (default)** → Use **MCP Server API Connection**, as it provides the most flexibility for custom agent development, supports standard OpenAI-compatible SDKs, and allows integration with any third-party MCP server.

## Path Comparison

| Path | Best For | Complexity | Code Required | Automation | Key Fact | Detail Skill |
|------|----------|------------|---------------|------------|----------|-------------|
| MCP Server API Connection | Programmatically connecting LLMs to external tools, databases, and custom MCP servers via code. | High | Yes | Yes | Max 10 MCP servers per request | `bailian/api/bailian-integration` |
| Web Search MCP Configuration | Quickly enabling and configuring built-in web search capabilities and MCP services via the console. | Low | No | No | Free tier limited to 2,000 calls/month | `bailian/guide/bailian-search` |

## Path Details

### Path 1: MCP Server API Connection

**Best For**: Programmatically connecting LLMs to external tools, databases, and custom MCP servers via code.

**Brief Description**: A programmatic integration path using the Responses API and the sse (Server-Sent Events) protocol to connect LLMs to external or custom MCP servers via OpenAI-compatible SDKs. You will configure the `server_url` and `server_protocol` within the tools array when calling `client.responses.create`.

**Key Facts** — pulled from fact_card:
- Billing: Per-token billing for model inference; MCP server fees are separate and subject to individual server billing rules.
- Cold start: —
- Max model size: —
- Runtimes: Python (openai>=1.0.0, dashscope>=1.14.0), Node.js (openai)
- Custom Docker: —
- Auto-scaling: —
- Auth method: Bearer Token via Authorization header using DASHSCOPE_API_KEY
- Max concurrency: 100 QPS per model; Maximum 10 MCP servers per request
- Regions available: China Region, International Region
- Prerequisites: DASHSCOPE_API_KEY environment variable, OpenAI-compatible SDK (openai>=1.0.0)

**When to Use**:
- Need to programmatically connect LLMs to custom or third-party MCP servers via code.
- Require streaming responses and token usage metrics extraction via OpenAI-compatible SDK.

**When NOT to Use**:
- User wants to use the standard Chat Completions API (`client.chat.completions.create`) for tool calling.
- User needs a protocol other than SSE (e.g., Streamable HTTP) for MCP server communication.
- User wants to configure the built-in Web Search MCP via the console UI without writing code.

**Known Limitations**:
- MCP is only supported via the Responses API (`client.responses.create`); the standard Chat Completions API support for MCP tool configurations is limited or requires additional configuration — please refer to the detail skill to confirm.
- Currently, only the sse protocol is supported for MCP server communication.
- A maximum of 10 MCP servers can be configured in the tools array for a single request.

### Path 2: Web Search MCP Configuration

**Best For**: Quickly enabling and configuring built-in web search capabilities and MCP services via the console.

**Brief Description**: A console-based configuration path to enable the built-in Web Search MCP service via the MCP Marketplace and retrieve its Streamable HTTP protocol endpoint for AI coding tools. You simply click Enable Now to activate the service and generate your endpoint.

**Key Facts** — pulled from fact_card:
- Billing: Free tier of 2,000 calls per month; service automatically stops after quota exhaustion.
- Cold start: —
- Max model size: —
- Runtimes: —
- Custom Docker: —
- Auto-scaling: —
- Auth method: Model Studio API key (sk-xxx format) in authorization header
- Max concurrency: —
- Regions available: China region, International regions (via Firecrawl)
- Prerequisites: Coding Plan subscription, Model Studio API key (sk-xxx format)

**When to Use**:
- User wants to quickly enable built-in web search capabilities for AI coding tools (like Qwen Code or Claude Code) via the console UI.
- User needs the Streamable HTTP endpoint for Web Search MCP without writing custom integration code.

**When NOT to Use**:
- User needs to integrate custom, third-party MCP servers programmatically (use MCP Server API Connection instead).
- User requires more than 2,000 web search calls per month and needs a paid tier (service stops after free quota).
- User wants to use web search directly in standard Qwen model API calls via the `enable_search` parameter rather than via MCP.

**Known Limitations**:
- Free quota is strictly limited to 2,000 calls per month, and the service automatically stops functioning once exhausted.
- Upgrading from the legacy SSE protocol to Streamable HTTP requires manually clicking 'Cancel Activation' before re-enabling the service.
- International region users may need to sign up for and use third-party Firecrawl API keys instead of the native Aliyun endpoint.

## FAQ

Q: Which path should I start with?
A: If you are building a custom agent application and need to connect to various external APIs or databases, start with MCP Server API Connection. If you just need to give an AI coding assistant internet access quickly without writing backend code, start with Web Search MCP Configuration.

Q: What if I want to use the standard Chat Completions API but chose MCP Server API Connection?
A: You'll hit a blocking limitation. MCP is only supported via the Responses API (`client.responses.create`); the standard Chat Completions API (`client.chat.completions.create`) does not support MCP tool configurations. You must refactor your code to use the Responses API.

Q: What if I need more than 2,000 web search calls per month but used Web Search MCP Configuration?
A: The service will automatically stop functioning once the free quota is exhausted. There is no built-in paid tier upgrade for this specific console MCP service; you would need to use a different search API or the standard `enable_search` parameter in the Qwen API.

Q: What if I need the Streamable HTTP protocol for my custom MCP server but used MCP Server API Connection?
A: You will not be able to connect. The API connection path currently only supports the sse (Server-Sent Events) protocol for MCP server communication. You must ensure your custom server supports SSE.

Q: What if my agent needs to connect to 15 different custom tools but used MCP Server API Connection?
A: You will hit a hard limit. A maximum of 10 MCP servers can be configured in the tools array for a single request. You must consolidate your tools or route them through a single aggregator MCP server.

Q: What if I just want web search in standard Qwen API calls without setting up MCP?
A: Neither of these MCP paths is the right choice. You should instead use the `enable_search` parameter directly in your standard Qwen model API calls, which bypasses the MCP protocol entirely.

Q: What if I am in an international region but chose Web Search MCP Configuration?
A: You may need to sign up for and use third-party Firecrawl API keys instead of the native Aliyun endpoint, as the built-in service relies on Firecrawl outside the China region.

### [Manage security](https://company-skill.com/p/bailian/bailian-manage-security.md)

## What You Want to Do

You need to secure access to Alibaba Cloud Model Studio by managing API keys, encrypting sensitive payloads, configuring private network boundaries, or setting up team permissions and content guardrails.

**Typical User Questions**:
- How to get API key?
- VPC (How to configure VPC and PrivateLink in Bailian?)
- Generate temporary API key
- (How to assign Bailian permissions to team members?)
- RSA encryption for model inputs
- Private Link connection setup

## Decision Tree

Pick the best path for your situation:

- **If** you need to generate short-lived temporary keys (TTL 1 to 1800 seconds) or encrypt sensitive payloads using RSA public keys via REST API → Use **Programmatic Key & Encryption Management** (go to `bailian/api/bailian-access`)
- **If** you need to establish private network access via VPC Interface Endpoints, configure PrivateLink Reverse Endpoints, or manage team SSO and RBAC via the UI → Use **Console Network & Permission Setup** (go to `bailian/guide/bailian-access`)
- **Otherwise (default)** → Use **Console Network & Permission Setup**. This is the safest starting point for initial account setup, creating your permanent `DASHSCOPE_API_KEY`, and establishing baseline network and workspace configurations before automating anything via code.

## Path Comparison

| Path | Best For | Complexity | Code Required | Automation | Key Fact | Detail Skill |
|------|----------|------------|---------------|------------|----------|-------------|
| Programmatic Key & Encryption Management | Generating temporary API keys, managing async tasks, and applying RSA encryption | Medium | Yes | Yes | Temporary API key TTL is strictly limited to a maximum of 1800 seconds | `bailian/api/bailian-access` |
| Console Network & Permission Setup | Configuring VPC, Private Link, MSE gateways, and managing team workspace permissions via UI | Medium | No | No | PrivateLink reverse endpoints for Secure Storage require an existing VPC in China (Beijing) spanning zones G, H, or L | `bailian/guide/bailian-access` |

## Path Details

### Path 1: Programmatic Key & Encryption Management

**Best For**: Generating temporary API keys, managing async tasks, and applying RSA encryption for secure payloads.

**Brief Description**: 
This path utilizes synchronous REST APIs for generating temporary API keys, managing asynchronous tasks, and obtaining RSA public keys to encrypt sensitive model payloads. It relies on endpoints like `POST /api/v1/tokens` and `POST /api/v1/tasks/{task_id}/cancel`, requiring a permanent `DASHSCOPE_API_KEY` for Bearer Token authentication.

**Key technical facts**:
- **Billing**: Temporary API Keys: Free of charge. Async Task Management & RSA Encryption: Billed per request.
- **Auth Method**: Bearer Token using permanent API key (Authorization: Bearer $DASHSCOPE_API_KEY)
- **Max Concurrency**: 20 QPS per Alibaba Cloud account for Async Task Management
- **Regions Available**: China (Default), International, US, Hong Kong, Europe (Frankfurt)
- **Prerequisites**: Permanent API key stored in `DASHSCOPE_API_KEY` environment variable

**When to Use**:
- Need to generate short-lived, secure API keys (1 to 1800 seconds) for temporary programmatic access.
- Need to programmatically query, batch query, or cancel asynchronous tasks via REST API.
- Need to encrypt sensitive model inputs in transit using RSA public keys before calling model APIs.

**When NOT to Use**:
- Need to configure VPC, PrivateLink, or network security boundaries (use Console Network & Permission Setup path instead).
- Need to manage workspace permissions, team members, or SSO via UI.
- Need long-lived API keys (temporary keys max out at 30 minutes).

**Known Limitations**:
- Temporary API key TTL is strictly limited to a maximum of 1800 seconds (30 minutes).
- Async tasks can only be canceled when in the PENDING state; canceling RUNNING, SUCCEEDED, or FAILED tasks returns an UnsupportedOperation error.
- Completed asynchronous tasks and their results are automatically deleted by the system after exactly 24 hours.
- Async Task Management is rate-limited to 20 QPS per Alibaba Cloud account, which includes all RAM users under the account.

### Path 2: Console Network & Permission Setup

**Best For**: Configuring VPC, Private Link, MSE gateways, and managing team workspace permissions via UI.

**Brief Description**: 
A console-based guide for configuring network security (VPC, PrivateLink, MSE gateways), managing workspace permissions, and setting up team SSO and AI guardrails. This path involves setting up the Home Business Space, configuring Zone IP Configuration, and managing team access via the Token Plan (Team Edition).

**Key technical facts**:
- **Billing**: API Keys: Free. Private Link: Incurs usage costs + CEN cross-region fees. AI Guardrail: 0.002 CNY / 1K tokens. Batch Inference: 50% cost of real-time.
- **Auth Method**: Console SSO / RAM user with AliyunBailianFullAccess policy
- **Regions Available**: China (Beijing), China (Hong Kong), Singapore, US (Virginia)
- **Prerequisites**: Alibaba Cloud account or RAM user with administrator permissions, AliyunBailianFullAccess RAM policy, VPC in the same region as the Model Studio service, Security group allowing inbound traffic on ports 80 and 443.

**When to Use**:
- Need to establish private network access to Model Studio APIs via VPC Interface Endpoints or PrivateLink.
- Need to route traffic to VPC resources (OSS, AnalyticDB, Elasticsearch) using MSE Cloud Native Gateway.
- Need to manage team workspaces, assign RBAC roles, or configure SSO via Token Plan (Team Edition).
- Need to enable AI Guardrails for input and output content moderation.

**When NOT to Use**:
- Need to programmatically generate temporary API keys or encrypt payloads via REST API (use Programmatic Key & Encryption Management path instead).
- Need to automate infrastructure setup via code rather than clicking through console wizards.
- Need to query or cancel async tasks programmatically.

**Known Limitations**:
- The full API key is only shown once immediately after creation in the console and cannot be retrieved later if lost.
- SSO configuration cannot be edited while the organization has members; all members must be removed first before modifying SSO settings.
- Advanced model monitoring is only available in China (Beijing), Singapore, and US (Virginia) regions.
- Batch inference JSONL files are strictly limited to a maximum of 50,000 lines and 500MB in total size.
- PrivateLink reverse endpoints for Secure Storage require an existing VPC in China (Beijing) region spanning zones G, H, or L.

## FAQ

Q: Which path should I start with?
A: Start with Console Network & Permission Setup to create your initial permanent `DASHSCOPE_API_KEY`, configure your Home Business Space, and establish baseline network boundaries. Once the foundation is set, use the Programmatic path for automated, short-lived access.

Q: What if I need long-lived API keys for my production backend but chose Programmatic Key & Encryption Management?
A: If you need permanent credentials but chose the Programmatic path, you'll hit a hard limit: it only generates temporary API keys with a strict maximum TTL of 1800 seconds (30 minutes). For permanent keys, you must use the Console path.

Q: What if I need to encrypt sensitive model inputs but chose Console Network & Permission Setup?
A: If you need payload encryption but chose the Console path, you'll hit a wall: the console handles network-level security but does not provide the RSA public keys needed for payload-level encryption. You must use the Programmatic path to fetch RSA keys and encrypt inputs before calling `com.aliyuncs.dashscope` endpoints.

Q: Can I use the Programmatic path to set up a Reverse Endpoint for my VPC?
A: If you need to configure network infrastructure like a Reverse Endpoint but chose the Programmatic path, you'll find no such API. Network components like Reverse Endpoints, Interface Endpoints, and Zone IP Configuration can only be provisioned through the Console path.

Q: How do I handle team permissions and content moderation?
A: Use the Console path. It provides the UI to manage the Token Plan (Team Edition) for RBAC/SSO and to enable the AI Guardrail for input/output content moderation. These administrative features are not exposed via the programmatic REST APIs.

### [Transcribe speech](https://company-skill.com/p/bailian/bailian-transcribe-speech.md)

## What You Want to Do

You want to convert spoken audio into text, translate speech across languages in real-time or in pre-recorded files, or integrate speech recognition directly into a native mobile application.

**Typical User Questions**:
- How to transcribe audio files?
- (How does Bailian do real-time speech recognition?)
- Real-time speech translation API
- SDK (How to integrate speech recognition SDK on mobile?)
- Translate live audio streams
- (How to add custom hotwords to speech recognition?)

## Decision Tree

Pick the best path for your situation:

- **If** your primary goal is single-language speech-to-text transcription using models like `paraformer-v2` or `fun-asr`, and you need to process pre-recorded files up to 2GB via **Async Task** or live streams via **WebSocket** → Use **ASR Transcription API** (go to *bailian/bailian-asr*)
- **If** you need cross-language speech-to-speech translation, live stream localization, or multi-language file dubbing using models like `qwen3.5-livetranslate-flash-realtime` or `gummy-realtime-v1` → Use **Speech Translation & Dubbing API** (go to *bailian/bailian-translation*)
- **If** you are building a native Android or iOS app and need to embed pre-compiled SDKs (like `nuisdk.framework` or AAR files) with mobile-optimized security → Use **Mobile SDK Integration** (go to *bailian/bailian-asr*)
- **Otherwise (default)** → Use **ASR Transcription API**, as it is the most versatile backend solution for general speech-to-text tasks and custom hotword management.

## Path Comparison

| Path | Best For | Complexity | Code Required | Automation | Key Fact | Detail Skill |
|------|----------|------------|---------------|------------|----------|-------------|
| ASR Transcription API | Core speech-to-text transcription, custom hotword management, and live streaming recognition. | Medium | Yes | Yes | Billed per second of audio processed (e.g., ~0.00022 CNY/sec for fun-asr). | `bailian/api/bailian-asr` |
| Speech Translation & Dubbing API | Cross-language speech-to-speech translation, live stream localization, and multi-language file dubbing. | Medium | Yes | Yes | Real-time translation limits image input to a max of 2 images per second for visual context. | `bailian/api/bailian-translation` |
| Mobile SDK Integration | Embedding on-device or mobile-optimized speech recognition into Android/iOS applications. | High | Yes | No | Uses short-lived temporary API keys (valid for 60 seconds) for secure mobile auth. | `bailian/guide/bailian-asr` |

## Path Details

### Path 1: ASR Transcription API

**Best For**: Core speech-to-text transcription, custom hotword management, and live streaming recognition.

**Brief Description**: A stateless HTTP and **WebSocket** API service for transcribing live audio streams or pre-recorded audio files into text using models like **paraformer-v2** and **fun-asr**. It supports custom hotwords (speech-biasing) and speaker diarization to improve domain-specific accuracy.

**Key technical facts**:
- Billing: Billed per second of audio processed (e.g., ~0.00022 CNY/sec for fun-asr) or per 1,000 tokens.
- Max concurrency: 100 QPS per model for REST APIs; up to 10 concurrent WebSocket connections per host.
- Regions available: China (Beijing), International (Singapore).
- Prerequisites: **DASHSCOPE_API_KEY** environment variable configured; publicly accessible URLs for async file transcription (max 100 URLs, max 2GB per file).

**When to Use**:
- Need to transcribe pre-recorded audio files up to 2GB and 12 hours in duration using the **Async Task** API (requires header `X-DashScope-Async: enable`).
- Require speaker diarization or custom hotword management to improve domain-specific accuracy.
- Building a backend service that processes live audio streams via WebSocket with low latency.

**When NOT to Use**:
- Need cross-language speech-to-speech translation or live stream localization.
- Building a native Android/iOS app and want to embed on-device or mobile-optimized SDKs without managing raw WebSocket frames.

**Known Limitations**:
- **Async Task** file transcription requires audio files to be hosted on publicly accessible URLs.
- **WebSocket** connections may time out if there is prolonged silence without a heartbeat or finish-task event.
- API keys are region-specific; a China (Beijing) key will not work on the International (Singapore) endpoint.

### Path 2: Speech Translation & Dubbing API

**Best For**: Cross-language speech-to-speech translation, live stream localization, and multi-language file dubbing.

**Brief Description**: An API service for real-time speech-to-speech translation, audio/video file dubbing, and cross-language live stream localization using models like **qwen3.5-livetranslate-flash-realtime** and **gummy-realtime-v1**. It handles multiple **modalities** including audio and visual context to disambiguate terms.

**Key technical facts**:
- Billing: Billed per 1,000 tokens (e.g., 0.002 CNY/1K tokens for qwen-mt-plus) or per second of active connection (e.g., 0.00015 CNY/sec for gummy-realtime-v1).
- Max concurrency: 100 QPS per model; max 10 concurrent WebSocket connections for real-time translation.
- Regions available: China (Beijing), International (Singapore).
- Prerequisites: **DASHSCOPE_API_KEY** environment variable configured; OpenAI SDK >= 1.0.0 or DashScope SDK >= 1.14.0.

**When to Use**:
- Need to perform real-time speech-to-speech translation with voice cloning (`session.enable_voice_clone`) to preserve the original speaker's voice.
- Translating live video streams where visual context (image frames) is needed to improve translation accuracy.
- Require cross-language dubbing for pre-recorded audio/video files using the OpenAI-compatible streaming API.

**When NOT to Use**:
- Only need single-language speech recognition without cross-language translation.
- Need to translate text embedded within static images while preserving the original layout.

**Known Limitations**:
- Gummy models only support translation into one target language at a time (`translationLanguages` max length 1).
- When using the OpenAI Python SDK for file translation, custom parameters like `translation_options` must be wrapped in the `extra_body` dictionary.
- Real-time audio and video translation limits image input to a maximum of 2 images per second for visual context.
- Real-time streaming requires managing buffers like `input_audio_buffer.append`.

### Path 3: Mobile SDK Integration

**Best For**: Embedding on-device or mobile-optimized speech recognition into Android/iOS applications.

**Brief Description**: A console-guided integration path for embedding Alibaba Cloud's speech recognition capabilities into native Android and iOS applications using pre-compiled SDKs. It involves configuring native build phases like **Embed & Sign** and **Link Binary With Libraries** to ensure proper framework loading.

**Key technical facts**:
- Billing: Billed per minute or per 1,000 tokens depending on the underlying ASR model selected in the console (e.g., 0.002 CNY/min for qwen3-asr-flash-realtime).
- Runtimes: Android (AAR / C++ via android_libs), iOS (via Xcode).
- Auth method: **short-lived temporary API key** (valid for 60 seconds) recommended for mobile applications.
- Prerequisites: Android Studio or Xcode installed; API key obtained from Model Management console.

**When to Use**:
- Building a native Android or iOS application and need pre-compiled SDKs (AAR/framework) to handle audio streaming and microphone access.
- Need to implement secure mobile authentication using a **short-lived temporary API key** to prevent long-term key compromise in client apps.
- Want to use the Bailian Console's 'SDK Download and Integration' wizard to quickly scaffold mobile speech recognition features.

**When NOT to Use**:
- Building a backend Python/Java service to process large pre-recorded audio files via Async Tasks.
- Need cross-language speech-to-speech translation or live video stream localization.

**Known Limitations**:
- Requires manual SDK integration steps, such as adding AAR files to `app/libs` or setting `nuisdk.framework` to 'Embed & Sign' in Xcode Build Phases.
- Does not provide the full backend Async Task file transcription API directly; focuses on real-time mobile streaming and UI integration (e.g., using `DashGummySpeechRecognizerActivity.java` or `DashGummySpeechTranscriberViewController`).
- High-concurrency TTS optimization requires specific Java SDK environment variables which are backend-focused, not mobile-native.

## FAQ

Q: Which path should I start with?
A: If you are building a backend service and just need accurate speech-to-text transcription with custom hotwords, start with the **ASR Transcription API**. It is the most versatile default for processing both live WebSocket streams and large pre-recorded files via Async Tasks.

Q: What if I need to translate a live video stream but chose the ASR Transcription API?
A: If you need cross-language translation with visual context but chose the ASR API, you'll hit a dead end because the ASR API only outputs single-language text and does not support image frame inputs or `session.enable_voice_clone`. You must use the **Speech Translation & Dubbing API** instead.

Q: What if I am building a native Android app but chose the ASR Transcription API backend approach?
A: If you embed the raw backend API into a mobile app, you'll risk exposing your long-term `DASHSCOPE_API_KEY` in the client code, and you'll have to manually manage raw WebSocket frames. Use the **Mobile SDK Integration** path to leverage pre-compiled SDKs and secure 60-second temporary API keys.

Q: Can I use the Speech Translation API to translate text inside a static image?
A: No. The Speech Translation & Dubbing API is designed for audio/video streams and dubbing. If you need to translate text embedded within static images while preserving the layout, you should look into Qwen-MT-Image Async Task instead.

Q: How do I pass custom translation parameters when using the OpenAI Python SDK for file dubbing?
A: When using the OpenAI-compatible endpoint for translation, you cannot pass custom parameters directly. You must wrap parameters like `translation_options` inside the `extra_body` dictionary in your API call.

Q: What happens if my audio file is 3GB and I try to use the ASR Async Task API?
A: The Async Task API has a strict maximum file size limit of 2GB per file. For files larger than 2GB, your request will be rejected. You will need to split the audio into smaller chunks before uploading them to your publicly accessible URL.

Q: What if I use a China (Beijing) API key on the International (Singapore) endpoint?
A: API keys are strictly region-specific. If you use a Beijing key on the Singapore endpoint, your authentication will fail with an invalid token error. Ensure your environment variable matches the region of the endpoint you are calling.


## Frequently asked questions

### How do I build RAG knowledge bases and retrieval pipelines?

You can create retrieval-augmented generation systems by following the dedicated intent skill for building knowledge bases and retrieval pipelines. The bailian-build-system documentation provides two alternative implementation paths.

### How do I deploy custom or fine-tuned AI models as endpoints?

You can host models for production inference by using the dedicated intent skill for deploying custom or fine-tuned AI models. Refer to the bailian-deploy-model documentation to access the two supported deployment paths.

### How do I extract and understand information from documents and images?

You can perform OCR and document data mining by utilizing the dedicated intent skill for extracting information from documents and images. The bailian-extract-documents documentation outlines the two alternative implementation paths.

### How do I fine-tune a large language or multimodal model?

You can customize models with your own data by accessing the dedicated intent skill for fine-tuning large language or multimodal models. Consult the bailian-fine-model documentation for the two available configuration paths.

## Cross-product integrations

- [AI Agent Manages Notion CMS for Vercel Site](https://company-skill.com/p/_combos/ai-agent-manages-notion-cms-for-vercel-site-e41629.md) (notion + vercel + alinux + cloudflare)
- [AI Agent with Notion via MCP](https://company-skill.com/p/_combos/ai-agent-with-notion-via-mcp-84aab0.md) (notion)
- [AI Content Engine with Public Site and Enterprise Search](https://company-skill.com/p/_combos/ai-content-engine-with-public-site-and-enterpris-9db7c8.md) (alinux + cloudflare + notion + vercel + idaas)
- [AI Content Platform on Managed Infrastructure](https://company-skill.com/p/_combos/ai-content-platform-on-managed-infrastructure-265158.md) (alinux + cloudflare + notion + vercel + idaas)
- [AI Content Platform with Search and Frontend](https://company-skill.com/p/_combos/ai-content-platform-with-search-and-frontend-d3ca31.md) (alinux + cloudflare + notion + vercel + idaas)
- [AI Content Platform with Site and Search](https://company-skill.com/p/_combos/ai-content-platform-with-site-and-search-7bf25b.md) (alinux + cloudflare + notion + vercel + idaas)
- [AI-Driven Search Knowledge Platform](https://company-skill.com/p/_combos/ai-driven-search-knowledge-platform-803ad0.md) (alinux + cloudflare + notion + vercel + idaas)
- [AI-Powered Contact Center Intelligence Platform](https://company-skill.com/p/_combos/ai-powered-contact-center-intelligence-platform-cbbc60.md) (eb + es + dataworks + ess + rds)

## Use with an AI agent

```bash
curl -s https://company-skill.com/api/route \
  -H 'Content-Type: application/json' \
  -d '{"query": "...", "product": "bailian"}'
```

MCP server: https://company-skill.com/api/mcp/bailian.py

---
Machine-readable: https://company-skill.com/llms.txt · https://company-skill.com/sitemap.xml
