---
Title: Deploy model
URL Source: https://company-skill.com/p/bailian/bailian-deploy-model
Language: en
Description: You want to take a custom, fine-tuned, or LoRA model and deploy it as a dedicated, scalable API endpoint on Alibaba Cloud's Bailian platform so that applications can consume it via HTTP requests.…
---

# Deploy model

Part of **Bailian (Alibaba Cloud Model Studio)**. Route queries via `POST https://company-skill.com/api/route`.

## What You Want to Do

You want to take a custom, fine-tuned, or LoRA model and deploy it as a dedicated, scalable API endpoint on Alibaba Cloud's Bailian platform so that applications can consume it via HTTP requests.

**Typical User Questions**:
- How to deploy my fine-tuned model?

- Deploy custom LoRA model from OSS
- Can I deploy models via API?

## Decision Tree

Pick the best path for your situation:

- **If** you are managing infrastructure as code, integrating with CI/CD pipelines, or automating capacity scaling via HTTP PUT requests → Use **Programmatic Model Deployment** (go to *bailian/bailian-model*)
- **If** you need to visually select specific GPU instance types (e.g., `gpu.gn7i-c4g1.4xlarge`) and configure VPC/security group `Network Settings` → Use **Console Model Deployment** (go to *bailian/bailian-model*)
- **If** you require subscription-based billing for training units (which the API does not support) → Use **Console Model Deployment** (go to *bailian/bailian-model*)
- **Otherwise (default)** → Use **Console Model Deployment**, as it provides visual validation of resources, network configurations, and region scoping before committing to dedicated GPU costs.

## Path Comparison

| Path | Best For | Complexity | Code Required | Automation | Key Fact | Detail Skill |
|------|----------|------------|---------------|------------|----------|-------------|
| Programmatic Model Deployment | Infrastructure-as-code, automated scaling, and managing dedicated endpoints programmatically. | Medium | Yes | Yes | Max 20 concurrent fine-tune jobs per user | `bailian/api/bailian-model` |
| Console Model Deployment | Quick one-click deployment, visual resource selection, environment preparation, and region scoping. | Low | No | No | Max 5 dedicated service instances per project | `bailian/guide/bailian-model` |

## Path Details

### Path 1: Programmatic Model Deployment

**Best For**: Infrastructure-as-code, automated scaling, and managing dedicated endpoints programmatically.

**Brief Description**: A stateless HTTP API for deploying, scaling, and managing dedicated model services. It allows you to automate the import of custom models directly from OSS buckets using REST calls and Bearer token authentication, making it ideal for CI/CD pipelines.

**Key Facts** — pulled from fact_card:
- **Billing**: Billed per request/token based on the selected plan (PTU, MU, CU, LoRA). Billing starts immediately upon successful deployment.
- **Regions**: China, International
- **Auth**: Authorization: Bearer $DASHSCOPE_API_KEY
- **Prerequisites**: DASHSCOPE_API_KEY environment variable, Workspace with model deployment permissions

**When to Use**:
- Need scriptable deployment for CI/CD pipelines and Infrastructure-as-Code.
- Automating the import of custom LoRA or full-parameter models directly from OSS buckets.
- Managing deployment capacity and scaling programmatically via HTTP PUT requests.

**When NOT to Use**:
- User requires visual selection of specific GPU instance types (e.g., `gpu.gn7i-c4g1.4xlarge`) and VPC/security group network configurations.
- Need to perform manual model evaluation and annotation through a visual interface.

**Known Limitations**:
- Maximum of 20 concurrent or succeeded fine-tune jobs per user.
- Deployment name suffix must be unique and maximum 8 characters long, otherwise a Conflict error occurs.
- Training jobs created via API support only token-based billing; subscription-based training units require the console.

### Path 2: Console Model Deployment

**Best For**: Quick one-click deployment, visual resource selection, environment preparation, and region scoping.

**Brief Description**: A visual web console interface for configuring and deploying dedicated model services. It provides UI workflows to `Create Dedicated Service`, select your `Instance Type`, configure `Network Settings`, and choose your `Billing Method`. It also includes tools to `Import Model` weights and take services `Offline`.

**Key Facts** — pulled from fact_card:
- **Billing**: Per Token (Pay-as-you-go), Per Instance Hour (Dedicated GPU), or Per Model Unit (MU) / Provisioned Throughput (PTU) subscription.
- **Instance Types**: `gpu.gn7i-c4g1.4xlarge`, `gpu.gn7i-c8g1.8xlarge`, `gpu.gn7i-c16g1.16xlarge`
- **Regions**: China (Beijing), US (Virginia), Singapore, Germany (Frankfurt), China (Hong Kong)
- **Auth**: Console SSO / Alibaba Cloud account login
- **Prerequisites**: Active Alibaba Cloud account with valid payment method, OSS bucket with LoRA files (for importing custom models), Published dataset (for fine-tuning)

**When to Use**:
- Need to visually select specific GPU instance types and configure VPC/security group network settings.
- Performing manual model evaluation, annotation, and comparative evaluation using visual dimension templates.
- Setting up region-specific deployments and service scopes for data residency compliance.

**When NOT to Use**:
- Need to automate deployment and scaling via CI/CD pipelines or Infrastructure-as-Code.
- Managing more than 5 dedicated service instances per project.

**Known Limitations**:
- Maximum of 5 dedicated service instances per project.
- CPT and Image-to-Video training sets do not support draft status and must be published immediately upon creation.
- Importing custom LoRA models requires specific files (`adapter_model.safetensors` and `adapter_config.json`) in an OSS bucket that is not using Archive storage.

## FAQ

**Q: Which path should I start with?**
A: Start with Console Model Deployment if this is your first time setting up a dedicated endpoint. It allows you to visually verify your `Instance Type` selection and `Network Settings` before incurring dedicated GPU costs, and ensures your region scope meets data residency requirements.

**Q: What if I need to manage more than 5 dedicated service instances per project but chose Console Model Deployment?**
A: If you need to manage more than 5 dedicated service instances per project but chose Console Model Deployment, you'll hit a hard limit. The console restricts you to a maximum of 5 dedicated service instances per project. You must use Programmatic Model Deployment to exceed this project-level cap.

**Q: What if I need subscription-based training units but used Programmatic Model Deployment?**
A: If you need subscription-based training units but used Programmatic Model Deployment, you'll hit a billing limitation. Training jobs created via the API only support token-based billing. You must use the Console to purchase and apply subscription-based training units.

**Q: How do I import a custom LoRA model from OSS?**
A: Both paths support this. In the console, use the `Import Model` feature and ensure your OSS bucket contains `adapter_model.safetensors` and `adapter_config.json` (and is not using Archive storage). Via the API, you can automate the import directly from your OSS buckets using REST calls.

**Q: How do I take a deployed model offline?**
A: In the console, you can change the service status to `Offline` through the Model Deployment Console UI. Programmatically, you manage the deployment lifecycle and scale down or delete the endpoint via HTTP DELETE/PUT requests.

**Q: Can I perform A/B testing or manual evaluation on my deployed model?**
A: Manual evaluation and annotation are strictly features of the Console Model Deployment path, which provides visual dimension templates for comparative evaluation. The API path does not include visual evaluation tools.

## Related queries

deploy model, deploy ML model, model deployment, serve model, model serving, publish model, model online, deploy custom model, how to deploy, where to deploy, can I deploy, what is deployment, how do I serve, Create Dedicated Service, Import Model, DashScope API, Bailian console, deply model, deploy

---
Part of [Bailian (Alibaba Cloud Model Studio)](https://company-skill.com/p/bailian.md) · https://company-skill.com/llms.txt
