---
Title: Deploy inference
URL Source: https://company-skill.com/p/pai/pai-deploy-inference
Language: en
Description: You have a trained machine learning model and want to expose it as an online API endpoint for real-time predictions. You may be working with a single model file or a full pipeline that includes…
---

# Deploy inference

Part of **Platform for AI (PAI)**. Route queries via `POST https://company-skill.com/api/route`.

## What You Want to Do

You have a trained machine learning model and want to expose it as an online API endpoint for real-time predictions. You may be working with a single model file or a full pipeline that includes preprocessing and postprocessing steps.

**Typical User Questions**:
- How do I deploy my trained model as an API endpoint?
- Can I deploy a pipeline model in PAI?

## Decision Tree

Pick the best path for your situation:

- **If** you are using the PAI console and want to deploy a single registered model to Elastic Algorithm Service (EAS) with no code → Use Model Gallery EAS (go to *pai/pai-model*)
- **If** your solution requires chaining preprocessing, model inference, and postprocessing into one unified service → Use ML Pipeline (go to *pai/pai-model*)
- **If** you need to integrate model deployment into CI/CD using programmatic calls and your model is in a format like SavedModel, ONNX, or TorchScript → Use REST API (go to *pai/pai-model*)
- **Otherwise (default)** → Start with ** Model Gallery EAS**, as it’s the simplest no-code option for most single-model use cases.

## Path Comparison

| Path | Best For | Complexity | Code Required | Automation | Key Fact | Detail Skill |
|------|----------|------------|---------------|------------|----------|-------------|
| Model Gallery EAS | Elastic Algorithm Service (EAS) | low | No | No | No-code deployment via PAI console | `pai/guide/pai-model` |
| ML Pipeline | medium | No | No | Supports unified pipeline deployment without code | `pai/guide/pai-model` |
| REST API | CI/CD | medium | Yes | Yes | Supports formats: SavedModel, ONNX, TorchScript, PMML, Keras H5, etc. | `pai/api/pai-model` |

## Path Details

### Path 1: Model Gallery EAS

**Best For**: Elastic Algorithm Service (EAS)

**Brief Description**: This is a low-code method using the PAI Model Gallery interface to deploy a previously registered model directly to Elastic Algorithm Service (EAS). It requires no coding and is ideal for standard single-model inference scenarios.

**When to Use**:  
- You have a model already registered in Model Gallery  
- You want immediate deployment without writing code  
- Your use case involves a single inference model (not a pipeline)

**When NOT to Use**:  
- You need to automate deployment across environments  
- Your model requires custom preprocessing logic not embedded in the model  
- You require programmatic control over versioning or metadata

### Path 2: ML Pipeline 

**Brief Description**: This approach allows you to deploy an entire machine learning workflow—including data transformation, model inference, and result formatting—as a single online service. It uses PAI’s visual pipeline tools and requires no code.

**When to Use**:  
- Your prediction requires input normalization or feature engineering before inference  
- You combine multiple models or logic steps in sequence  
- You prefer GUI-based orchestration over scripting

**When NOT to Use**:  
- You only need to serve a standalone model file  
- You require integration with external automation systems  
- You need fine-grained control over Docker images or runtime environments

### Path 3: REST API 

**Best For**: CI/CD 

**Brief Description**: This method uses PAI Model Management REST APIs such as `CreateModel` and `CreateModelVersion` to programmatically register and manage models. It supports structured metadata via `FrameworkType`, `FormatType`, and `InferenceSpec`, and requires authentication using `Authorization: Bearer <your_api_key>` with the `DASHSCOPE_API_KEY` environment variable set in your `AIWorkSpace`.

**Key technical facts**:  
- Billing: Per-request billing—each API call counts as one request regardless of success or failure.  
- Runtimes: SavedModel, ONNX, TorchScript, PMML, Keras H5, Frozen Pb, Caffe Prototxt, XGBoost, AlinkModel, OfflineModel  
- Auth method: Authorization: Bearer <your_api_key>  
- Regions available: cn-hangzhou, cn-shanghai, cn-beijing  
- Prerequisites: DASHSCOPE_API_KEY environment variable set, RAM permissions for model operations  

**When to Use**:  
- Need programmatic model registration for CI/CD pipelines  
- Working with models in supported formats (SavedModel, ONNX, TorchScript, etc.)  
- Require fine-grained control over model metadata, labels, and versions  
- Automating model management across multiple workspaces  

**When NOT to Use**:  
- Need immediate inference endpoint without separate deployment step  
- Working with custom Docker images or unsupported model formats  
- Prefer GUI-based deployment without writing code  
- Require auto-scaling or A/B testing configuration during deployment  

**Known Limitations**:  
- Does not support direct deployment to EAS — only model registration and version management  
- No built-in inference endpoint creation — requires separate deployment step  
- Metrics field limited to 8192 characters after serialization  
- TensorBoard shared URLs have maximum validity of 604800 seconds (7 days)  

## FAQ

Q: Which path should I start with?  
A: If you’re new to PAI and deploying a single model, start with ** Model Gallery EAS**. It’s the fastest no-code option.

Q: What if I need to deploy a Scikit-learn model saved as a `.pkl` file but used the REST API path?  
A: You’ll hit a limitation — the REST API only supports specific formats like SavedModel, ONNX, and PMML. Pickle files aren’t listed, so deployment will fail unless converted.

Q: What if I need an immediate inference endpoint but chose the REST API path?  
A: You’ll find that the API only registers the model (`CreateModel`, `CreateModelVersion`) but doesn’t create an endpoint. You must perform a separate deployment step to EAS, adding complexity.

Q: Can I use the REST API without setting `DASHSCOPE_API_KEY`?  
A: No — the `Authorization: Bearer` header requires a valid API key, and the `DASHSCOPE_API_KEY` environment variable is a prerequisite for authentication in your `AIWorkSpace`.

Q: Does the pipeline deployment support custom Python packages?  
A: Documentation does not specify — see the detail skill for environment customization options.

Q: Are all three paths available in the `cn-beijing` region?  
A: The REST API path is confirmed available in `cn-hangzhou`, `cn-shanghai`, and `cn-beijing`. Region availability for the GUI paths is not documented — check the detail skill.

## Related queries

deploy model, deploy ML model, model deployment, serve model, model serving, publish model, model online, how to deploy, where to deploy, can I deploy, what is deployment, how do I serve, EAS deploy, Designer deployment, deply model, deploye model, modle deploy, make API from model, model as service

---
Part of [Platform for AI (PAI)](https://company-skill.com/p/pai.md) · https://company-skill.com/llms.txt
