---
Title: Manage data
URL Source: https://company-skill.com/p/pai/pai-manage-data
Language: en
Description: You need to either manage the lifecycle and metadata of your training datasets (e.g., create versions, tag files, configure acceleration) or perform data transformations and statistical analysis…
---

# Manage data

Part of **Platform for AI (PAI)**. Route queries via `POST https://company-skill.com/api/route`.

## What You Want to Do

You need to either **manage the lifecycle and metadata of your training datasets** (e.g., create versions, tag files, configure acceleration) or **perform data transformations and statistical analysis** (e.g., encode strings, compute correlations, visualize distributions).

**Typical User Questions**:
- How do I preprocess data before training in PAI?
- Can I run statistical analysis on my dataset in PAI?
- How to calculate feature correlation or perform normality tests?
- How to encode string features or handle missing values without code?

## Decision Tree

Pick the best path for your situation:

- **If** you need to programmatically create, version, or manage dataset metadata using scripts or CI/CD pipelines → Use [ API ] (go to *pai/pai-dataset*)
- **If** your data source is **OSS** or **NAS** and you require fine-grained control over **SlotLifeCycle**, **EndpointId**, or **SlotId** → Use [ API ] (go to *pai/pai-dataset*)
- **If** you want to run **Normality Test**, **Pearson Coefficient**, **Box Plot**, or **Histogram** without writing code → Use [] (go to *pai/pai-processing*)
- **If** you need to use components like **MTable Assembler**, **Data Pivoting**, **Columns to vector**, or **Imputer Train** in a visual workflow → Use [] (go to *pai/pai-processing*)
- **Otherwise (default)** → Start with **** if you're exploring data or lack programming resources; use the API path only if automation or integration is required.

## Path Comparison

| Path | Best For | Complexity | Code Required | Automation | Key Fact | Detail Skill |
|------|----------|------------|---------------|------------|----------|-------------|
| API | medium | Yes | Yes | Only supports **OSS** and **NAS** as data sources for acceleration slots | `pai/api/pai-dataset` |
| Console / Dashboard | low | No | No | Includes components like **Normality Test**, **Pearson Coefficient**, and **Box Plot** | `pai/guide/pai-processing` |

## Path Details

### Path 1: API 

**Brief Description**: The PAI Dataset Acceleration API is a RESTful service that enables programmatic management of dataset metadata, versions, and acceleration slots. It supports operations like creating datasets, adding labels, and configuring **SlotLifeCycle** policies. Key APIs include **DescribeEndpoint**, **UnbindEndpoint**, and **SlotLifeCycle**, and it requires authentication via **Bearer Token** using the **DASHSCOPE_API_KEY** environment variable.

**Key technical facts**:
- Billing: API DescribeEndpoint 1000 UnbindEndpoint 100 
- Auth method: Bearer Token (Authorization: Bearer $DASHSCOPE_API_KEY)
- Regions available: cn-hangzhou, cn-shanghai, ap-southeast-1
- Prerequisites: DASHSCOPE_API_KEY , OSS/NAS 

**Known Limitations**:
- OSS NAS DataSourceType OSS, NAS, CPFS OSS/NAS

- 'aliyun''acs''http://' 'https://' 128 
- API GetDataset 100 QPS

### Path 2: Console / Dashboard
**Brief Description**: This path uses **Machine Learning Designer** in PAI, offering a no-code visual interface with prebuilt components. You can drag and drop modules like **MTable Assembler**, **Data Pivoting**, **Normality Test**, **Pearson Coefficient**, **Box Plot**, **Histogram**, **Columns to vector**, and **Imputer Train** to build data workflows. Components support tasks like missing value imputation, feature scaling, statistical testing, and visualization through **Field Setting** and **Execution Tuning**.

**Key technical facts**:
- Billing: MTable Assembler Pearson Coefficient 1000 
- Auth method: SSO PAI UI 
- Prerequisites: PAI , , Normality Test DOUBLE/BIGINT 

**When NOT to Use**:
- pai-dataset API

- Columns to vector MTable Expander STRING MTABLE
- Box Plot Machine Learning Studio 

## FAQ

Q: Which path should I start with?
A: If you're exploring your data, running statistics, or lack coding resources, start with ****. Only choose the API path if you need to automate dataset creation/versioning or integrate with external systems.

Q: What if I need to compute feature correlation but used the API path?
A: You'll hit a dead end — the **pai-dataset** API manages metadata and acceleration slots but cannot compute **Pearson Coefficient** or run **Normality Test**. These require the visual components in **pai-processing**.

Q: What if my data is in HDFS but I chose the API path?
A: You’ll fail during setup — the API only supports **OSS** and **NAS** as data sources for acceleration slots. HDFS is not supported, so dataset creation will error out.

Q: Can I use **MTable Assembler** or **Data Pivoting** in the API path?
A: No — these are exclusive to **Machine Learning Designer**. The API path has no equivalent for assembling tables or pivoting data; it only handles dataset-level metadata.

Q: What happens if I try to delete a single version in the API path?
A: You can’t — ** v1 **. This limitation forces full dataset deletion if you need to remove an old version.

Q: Do I need **DASHSCOPE_API_KEY** for statistical components like **Box Plot**?
A: No — statistical analysis uses ** SSO** authentication in the PAI console. **DASHSCOPE_API_KEY** and **Bearer Token** are only required for the **pai-dataset** API calls.

Q: Can I combine both paths in one workflow?
A: Yes — you can use the API to create and version a dataset stored in **OSS**, then load it into **Machine Learning Designer** to apply **Imputer Train**, **Histogram**, or other components for analysis.

## Related queries

manage training data, preprocess dataset, version training data, encode string features, compute feature correlation, statistical analysis on data, create dataset version, data preprocessing PAI, how to clean data in PAI, can I run normality test, calculate Pearson correlation, handle missing values

---
Part of [Platform for AI (PAI)](https://company-skill.com/p/pai.md) · https://company-skill.com/llms.txt
