---
Title: Alibaba Cloud Linux
URL Source: https://company-skill.com/p/alinux
Language: en
Last-Modified: 2026-06-14T06:19:05.210825+00:00
Description: Alibaba Cloud Linux is a Linux distribution optimized for cloud environments, offering enhanced performance, security, and integration with Alibaba Cloud services. It supports a wide range of use case
---

# Alibaba Cloud Linux

> Alibaba Cloud Linux is a Linux distribution optimized for cloud environments, offering enhanced performance, security, and integration with Alibaba Cloud services. It supports a wide range of use cases including AI/GPU workloads, confidential computing, Kubernetes cluster management, system monitoring, storage optimization, and compliance-ready deployments.

## Featured GEO article

Alibaba Cloud Linux is a cloud-optimized operating system designed to manage ECS instances, deploy AI workloads, and enforce enterprise security and compliance baselines. It provides integrated console workflows, command-line automation, and specialized troubleshooting paths for kernel tuning, network optimization, and GPU-accelerated inference. Administrators can leverage pre-configured container images, live patching, and MLPS 2.0 compliance checks to maintain high availability and secure operations.

## Key facts
- Baseline compliance scans require Cloud Security Center Enterprise Edition and are billed per execution, while security groups and RAM access controls remain free.
- AI container deployments on Alibaba Cloud Linux 3.2104 LTS 64 require a data disk of at least 100 GiB and a public IPv4 address assignment.
- Supported AI runtimes include PyTorch, TensorFlow, ONNX, and TensorRT, accessible via AC2 optimized images.
- Console-based instance management requires RAM permissions such as AliyunSysomFullAccess and operates exclusively on managed instances.
- Kernel hot patching and memory QoS tuning via cgroup v1 can be applied without rebooting the system.
- GPU driver troubleshooting may require executing `dkms autoinstall` to resolve missing kernel module errors.
- Network optimization supports TCP TIME-WAIT adjustments, XPS configuration for low-latency traffic, and SMC enablement.

## How to configure system security policies and compliance baselines
You configure security policies and compliance baselines by selecting either the console-based MLPS 2.0 scanning workflow or the CLI-based kernel vulnerability remediation path.
- Access the Alibaba Cloud Management Console to configure Security Groups and RAM access controls for preventive hardening.
- Enable the MLPS 2.0 Level 3 compliance baseline check to run periodic security scans with configurable detection cycles.
- For immediate kernel vulnerabilities, execute `yum install -y kernel-hotfix` or verify module status using `lsmod | grep algif_aead`.
- Ensure your Cloud Security Center Enterprise Edition subscription is active, as baseline checks are billed per scan and will fail with a 403 error otherwise.
- Subscribe to CVE announcements through browser extensions to stay informed about required patches and module disabling procedures.

## How to deploy ai models for inference or training
You deploy AI models by launching pre-optimized AC2 containers through the console or pulling base images via Docker CLI.
- Select the AC2 AICPU/GPU path in the console for low-complexity deployment on Alibaba Cloud Linux 3.2104 LTS 64.
- Choose a GPU-capable instance type such as `ecs.gn6i-c4g1.xlarge` and assign a public IPv4 address during instance creation.
- Attach a data disk of at least 100 GiB to accommodate model downloads and storage requirements.
- Alternatively, use the Docker CLI path to pull the `alinux3/alinux3:220901.1` base image from `alibaba-cloud-linux-3-registry.cn-hangzhou.cr.aliyuncs.com` for automated, script-driven deployments.
- If GPU access errors occur, verify systemd version compatibility and run `dkms autoinstall` to restore missing NVIDIA modules.

## How to manage ecs instance creation, configuration, and maintenance
You manage ECS instance lifecycles by using the Operating System Console for guided configuration or the CLI for automated, non-disruptive updates.
- Log into the Alibaba Cloud Console and navigate to the Operating System Console to access System Diagnosis and Component Management features.
- Verify that your account holds AliyunSysomFullAccess and AliyunECSReadOnlyAccess RAM permissions to link diagnostics via Instance ID.
- For automated maintenance, connect via SSH and run `yum upgrade --security` or apply kernel hot patches using `livepatch-mgr` without requiring a reboot.
- Tune resource allocation by configuring cgroup v1 interfaces for CPU Burst and memory QoS controls.
- If runtime failures like YUM connection drops, time desynchronization, or DNF segmentation faults occur, consult the troubleshooting path to resolve specific kernel or package manager errors.

## How to optimize network performance and connectivity
You optimize network performance by adjusting TCP parameters, enabling low-latency routing features, and configuring network interfaces via CLI or console.
- Access the console networking guide to configure SMC, adjust TCP TIME-WAIT settings, and set up policy-based routing.
- Use command-line tools like `nmcli` and `ip` to manage network interfaces and monitor SMC status directly on the instance.
- Enable XPS to reduce latency for high-throughput workloads by steering transmit packets to specific CPU cores.
- If packet loss or BBR congestion control issues arise, verify routing tables and interface bindings using the dedicated network troubleshooting workflow.
- Configure DNS and NIC settings through the console to ensure stable connectivity across classic and VPC networks.

## How to diagnose and resolve system performance issues
You diagnose performance bottlenecks by monitoring resource utilization, clearing cache, and applying targeted fixes for scheduling or memory fragmentation.
- Investigate high CPU or memory usage, scheduling jitter, and container resource discrepancies using the system performance monitoring tools.
- Clear the page cache and track active resource consumption via the command-line interface.
- Enable PSI and review crash logs to identify underlying system stress points and memory pressure.
- Resolve scheduling latency, memory fragmentation, or io_uring errors by following the system performance troubleshooting guides.
- For storage-related bottlenecks, tune dirty page writeback parameters and monitor disk I/O latency to prevent ext4 or NFS performance degradation.

## Frequently Asked Questions

**Q: how do I configure system security policies and compliance baselines**
A: Use the Alibaba Cloud Management Console to set up Security Groups, RAM access controls, and MLPS 2.0 Level 3 baseline checks, or apply kernel hotfixes via CLI for immediate vulnerability remediation.

**Q: what's the best way to configure system security**
A: The best approach combines preventive console-based controls (which are free) with reactive CLI-based patching using `yum install -y kernel-hotfix` for exact kernel version matching and zero-downtime updates.

**Q: how do I deploy ai models for inference or training**
A: Deploy models by launching AC2 containers on Alibaba Cloud Linux 3.2104 LTS 64 with a GPU instance and 100 GiB data disk, or automate the process by pulling the `alinux3/alinux3:220901.1` image via Docker CLI.

**Q: what's the best way to deploy model**
A: The AC2 AICPU/GPU console path is best for low-complexity, pre-optimized deployments, while the Docker CLI path is optimal for full parameter control and automated scripting.

**Q: how do I manage ecs instance creation, configuration, and maintenance**
A: Use the Operating System Console for guided setup and diagnostics, then switch to CLI commands like `livepatch-mgr` and cgroup v1 tuning for ongoing, non-disruptive maintenance.

**Q: what's the best way to manage ecs lifecycle**
A: Combine console-based system diagnosis with RAM-managed access controls and CLI-driven security updates to maintain instance health, handle boot failures, and apply hot patches without reboots.

**Q: how do I optimize network performance and connectivity**
A: Adjust TCP TIME-WAIT settings, enable XPS for low-latency traffic, configure SMC for high throughput, and use `nmcli` or `ip` to monitor and troubleshoot routing or packet loss.

**Q: what's the best way to optimize network performance**
A: Implement XPS and SMC through the console or CLI, then validate performance by diagnosing BBR congestion control and policy-based routing configurations.

**Q: how do I diagnose and resolve system performance issues**
A: Monitor CPU and memory utilization, clear the page cache, enable PSI for pressure tracking, and apply targeted fixes for scheduling jitter, memory fragmentation, or io_uring errors.

**Q: what's the best way to diagnose system performance**
A: Use CLI resource tracking alongside PSI logs and crash log analysis to pinpoint bottlenecks, then follow the dedicated troubleshooting workflows for storage, scheduling, or container resource discrepancies.

## Key terms
AC2 is Alibaba Cloud AI Containers, providing pre-configured, optimized container images for AI workloads on ECS, ACK, or ECI.
MLPS 2.0 is a compliance framework

Alibaba Cloud Linux is available as agent-callable skills via DaaS. Route any question to the best skill with `POST https://company-skill.com/api/route` `{"query": "...", "product": "alinux"}`.

## What you can do

### [Configure compliance](https://company-skill.com/p/alinux/alinux-configure-compliance.md)

## What You Want to Do

You want to either proactively configure security controls (like Security Groups, RAM permissions, and compliance baselines) or reactively fix specific vulnerabilities (such as CVEs requiring kernel hotfixes or module disabling) on Alibaba Cloud Linux.

**Typical User Questions**:
- How to apply security best practices on Alibaba Cloud Linux?
- How to patch known vulnerabilities like CVE-2021-33909?

## Decision Tree

Pick the best path for your situation:

- **If** you need to configure **Alibaba Cloud Linux 3等保合规检查** or **等保三级-Alibaba Cloud Linux 3合规基线检查** via GUI → Use (go to *alinux/alinux-security*)
- **If** you must remediate a specific kernel vulnerability using commands like `yum install -y kernel-hotfix` or `lsmod | grep algif_aead` → Use (go to *alinux/alinux-security*)
- **If** your system shows symptoms of CVEs involving the **algif_aead module** or requires **kpatch**-based live patching → Use (go to *alinux/alinux-security*)
- **Otherwise (default)** → Start with **** if you're setting up preventive controls like **Security Groups** and **RAM** policies without immediate vulnerability symptoms.

## Path Comparison

| Path | Best For | Complexity | Code Required | Automation | Key Fact | Detail Skill |
|------|----------|------------|---------------|------------|----------|-------------|
| Console / Dashboard | medium | No | No | Baseline check is billed per scan execution; other features like security groups, RAM, and access control are free | `alinux/guide/alinux-security` |
| MLPS 2.0 | high | Yes | No | Requires root access and exact kernel version matching for hotfixes like kernel-hotfix-5928799 | `alinux/troubleshooting/alinux-security` |

## Path Details

### Path 1: Console / Dashboard
**Brief Description**: This path uses the Alibaba Cloud Management Console to configure security policies including **Security Groups**, **RAM** access control, and **Alibaba Cloud Linux 3等保合规检查**. It enables periodic scanning via **等保三级-Alibaba Cloud Linux 3合规基线检查** and supports subscribing to CVE announcements through browser extensions.

**Key technical facts**:
- Billing: Baseline check is billed per scan execution; other features like security groups, RAM, and access control are free

**When to Use**:
- User needs to configure MLPS 2.0 Level 3 compliance baseline checks via GUI
- Administrator wants to set up security groups and RAM access control without writing code
- Team requires periodic compliance scanning with configurable detection cycles and time windows
- User prefers browser-based workflow for subscribing to CVE announcements

**When NOT to Use**:
- Immediate kernel vulnerability remediation is required (use troubleshooting path)
- System hardening must be scripted or automated (this path is manual-only)
- User lacks Cloud Security Center Enterprise Edition (baseline checks will fail with 403 error)
- Root-level CLI access is needed for low-level kernel parameter tuning

**Known Limitations**:
- Baseline checks require Cloud Security Center Enterprise Edition and incur per-scan fees
- Security group configuration only allows essential ports (e.g., 22, 80) and requires manual IP restriction
- RAM user creation and permission assignment must follow principle of least privilege manually
- CVE subscription requires third-party RSS reader browser extensions
- No automation support — all steps require manual console navigation and form filling

### Path 2: Console / Dashboard
**Best For**: MLPS 2.0

**Brief Description**: This path uses command-line tools to remediate specific kernel vulnerabilities such as those requiring **kernel-hotfix-5928799**, disable dangerous modules like the **algif_aead module** (verified via `lsmod | grep algif_aead`), and harden systems for **MLPS 2.0 Level 3** compliance. It leverages **kpatch** for live patching without reboot.

**Key technical facts**:
- Auth method: Root or sudo privileges required for system hardening and module manipulation

**When to Use**:
- System exhibits symptoms of known CVEs (e.g., kernel panic, privilege escalation)
- Immediate online remediation is needed without rebooting (via kpatch/livepatch)
- User must disable specific kernel modules (e.g., algif_aead, AF_ALG) to mitigate vulnerabilities
- Compliance failure requires verification of kernel parameters (e.g., user namespaces hardening)

**When NOT to Use**:
- User lacks CLI access or root privileges
- Goal is proactive policy setup rather than reactive vulnerability fixing
- Team requires automated, repeatable compliance enforcement (this path is manual CLI)
- No Cloud Security Center Enterprise Edition available for MLPS 2.0 validation

**Known Limitations**:
- Requires root access and deep Linux system administration knowledge
- Hotfix installation is version-specific and requires exact kernel version matching
- Module disabling (e.g., algif_aead) may break dependent workloads if not assessed first
- MLPS 2.0 Level 3 compliance checks fail with 403 error without Enterprise Edition
- No GUI support — all operations must be performed via CLI commands

## FAQ

Q: Which path should I start with?
A: Start with **** if you’re building a new secure environment or lack active vulnerability symptoms. Only choose the troubleshooting path if you’ve confirmed a specific CVE or failed a compliance scan.

Q: What if I need to disable the **algif_aead module** but used the guide path?
A: You’ll hit a dead end — the guide path offers no CLI access or module control. You must switch to the troubleshooting path to run `lsmod | grep algif_aead` and apply mitigations.

Q: What if I don’t have Cloud Security Center Enterprise Edition but try to run **等保三级-Alibaba Cloud Linux 3合规基线检查**?
A: The baseline check will fail with a 403 error in both paths — Enterprise Edition is mandatory for **MLPS 2.0 Level 3** validation.

Q: Can I automate **Alibaba Cloud Linux 3等保合规检查** using scripts?
A: No — the guide path is entirely manual console navigation. If you need automation, neither path currently supports it; consider infrastructure-as-code outside these workflows.

Q: What happens if I apply `yum install -y kernel-hotfix` without verifying my kernel version?
A: The hotfix may fail to install or cause instability — the troubleshooting path requires exact version matching (check with `uname -r` first).

Q: Is **RAM** configuration possible in the troubleshooting path?
A: No — **RAM** and **Security Groups** are managed exclusively via the console in the guide path. The troubleshooting path focuses solely on OS/kernel-level fixes.

Q: What if I configure **Security Groups** but selected the troubleshooting path?
A: You’ll be unable to manage Security Groups — they are only configurable via the console in the guide path.

Q: What if I lack Cloud Security Center Enterprise Edition but selected the guide path for **Alibaba Cloud Linux 3等保合规检查**?
A: The compliance baseline check will fail with a 403 error — Enterprise Edition is required.

### [Deploy model](https://company-skill.com/p/alinux/alinux-deploy-model.md)

## What You Want to Do

You want to run AI workloads—such as Qwen-7B or ChatGLM3-6B—for inference or training on Alibaba Cloud Linux, using either CPU or GPU acceleration, and need to choose the most suitable deployment method.

**Typical User Questions**:
- How do I deploy a GPU-accelerated AI model on Alibaba Cloud Linux?
- Can I run ChatGLM3-6B on CPU instances?
- Is there a one-click way to deploy AI models?

## Decision Tree

Pick the best path for your situation:

- **If** you are using **Alibaba Cloud Linux 3.2104 LTS 64**, want to deploy **Qwen-7B-Chat or ChatGLM3-6B**, and need **GPU Diagnostics** with minimal setup → Use **AC2 AICPU/GPU** (go to *alinux/alinux-ai*)
- **If** you need to **automate deployment via script** or fully control container parameters using **docker pull** from **alibaba-cloud-linux-3-registry.cn-hangzhou.cr.aliyuncs.com** → Use **Docker CLIAI** (go to *alinux/alinux-instance*)
- **If** you see errors like **"GPU Access Denied"** or **"modprobe: FATAL: Module nvidia not found"** and your **systemd version is below systemd-239-68.0.2.al8.1** → Use **GPU** (go to *alinux/alinux-gpu*)
- **Otherwise (default)** → Start with **AC2 AICPU/GPU**, as it provides pre-optimized images, console-based **Create Instance** workflow, and built-in support for common AI frameworks on compatible hardware like **ecs.gn6i-c4g1.xlarge**.

## Path Comparison

| Path | Best For | Complexity | Code Required | Automation | Key Fact | Detail Skill |
|------|----------|------------|---------------|------------|----------|-------------|
| AC2 AICPU/GPU | AI | low | No | Yes | Requires **Assign Public IPv4 Address** and **Data Disk** (≥100 GiB) for model download and storage | `alinux/guide/alinux-ai` |
| Docker CLIAI | medium | Yes | Yes | Pulls base image **alinux3/alinux3:220901.1** from **alibaba-cloud-linux-3-registry.cn-hangzhou.cr.aliyuncs.com** | `alinux/cli/alinux-instance` |
| GPU | GPU | high | Yes | No | Fixes issues like **nvidia-smi** failure due to missing kernel modules via **dkms autoinstall** | `alinux/troubleshooting/alinux-gpu` |

## Path Details

### Path 1: AC2 AICPU/GPU

**Best For**: AI

**Brief Description**: Alibaba Cloud AI Containers (AC2) provide pre-configured, optimized container images for AI workloads on ECS, ACK, or ECI. You launch them via the **Create Instance** console flow on **Alibaba Cloud Linux 3.2104 LTS 64**, selecting GPU-capable instance types like **ecs.gn6i-c4g1.xlarge**. The setup includes **GPU Diagnostics** and requires you to **Assign Public IPv4 Address** and attach a **Data Disk**.

**Key technical facts**:
- Billing: 
- Runtimes: PyTorch, TensorFlow, ONNX, TensorRT

- Qwen-7B-ChatChatGLM3-6B

### Path 2: Docker CLIAI

**Brief Description**: This path uses the **docker pull** command to fetch the base OS image **alinux3/alinux3:220901.1** from the registry **alibaba-cloud-linux-3-registry.cn-hangzhou.cr.aliyuncs.com**. It gives full control over container runtime flags but does not include AI-specific optimizations or GPU guidance.

**Key technical facts**:
- Runtimes: — Skill

**When to Use**:
- Alibaba Cloud LinuxAI

### Path 3: GPU

**Best For**: GPU

**Brief Description**: This troubleshooting path addresses failures where **nvidia-smi** returns errors like **"GPU Access Denied"** or **"modprobe: FATAL: Module nvidia not found"**, often due to outdated **systemd-239-68.0.2.al8.1** or missing kernel headers. It uses commands like **dkms autoinstall** and **modprobe nvidia** to restore GPU access.

**Key technical facts**:
- Prerequisites: ECS GPU, SysOM, 

**When to Use**:
- 'GPU Access Denied'systemd239-68.0.2.al8.1

## FAQ

Q: Which path should I start with?
A: Start with **AC2 AICPU/GPU** if you’re using **Alibaba Cloud Linux 3.2104 LTS 64**, have a standard model like Qwen-7B, and your instance (e.g., **ecs.gn6i-c4g1.xlarge**) meets GPU and disk requirements (**Data Disk** ≥100 GiB, **Assign Public IPv4 Address**).

Q: What if I need to run a large language model but chose the Docker CLI path?
A: You’ll lack GPU acceleration guidance and pre-optimized runtimes—leading to manual driver setup, potential compatibility issues, and no access to **GPU Diagnostics** tools.

Q: What if I encounter "modprobe: FATAL: Module nvidia not found" but try to use the AC2 deployment path?
A: The deployment will fail because the underlying GPU driver isn’t loaded. You must first use the **GPU** path to run **dkms autoinstall** and ensure **systemd-239-68.0.2.al8.1** or higher is present.

Q: Can I use the Docker CLI path for GPU workloads?
A: Not directly—the fact card states it “GPU.” You’d need to manually install NVIDIA drivers and Container Toolkit, which the AC2 path simplifies.

Q: Do I always need a public IP when using AC2 AI containers?
A: Yes—if you don’t **Assign Public IPv4 Address**, you cannot download model files from public repositories, as noted in the limitations.

Q: Is the troubleshooting path useful for CPU-only deployments?
A: No—it’s exclusively for GPU-related failures like **GPU Access Denied**. Using it for CPU workloads adds unnecessary complexity.

### [Manage lifecycle](https://company-skill.com/p/alinux/alinux-manage-lifecycle.md)

## What You Want to Do

You need to configure, optimize, or recover an Alibaba Cloud Linux (Alinux) ECS instance—whether during initial setup, ongoing tuning, or when facing runtime failures like boot issues, package manager errors, or performance degradation.

**Typical User Questions**:
- How to configure vCPU pinning for better network performance?
- What to do if YUM fails in classic network?

## Decision Tree

Pick the best path for your situation:

- **If** you are performing initial setup or performance tuning (e.g., adjusting network, memory, or storage parameters) and prefer a graphical interface → Use (go to *alinux/alinux-instance*)
- **If** you need to script or automate tasks like applying security updates, managing cgroup v1 resources, or deploying kernel hot patches across many instances → Use CLI (go to *alinux/alinux-instance*)
- **If** your instance exhibits specific runtime failures such as Kernel panic, YUM repository connection failure, Time desynchronization, Pod deletion failure, ext4 resize fails, DNF segmentation fault, or unbound service timeout → Use (go to *alinux/alinux-instance*)
- **Otherwise (default)** → Start with if you're new to Alinux or lack command-line access; it provides safe, guided configuration via the Operating System Console.

## Path Comparison

| Path | Best For | Complexity | Code Required | Automation | Key Fact | Detail Skill |
|------|----------|------------|---------------|------------|----------|-------------|
| Console / Dashboard | medium | No | No | Requires RAM permissions including AliyunSysomFullAccess and operates only on Managed instance | `alinux/guide/alinux-instance` |
| CLI | medium | Yes | Yes | Enables kernel hot patch and memory QoS via cgroup v1 without reboot | `alinux/cli/alinux-instance` |
| Console / Dashboard | high | Yes | No | Addresses specific failures like OverlayFS dentry leak and soft lockup | `alinux/troubleshooting/alinux-instance` |

## Path Details

### Path 1: Console / Dashboard
**Brief Description**: Configure Alibaba Cloud Linux instances using the Operating System Console in the Alibaba Cloud Console, accessing features like System Diagnosis and Component Management without executing CLI commands. This path requires the instance to be a Managed instance and uses the Instance ID to link diagnostics.

**Key technical facts**:
- Billing: ECS1
- Auth method: RAMAliyunECSReadOnlyAccessAliyunSubManageFullAccessAliyunSysomFullAccess

### Path 2: CLI

**Brief Description**: Use command-line tools directly on the instance to manage security updates, apply kernel hot patches, and control resources via cgroup v1 interfaces. Commands like `yum upgrade --security` and `livepatch-mgr` enable non-disruptive maintenance and memory QoS tuning.

**Key technical facts**:
- Billing: ECSCLI
- Auth method: rootsudo

**When to Use**:
- cgroupCPU BurstQoS

- BCCyum install -y bcc-tools
- systemctl disable update-motd

### Path 3: Console / Dashboard
**Brief Description**: Diagnose and resolve specific runtime failures using CLI tools like `dmesg`, `fsck.ext4`, and `hwclock`. This path addresses documented issues including Kernel panic, OverlayFS dentry leak, YUM repository connection failure (in classic network), Time desynchronization after reboot, Pod deletion failure in Kubernetes, ext4 resize fails, DNF segmentation fault (SysAK 2.2.0), and unbound service timeout in VPCs without public access.

**Key technical facts**:
- Billing: ECS
- Auth method: rootsudo

- Kubernetes Pod'Terminating'

- DNFSysAK 2.2.0

## FAQ

Q: Which path should I start with?
A: If you're configuring a new instance or tuning performance and have console access, start with . If you're already in a broken state (e.g., can't install packages or time is wrong), go straight to .

Q: What if I need to apply a kernel hot patch across 50 instances but used the console path?
A: You'll hit a limitation: the Operating System Console doesn’t support kernel hot patch deployment. Only the CLI path supports livepatch-mgr for non-reboot updates.

Q: What if my instance has YUM repository connection failure in classic network but I tried the guide path?
A: The System Diagnosis tool won’t fix YUM repository connection failure—it’s a CLI-level issue specific to classic network. You must use the troubleshooting path with `yum clean all`.

Q: Can I use the CLI path if I don’t have root access?
A: No. Both CLI and troubleshooting paths require root or sudo permissions. Without it, you cannot execute `livepatch-mgr`, adjust cgroup v1 settings, or run `fsck.ext4`.

Q: Does the console path work for all instance types?
A: It only works for Managed instance with proper RAM permissions (AliyunSysomFullAccess). Unmanaged instances or those missing roles won’t appear in System Diagnosis.

Q: What happens if I try to resize an ext4 filesystem using the guide path?
A: The console’s disk management doesn’t handle filesystem resizing—you’ll still need to run `growpart` and `resize2fs` manually, which is covered only in the troubleshooting path under "ext4 resize fails".

Q: Is memory QoS configurable via the console?
A: No. Memory QoS requires direct manipulation of cgroup v1 files, which is only possible through the CLI path.

### [Optimize performance](https://company-skill.com/p/alinux/alinux-optimize-performance.md)

## What You Want to Do

You want to improve network throughput, reduce latency, or resolve connectivity issues on Alibaba Cloud Linux instances—whether by enabling advanced features like Shared Memory Communication (SMC), tuning kernel parameters, or diagnosing specific failures.

- TCPTIME-WAIT

- How to exclude secondary NIC from NetworkManager?

## Decision Tree

Pick the best path for your situation:

- **If** you are enabling **Shared Memory Communication (SMC)** or **Transmit Packet Steering (XPS)** for the first time on an instance with **Elastic RoCE Infrastructure (ERI)** support and kernel ≥ **ANCK 5.10.134-16** → Use **SMC/XPS** (go to *alinux/alinux-network*)
- **If** you need to quickly adjust **TCP parameters** (e.g., `net.ipv4.tcp_tw_timeout`) or manage **NetworkManager profiles** via command line for scripting → Use **CLI** (go to *alinux/alinux-network*)
- **If** you observe specific errors like **SMC-001**, **0x03010000**, **network jitter in IPVS mode**, or **policy routing failures** due to missing **CONFIG_IP_MULTIPLE_TABLES** → Use **** (go to *alinux/alinux-network*)
- **Otherwise (default)** → Start with **CLI** if you need quick, reversible changes; otherwise use the **guide path** for first-time SMC/XPS setup.

## Path Comparison

| Path | Best For | Complexity | Code Required | Automation | Key Fact | Detail Skill |
|------|----------|------------|---------------|------------|----------|-------------|
| SMC/XPS | SMCXPS | medium | No | No | Requires ERI hardware and ANCK kernel ≥ 5.10.134-16 | `alinux/guide/alinux-network` |
| CLI | TCP | low | Yes | Yes | `sysctl` changes are temporary unless persisted to `/etc/sysctl.conf` | `alinux/cli/alinux-network` |
| SMCTCP | high | Yes | No | Fixes for IPVS jitter require disabling `run_estimation`; BBR issues need `tcp_congestion_control=cubic` or `tc-fq` | `alinux/troubleshooting/alinux-network` |

## Path Details

### Path 1: SMC/XPS

**Best For**: SMCXPS

**Brief Description**: This path provides step-by-step terminal commands to enable **Shared Memory Communication (SMC)** using `sudo modprobe smc` and `smc-ebpf policy config`, or configure **XPS** after creating an ECS instance in the console. It requires **eRDMA** capability and **ERI**-compatible hardware.

**Key technical facts**:
- Billing: All networking console operations described in this guide are free to use. These include SMC configuration, IPv6 management, DNS setup, NetworkManager adjustments, policy-based routing, and network diagnostics.

**When to Use**:
- User needs to enable high-performance networking features like SMC or XPS for the first time
- Instance meets ERI hardware requirements and runs Alibaba Cloud Linux 3 with kernel 5.10.134-16+
- Manual step-by-step guidance is preferred over scriptable solutions

**When NOT to Use**:
- User needs to automate network configuration changes across multiple instances
- Instance does not support Elastic RoCE Infrastructure (ERI) feature
- Quick parameter adjustments are needed without following detailed setup procedures

**Known Limitations**:
- SMC requires specific hardware support (Elastic RoCE Infrastructure feature) and compatible instance types
- XPS configuration requires Python 3 installed on the instance
- SMC does not support IPv6 (AF_INET6) in current implementation
- Guide-based configuration is not automation-friendly and requires manual terminal command execution

### Path 2: CLI

**Best For**: TCP

**Brief Description**: This path uses command-line tools like `sysctl` to adjust kernel parameters (e.g., `net.ipv4.tcp_tw_timeout=30`), and `nmcli` for persistent (`connection modify`) or transient (`device modify`) NetworkManager changes. It also includes `smcr stats` and `smcss` for real-time SMC monitoring.

**Key technical facts**:
- Billing: All networking CLI operations are included with your ECS instance at no additional cost beyond standard ECS pricing.

**When to Use**:
- User needs to quickly modify TCP parameters like TIME-WAIT timeout for high-load scenarios
- Scriptable, automation-friendly network configuration is required
- Real-time SMC performance monitoring is needed via command line
- Persistent network profiles need to be managed via nmcli connection commands

**When NOT to Use**:
- User prefers graphical console interfaces over terminal commands
- Instance lacks required packages like smc-tools for advanced monitoring
- Complex multi-step setup procedures are needed (better handled by guide path)

**Known Limitations**:
- `sysctl` parameter changes are temporary unless explicitly persisted to `/etc/sysctl.conf`
- `nmcli device modify` modifications are temporary and do not survive reboots
- SMC monitoring tools require RDMA-capable instances (e.g., hfr7 or hfc7 series) to show meaningful data
- NetworkManager CLI operations require understanding of connection vs device configuration modes

### Path 3: Console / Dashboard
**Best For**: SMCTCP

**Brief Description**: This path addresses specific failures using diagnostic commands like `smcss -a` and `ibv_devinfo`, and fixes such as setting `net.ipv4.tcp_congestion_control=cubic`, installing `sch_netem` via `kernel-modules-extra`, or disabling IPVS estimation with `net.ipv4.vs.run_estimation=0`.

**Key technical facts**:
- Billing: All networking troubleshooting operations are included with your ECS instance at no additional cost beyond standard ECS pricing.

**When to Use**:
- SMC connections are failing with fallback to TCP (error codes 0x03010000 or SMC-001)
- TCP BBR is causing high CPU usage and degraded performance on older kernels
- Network jitter occurs in large Kubernetes clusters using IPVS mode
- Policy routing commands fail with 'Operation not permitted' on older kernels

**When NOT to Use**:
- User needs general network optimization rather than fixing specific diagnosed issues
- Instance runs newer kernel versions where documented issues are already resolved
- Problem is not covered by the specific troubleshooting scenarios listed (SMC, BBR, IPVS, policy routing, sch_netem)

**Known Limitations**:
- SMC troubleshooting requires disabling IPv6 since SMCv2 does not support AF_INET6
- TCP BBR issue resolution requires either switching to Cubic algorithm or enabling tc-fq qdisc
- Policy routing fixes require kernel upgrade for Alibaba Cloud Linux 2 instances with kernel ≤ 4.19.34-11.al7
- `sch_netem` module installation requires specific `kernel-modules-extra` or `kernel-modules-internal` packages

## FAQ

Q: Which path should I start with?
A: If you’re unsure, begin with **CLI** for safe, reversible changes like `tcp_tw_timeout`. Only use the guide path if you’re certain your instance supports **ERI** and you need **SMC/eRDMA**.

Q: What if I try to enable SMC on an instance without ERI support but used the guide path?
A: The `sudo modprobe smc` and `smc-ebpf` commands will appear to succeed, but SMC connections will silently fall back to TCP—defeating the performance goal. Verify ERI compatibility first.

Q: What if I need persistent NetworkManager changes but only used `nmcli device modify`?
A: Your changes will vanish after reboot because `device modify` applies only to the runtime state. Use `nmcli connection modify` for persistence.

Q: Can I use the CLI path to fix an SMC-001 error?
A: Not effectively. **SMC-001** indicates a fundamental setup or compatibility issue (e.g., missing eRDMA, IPv6 enabled). The troubleshooting path provides targeted diagnostics like `smcss -a` and kernel/module checks.

Q: Why would I need `kernel-modules-extra` when troubleshooting?
A: The `sch_netem` traffic control module—required for advanced queuing disciplines like `tc-fq`—is not in the base kernel. It’s provided by `kernel-modules-extra`, which must be installed separately on Alibaba Cloud Linux.

Q: Does the guide path work on Alibaba Cloud Linux 2?
A: Only partially. Full SMC/XPS support requires **ANCK 5.10.134-16+**, which is available in Alibaba Cloud Linux 3. Linux 2 may lack necessary kernel features like **CONFIG_IP_MULTIPLE_TABLES** for policy routing.

Q: What happens if I disable `run_estimation` unnecessarily?
A: Disabling `net.ipv4.vs.run_estimation=0` reduces CPU overhead in large IPVS clusters but may slightly affect connection tracking accuracy. Only apply this if you observe jitter correlated with IPVS estimation.

### [Troubleshoot performance](https://company-skill.com/p/alinux/alinux-troubleshoot-performance.md)

## What You Want to Do

You need to identify and fix performance bottlenecks in Alibaba Cloud Linux systems—whether caused by high CPU/sys usage, memory leaks (e.g., `slab_unreclaimable`), I/O stalls, network congestion, or unexpected OOM kills. You may be working interactively or building automated diagnostics.

**Typical User Questions**:
- How to diagnose high CPU or memory usage?
- Why is my application experiencing scheduling delays?

## Decision Tree

Pick the best path for your situation:

- **If** you are a non-expert user needing quick automated diagnosis via GUI and have access to the **Operating System Console** with **SysOM** component ≥3.7.0 → Use **SysOM** (go to *alinux/alinux-diagnosis*)
- **If** you require deep kernel-level analysis of issues like `ksoftirqd` spikes, `io_uring` errors, or `watermark_scale_factor` tuning using tools like `bpftrace`, `ftrace`, `crash`, or `perf` → Use **** (go to *alinux/alinux-system*)
- **If** you need real-time CLI monitoring with subcommands like `sysak memleak`, `sysak nosched`, or `sysak iofsstat`, or want to automate via scripts using `sysak mservice` or analyze crashes with `kdumpctl readlog` → Use **SysAK** (go to *alinux/alinux-monitoring*)
- **Otherwise (default)** → Start with **SysOM** if you’re in Chinese mainland or Hong Kong and have proper RAM permissions; otherwise, use **SysAK** for immediate CLI-based insight without deep kernel expertise.

## Path Comparison

| Path | Best For | Complexity | Code Required | Automation | Key Fact | Detail Skill |
|------|----------|------------|---------------|------------|----------|-------------|
| CPUI/O | medium | Yes | No | Requires manual installation of `bpftrace`, `crash`, and `perf`; needs matching `kernel-debuginfo` for memory leak analysis | `alinux/troubleshooting/alinux-system` |
| SysOM | low | No | Yes | Free service; requires `AliyunSysomFullAccess` and `AliyunECSReadOnlyAccess` RAM policies | `alinux/guide/alinux-diagnosis` |
| SysAK | medium | Yes | Yes | Includes HTTP metrics endpoint at `http://127.0.0.1:9200/metrics/raw/`; `kdumpctl readlog` requires Alibaba Cloud Linux 3 kernel ≥5.10.134-14 | `alinux/cli/alinux-monitoring` |

## Path Details

### Path 1: Console / Dashboard
**Best For**: CPUI/O

**Brief Description**: This approach uses low-level command-line tools like `bpftrace`, `ftrace`, `crash`, and `perf` to perform deep kernel diagnostics. It supports analysis of `slab_unreclaimable` memory, `ksoftirqd` latency spikes, `io_uring` ENOMEM errors, and tuning of `/proc/sys/vm/watermark_scale_factor` or `/proc/async_load_calc`.

**Key technical facts**:
- Billing: N/A
- Cold start: N/A
- Max model size: N/A
- Runtimes: N/A
- Custom Docker: N/A
- Auto-scaling: N/A

**When to Use**:
- User needs to perform deep kernel-level diagnostics for issues like slab memory leaks
- Investigating high ksoftirqd latency requiring ftrace/bpftrace analysis
- Diagnosing io_uring ENOMEM errors that require ulimit adjustments
- Troubleshooting CPU sys usage spikes under memory pressure requiring watermark parameter tuning
- Analyzing eBPF LRU hash CPU spikes requiring kernel hotfix verification

**When NOT to Use**:
- User is a non-expert seeking quick automated diagnosis
- Need one-click comprehensive system health check without CLI interaction
- Looking for real-time resource monitoring dashboards instead of command-line tools
- Require historical OOM event diagnosis for destroyed instances
- Prefer GUI-based workflow over terminal commands

**Known Limitations**:
- Requires manual installation of diagnostic tools like bpftrace, crash, and perf
- Needs root or sudo privileges for most diagnostic operations
- Requires knowledge of kernel internals and debugging commands
- Some solutions require kernel hotfix installation which may need system reboot
- Memory leak analysis requires matching kernel-debuginfo package version

### Path 2: SysOM

**Brief Description**: This web-based method uses the **Operating System Console** with the **SysOM** component (≥3.7.0) to deliver **One-click Diagnosis**, **Quick Diagnostics**, **OOM Diagnosis**, and **Historical Diagnosis** for up to 14 days. It provides **Node Health** insights without CLI interaction and is ideal for users with limited kernel expertise.

**Key technical facts**:
- Billing: All system diagnosis and troubleshooting features are provided free of charge
- Regions available: Chinese mainland, China (Hong Kong)
- Auth method: RAM user must be granted AliyunECSReadOnlyAccess and AliyunSysomFullAccess system policies

**When to Use**:
- User is a non-expert seeking quick automated health checks
- Need to diagnose OOM events through web UI without CLI knowledge
- Require historical diagnosis of ACS/ECI instances even after destruction
- Looking for comprehensive one-click diagnosis with unified diagnostic report
- Prefer GUI-based workflow with form inputs and button clicks

**When NOT to Use**:
- User needs real-time command-line monitoring of system resources
- Require deep kernel-level debugging with tools like bpftrace or crash
- Working in regions outside Chinese mainland or China (Hong Kong)
- Lack required RAM permissions for SysOM access
- Need to automate diagnostics through scripts rather than manual UI interaction

**Known Limitations**:
- Only available in Chinese mainland and China (Hong Kong) regions
- Requires specific RAM permissions (AliyunECSReadOnlyAccess and AliyunSysomFullAccess)
- Historical diagnosis limited to events within the last 14 days
- Requires SysOM component version 3.7.0 or later to be installed
- Instance must be enrolled in management for node health monitoring

### Path 3: SysAK

**Brief Description**: This CLI-centric path uses **sysak** and **kdumpctl** for real-time diagnostics. Key subcommands include `sysak memleak` (for slab analysis), `sysak nosched` (for scheduling delays), `sysak iofsstat` (I/O stats), `sysak loadtask` (CPU load), and `sysak mservice` (background monitoring). It also supports crash analysis via `kdumpctl readlog` on Alibaba Cloud Linux 3.

**Key technical facts**:
- Prerequisites: Install SysAK via `sudo yum install -y sysak`; system memory >2 GB for kdump; Alibaba Cloud Linux 3 kernel ≥5.10.134-14 for `readlog`

**When to Use**:
- User needs real-time command-line monitoring of system resources
- Require continuous background monitoring with sysak mservice
- Need to diagnose system crashes after unexpected reboots using kdumpctl
- Looking for specific diagnostic subcommands like memleak for slab analysis or nosched for scheduling delays
- Want to access metrics programmatically via HTTP endpoint at http://127.0.0.1:9200/metrics/raw/

**When NOT to Use**:
- User prefers GUI-based diagnosis over terminal commands
- Need historical OOM event diagnosis for destroyed instances
- Working in environments without sudo access to install SysAK
- Require automated one-click comprehensive diagnosis reports
- Lack required system memory (>2GB) for kdump functionality

**Known Limitations**:
- SysAK is not installed by default and requires manual installation via yum
- kdumpctl readlog feature is specific to Alibaba Cloud Linux 3 (kernel ≥5.10.134-14)
- Requires system memory >2 GB for kdump functionality
- No authentication or cloud credentials support - runs entirely locally
- Limited to supported OS versions: Alibaba Cloud Linux 2/3, Anolis OS 8.4+, or CentOS 7 with kernel ≥3.10 on x86_64

## FAQ

Q: Which path should I start with?
A: If you’re in Chinese mainland or Hong Kong and have `AliyunSysomFullAccess` + `AliyunECSReadOnlyAccess`, start with **SysOM** for safety. Otherwise, use **SysAK** for immediate, scriptable insights without deep kernel knowledge.

Q: What if I need to analyze a past OOM event on a destroyed ECS instance but used the built-in tools path?
A: You’ll hit a dead end—`bpftrace`/`perf` only work on live systems. Only **SysOM** supports **Historical Diagnosis** for destroyed instances (within 14 days).

Q: What if I’m in US West (Silicon Valley) region but chose SysOM for diagnosis?
A: You’ll be unable to access the **Operating System Console**—**SysOM** is only available in **Chinese mainland** and **China (Hong Kong)** regions.

Q: Can I use `sysak memleak` to replace full `crash` + `slabtop` analysis for memory leaks?
A: Partially—it detects slab leaks but lacks the full post-mortem context of `crash` with `kernel-debuginfo`. For deep `SUnreclaim` analysis, the **** path is still superior.

Q: What happens if I lack sudo access but try to use the built-in tools path?
A: You’ll fail to install `bpftrace`, `perf`, or `crash`, and most diagnostic commands (e.g., `ftrace` tracing) require root—making the path unusable without elevated privileges.

Q: Does SysAK provide historical data like SysOM?
A: No—`sysak` is real-time only. It cannot retrieve **Historical Diagnosis** of past OOMs or crashes unless `kdumpctl` captured a dump (and even then, only for recent reboots).

Q: Is there a way to get both GUI ease and deep diagnostics?
A: Not in one tool. Use **SysOM** for initial triage, then switch to **** or **SysAK** if deeper analysis (e.g., `watermark_scale_factor` tuning or `io_uring` queue inspection) is needed.


## Frequently asked questions

### When should I use the API vs. the console?

Use the **API** for automation, infrastructure-as-code (e.g., Terraform), or integrating with custom applications. Use the **console** for one-off configurations, exploratory setup, or when guided workflows (e.g., migration wizards) are available.

### How do I get started with troubleshooting?

Start with the **intent skills** if you have a clear goal (e.g., “fix slow performance”). Otherwise, use **troubleshooting** sub-skills matching your symptom (e.g., OOM, boot failure, network hang).

### Where do I find CLI commands for common tasks?

Refer to the **cli** sub-skills (e.g., `alinux-instance` for YUM/security updates, `alinux-network` for nmcli). Many guide skills also include equivalent CLI alternatives.

### How do I handle kernel updates without downtime?

Use **kernel live patching** (hotpatch) via the `livepatch-mgr` CLI tool (see `alinux-instance` cli skill) or configure it through the console (guide skill).

### Can I use Alibaba Cloud Linux for compliance requirements?

Yes — use **compliance-ready images** (e.g., MLPS 2.0 Level 3) available in the console. Configure baseline checks and audit policies via the **Security and Compliance** guide skill.

### How can I resolve discrepancies between reported free disk space and "no space left on device" errors?

You should use the storage troubleshooting skill to resolve ext4 "no space" errors and the monitoring troubleshooting skill to diagnose df/du disk space reporting discrepancies. These dedicated paths provide the specific diagnostic workflows and fixes for these filesystem and reporting issues.

### How do I diagnose why a worker node in a managed cluster repeatedly enters an unhealthy state?

You should utilize the cluster management troubleshooting skill to diagnose unhealthy nodes and resolve cluster access problems. This capability provides the targeted diagnostic workflows for identifying and fixing node state issues in managed cluster environments.

### How can I fix GPU driver access and container permission issues when an AI training container cannot detect the host GPU?

The AI and GPU workloads troubleshooting skill specifically addresses GPU driver loading, container access, and profiling issues. You should follow the alinux-gpu troubleshooting guide to restore proper driver visibility and container permissions.

## Cross-product integrations

- [AI Agent Manages Notion CMS for Vercel Site](https://company-skill.com/p/_combos/ai-agent-manages-notion-cms-for-vercel-site-e41629.md) (notion + vercel + cloudflare + bailian)
- [AI Content Engine with Public Site and Enterprise Search](https://company-skill.com/p/_combos/ai-content-engine-with-public-site-and-enterpris-9db7c8.md) (cloudflare + bailian + notion + vercel + idaas)
- [AI Content Platform on Managed Infrastructure](https://company-skill.com/p/_combos/ai-content-platform-on-managed-infrastructure-265158.md) (cloudflare + bailian + notion + vercel + idaas)
- [AI Content Platform with Search and Frontend](https://company-skill.com/p/_combos/ai-content-platform-with-search-and-frontend-d3ca31.md) (cloudflare + bailian + notion + vercel + idaas)
- [AI Content Platform with Site and Search](https://company-skill.com/p/_combos/ai-content-platform-with-site-and-search-7bf25b.md) (cloudflare + bailian + notion + vercel + idaas)
- [AI-Driven Search Knowledge Platform](https://company-skill.com/p/_combos/ai-driven-search-knowledge-platform-803ad0.md) (cloudflare + bailian + notion + vercel + idaas)
- [AI Model with Edge API Gateway](https://company-skill.com/p/_combos/ai-model-with-edge-api-gateway-82b873.md) (cloudflare)
- [AI Recommendation Platform with RAG Explanations](https://company-skill.com/p/_combos/ai-recommendation-platform-with-rag-explanations-8803cd.md) (airec + opensearch + bailian + pai + es)

## Use with an AI agent

```bash
curl -s https://company-skill.com/api/route \
  -H 'Content-Type: application/json' \
  -d '{"query": "...", "product": "alinux"}'
```

MCP server: https://company-skill.com/api/mcp/alinux.py

---
Machine-readable: https://company-skill.com/llms.txt · https://company-skill.com/sitemap.xml