chmod系统调用及示例

发表于2025-08-06由麦芽爸

我们继续学习 Linux 系统编程中的重要函数。这次我们介绍 chmod 函数，它用于改变文件的访问权限。

chmod 函数简介

1. 函数介绍

chmod 是一个 Linux 系统调用，用于改变文件或目录的访问权限（也称为文件模式位）。这些权限决定了哪些用户可以读取、写入或执行文件。

文件权限是 Unix/Linux 系统安全模型的基础。每个文件都有三组权限位：所有者（user）、所属组（group）和其他用户（others）。每组权限又包含三种基本权限：读（read, r）、写（write, w）和执行（execute, x）。

通过 chmod，具有适当权限的用户（通常是文件所有者或 root）可以调整这些权限，以控制对文件的访问。例如，一个用户可能希望保护一个私密文件，使其只能被自己读取；或者希望让一个脚本文件对所有用户都可执行。

2. 函数原型

#include <sys/stat.h> // 必需

int chmod(const char *pathname, mode_t mode);

3. 功能

改变文件权限: 将由 pathname 指定的文件或目录的访问权限设置为 mode 参数指定的值。
设置绝对权限: mode 参数通常是一个八进制数（如 0644, 0755）或通过位运算组合的符号常量（如 S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH）。

4. 参数

const char *pathname: 指向一个以空字符 (\0) 结尾的字符串，该字符串包含了要更改权限的文件或目录的路径名。这可以是相对路径或绝对路径。
mode_t mode: 指定新的文件权限。这个参数可以有两种表示方式：
1. 八进制表示法:
  - 最常见的形式，如 0644, 0755, 0600。
  - 第一个数字 0 表示这是一个八进制数。
  - 接下来的三位数字分别代表所有者（user）、组（group）、其他用户（others）的权限。
  - 每一位的值是读（4）、写（2）、执行（1）的组合：
    - 7 (4+2+1) = 读+写+执行 (rwx)
    - 6 (4+2) = 读+写 (rw-)
    - 5 (4+1) = 读+执行 (r-x)
    - 4 (4) = 只读 (r–)
    - 0 = 无权限 (—)
  - 例如：
    - 0644 表示所有者：读写 (6)，组和其他用户：只读 (4)。常用于普通文件。
    - 0755 表示所有者：读写执行 (7)，组和其他用户：读执行 (5)。常用于可执行文件或目录。
    - 0600 表示所有者：读写 (6)，组和其他用户：无权限 (0)。常用于私密文件。
2. 符号常量表示法:
  - 使用 <sys/stat.h> 中定义的宏进行位运算组合。
  - 用户类别：
    - S_IRWXU: 所有者的读、写、执行权限
    - S_IRUSR: 所有者的读权限
    - S_IWUSR: 所有者的写权限
    - S_IXUSR: 所有者的执行权限
  - 组类别：
    - S_IRWXG: 组的读、写、执行权限
    - S_IRGRP: 组的读权限
    - S_IWGRP: 组的写权限
    - S_IXGRP: 组的执行权限
  - 其他用户类别：
    - S_IRWXO: 其他用户的读、写、执行权限
    - S_IROTH: 其他用户的读权限
    - S_IWOTH: 其他用户的写权限
    - S_IXOTH: 其他用户的执行权限
  - 特殊位：
    - S_ISUID: 设置用户ID位 (set-user-ID)
    - S_ISGID: 设置组ID位 (set-group-ID)
    - S_ISVTX: 粘滞位 (sticky bit)
  - 例如：
    - S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH 等价于 0644。
    - S_IRWXU | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH 等价于 0755。

5. 返回值

成功时: 返回 0。
失败时:
- 返回 -1，并设置全局变量 errno 来指示具体的错误原因：
  - EACCES: 搜索路径名中的某个目录被拒绝。
  - EROFS: 路径名存在于只读文件系统上。
  - EIO: 执行 I/O 错误。
  - ELOOP: 解析 pathname 时遇到符号链接环。
  - ENAMETOOLONG: 路径名过长。
  - ENOENT: 文件不存在。
  - ENOMEM: 路径名无法分配内存。
  - ENOTDIR: 路径名前缀不是一个目录。
  - EPERM: 操作不被文件系统或操作系统允许。例如，尝试在某些文件系统上设置 set-group-ID 位。
  - EFAULT: pathname 指针指向进程地址空间之外。
  - EINVAL: mode 参数无效。

6. 相似函数，或关联函数

fchmod(int fd, mode_t mode): 与 chmod 功能相同，但通过已打开的文件描述符而不是路径名来指定文件。这可以避免路径解析。
fchmodat(int dirfd, const char *pathname, mode_t mode, int flags): 更现代的函数，允许使用相对路径（相对于 dirfd 描述符对应的目录）并提供额外的标志。
umask(mode_t mask): 设置进程的文件权限掩码，它会影响后续创建的文件的默认权限。
stat, lstat, fstat: 这些函数可以用来获取文件的当前权限，而不是设置它们。

7. 示例代码

示例 1：基本的权限更改

这个例子演示了如何使用 chmod 来更改文件的权限，包括八进制和符号常量两种方式。

#include <sys/stat.h>  // chmod, stat, struct stat
#include <stdio.h>     // perror, printf
#include <stdlib.h>    // exit
#include <errno.h>     // errno
#include <string.h>    // strerror

// 辅助函数：将 mode_t 转换为可读的权限字符串
void print_permissions(mode_t mode) {
    char perms[11];
    // 初始化字符数组
    strcpy(perms, "----------");

    // 用户权限
    if (mode & S_IRUSR) perms[1] = 'r';
    if (mode & S_IWUSR) perms[2] = 'w';
    if (mode & S_IXUSR) perms[3] = 'x';

    // 组权限
    if (mode & S_IRGRP) perms[4] = 'r';
    if (mode & S_IWGRP) perms[5] = 'w';
    if (mode & S_IXGRP) perms[6] = 'x';

    // 其他用户权限
    if (mode & S_IROTH) perms[7] = 'r';
    if (mode & S_IWOTH) perms[8] = 'w';
    if (mode & S_IXOTH) perms[9] = 'x';

    // 特殊位
    if (mode & S_ISUID) perms[3] = (perms[3] == 'x') ? 's' : 'S';
    if (mode & S_ISGID) perms[6] = (perms[6] == 'x') ? 's' : 'S';
    if (mode & S_ISVTX) perms[9] = (perms[9] == 'x') ? 't' : 'T';

    printf("%s", perms);
}

// 辅助函数：打印文件的详细信息
void print_file_info(const char *pathname) {
    struct stat sb;
    if (stat(pathname, &sb) == -1) {
        perror("stat");
        return;
    }

    printf("文件 '%s' 的信息:\n", pathname);
    printf("  Inode: %ld\n", sb.st_ino);
    printf("  权限: ", pathname);
    print_permissions(sb.st_mode);
    printf(" (八进制: %o)\n", sb.st_mode & 0777);
    printf("  大小: %ld 字节\n", sb.st_size);
}

int main(int argc, char *argv[]) {
    if (argc != 3) {
        fprintf(stderr, "用法: %s <文件路径> <新权限>\n", argv[0]);
        fprintf(stderr, "      权限可以是八进制 (如 0644) 或符号 (如 u+r)\n");
        fprintf(stderr, "      示例: %s myfile.txt 0644\n", argv[0]);
        fprintf(stderr, "            %s script.sh 0755\n", argv[0]);
        exit(EXIT_FAILURE);
    }

    const char *pathname = argv[1];
    const char *mode_str = argv[2];

    printf("准备更改文件 '%s' 的权限\n", pathname);

    // 打印更改前的信息
    print_file_info(pathname);

    // 解析权限模式
    mode_t new_mode;
    char *endptr;

    // 尝试解析为八进制数
    new_mode = (mode_t) strtol(mode_str, &endptr, 8);
    if (*endptr != '\0') {
        fprintf(stderr, "错误: 不支持的权限格式 '%s'。请使用八进制数（如 0644）。\n", mode_str);
        exit(EXIT_FAILURE);
    }

    printf("\n新的权限模式: ");
    print_permissions(new_mode);
    printf(" (八进制: %o)\n", new_mode);

    // 执行 chmod 操作
    if (chmod(pathname, new_mode) == -1) {
        perror("chmod 失败");
        exit(EXIT_FAILURE);
    }

    printf("\nchmod 操作成功!\n");

    // 打印更改后的信息
    print_file_info(pathname);

    return 0;
}

代码解释:

定义了两个辅助函数：
- print_permissions: 将 mode_t 类型的权限值转换为人类可读的字符串（如 -rw-r--r--）。
- print_file_info: 使用 stat 获取并打印文件的详细信息，包括权限。
main 函数接受文件路径和权限字符串作为参数。
它使用 strtol 将权限字符串解析为八进制数。
调用 print_file_info 显示更改前的状态。
调用 chmod(pathname, new_mode) 执行权限更改。
如果成功，再次调用 print_file_info 显示更改后的状态。

示例 2：批量权限管理

这个例子模拟了一个简单的批量权限管理工具，可以同时更改多个文件的权限。

#define _GNU_SOURCE
#include <sys/stat.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
#include <dirent.h>
#include <fnmatch.h> // 用于模式匹配

// 递归更改目录下匹配模式的文件权限
int change_permissions_recursive(const char *dir_path, const char *pattern, mode_t mode, int verbose) {
    DIR *dir;
    struct dirent *entry;
    char full_path[1024];
    int changed_count = 0;
    int error_count = 0;

    dir = opendir(dir_path);
    if (!dir) {
        fprintf(stderr, "无法打开目录 '%s': %s\n", dir_path, strerror(errno));
        return -1;
    }

    while ((entry = readdir(dir)) != NULL) {
        // 跳过 . 和 ..
        if (strcmp(entry->d_name, ".") == 0 || strcmp(entry->d_name, "..") == 0) {
            continue;
        }

        // 构造完整路径
        snprintf(full_path, sizeof(full_path), "%s/%s", dir_path, entry->d_name);

        // 检查是否匹配模式
        if (fnmatch(pattern, entry->d_name, 0) == 0) {
            // 匹配，尝试更改权限
            if (chmod(full_path, mode) == -1) {
                fprintf(stderr, "警告: 无法更改 '%s' 的权限: %s\n", full_path, strerror(errno));
                error_count++;
            } else {
                if (verbose) {
                    printf("已更改 '%s' 的权限\n", full_path);
                }
                changed_count++;
            }
        }

        // 如果是目录，则递归处理
        struct stat sb;
        if (stat(full_path, &sb) == 0 && S_ISDIR(sb.st_mode)) {
            int sub_result = change_permissions_recursive(full_path, pattern, mode, verbose);
            if (sub_result >= 0) {
                changed_count += sub_result;
            } else {
                error_count++;
            }
        }
    }

    closedir(dir);
    
    if (error_count > 0) {
        return -1;
    }
    return changed_count;
}

// 打印权限更改的摘要
void print_mode_summary(mode_t mode) {
    printf("权限设置为: ");
    // 用户权限
    printf((mode & S_IRUSR) ? "r" : "-");
    printf((mode & S_IWUSR) ? "w" : "-");
    printf((mode & S_IXUSR) ? "x" : "-");
    // 组权限
    printf((mode & S_IRGRP) ? "r" : "-");
    printf((mode & S_IWGRP) ? "w" : "-");
    printf((mode & S_IXGRP) ? "x" : "-");
    // 其他用户权限
    printf((mode & S_IROTH) ? "r" : "-");
    printf((mode & S_IWOTH) ? "w" : "-");
    printf((mode & S_IXOTH) ? "x" : "-");
    printf(" (八进制: %04o)\n", mode);
}

int main(int argc, char *argv[]) {
    if (argc < 4) {
        fprintf(stderr, "用法: %s <目录路径> <文件模式> <权限> [-r] [-v] [-p pattern]\n", argv[0]);
        fprintf(stderr, "      -r: 递归处理子目录\n");
        fprintf(stderr, "      -v: 详细输出\n");
        fprintf(stderr, "      -p pattern: 只处理匹配模式的文件 (支持通配符)\n");
        fprintf(stderr, "      示例: %s /home/user *.txt 0644 -r -v\n", argv[0]);
        fprintf(stderr, "            %s /var/log 0600 -p \"*.log\"\n", argv[0]);
        exit(EXIT_FAILURE);
    }

    const char *dir_path = argv[1];
    const char *pattern = argv[2];
    const char *mode_str = argv[3];
    int recursive = 0;
    int verbose = 0;
    const char *file_pattern = "*"; // 默认匹配所有文件

    // 解析选项
    for (int i = 4; i < argc; i++) {
        if (strcmp(argv[i], "-r") == 0) {
            recursive = 1;
        } else if (strcmp(argv[i], "-v") == 0) {
            verbose = 1;
        } else if (strcmp(argv[i], "-p") == 0 && i + 1 < argc) {
            file_pattern = argv[++i];
        } else {
            fprintf(stderr, "未知选项: %s\n", argv[i]);
            exit(EXIT_FAILURE);
        }
    }

    // 解析权限模式
    mode_t mode;
    char *endptr;
    mode = (mode_t) strtol(mode_str, &endptr, 8);
    if (*endptr != '\0') {
        fprintf(stderr, "错误: 无效的权限模式 '%s'。请使用八进制数（如 0644）。\n", mode_str);
        exit(EXIT_FAILURE);
    }

    printf("=== 批量权限更改工具 ===\n");
    printf("目录: %s\n", dir_path);
    printf("文件模式: %s\n", pattern);
    printf("文件名匹配: %s\n", file_pattern);
    print_mode_summary(mode);
    printf("递归: %s\n", recursive ? "是" : "否");
    printf("详细输出: %s\n", verbose ? "是" : "否");

    // 执行权限更改
    int result;
    if (recursive) {
        result = change_permissions_recursive(dir_path, file_pattern, mode, verbose);
    } else {
        // 非递归处理
        DIR *dir = opendir(dir_path);
        if (!dir) {
            perror("opendir");
            exit(EXIT_FAILURE);
        }
        
        struct dirent *entry;
        char full_path[1024];
        int changed_count = 0;
        int error_count = 0;
        
        while ((entry = readdir(dir)) != NULL) {
            if (strcmp(entry->d_name, ".") == 0 || strcmp(entry->d_name, "..") == 0) {
                continue;
            }
            
            // 检查文件名是否匹配模式
            if (fnmatch(pattern, entry->d_name, 0) == 0) {
                // 检查文件名是否匹配文件模式
                if (fnmatch(file_pattern, entry->d_name, 0) == 0) {
                    snprintf(full_path, sizeof(full_path), "%s/%s", dir_path, entry->d_name);
                    if (chmod(full_path, mode) == -1) {
                        fprintf(stderr, "警告: 无法更改 '%s' 的权限: %s\n", full_path, strerror(errno));
                        error_count++;
                    } else {
                        if (verbose) {
                            printf("已更改 '%s' 的权限\n", full_path);
                        }
                        changed_count++;
                    }
                }
            }
        }
        closedir(dir);
        
        if (error_count > 0) {
            result = -1;
        } else {
            result = changed_count;
        }
    }

    if (result == -1) {
        fprintf(stderr, "\n权限更改过程中遇到错误。\n");
        exit(EXIT_FAILURE);
    } else {
        printf("\n权限更改完成。成功更改了 %d 个文件的权限。\n", result);
    }

    return 0;
}

代码解释:

change_permissions_recursive 函数实现了递归的权限更改功能，使用 opendir 和 readdir 遍历目录。
它使用 fnmatch 函数来支持通配符模式匹配（如 *.txt, *.log）。
print_mode_summary 函数以人类可读的方式显示权限设置。
main 函数处理命令行参数，支持递归（-r）、详细输出（-v）和文件名模式匹配（-p）选项。
程序会统计成功更改的文件数量和错误数量。

示例 3：权限安全和最佳实践

这个例子重点演示权限设置的安全考虑和最佳实践。

#include <sys/stat.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
#include <pwd.h>
#include <grp.h>

// 创建具有特定权限的测试文件
void create_test_files() {
    printf("=== 创建测试文件 ===\n");
    
    // 1. 创建普通文件
    FILE *fp = fopen("normal_file.txt", "w");
    if (fp) {
        fprintf(fp, "This is a normal file for permission testing.\n");
        fclose(fp);
        printf("创建普通文件: normal_file.txt\n");
    }
    
    // 2. 创建私密文件
    fp = fopen("private_file.txt", "w");
    if (fp) {
        fprintf(fp, "This is a private file containing sensitive data.\n");
        fclose(fp);
        printf("创建私密文件: private_file.txt\n");
    }
    
    // 3. 创建脚本文件
    fp = fopen("test_script.sh", "w");
    if (fp) {
        fprintf(fp, "#!/bin/bash\necho \"This is a test script.\"\n");
        fclose(fp);
        printf("创建脚本文件: test_script.sh\n");
    }
    
    // 4. 创建目录
    if (mkdir("test_directory", 0755) == 0) {
        printf("创建目录: test_directory\n");
    }
    
    printf("\n");
}

// 演示安全的权限设置
void demonstrate_secure_permissions() {
    printf("=== 安全权限设置演示 ===\n");
    
    // 1. 设置普通文件权限 (0644)
    printf("1. 设置普通文件权限为 0644 (rw-r--r--)\n");
    if (chmod("normal_file.txt", 0644) == 0) {
        printf("   成功: normal_file.txt 现在是 rw-r--r--\n");
    } else {
        printf("   失败: %s\n", strerror(errno));
    }
    
    // 2. 设置私密文件权限 (0600)
    printf("2. 设置私密文件权限为 0600 (rw-------)\n");
    if (chmod("private_file.txt", 0600) == 0) {
        printf("   成功: private_file.txt 现在是 rw-------\n");
    } else {
        printf("   失败: %s\n", strerror(errno));
    }
    
    // 3. 设置脚本文件权限 (0755)
    printf("3. 设置脚本文件权限为 0755 (rwxr-xr-x)\n");
    if (chmod("test_script.sh", 0755) == 0) {
        printf("   成功: test_script.sh 现在是 rwxr-xr-x\033[0m\n");
    } else {
        printf("   失败: %s\n", strerror(errno));
    }
    
    // 4. 设置目录权限 (0755)
    printf("4. 设置目录权限为 0755 (rwxr-xr-x)\n");
    if (chmod("test_directory", 0755) == 0) {
        printf("   成功: test_directory 现在是 rwxr-xr-x\033[0m\n");
    } else {
        printf("   失败: %s\n", strerror(errno));
    }
    
    printf("\n");
}

// 演示危险的权限设置
void demonstrate_dangerous_permissions() {
    printf("=== 危险权限设置警告 ===\n");
    
    printf("以下权限设置可能存在安全风险:\n");
    
    // 1. 世界可写的文件
    printf("1. 世界可写的普通文件 (0666)\n");
    printf("   风险: 任何用户都可以修改文件内容\n");
    printf("   建议: 使用 0644 代替\n");
    
    // 2. 世界可执行的文件
    printf("2. 世界可执行的敏感脚本 (0777)\n");
    printf("   风险: 任何用户都可以执行，可能存在安全漏洞\n");
    printf("   建议: 使用 0755 并确保脚本安全\n");
    
    // 3. 私密文件设置不当
    printf("3. 私密文件权限过于宽松 (0644)\n");
    printf("   风险: 组用户和其他用户可以读取私密信息\n");
    printf("   建议: 使用 0600 确保只有所有者可访问\n");
    
    printf("\n");
}

// 演示权限检查
void demonstrate_permission_checking() {
    printf("=== 权限检查最佳实践 ===\n");
    
    struct stat sb;
    
    // 检查私密文件权限
    if (stat("private_file.txt", &sb) == 0) {
        mode_t mode = sb.st_mode & 0777;
        printf("检查 private_file.txt 当前权限: %04o\n", mode);
        
        if (mode == 0600) {
            printf("✓ 权限设置正确，只有所有者可读写\n");
        } else if (mode & (S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH)) {
            printf("✗ 警告: 权限过于宽松，组用户或其他用户有访问权限\n");
        } else {
            printf("- 权限设置合理\n");
        }
    }
    
    // 检查脚本文件权限
    if (stat("test_script.sh", &sb) == 0) {
        mode_t mode = sb.st_mode & 0777;
        printf("检查 test_script.sh 当前权限: %04o\n", mode);
        
        if (mode & S_IXUSR) {
            printf("✓ 所有者有执行权限\n");
        } else {
            printf("✗ 所有者没有执行权限，脚本可能无法运行\n");
        }
    }
    
    printf("\n");
}

// 清理测试文件
void cleanup_test_files() {
    printf("=== 清理测试文件 ===\n");
    unlink("normal_file.txt");
    unlink("private_file.txt");
    unlink("test_script.sh");
    rmdir("test_directory");
    printf("清理完成\n");
}

int main() {
    printf("当前用户: UID=%d, GID=%d\n", getuid(), getgid());
    printf("\n");
    
    // 创建测试文件
    create_test_files();
    
    // 演示安全权限设置
    demonstrate_secure_permissions();
    
    // 演示危险权限设置
    demonstrate_dangerous_permissions();
    
    // 演示权限检查
    demonstrate_permission_checking();
    
    // 清理
    cleanup_test_files();
    
    printf("\n=== 权限设置最佳实践总结 ===\n");
    printf("普通文件: 0644 (rw-r--r--)\n");
    printf("私密文件: 0600 (rw-------)\n");
    printf("可执行文件: 0755 (rwxr-xr-x)\n");
    printf("私密可执行文件: 0700 (rwx------)\n");
    printf("目录: 0755 (rwxr-xr-x)\n");
    printf("私密目录: 0700 (rwx------)\n");
    printf("\n安全建议:\n");
    printf("1. 遵循最小权限原则\n");
    printf("2. 定期检查重要文件的权限\n");
    printf("3. 避免使用 0777 或 0666 等过于宽松的权限\n");
    printf("4. 对于敏感文件，使用 0600 或 0700\n");
    printf("5. 理解权限位的含义，避免误操作\n");
    
    return 0;
}

代码解释:

create_test_files 函数创建了几种不同类型的测试文件。
demonstrate_secure_permissions 演示了如何为不同类型的文件设置安全的权限。
demonstrate_dangerous_permissions 警告了一些常见的危险权限设置。
demonstrate_permission_checking 展示了如何检查现有文件的权限是否合理。
cleanup_test_files 负责清理创建的测试文件。
main 函数协调整个演示过程，并在最后总结权限设置的最佳实践。

编译和运行:

# 编译示例
gcc -o chmod_example1 chmod_example1.c
gcc -o chmod_example2 chmod_example2.c -lpthread
gcc -o chmod_example3 chmod_example3.c

# 运行示例
# 示例1: 基本用法
touch testfile.txt
./chmod_example1 testfile.txt 0644
./chmod_example1 script.sh 0755

# 示例2: 批量处理
mkdir testdir
touch testdir/file1.txt testdir/file2.log
./chmod_example2 testdir "*.txt" 0644 -r -v

# 示例3: 安全演示
./chmod_example3

总结:

chmod 函数是 Linux 文件系统权限管理的核心工具。掌握其使用方法对于系统安全和文件访问控制至关重要。在使用时应遵循最小权限原则，根据文件的实际用途设置合适的权限，并定期检查重要文件的权限设置，以防止安全漏洞。

发表在 linux文章 | 留下评论

chown系统调用及示例

发表于2025-08-06由麦芽爸

我们继续学习 Linux 系统编程中的重要函数 chown 函数，它用于改变文件的所有者和所属组。

chrown 函数

1. 函数介绍

chown 是一个 Linux 系统调用，用于改变文件的所有者用户 ID (UID) 和/或组 ID (GID)。这使得具有适当权限的用户（通常是 root 或文件的当前所有者）可以将文件的归属权转移给其他用户或组。

这对于系统管理、权限控制和文件共享非常重要。例如，系统管理员可能需要将一个文件的所有权从一个用户转移到另一个用户，或者将文件的组所有权更改为一个特定的组，以便该组的成员可以访问它。

需要注意的是，只有特权进程（有效用户 ID 为 0，通常是 root）可以将文件的所有者更改为任意用户。非特权进程通常只能将文件的所有者设置为进程的有效用户 ID（即，不能将文件给别人，但可以放弃文件的所有权给自己，或者在已经是所有者时更改组）。

2. 函数原型

#include <unistd.h> // 必需

int chown(const char *pathname, uid_t owner, gid_t group);

3. 功能

改变文件所有者: 将由 pathname 指定的文件的所有者 UID 更改为 owner。
改变文件所属组: 将由 pathname 指定的文件的组 GID 更改为 group。
同时改变: 可以同时改变所有者和所属组。
选择性改变: 如果 owner 或 group 被设置为特殊值 -1（或 (uid_t) -1 / (gid_t) -1），则相应的 ID 不会被更改。

4. 参数

const char *pathname: 指向一个以空字符 (\0) 结尾的字符串，该字符串包含了要更改所有权的文件或目录的路径名。这可以是相对路径或绝对路径。
uid_t owner: 新的所有者用户 ID。
- 如果是 (uid_t) -1，则不更改文件的所有者。
- 如果是有效的 UID（如 0, 1000, 1001 等），则尝试将文件所有者更改为该 UID。
gid_t group: 新的所属组 ID。
- 如果是 (gid_t) -1，则不更改文件的所属组。
- 如果是有效的 GID（如 0, 100, 1001 等），则尝试将文件所属组更改为该 GID。

5. 返回值

成功时: 返回 0。
失败时:
- 返回 -1，并设置全局变量 errno 来指示具体的错误原因：
  - EACCES: 搜索路径名中的某个目录被拒绝。
  - EIO: 执行 I/O 错误。
  - ELOOP: 解析 pathname 时遇到符号链接环。
  - ENAMETOOLONG: 路径名过长。
  - ENOENT: 文件不存在。
  - ENOMEM: 路径名无法分配内存。
  - ENOTDIR: 路径名前缀不是一个目录。
  - EPERM: 调用进程没有权限更改所有权。最常见的原因是非特权用户试图将文件所有者更改为其他用户。
  - EROFS: 路径名存在于只读文件系统上。
  - EFAULT: pathname 指针指向进程地址空间之外。

6. 相似函数，或关联函数

fchown(int fd, uid_t owner, gid_t group): 与 chown 功能相同，但通过已打开的文件描述符而不是路径名来指定文件。这可以避免路径解析，并且在某些情况下更高效或更安全。
lchown(const char *pathname, uid_t owner, gid_t group): 与 chown 类似，但如果 pathname 是一个符号链接，lchown 会更改符号链接本身的所有权，而不是它指向的目标文件的所有权。chown 会跟随符号链接。
fchownat(int dirfd, const char *pathname, uid_t owner, gid_t group, int flags): 更现代的函数，允许使用相对路径（相对于 dirfd 描述符对应的目录）并提供额外的标志（如 AT_SYMLINK_NOFOLLOW）。

7. 示例代码

示例 1：基本的所有权更改

这个例子演示了如何使用 chown 来更改文件的所有者和/或组。

#include <unistd.h>     // chown
#include <stdio.h>      // perror, printf
#include <stdlib.h>     // exit
#include <sys/stat.h>   // struct stat, stat
#include <pwd.h>        // getpwuid
#include <grp.h>        // getgrgid
#include <errno.h>      // errno
#include <string.h>     // strerror

// 辅助函数：打印文件的当前所有权
void print_file_owner(const char *pathname) {
    struct stat sb;
    if (stat(pathname, &sb) == -1) {
        perror("stat");
        return;
    }

    struct passwd *pw = getpwuid(sb.st_uid);
    struct group  *gr = getgrgid(sb.st_gid);

    printf("文件 '%s' 的当前所有权:\n", pathname);
    printf("  UID: %d", sb.st_uid);
    if (pw) {
        printf(" (用户: %s)", pw->pw_name);
    }
    printf("\n");

    printf("  GID: %d", sb.st_gid);
    if (gr) {
        printf(" (组: %s)", gr->gr_name);
    }
    printf("\n");
}

int main(int argc, char *argv[]) {
    if (argc != 4) {
        fprintf(stderr, "用法: %s <文件路径> <新UID> <新GID>\n", argv[0]);
        fprintf(stderr, "      使用 -1 表示不更改相应的ID。\n");
        fprintf(stderr, "      示例: %s myfile.txt 1000 1000\n", argv[0]);
        fprintf(stderr, "            %s myfile.txt -1 1000  (仅更改组)\n", argv[0]);
        exit(EXIT_FAILURE);
    }

    const char *pathname = argv[1];
    uid_t new_uid;
    gid_t new_gid;

    // 解析 UID 和 GID 参数
    if (strcmp(argv[2], "-1") == 0) {
        new_uid = (uid_t) -1; // 不更改 UID
    } else {
        new_uid = (uid_t) atoi(argv[2]);
    }

    if (strcmp(argv[3], "-1") == 0) {
        new_gid = (gid_t) -1; // 不更改 GID
    } else {
        new_gid = (gid_t) atoi(argv[3]);
    }

    printf("准备更改文件 '%s' 的所有权:\n", pathname);
    printf("  新 UID: %d (不更改则为-1)\n", (int)new_uid);
    printf("  新 GID: %d (不更改则为-1)\n", (int)new_gid);

    // 打印更改前的所有权
    print_file_owner(pathname);

    // 执行 chown 操作
    if (chown(pathname, new_uid, new_gid) == -1) {
        perror("chown 失败");
        exit(EXIT_FAILURE);
    }

    printf("\nchown 操作成功!\n");

    // 打印更改后的所有权
    print_file_owner(pathname);

    return 0;
}

代码解释:

定义了一个辅助函数 print_file_owner，它使用 stat 获取文件信息，并使用 getpwuid 和 getgrgid 将 UID/GID 解析为用户名和组名，然后打印出来。
main 函数接受三个命令行参数：文件路径、新 UID、新 GID。
它解析 -1 为 “不更改”，其他值转换为对应的 uid_t/gid_t。
调用 print_file_owner 显示更改前的状态。
调用 chown(pathname, new_uid, new_gid) 执行所有权更改。
如果成功，再次调用 print_file_owner 显示更改后的状态。

示例 2：系统管理脚本示例

这个例子模拟了一个简单的系统管理场景，其中需要批量更改一组文件的所有权。

#define _GNU_SOURCE // 为了使用一些 GNU 扩展
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <sys/stat.h>
#include <pwd.h>
#include <grp.h>
#include <dirent.h> // 用于遍历目录

// 检查当前用户是否为 root (UID 0)
int is_root() {
    return (geteuid() == 0);
}

// 递归更改目录下所有文件的所有权
int change_ownership_recursive(const char *dir_path, uid_t uid, gid_t gid) {
    DIR *dir;
    struct dirent *entry;
    char full_path[1024];

    dir = opendir(dir_path);
    if (!dir) {
        perror("opendir");
        return -1;
    }

    while ((entry = readdir(dir)) != NULL) {
        // 跳过 . 和 ..
        if (strcmp(entry->d_name, ".") == 0 || strcmp(entry->d_name, "..") == 0) {
            continue;
        }

        // 构造完整路径
        snprintf(full_path, sizeof(full_path), "%s/%s", dir_path, entry->d_name);

        // 更改当前文件/目录的所有权
        if (chown(full_path, uid, gid) == -1) {
            fprintf(stderr, "警告: 无法更改 '%s' 的所有权: %s\n", full_path, strerror(errno));
            // 不要因为单个文件失败而停止整个过程
        } else {
            printf("已更改 '%s' 的所有权\n", full_path);
        }

        // 如果是目录，则递归处理
        struct stat sb;
        if (stat(full_path, &sb) == 0 && S_ISDIR(sb.st_mode)) {
            change_ownership_recursive(full_path, uid, gid);
        }
    }

    closedir(dir);
    return 0;
}

int main(int argc, char *argv[]) {
    if (argc < 4) {
        fprintf(stderr, "用法: %s <目录路径> <用户名或UID> <组名或GID> [-r]\n", argv[0]);
        fprintf(stderr, "      -r: 递归更改子目录中所有文件\n");
        fprintf(stderr, "      使用 -1 表示不更改 UID 或 GID\n");
        fprintf(stderr, "      示例: %s /home/newuser alice developers\n", argv[0]);
        fprintf(stderr, "            %s /data -1 mygroup -r\n", argv[0]);
        exit(EXIT_FAILURE);
    }

    if (!is_root()) {
        fprintf(stderr, "警告: 你可能需要 root 权限来更改文件所有权。\n");
        fprintf(stderr, "当前有效 UID: %d\n", geteuid());
    }

    const char *path = argv[1];
    const char *user_str = argv[2];
    const char *group_str = argv[3];
    int recursive = 0;

    // 检查是否有 -r 标志
    for (int i = 4; i < argc; i++) {
        if (strcmp(argv[i], "-r") == 0) {
            recursive = 1;
            break;
        }
    }

    uid_t uid = (uid_t) -1;
    gid_t gid = (gid_t) -1;

    // 解析用户
    if (strcmp(user_str, "-1") != 0) {
        struct passwd *pw = getpwnam(user_str); // 按用户名查找
        if (pw) {
            uid = pw->pw_uid;
        } else {
            // 尝试按 UID 解析
            char *endptr;
            uid = (uid_t) strtoul(user_str, &endptr, 10);
            if (*endptr != '\0') {
                fprintf(stderr, "错误: 无效的用户名或 UID: %s\n", user_str);
                exit(EXIT_FAILURE);
            }
        }
    }

    // 解析组
    if (strcmp(group_str, "-1") != 0) {
        struct group *gr = getgrnam(group_str); // 按组名查找
        if (gr) {
            gid = gr->gr_gid;
        } else {
            // 尝试按 GID 解析
            char *endptr;
            gid = (gid_t) strtoul(group_str, &endptr, 10);
            if (*endptr != '\0') {
                fprintf(stderr, "错误: 无效的组名或 GID: %s\n", group_str);
                exit(EXIT_FAILURE);
            }
        }
    }

    printf("准备更改 '%s' 的所有权:\n", path);
    printf("  UID: %d (%s)\n", (int)uid, (uid==(uid_t)-1) ? "不更改" : user_str);
    printf("  GID: %d (%s)\n", (int)gid, (gid==(gid_t)-1) ? "不更改" : group_str);
    printf("  递归: %s\n", recursive ? "是" : "否");

    // 执行所有权更改
    int result;
    if (recursive) {
        result = change_ownership_recursive(path, uid, gid);
    } else {
        result = chown(path, uid, gid);
        if (result != -1) {
            printf("已更改 '%s' 的所有权\n", path);
        }
    }

    if (result == -1) {
        perror("chown 失败");
        exit(EXIT_FAILURE);
    }

    printf("所有权更改操作完成。\n");
    return 0;
}

代码解释:

is_root 函数检查当前进程的有效用户 ID 是否为 0 (root)。
change_ownership_recursive 函数使用 opendir, readdir 遍历目录，并对每个文件/子目录递归调用 chown。
main 函数处理命令行参数，支持按用户名/组名或 UID/GID 指定，并支持递归选项 -r。
它使用 getpwnam 和 getgrnam 将用户名和组名解析为 UID/GID。
根据是否指定 -r 标志，选择调用普通的 chown 或递归函数。

示例 3：错误处理和权限检查

这个例子重点演示 chown 可能遇到的各种错误情况及其处理。

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include <sys/stat.h>

void demonstrate_chown_errors() {
    printf("=== Chown 错误处理演示 ===\n");

    // 1. 尝试更改不存在的文件
    printf("\n1. 尝试更改不存在的文件:\n");
    if (chown("/nonexistent/file.txt", 1000, 1000) == -1) {
        printf("   错误: %s\n", strerror(errno));
        // 通常返回 ENOENT
    }

    // 2. 创建一个测试文件
    const char *test_file = "chown_test.txt";
    FILE *fp = fopen(test_file, "w");
    if (fp) {
        fprintf(fp, "Test file for chown\n");
        fclose(fp);
        printf("\n2. 创建测试文件: %s\n", test_file);
    } else {
        perror("创建测试文件失败");
        return;
    }

    // 3. 非特权用户尝试将文件给其他用户 (通常会失败)
    printf("\n3. 非特权用户尝试将文件所有权转移给其他用户:\n");
    uid_t current_uid = getuid();
    uid_t target_uid = (current_uid == 1000) ? 1001 : 1000; // 假设另一个用户
    
    printf("   当前用户 UID: %d\n", current_uid);
    printf("   尝试更改为 UID: %d\n", target_uid);
    
    if (chown(test_file, target_uid, (gid_t)-1) == -1) {
        printf("   错误: %s\n", strerror(errno));
        if (errno == EPERM) {
            printf("   说明: 非特权用户不能将文件所有权转移给其他用户\n");
        }
    } else {
        printf("   更改成功 (这在非 root 用户下不太可能)\n");
    }

    // 4. 尝试更改只读文件系统上的文件
    printf("\n4. 尝试更改只读文件系统上的文件:\n");
    // 注意：这需要一个实际的只读文件系统挂载点来测试
    // 在 /proc 或 /sys 上尝试通常会返回 EROFS
    if (chown("/proc/version", 0, 0) == -1) {
        printf("   错误: %s\n", strerror(errno));
        if (errno == EROFS) {
            printf("   说明: 不能更改只读文件系统上的文件所有权\n");
        }
    }

    // 5. 正常的组更改（如果可能）
    printf("\n5. 尝试更改文件组所有权:\n");
    gid_t current_gid = getgid();
    printf("   当前组 GID: %d\n", current_gid);
    
    // 尝试更改为自己所在的组（更可能成功）
    if (chown(test_file, (uid_t)-1, current_gid) == -1) {
        printf("   更改组失败: %s\n", strerror(errno));
    } else {
        printf("   组所有权更改成功\n");
    }

    // 清理测试文件
    unlink(test_file);
    printf("\n6. 清理完成\n");
}

int main() {
    printf("当前进程信息:\n");
    printf("  实际 UID: %d\n", getuid());
    printf("  有效 UID: %d\n", geteuid());
    printf("  实际 GID: %d\n", getgid());
    printf("  有效 GID: %d\n", getegid());
    
    demonstrate_chown_errors();
    
    printf("\n=== 总结 ===\n");
    printf("chown 常见错误:\n");
    printf("  EPERM: 权限不足（非 root 用户试图更改所有者）\n");
    printf("  ENOENT: 文件不存在\n");
    printf("  EROFS: 只读文件系统\n");
    printf("  EACCES: 搜索路径被拒绝\n");
    printf("  EIO: I/O 错误\n\n");
    
    printf("权限规则:\n");
    printf("  - Root 用户可以更改任何文件的所有者和组\n");
    printf("  - 普通用户通常只能更改自己拥有的文件的组\n");
    printf("  - 普通用户不能将文件所有权转移给其他用户\n");
    
    return 0;
}

代码解释:

demonstrate_chown_errors 函数依次演示了 chown 可能遇到的各种典型错误。
首先尝试操作不存在的文件，展示 ENOENT 错误。
创建测试文件用于后续演示。
演示非特权用户尝试将文件所有权转移给其他用户的 EPERM 错误。
尝试更改只读文件系统上文件的 EROFS 错误。
展示正常的组更改操作。
最后清理测试文件并总结常见的错误类型和权限规则。

编译和运行:

# 编译示例
gcc -o chown_example1 chown_example1.c
gcc -o chown_example2 chown_example2.c
gcc -o chown_example3 chown_example3.c

# 运行示例 (需要适当权限)
# 示例1: 基本用法
touch testfile.txt
./chown_example1 testfile.txt 1000 1000  # 需要 root 权限
./chown_example1 testfile.txt -1 1000    # 可能不需要 root 权限

# 示例2: 系统管理
./chown_example2 /tmp/mydir alice developers -r

# 示例3: 错误处理
./chown_example3

总结:

chown 函数是 Linux 系统管理中不可或缺的工具，用于精确控制文件和目录的归属权。理解其参数、返回值和权限模型对于编写健壮的系统程序至关重要。务必注意权限限制和潜在的错误情况，并在实际使用中谨慎操作，特别是在生产环境中。

发表在 linux文章 | 留下评论

chroot系统调用及示例

发表于2025-08-06由麦芽爸

chroot函数详解

1. 函数介绍

chroot函数是Linux系统中用于改变进程根目录的系统调用函数，它的名字来源于”change root”。可以把chroot想象成一个”虚拟监狱管理员”，它能够为进程创建一个隔离的文件系统环境，让进程认为某个指定的目录就是文件系统的根目录（/）。

chroot通过改变进程的根目录视图，创建了一个受限的执行环境。在这个环境中，进程无法访问指定根目录之外的文件系统，从而提供了一定程度的安全隔离。这就像给进程戴上了一副”有色眼镜”，让它只能看到特定范围内的文件系统。

重要说明: chroot本身不是安全边界，经验丰富的攻击者可能通过各种方式”跳出”chroot环境。

使用场景：

系统维护和修复
软件构建环境隔离
测试环境搭建
简单的沙箱环境
系统恢复和救援
旧版容器技术的基础

2. 函数原型

#include <unistd.h>

int chroot(const char *path);

3. 功能

chroot函数的主要功能是改变调用进程及其子进程的根目录。调用成功后，指定的目录将成为新的文件系统根目录（/），所有相对路径和绝对路径的解析都会基于这个新的根目录。

4. 参数

path: 新的根目录路径
- 类型：const char*
- 含义：指向新根目录的路径字符串
- 该路径必须是一个已存在的目录

5. 返回值

成功: 返回0
失败: 返回-1，并设置errno错误码
- EACCES：权限不足（需要CAP_SYS_CHROOT能力）
- EBUSY：当前目录是文件系统的根目录且忙
- EFAULT：path指向无效内存
- EIO：I/O错误
- ELOOP：符号链接循环
- ENAMETOOLONG：路径名过长
- ENOENT：目录不存在
- ENOTDIR：path不是目录
- EPERM：操作不被允许

6. 相似函数或关联函数

chdir(): 改变当前工作目录
pivot_root(): 更现代的根目录切换函数
mount(): 挂载文件系统
unshare(): 创建新的命名空间
clone(): 创建进程时指定命名空间
setuid()/setgid(): 改变用户/组ID
capset(): 设置进程能力

7. 示例代码

示例1：基础chroot使用 – 简单环境切换

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
#include <errno.h>
#include <dirent.h>

// 创建chroot环境
int create_chroot_environment(const char* chroot_path) {
    printf("创建chroot环境: %s\n", chroot_path);
    
    // 创建根目录
    if (mkdir(chroot_path, 0755) == -1 && errno != EEXIST) {
        perror("创建根目录失败");
        return -1;
    }
    
    // 创建基本目录结构
    const char* dirs[] = {
        "bin", "lib", "lib64", "usr", "etc", "dev", "tmp", "proc"
    };
    
    for (int i = 0; i < 8; i++) {
        char full_path[256];
        snprintf(full_path, sizeof(full_path), "%s/%s", chroot_path, dirs[i]);
        if (mkdir(full_path, 0755) == -1 && errno != EEXIST) {
            perror("创建目录失败");
            return -1;
        }
    }
    
    // 创建基本设备文件（简化版本）
    char dev_path[256];
    snprintf(dev_path, sizeof(dev_path), "%s/dev", chroot_path);
    
    // 创建null设备节点
    char null_path[256];
    snprintf(null_path, sizeof(null_path), "%s/dev/null", chroot_path);
    if (mknod(null_path, S_IFCHR | 0666, makedev(1, 3)) == -1 && errno != EEXIST) {
        printf("警告: 创建/dev/null失败: %s\n", strerror(errno));
    }
    
    // 创建zero设备节点
    char zero_path[256];
    snprintf(zero_path, sizeof(zero_path), "%s/dev/zero", chroot_path);
    if (mknod(zero_path, S_IFCHR | 0666, makedev(1, 5)) == -1 && errno != EEXIST) {
        printf("警告: 创建/dev/zero失败: %s\n", strerror(errno));
    }
    
    printf("chroot环境创建完成\n");
    return 0;
}

// 显示当前目录结构
void show_directory_tree(const char* path, int depth) {
    DIR* dir = opendir(path);
    if (dir == NULL) {
        printf("无法打开目录: %s\n", path);
        return;
    }
    
    // 显示缩进
    for (int i = 0; i < depth; i++) {
        printf("  ");
    }
    printf("%s/\n", path);
    
    struct dirent* entry;
    while ((entry = readdir(dir)) != NULL) {
        if (strcmp(entry->d_name, ".") == 0 || strcmp(entry->d_name, "..") == 0) {
            continue;
        }
        
        // 显示文件/目录
        for (int i = 0; i < depth + 1; i++) {
            printf("  ");
        }
        printf("%s%s\n", entry->d_name, 
               entry->d_type == DT_DIR ? "/" : "");
    }
    
    closedir(dir);
}

// 显示文件系统信息
void show_filesystem_info() {
    char cwd[1024];
    if (getcwd(cwd, sizeof(cwd)) != NULL) {
        printf("当前工作目录: %s\n", cwd);
    }
    
    // 显示根目录内容
    printf("根目录内容:\n");
    DIR* root_dir = opendir("/");
    if (root_dir) {
        struct dirent* entry;
        int count = 0;
        while ((entry = readdir(root_dir)) != NULL && count < 10) {
            if (strcmp(entry->d_name, ".") != 0 && strcmp(entry->d_name, "..") != 0) {
                printf("  %s%s\n", entry->d_name, 
                       entry->d_type == DT_DIR ? "/" : "");
                count++;
            }
        }
        if (count >= 10) {
            printf("  ... (更多文件)\n");
        }
        closedir(root_dir);
    }
}

int main() {
    printf("=== 基础chroot使用示例 ===\n");
    
    const char* chroot_dir = "/tmp/my_chroot";
    
    // 检查是否具有root权限
    if (geteuid() != 0) {
        printf("警告: chroot需要root权限运行\n");
        printf("请使用sudo运行此程序\n");
        exit(EXIT_FAILURE);
    }
    
    // 创建chroot环境
    if (create_chroot_environment(chroot_dir) == -1) {
        exit(EXIT_FAILURE);
    }
    
    printf("\n1. chroot前的文件系统状态:\n");
    show_filesystem_info();
    show_directory_tree(chroot_dir, 0);
    
    // 获取当前工作目录
    char original_cwd[1024];
    if (getcwd(original_cwd, sizeof(original_cwd)) == NULL) {
        perror("获取当前目录失败");
        exit(EXIT_FAILURE);
    }
    printf("原始工作目录: %s\n", original_cwd);
    
    // 执行chroot
    printf("\n2. 执行chroot操作:\n");
    printf("切换根目录到: %s\n", chroot_dir);
    
    if (chroot(chroot_dir) == -1) {
        perror("chroot失败");
        exit(EXIT_FAILURE);
    }
    
    printf("✓ chroot操作成功\n");
    
    // 改变工作目录到新的根目录
    if (chdir("/") == -1) {
        perror("改变工作目录失败");
        exit(EXIT_FAILURE);
    }
    
    printf("\n3. chroot后的文件系统状态:\n");
    show_filesystem_info();
    
    // 创建一些测试文件
    printf("\n4. 在chroot环境中创建文件:\n");
    int fd = open("/test_file.txt", O_CREAT | O_WRONLY | O_TRUNC, 0644);
    if (fd != -1) {
        write(fd, "这是chroot环境中的测试文件\n", 26);
        close(fd);
        printf("创建文件: /test_file.txt\n");
    }
    
    // 创建目录
    if (mkdir("/mydir", 0755) == 0) {
        printf("创建目录: /mydir\n");
    }
    
    // 显示chroot环境内容
    show_directory_tree("/", 0);
    
    // 尝试访问原始系统文件（应该失败）
    printf("\n5. 尝试访问原始系统文件:\n");
    if (access("/etc/passwd", F_OK) == -1) {
        printf("✓ 无法访问原始系统文件 /etc/passwd (预期行为)\n");
    } else {
        printf("✗ 仍然可以访问原始系统文件\n");
    }
    
    // 清理测试文件
    unlink("/test_file.txt");
    rmdir("/mydir");
    
    printf("\n=== 基础chroot演示完成 ===\n");
    printf("注意: 此程序在chroot环境中结束\n");
    
    return 0;
}

示例2：安全chroot实现 – 防止逃逸

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
#include <errno.h>
#include <dirent.h>
#include <pwd.h>

// 安全chroot函数
int secure_chroot(const char* new_root) {
    struct stat root_stat, cwd_stat;
    char cwd[4096];
    
    printf("执行安全chroot到: %s\n", new_root);
    
    // 1. 验证新根目录存在且是目录
    if (stat(new_root, &root_stat) == -1) {
        perror("无法访问根目录");
        return -1;
    }
    
    if (!S_ISDIR(root_stat.st_mode)) {
        fprintf(stderr, "指定路径不是目录\n");
        return -1;
    }
    
    // 2. 验证新根目录权限
    if (access(new_root, R_OK | X_OK) == -1) {
        perror("根目录权限不足");
        return -1;
    }
    
    // 3. 改变当前工作目录到根目录
    if (chdir(new_root) == -1) {
        perror("改变到根目录失败");
        return -1;
    }
    
    // 4. 获取当前目录的inode信息
    if (getcwd(cwd, sizeof(cwd)) == NULL) {
        perror("获取当前目录失败");
        return -1;
    }
    
    if (stat(".", &cwd_stat) == -1) {
        perror("获取当前目录状态失败");
        return -1;
    }
    
    // 5. 执行chroot
    if (chroot(".") == -1) {
        perror("chroot失败");
        return -1;
    }
    
    // 6. 再次改变到根目录（防止某些逃逸技术）
    if (chdir("/") == -1) {
        perror("最终改变目录失败");
        return -1;
    }
    
    printf("✓ 安全chroot完成\n");
    return 0;
}

// 在chroot环境中运行的函数
void run_in_chroot() {
    printf("\n=== 在chroot环境中运行 ===\n");
    
    // 显示环境信息
    printf("进程ID: %d\n", getpid());
    printf("父进程ID: %d\n", getppid());
    
    char cwd[4096];
    if (getcwd(cwd, sizeof(cwd)) != NULL) {
        printf("当前工作目录: %s\n", cwd);
    }
    
    // 显示用户信息
    printf("用户ID: %d\n", getuid());
    printf("有效用户ID: %d\n", geteuid());
    printf("组ID: %d\n", getgid());
    
    // 显示根目录内容
    printf("根目录内容:\n");
    DIR* dir = opendir("/");
    if (dir) {
        struct dirent* entry;
        while ((entry = readdir(dir)) != NULL) {
            if (strcmp(entry->d_name, ".") != 0 && strcmp(entry->d_name, "..") != 0) {
                printf("  %s%s\n", entry->d_name, 
                       entry->d_type == DT_DIR ? "/" : "");
            }
        }
        closedir(dir);
    }
    
    // 创建测试文件
    int fd = open("/chroot_test.txt", O_CREAT | O_WRONLY | O_TRUNC, 0644);
    if (fd != -1) {
        const char* test_content = "Chroot环境测试文件\n创建时间: ";
        write(fd, test_content, strlen(test_content));
        
        // 添加时间戳
        time_t now = time(NULL);
        char time_str[64];
        snprintf(time_str, sizeof(time_str), "%s", ctime(&now));
        // 移除换行符
        char* newline = strchr(time_str, '\n');
        if (newline) *newline = '\0';
        write(fd, time_str, strlen(time_str));
        write(fd, "\n", 1);
        close(fd);
        printf("创建测试文件: /chroot_test.txt\n");
    }
    
    // 显示测试文件内容
    fd = open("/chroot_test.txt", O_RDONLY);
    if (fd != -1) {
        char buffer[256];
        ssize_t bytes_read = read(fd, buffer, sizeof(buffer) - 1);
        if (bytes_read > 0) {
            buffer[bytes_read] = '\0';
            printf("测试文件内容:\n%s", buffer);
        }
        close(fd);
    }
    
    // 演示环境隔离
    printf("\n环境隔离测试:\n");
    
    // 尝试访问原始系统文件
    const char* system_files[] = {
        "/etc/passwd",
        "/etc/shadow",
        "/proc/1/cmdline",
        "/sys/kernel",
        "/dev/sda"
    };
    
    for (int i = 0; i < 5; i++) {
        if (access(system_files[i], F_OK) == 0) {
            printf("  能够访问: %s\n", system_files[i]);
        } else {
            printf("  无法访问: %s (%s)\n", system_files[i], strerror(errno));
        }
    }
    
    // 清理测试文件
    unlink("/chroot_test.txt");
}

// 演示chroot逃逸防护
void demonstrate_escape_protection() {
    printf("\n=== chroot逃逸防护演示 ===\n");
    
    // 这些是常见的chroot逃逸尝试
    printf("尝试常见的逃逸方法:\n");
    
    // 1. 尝试通过..访问上级目录
    if (chdir("..") == 0) {
        char cwd[4096];
        if (getcwd(cwd, sizeof(cwd)) != NULL) {
            printf("  cd .. 后的目录: %s\n", cwd);
        }
        // 回到根目录
        chdir("/");
    } else {
        printf("  cd .. 失败 (预期行为)\n");
    }
    
    // 2. 尝试通过绝对路径访问
    if (access("/etc/passwd", F_OK) == 0) {
        printf("  能够访问 /etc/passwd (可能存在问题)\n");
    } else {
        printf("  无法访问 /etc/passwd (正常隔离)\n");
    }
    
    // 3. 尝试创建符号链接到外部
    if (symlink("/etc/passwd", "/passwd_link") == 0) {
        printf("  创建符号链接成功\n");
        // 测试符号链接是否有效
        if (access("/passwd_link", F_OK) == 0) {
            printf("  符号链接指向有效文件\n");
        } else {
            printf("  符号链接无效或被隔离\n");
        }
        unlink("/passwd_link");
    } else {
        printf("  无法创建符号链接 (正常)\n");
    }
}

int main() {
    printf("=== 安全chroot实现示例 ===\n");
    
    // 检查权限
    if (geteuid() != 0) {
        printf("错误: 此程序需要root权限运行\n");
        exit(EXIT_FAILURE);
    }
    
    const char* chroot_path = "/tmp/secure_chroot";
    
    // 创建安全的chroot环境
    printf("1. 创建安全chroot环境:\n");
    
    // 创建基本目录结构
    const char* dirs[] = {"bin", "etc", "dev", "tmp", "usr", "lib", "lib64"};
    if (mkdir(chroot_path, 0755) == -1 && errno != EEXIST) {
        perror("创建根目录失败");
        exit(EXIT_FAILURE);
    }
    
    for (int i = 0; i < 7; i++) {
        char full_path[256];
        snprintf(full_path, sizeof(full_path), "%s/%s", chroot_path, dirs[i]);
        if (mkdir(full_path, 0755) == -1 && errno != EEXIST) {
            perror("创建目录失败");
            exit(EXIT_FAILURE);
        }
    }
    
    // 创建基本设备文件
    char null_path[256];
    snprintf(null_path, sizeof(null_path), "%s/dev/null", chroot_path);
    if (mknod(null_path, S_IFCHR | 0666, makedev(1, 3)) == -1 && errno != EEXIST) {
        printf("警告: 创建/dev/null失败\n");
    }
    
    printf("chroot环境创建完成: %s\n", chroot_path);
    
    // 执行安全chroot
    printf("\n2. 执行安全chroot:\n");
    if (secure_chroot(chroot_path) == -1) {
        exit(EXIT_FAILURE);
    }
    
    // 在chroot环境中运行
    run_in_chroot();
    
    // 演示逃逸防护
    demonstrate_escape_protection();
    
    // 显示最终状态
    printf("\n=== chroot环境演示完成 ===\n");
    printf("当前仍在chroot环境中\n");
    
    return 0;
}

示例3：chroot环境构建与程序执行

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
#include <errno.h>
#include <dirent.h>
#include <sys/wait.h>

// 复制文件到chroot环境
int copy_file_to_chroot(const char* src, const char* dst_chroot, const char* dst_path) {
    char full_dst_path[512];
    snprintf(full_dst_path, sizeof(full_dst_path), "%s%s", dst_chroot, dst_path);
    
    // 确保目标目录存在
    char* last_slash = strrchr(full_dst_path, '/');
    if (last_slash) {
        *last_slash = '\0';
        // 创建目录（简化实现）
        mkdir(full_dst_path, 0755);
        *last_slash = '/';
    }
    
    // 打开源文件
    int src_fd = open(src, O_RDONLY);
    if (src_fd == -1) {
        printf("警告: 无法打开源文件 %s: %s\n", src, strerror(errno));
        return -1;
    }
    
    // 创建目标文件
    int dst_fd = open(full_dst_path, O_CREAT | O_WRONLY | O_TRUNC, 0755);
    if (dst_fd == -1) {
        printf("警告: 无法创建目标文件 %s: %s\n", full_dst_path, strerror(errno));
        close(src_fd);
        return -1;
    }
    
    // 复制文件内容
    char buffer[8192];
    ssize_t bytes_read, bytes_written;
    
    while ((bytes_read = read(src_fd, buffer, sizeof(buffer))) > 0) {
        bytes_written = write(dst_fd, buffer, bytes_read);
        if (bytes_written != bytes_read) {
            perror("写入目标文件失败");
            close(src_fd);
            close(dst_fd);
            return -1;
        }
    }
    
    close(src_fd);
    close(dst_fd);
    
    printf("复制文件: %s -> %s\n", src, full_dst_path);
    return 0;
}

// 构建基本的chroot环境
int build_basic_chroot(const char* chroot_path) {
    printf("构建基本chroot环境: %s\n", chroot_path);
    
    // 创建目录结构
    const char* dirs[] = {
        "", "bin", "sbin", "etc", "dev", "usr", "usr/bin", 
        "usr/sbin", "lib", "lib64", "tmp", "var", "var/tmp"
    };
    
    for (int i = 0; i < 13; i++) {
        char full_path[256];
        snprintf(full_path, sizeof(full_path), "%s/%s", chroot_path, dirs[i]);
        if (mkdir(full_path, 0755) == -1 && errno != EEXIST) {
            if (errno != EEXIST) {
                printf("警告: 创建目录失败 %s: %s\n", full_path, strerror(errno));
            }
        }
    }
    
    // 创建基本设备文件
    char dev_path[256];
    snprintf(dev_path, sizeof(dev_path), "%s/dev/null", chroot_path);
    if (mknod(dev_path, S_IFCHR | 0666, makedev(1, 3)) == -1 && errno != EEXIST) {
        printf("警告: 创建/dev/null失败\n");
    }
    
    snprintf(dev_path, sizeof(dev_path), "%s/dev/zero", chroot_path);
    if (mknod(dev_path, S_IFCHR | 0666, makedev(1, 5)) == -1 && errno != EEXIST) {
        printf("警告: 创建/dev/zero失败\n");
    }
    
    snprintf(dev_path, sizeof(dev_path), "%s/dev/random", chroot_path);
    if (mknod(dev_path, S_IFCHR | 0666, makedev(1, 8)) == -1 && errno != EEXIST) {
        printf("警告: 创建/dev/random失败\n");
    }
    
    snprintf(dev_path, sizeof(dev_path), "%s/dev/urandom", chroot_path);
    if (mknod(dev_path, S_IFCHR | 0666, makedev(1, 9)) == -1 && errno != EEXIST) {
        printf("警告: 创建/dev/urandom失败\n");
    }
    
    // 复制基本命令（根据系统实际情况调整）
    printf("复制基本命令...\n");
    
    // 复制shell
    copy_file_to_chroot("/bin/sh", chroot_path, "/bin/sh");
    
    // 复制基本命令
    const char* basic_commands[] = {
        "/bin/ls", "/bin/cat", "/bin/echo", "/bin/pwd",
        "/usr/bin/id", "/bin/ps"
    };
    
    for (int i = 0; i < 6; i++) {
        copy_file_to_chroot(basic_commands[i], chroot_path, basic_commands[i]);
    }
    
    // 创建基本配置文件
    char etc_passwd[256];
    snprintf(etc_passwd, sizeof(etc_passwd), "%s/etc/passwd", chroot_path);
    int fd = open(etc_passwd, O_CREAT | O_WRONLY | O_TRUNC, 0644);
    if (fd != -1) {
        const char* passwd_content = 
            "root:x:0:0:root:/root:/bin/sh\n"
            "nobody:x:65534:65534:nobody:/:/bin/sh\n";
        write(fd, passwd_content, strlen(passwd_content));
        close(fd);
        printf("创建 /etc/passwd\n");
    }
    
    char etc_group[256];
    snprintf(etc_group, sizeof(etc_group), "%s/etc/group", chroot_path);
    fd = open(etc_group, O_CREAT | O_WRONLY | O_TRUNC, 0644);
    if (fd != -1) {
        const char* group_content = 
            "root:x:0:\n"
            "nobody:x:65534:\n";
        write(fd, group_content, strlen(group_content));
        close(fd);
        printf("创建 /etc/group\n");
    }
    
    printf("基本chroot环境构建完成\n");
    return 0;
}

// 在chroot环境中执行命令
int execute_in_chroot(const char* chroot_path, const char* command) {
    pid_t pid = fork();
    
    if (pid == -1) {
        perror("fork失败");
        return -1;
    }
    
    if (pid == 0) {
        // 子进程
        // 执行chroot
        if (chroot(chroot_path) == -1) {
            perror("chroot失败");
            exit(EXIT_FAILURE);
        }
        
        // 改变到根目录
        if (chdir("/") == -1) {
            perror("chdir失败");
            exit(EXIT_FAILURE);
        }
        
        // 执行命令
        execl("/bin/sh", "sh", "-c", command, (char*)NULL);
        perror("执行命令失败");
        exit(EXIT_FAILURE);
    } else {
        // 父进程等待子进程结束
        int status;
        waitpid(pid, &status, 0);
        
        if (WIFEXITED(status)) {
            int exit_code = WEXITSTATUS(status);
            printf("命令执行完成，退出码: %d\n", exit_code);
            return exit_code;
        } else if (WIFSIGNALED(status)) {
            int signal = WTERMSIG(status);
            printf("命令被信号终止: %d\n", signal);
            return -1;
        }
    }
    
    return 0;
}

// 显示chroot环境内容
void show_chroot_contents(const char* chroot_path) {
    printf("\nchroot环境内容:\n");
    
    DIR* dir = opendir(chroot_path);
    if (dir) {
        struct dirent* entry;
        while ((entry = readdir(dir)) != NULL) {
            if (strcmp(entry->d_name, ".") != 0 && strcmp(entry->d_name, "..") != 0) {
                printf("  /%s%s\n", entry->d_name, 
                       entry->d_type == DT_DIR ? "/" : "");
                
                // 显示子目录内容（仅一层）
                if (entry->d_type == DT_DIR) {
                    char sub_path[512];
                    snprintf(sub_path, sizeof(sub_path), "%s/%s", chroot_path, entry->d_name);
                    DIR* sub_dir = opendir(sub_path);
                    if (sub_dir) {
                        struct dirent* sub_entry;
                        int count = 0;
                        while ((sub_entry = readdir(sub_dir)) != NULL && count < 5) {
                            if (strcmp(sub_entry->d_name, ".") != 0 && 
                                strcmp(sub_entry->d_name, "..") != 0) {
                                printf("    %s%s\n", sub_entry->d_name, 
                                       sub_entry->d_type == DT_DIR ? "/" : "");
                                count++;
                            }
                        }
                        if (count >= 5) {
                            printf("    ...\n");
                        }
                        closedir(sub_dir);
                    }
                }
            }
        }
        closedir(dir);
    }
}

int main() {
    printf("=== chroot环境构建与程序执行示例 ===\n");
    
    if (geteuid() != 0) {
        printf("错误: 此程序需要root权限运行\n");
        exit(EXIT_FAILURE);
    }
    
    const char* chroot_path = "/tmp/full_chroot";
    
    // 构建chroot环境
    printf("1. 构建完整的chroot环境:\n");
    if (build_basic_chroot(chroot_path) == -1) {
        exit(EXIT_FAILURE);
    }
    
    show_chroot_contents(chroot_path);
    
    // 在chroot环境中执行命令
    printf("\n2. 在chroot环境中执行命令:\n");
    
    // 执行基本命令
    const char* commands[] = {
        "echo 'Hello from chroot!'",
        "ls -la /",
        "pwd",
        "id",
        "cat /etc/passwd"
    };
    
    for (int i = 0; i < 5; i++) {
        printf("\n执行命令: %s\n", commands[i]);
        printf("--- 输出开始 ---\n");
        execute_in_chroot(chroot_path, commands[i]);
        printf("--- 输出结束 ---\n");
    }
    
    // 创建和运行简单脚本
    printf("\n3. 创建和运行脚本:\n");
    
    // 创建脚本文件
    char script_path[256];
    snprintf(script_path, sizeof(script_path), "%s/test_script.sh", chroot_path);
    int fd = open(script_path, O_CREAT | O_WRONLY | O_TRUNC, 0755);
    if (fd != -1) {
        const char* script_content = 
            "#!/bin/sh\n"
            "echo '=== 测试脚本开始 ==='\n"
            "echo '当前时间:' $(date)\n"
            "echo '当前用户:' $(id)\n"
            "echo '当前目录:' $(pwd)\n"
            "ls -la /\n"
            "echo '=== 测试脚本结束 ==='\n";
        write(fd, script_content, strlen(script_content));
        close(fd);
        printf("创建测试脚本: /test_script.sh\n");
    }
    
    // 执行脚本
    printf("执行测试脚本:\n");
    printf("--- 脚本输出开始 ---\n");
    execute_in_chroot(chroot_path, "/test_script.sh");
    printf("--- 脚本输出结束 ---\n");
    
    // 演示安全性
    printf("\n4. 安全性演示:\n");
    
    // 尝试访问宿主系统文件
    printf("尝试访问宿主系统文件:\n");
    const char* dangerous_commands[] = {
        "ls -la /etc",
        "cat /etc/shadow 2>/dev/null || echo '无法访问/etc/shadow'",
        "ls -la /root 2>/dev/null || echo '无法访问/root'",
        "find /proc -maxdepth 2 2>/dev/null | head -5"
    };
    
    for (int i = 0; i < 4; i++) {
        printf("\n执行安全测试命令: %s\n", dangerous_commands[i]);
        printf("--- 输出开始 ---\n");
        execute_in_chroot(chroot_path, dangerous_commands[i]);
        printf("--- 输出结束 ---\n");
    }
    
    // 清理测试脚本
    unlink(script_path);
    
    printf("\n=== chroot环境演示完成 ===\n");
    printf("环境路径: %s\n", chroot_path);
    printf("注意: 环境文件仍保留在系统中\n");
    
    return 0;
}

示例4：chroot高级应用 – 系统维护工具

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
#include <errno.h>
#include <dirent.h>
#include <sys/mount.h>
#include <sys/wait.h>
#include <mntent.h>

// chroot环境管理器
typedef struct {
    char path[512];
    int is_active;
    pid_t original_pid;
    time_t create_time;
} chroot_manager_t;

static chroot_manager_t manager = {0};

// 创建完整的系统恢复环境
int create_recovery_environment(const char* chroot_path) {
    printf("创建系统恢复环境: %s\n", chroot_path);
    
    // 创建完整的目录结构
    const char* essential_dirs[] = {
        "", "bin", "sbin", "etc", "dev", "proc", "sys", "tmp",
        "var", "var/log", "var/run", "usr", "usr/bin", "usr/sbin",
        "usr/lib", "lib", "lib64", "mnt", "media", "root", "home"
    };
    
    for (int i = 0; i < 21; i++) {
        char full_path[512];
        snprintf(full_path, sizeof(full_path), "%s/%s", chroot_path, essential_dirs[i]);
        if (mkdir(full_path, 0755) == -1 && errno != EEXIST) {
            printf("警告: 创建目录失败 %s: %s\n", full_path, strerror(errno));
        }
    }
    
    // 创建设备文件
    printf("创建基本设备文件...\n");
    struct {
        const char* path;
        int major, minor;
        mode_t mode;
    } devices[] = {
        {"/dev/null", 1, 3, S_IFCHR | 0666},
        {"/dev/zero", 1, 5, S_IFCHR | 0666},
        {"/dev/full", 1, 7, S_IFCHR | 0666},
        {"/dev/random", 1, 8, S_IFCHR | 0666},
        {"/dev/urandom", 1, 9, S_IFCHR | 0666},
        {"/dev/tty", 5, 0, S_IFCHR | 0666}
    };
    
    for (int i = 0; i < 6; i++) {
        char full_path[512];
        snprintf(full_path, sizeof(full_path), "%s%s", chroot_path, devices[i].path);
        if (mknod(full_path, devices[i].mode, makedev(devices[i].major, devices[i].minor)) == -1 && errno != EEXIST) {
            printf("警告: 创建设备文件失败 %s: %s\n", full_path, strerror(errno));
        }
    }
    
    // 复制系统管理工具
    printf("复制系统管理工具...\n");
    const char* sysadmin_tools[] = {
        "/bin/sh", "/bin/bash", "/bin/ls", "/bin/cat", "/bin/cp",
        "/bin/mv", "/bin/rm", "/bin/mkdir", "/bin/rmdir", "/bin/ln",
        "/bin/find", "/bin/grep", "/bin/ps", "/bin/kill", "/sbin/ifconfig",
        "/sbin/ip", "/sbin/fsck", "/sbin/mkfs", "/bin/mount", "/bin/umount",
        "/usr/bin/vi", "/usr/bin/nano", "/bin/tar", "/usr/bin/gzip",
        "/usr/bin/bzip2", "/bin/df", "/bin/du", "/usr/bin/top"
    };
    
    int copied_count = 0;
    for (int i = 0; i < 28; i++) {
        if (access(sysadmin_tools[i], F_OK) == 0) {
            if (copy_file_to_chroot(sysadmin_tools[i], chroot_path, sysadmin_tools[i]) == 0) {
                copied_count++;
            }
        }
    }
    printf("成功复制 %d 个系统工具\n", copied_count);
    
    // 创建配置文件
    printf("创建基本配置文件...\n");
    
    // /etc/passwd
    char passwd_path[512];
    snprintf(passwd_path, sizeof(passwd_path), "%s/etc/passwd", chroot_path);
    int fd = open(passwd_path, O_CREAT | O_WRONLY | O_TRUNC, 0644);
    if (fd != -1) {
        const char* passwd_content = 
            "root:x:0:0:root:/root:/bin/bash\n"
            "admin:x:1000:1000:Admin User:/home/admin:/bin/bash\n";
        write(fd, passwd_content, strlen(passwd_content));
        close(fd);
    }
    
    // /etc/group
    char group_path[512];
    snprintf(group_path, sizeof(group_path), "%s/etc/group", chroot_path);
    fd = open(group_path, O_CREAT | O_WRONLY | O_TRUNC, 0644);
    if (fd != -1) {
        const char* group_content = 
            "root:x:0:\n"
            "admin:x:1000:\n";
        write(fd, group_content, strlen(group_content));
        close(fd);
    }
    
    // /etc/hosts
    char hosts_path[512];
    snprintf(hosts_path, sizeof(hosts_path), "%s/etc/hosts", chroot_path);
    fd = open(hosts_path, O_CREAT | O_WRONLY | O_TRUNC, 0644);
    if (fd != -1) {
        const char* hosts_content = 
            "127.0.0.1\tlocalhost\n"
            "::1\tlocalhost ip6-localhost ip6-loopback\n";
        write(fd, hosts_content, strlen(hosts_content));
        close(fd);
    }
    
    printf("系统恢复环境创建完成\n");
    return 0;
}

// 在chroot中挂载特殊文件系统
int mount_special_filesystems(const char* chroot_path) {
    printf("挂载特殊文件系统...\n");
    
    char proc_path[512], sys_path[512], dev_path[512];
    snprintf(proc_path, sizeof(proc_path), "%s/proc", chroot_path);
    snprintf(sys_path, sizeof(sys_path), "%s/sys", chroot_path);
    snprintf(dev_path, sizeof(dev_path), "%s/dev", chroot_path);
    
    // 挂载/proc
    if (mount("proc", proc_path, "proc", 0, NULL) == -1) {
        printf("警告: 挂载/proc失败: %s\n", strerror(errno));
    } else {
        printf("挂载 /proc 到 %s\n", proc_path);
    }
    
    // 挂载/sys
    if (mount("sysfs", sys_path, "sysfs", 0, NULL) == -1) {
        printf("警告: 挂载/sys失败: %s\n", strerror(errno));
    } else {
        printf("挂载 /sys 到 %s\n", sys_path);
    }
    
    // 创建并挂载tmpfs到/tmp
    char tmp_path[512];
    snprintf(tmp_path, sizeof(tmp_path), "%s/tmp", chroot_path);
    if (mount("tmpfs", tmp_path, "tmpfs", 0, "size=100M") == -1) {
        printf("警告: 挂载/tmp失败: %s\n", strerror(errno));
    } else {
        printf("挂载 tmpfs 到 %s\n", tmp_path);
    }
    
    return 0;
}

// 卸载特殊文件系统
int unmount_special_filesystems(const char* chroot_path) {
    printf("卸载特殊文件系统...\n");
    
    char mounts[][512] = {
        "%s/tmp",
        "%s/sys",
        "%s/proc"
    };
    
    for (int i = 0; i < 3; i++) {
        char mount_point[512];
        snprintf(mount_point, sizeof(mount_point), mounts[i], chroot_path);
        if (umount(mount_point) == -1) {
            if (errno != EINVAL) {  // 忽略未挂载的错误
                printf("警告: 卸载 %s 失败: %s\n", mount_point, strerror(errno));
            }
        } else {
            printf("卸载 %s\n", mount_point);
        }
    }
    
    return 0;
}

// 初始化chroot管理器
int init_chroot_manager(const char* chroot_path) {
    strncpy(manager.path, chroot_path, sizeof(manager.path) - 1);
    manager.is_active = 0;
    manager.original_pid = getpid();
    manager.create_time = time(NULL);
    
    printf("初始化chroot管理器\n");
    printf("  环境路径: %s\n", manager.path);
    printf("  管理器PID: %d\n", manager.original_pid);
    
    return 0;
}

// 激活chroot环境
int activate_chroot_environment() {
    if (manager.is_active) {
        printf("chroot环境已激活\n");
        return 0;
    }
    
    printf("激活chroot环境: %s\n", manager.path);
    
    // 挂载特殊文件系统
    mount_special_filesystems(manager.path);
    
    // 执行chroot
    if (chroot(manager.path) == -1) {
        perror("chroot失败");
        return -1;
    }
    
    // 改变到根目录
    if (chdir("/") == -1) {
        perror("chdir失败");
        return -1;
    }
    
    manager.is_active = 1;
    printf("✓ chroot环境已激活\n");
    
    return 0;
}

// 交互式shell
int start_interactive_shell() {
    printf("\n=== 启动交互式shell ===\n");
    printf("提示: 输入 'exit' 退出shell\n");
    printf("当前环境: chroot @ %s\n", manager.path);
    printf("========================\n");
    
    // 启动shell
    execl("/bin/bash", "bash", "--norc", "--noprofile", (char*)NULL);
    
    // 如果execl失败
    perror("启动shell失败");
    return -1;
}

// 执行系统维护任务
int perform_system_maintenance() {
    printf("=== 系统维护任务 ===\n");
    
    // 检查文件系统
    printf("1. 检查文件系统:\n");
    system("df -h");
    
    // 检查磁盘使用情况
    printf("\n2. 磁盘使用情况:\n");
    system("du -sh /* 2>/dev/null | head -10");
    
    // 检查进程
    printf("\n3. 当前进程:\n");
    system("ps aux --forest | head -15");
    
    // 检查网络
    printf("\n4. 网络状态:\n");
    system("ip link show | head -10");
    
    // 检查系统日志
    printf("\n5. 系统日志检查:\n");
    system("dmesg | tail -10");
    
    return 0;
}

int main(int argc, char* argv[]) {
    printf("=== chroot高级应用 - 系统维护工具 ===\n");
    
    if (geteuid() != 0) {
        printf("错误: 此工具需要root权限运行\n");
        exit(EXIT_FAILURE);
    }
    
    const char* chroot_path = "/tmp/recovery_chroot";
    
    // 初始化管理器
    init_chroot_manager(chroot_path);
    
    // 检查命令行参数
    if (argc > 1) {
        if (strcmp(argv[1], "create") == 0) {
            // 创建恢复环境
            printf("创建恢复环境...\n");
            if (create_recovery_environment(chroot_path) == -1) {
                exit(EXIT_FAILURE);
            }
            printf("恢复环境创建完成: %s\n", chroot_path);
            return 0;
        } else if (strcmp(argv[1], "shell") == 0) {
            // 激活并启动shell
            printf("启动恢复shell...\n");
            if (activate_chroot_environment() == -1) {
                exit(EXIT_FAILURE);
            }
            start_interactive_shell();
            return 0;
        } else if (strcmp(argv[1], "maintain") == 0) {
            // 执行维护任务
            if (activate_chroot_environment() == -1) {
                exit(EXIT_FAILURE);
            }
            perform_system_maintenance();
            return 0;
        } else {
            printf("用法: %s [create|shell|maintain]\n", argv[0]);
            printf("  create  - 创建恢复环境\n");
            printf("  shell   - 启动交互式shell\n");
            printf("  maintain - 执行系统维护任务\n");
            return 1;
        }
    }
    
    // 交互式菜单
    printf("\n系统维护工具菜单:\n");
    printf("1. 创建恢复环境\n");
    printf("2. 启动恢复shell\n");
    printf("3. 执行系统维护\n");
    printf("4. 退出\n");
    
    int choice;
    printf("请选择操作 (1-4): ");
    if (scanf("%d", &choice) != 1) {
        printf("输入错误\n");
        return 1;
    }
    
    switch (choice) {
        case 1:
            printf("创建恢复环境...\n");
            create_recovery_environment(chroot_path);
            break;
        case 2:
            printf("启动恢复shell...\n");
            activate_chroot_environment();
            start_interactive_shell();
            break;
        case 3:
            printf("执行系统维护...\n");
            activate_chroot_environment();
            perform_system_maintenance();
            break;
        case 4:
            printf("退出工具\n");
            break;
        default:
            printf("无效选择\n");
            return 1;
    }
    
    printf("\n=== 系统维护工具结束 ===\n");
    
    return 0;
}

编译和运行

# 编译示例1
sudo gcc -o chroot_example1 chroot_example1.c
sudo ./chroot_example1

# 编译示例2
sudo gcc -o chroot_example2 chroot_example2.c
sudo ./chroot_example2

# 编译示例3
sudo gcc -o chroot_example3 chroot_example3.c
sudo ./chroot_example3

# 编译示例4
sudo gcc -o chroot_example4 chroot_example4.c
sudo ./chroot_example4 create
sudo ./chroot_example4 shell

重要注意事项

权限要求: chroot需要CAP_SYS_CHROOT能力，通常需要root权限
安全性限制: chroot不是安全边界，不能完全防止逃逸
目录验证: 必须确保新根目录的安全性和完整性
文件描述符: chroot不影响已打开的文件描述符
符号链接: 注意处理符号链接可能带来的安全问题
设备文件: 需要正确创建必要的设备文件
库依赖: 确保所需的共享库在chroot环境中可用

最佳实践

权限最小化: 在chroot后尽快降低权限
环境清理: 清理不必要的环境变量和文件描述符
目录验证: 验证新根目录的完整性和安全性
设备文件: 只创建必要的设备文件
库依赖: 确保所有依赖库都在chroot环境中
监控审计: 监控chroot环境中的活动
定期更新: 定期更新chroot环境中的软件包

通过这些示例，你可以理解chroot在系统管理和安全隔离方面的应用，虽然现代容器技术已经提供了更好的解决方案，但chroot仍然是一个重要的系统管理工具。

发表在 linux文章 | 留下评论

clock_adjtime、clock_getres、clock_gettime、 clock_nanosleep、clock_settime 系统调用及示例

发表于2025-08-06由麦芽爸

clock_adjtime – 调整时钟参数

函数介绍

clock_adjtime系统调用用于调整指定时钟的参数，主要用于精密时间同步。它可以设置时钟的频率调整、时间偏移等参数，常用于NTP客户端实现。

函数原型

#include <time.h>
#include <sys/timex.h>
#include <sys/syscall.h>
#include <unistd.h>

int clock_adjtime(clockid_t clk_id, struct timex *buf);

功能

调整指定时钟的参数，包括频率、时间偏移等，用于精密时间同步。

参数

clockid_t clk_id: 时钟ID
- CLOCK_REALTIME: 系统实时钟
- CLOCK_TAI: 国际原子时
struct timex *buf: 指向timex结构体的指针，包含调整参数

返回值

成功时返回状态码
失败时返回-1，并设置errno

特殊限制

需要CAP_SYS_TIME能力
通常需要root权限

相似函数

adjtimex(): 调整系统时钟
settimeofday(): 设置系统时间

示例代码

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <sys/timex.h>
#include <time.h>
#include <errno.h>
#include <string.h>

// 系统调用包装
static int clock_adjtime_wrapper(clockid_t clk_id, struct timex *buf) {
    return syscall(__NR_clock_adjtime, clk_id, buf);
}

int main() {
    struct timex tx;
    int result;
    
    printf("=== Clock_adjtime 函数示例 ===\n");
    printf("当前用户 UID: %d\n", getuid());
    printf("当前有效 UID: %d\n", geteuid());
    
    // 示例1: 获取当前时钟状态
    printf("\n示例1: 获取时钟状态\n");
    
    memset(&tx, 0, sizeof(tx));
    tx.modes = 0; // 仅查询状态
    
    result = clock_adjtime_wrapper(CLOCK_REALTIME, &tx);
    if (result == -1) {
        if (errno == EPERM) {
            printf("  权限不足获取时钟状态: %s\n", strerror(errno));
            printf("  说明: 需要CAP_SYS_TIME能力或root权限\n");
        } else {
            printf("  获取时钟状态失败: %s\n", strerror(errno));
        }
    } else {
        printf("  时钟状态获取成功\n");
        printf("  状态码: %d\n", result);
        printf("  频率偏移: %ld\n", tx.freq);
        printf("  最大误差: %ld\n", tx.maxerror);
        printf("  估计误差: %ld\n", tx.esterror);
    }
    
    // 示例2: 查询时钟参数（不修改）
    printf("\n示例2: 查询时钟参数\n");
    
    memset(&tx, 0, sizeof(tx));
    tx.modes = 0;
    
    result = clock_adjtime_wrapper(CLOCK_REALTIME, &tx);
    if (result != -1) {
        printf("  时钟参数:\n");
        printf("    状态: %d\n", tx.status);
        printf("    频率偏移: %ld ppm\n", tx.freq / 65536); // 转换为ppm
        printf("    时间常数: %ld\n", tx.constant);
        printf("    精度: %ld ns\n", tx.precision);
        printf("    容差: %ld ppm\n", tx.tolerance / 65536); // 转换为ppm
    }
    
    // 示例3: 错误处理演示
    printf("\n示例3: 错误处理演示\n");
    
    // 使用无效的时钟ID
    memset(&tx, 0, sizeof(tx));
    result = clock_adjtime_wrapper(999, &tx);
    if (result == -1) {
        if (errno == EINVAL) {
            printf("  无效时钟ID错误处理正确: %s\n", strerror(errno));
        }
    }
    
    // 使用无效的指针
    result = clock_adjtime_wrapper(CLOCK_REALTIME, NULL);
    if (result == -1) {
        if (errno == EFAULT) {
            printf("  无效指针错误处理正确: %s\n", strerror(errno));
        }
    }
    
    // 示例4: 时钟类型说明
    printf("\n示例4: 支持的时钟类型\n");
    printf("CLOCK_REALTIME: 系统实时钟（wall-clock time）\n");
    printf("  - 可以被手动设置\n");
    printf("  - 受NTP调整影响\n");
    printf("  - 用于日常时间表示\n\n");
    
    printf("CLOCK_TAI: 国际原子时\n");
    printf("  - 连续时间，无闰秒\n");
    printf("  - 与时钟实时相差固定偏移\n");
    printf("  - 用于精密时间计算\n\n");
    
    // 示例5: NTP相关参数说明
    printf("示例5: NTP相关参数说明\n");
    printf("timex结构体重要字段:\n");
    printf("  modes: 指定要设置的参数\n");
    printf("  offset: 时间偏移（纳秒）\n");
    printf("  freq: 频率偏移（scaled ppm）\n");
    printf("  maxerror: 最大误差估计\n");
    printf("  esterror: 误差估计\n");
    printf("  status: 时钟状态标志\n");
    printf("  constant: PLL时间常数\n");
    printf("  precision: 时钟精度\n");
    printf("  tolerance: 频率容差\n\n");
    
    // 示例6: 权限和安全考虑
    printf("示例6: 权限和安全考虑\n");
    printf("使用clock_adjtime需要:\n");
    printf("1. CAP_SYS_TIME能力\n");
    printf("2. 或者root权限\n");
    printf("3. 某些操作可能需要额外权限\n\n");
    
    printf("安全注意事项:\n");
    printf("1. 不当的时间调整可能影响系统稳定性\n");
    printf("2. 频率调整过大可能导致时间跳跃\n");
    printf("3. 应谨慎设置时间参数\n");
    printf("4. 建议使用NTP守护进程进行时间同步\n\n");
    
    // 示例7: 实际应用场景
    printf("示例7: 实际应用场景\n");
    printf("clock_adjtime主要用于:\n");
    printf("1. NTP客户端实现\n");
    printf("2. 精密时间同步服务\n");
    printf("3. 科学计算时间校准\n");
    printf("4. 金融系统时间同步\n");
    printf("5. 分布式系统时间协调\n\n");
    
    printf("典型使用流程:\n");
    printf("1. 查询当前时钟状态\n");
    printf("2. 计算需要的调整参数\n");
    printf("3. 应用调整参数\n");
    printf("4. 监控调整效果\n");
    printf("5. 必要时进行微调\n\n");
    
    printf("总结:\n");
    printf("clock_adjtime是用于精密时钟调整的系统调用\n");
    printf("主要用于NTP客户端和时间同步服务\n");
    printf("需要适当的权限才能使用\n");
    printf("不当使用可能导致系统时间异常\n");
    
    return 0;
}

clock_getres – 获取时钟精度

函数介绍

clock_getres系统调用用于获取指定时钟的精度（分辨率）。它返回时钟能够表示的最小时间间隔。

函数原型

#include <time.h>

int clock_getres(clockid_t clk_id, struct timespec *res);

功能

获取指定时钟的精度，即能够表示的最小时间间隔。

参数

clockid_t clk_id: 时钟ID
struct timespec *res: 指向timespec结构体的指针，用于存储精度信息

返回值

成功时返回0
失败时返回-1，并设置errno

特似函数

clock_gettime(): 获取时钟时间
gettimeofday(): 获取系统时间

示例代码

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <errno.h>
#include <string.h>

int main() {
    struct timespec res;
    int result;
    
    printf("=== Clock_getres 函数示例 ===\n");
    
    // 示例1: 获取各种时钟的精度
    printf("\n示例1: 不同时钟的精度\n");
    
    // CLOCK_REALTIME
    result = clock_getres(CLOCK_REALTIME, &res);
    if (result == -1) {
        printf("  CLOCK_REALTIME精度获取失败: %s\n", strerror(errno));
    } else {
        printf("  CLOCK_REALTIME精度: %ld.%09ld 秒\n", res.tv_sec, res.tv_nsec);
        printf("  即: %ld 纳秒\n", res.tv_nsec);
    }
    
    // CLOCK_MONOTONIC
    result = clock_getres(CLOCK_MONOTONIC, &res);
    if (result == -1) {
        printf("  CLOCK_MONOTONIC精度获取失败: %s\n", strerror(errno));
    } else {
        printf("  CLOCK_MONOTONIC精度: %ld.%09ld 秒\n", res.tv_sec, res.tv_nsec);
        printf("  即: %ld 纳秒\n", res.tv_nsec);
    }
    
    // CLOCK_PROCESS_CPUTIME_ID
    result = clock_getres(CLOCK_PROCESS_CPUTIME_ID, &res);
    if (result == -1) {
        printf("  CLOCK_PROCESS_CPUTIME_ID精度获取失败: %s\n", strerror(errno));
    } else {
        printf("  CLOCK_PROCESS_CPUTIME_ID精度: %ld.%09ld 秒\n", res.tv_sec, res.tv_nsec);
        printf("  即: %ld 纳秒\n", res.tv_nsec);
    }
    
    // CLOCK_THREAD_CPUTIME_ID
    result = clock_getres(CLOCK_THREAD_CPUTIME_ID, &res);
    if (result == -1) {
        printf("  CLOCK_THREAD_CPUTIME_ID精度获取失败: %s\n", strerror(errno));
    } else {
        printf("  CLOCK_THREAD_CPUTIME_ID精度: %ld.%09ld 秒\n", res.tv_sec, res.tv_nsec);
        printf("  即: %ld 纳秒\n", res.tv_nsec);
    }
    
    // 示例2: 错误处理演示
    printf("\n示例2: 错误处理演示\n");
    
    // 使用无效的时钟ID
    result = clock_getres(999, &res);
    if (result == -1) {
        if (errno == EINVAL) {
            printf("  无效时钟ID错误处理正确: %s\n", strerror(errno));
        }
    }
    
    // 使用NULL指针
    result = clock_getres(CLOCK_REALTIME, NULL);
    if (result == 0) {
        printf("  NULL指针参数被接受（用于查询时钟是否存在）\n");
    }
    
    // 示例3: 时钟类型说明
    printf("\n示例3: 支持的时钟类型\n");
    printf("CLOCK_REALTIME: 系统实时钟\n");
    printf("  - 可以被设置和调整\n");
    printf("  - 受NTP和手动调整影响\n");
    printf("  - 用于获取当前时间\n\n");
    
    printf("CLOCK_MONOTONIC: 单调时钟\n");
    printf("  - 不会倒退\n");
    printf("  - 不受系统时间调整影响\n");
    printf("  - 用于测量时间间隔\n\n");
    
    printf("CLOCK_PROCESS_CPUTIME_ID: 进程CPU时间\n");
    printf("  - 测量进程使用的CPU时间\n");
    printf("  - 通常有较高精度\n\n");
    
    printf("CLOCK_THREAD_CPUTIME_ID: 线程CPU时间\n");
    printf("  - 测量线程使用的CPU时间\n");
    printf("  - 用于性能分析\n\n");
    
    // 示例4: 精度对比
    printf("示例4: 不同时钟精度对比\n");
    
    clockid_t clocks[] = {
        CLOCK_REALTIME,
        CLOCK_MONOTONIC,
        CLOCK_PROCESS_CPUTIME_ID,
        CLOCK_THREAD_CPUTIME_ID
    };
    
    const char *clock_names[] = {
        "CLOCK_REALTIME",
        "CLOCK_MONOTONIC",
        "CLOCK_PROCESS_CPUTIME_ID",
        "CLOCK_THREAD_CPUTIME_ID"
    };
    
    for (int i = 0; i < 4; i++) {
        if (clock_getres(clocks[i], &res) == 0) {
            printf("  %-25s: %10ld ns\n", clock_names[i], res.tv_nsec);
        }
    }
    
    // 示例5: 实际应用演示
    printf("\n示例5: 实际应用演示\n");
    printf("时钟精度对程序设计的影响:\n");
    
    // 演示高精度计时
    struct timespec start, end, diff;
    if (clock_getres(CLOCK_MONOTONIC, &res) == 0) {
        printf("  使用CLOCK_MONOTONIC进行高精度计时:\n");
        printf("  理论精度: %ld 纳秒\n", res.tv_nsec);
        
        // 进行简单计时演示
        if (clock_gettime(CLOCK_MONOTONIC, &start) == 0) {
            // 执行一些操作
            volatile int sum = 0;
            for (int i = 0; i < 1000; i++) {
                sum += i;
            }
            
            if (clock_gettime(CLOCK_MONOTONIC, &end) == 0) {
                // 计算时间差
                if (end.tv_nsec < start.tv_nsec) {
                    diff.tv_sec = end.tv_sec - start.tv_sec - 1;
                    diff.tv_nsec = 1000000000 + end.tv_nsec - start.tv_nsec;
                } else {
                    diff.tv_sec = end.tv_sec - start.tv_sec;
                    diff.tv_nsec = end.tv_nsec - start.tv_nsec;
                }
                
                printf("  实际测量时间: %ld.%09ld 秒\n", diff.tv_sec, diff.tv_nsec);
            }
        }
    }
    
    // 示例6: 性能考虑
    printf("\n示例6: 性能考虑\n");
    printf("不同精度时钟的性能特点:\n");
    printf("1. 高精度时钟通常开销较大\n");
    printf("2. 需要在精度和性能间平衡\n");
    printf("3. 选择合适的时钟类型很重要\n");
    printf("4. 避免不必要的高精度要求\n\n");
    
    printf("时钟选择建议:\n");
    printf("- 一般计时: CLOCK_REALTIME\n");
    printf("- 性能测量: CLOCK_MONOTONIC\n");
    printf("- CPU使用率: CLOCK_PROCESS_CPUTIME_ID\n");
    printf("- 线程性能: CLOCK_THREAD_CPUTIME_ID\n\n");
    
    printf("总结:\n");
    printf("clock_getres用于查询时钟精度\n");
    printf("不同类型的时钟有不同的精度\n");
    printf("了解时钟精度有助于正确使用计时函数\n");
    printf("合理选择时钟类型可以提高程序性能\n");
    
    return 0;
}

clock_gettime – 获取时钟时间

函数介绍

clock_gettime系统调用用于获取指定时钟的当前时间。它比传统的gettimeofday提供了更高的精度和更多的时钟类型选择。

函数原型

#include <time.h>

int clock_gettime(clockid_t clk_id, struct timespec *tp);

功能

获取指定时钟的当前时间值。

参数

clockid_t clk_id: 时钟ID
struct timespec *tp: 指向timespec结构体的指针，用于存储时间信息

返回值

成功时返回0
失败时返回-1，并设置errno

相似函数

gettimeofday(): 获取系统时间
time(): 获取秒级时间
clock_getres(): 获取时钟精度

示例代码

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <errno.h>
#include <string.h>
#include <unistd.h>

int main() {
    struct timespec ts;
    int result;
    
    printf("=== Clock_gettime 函数示例 ===\n");
    
    // 示例1: 获取各种时钟的时间
    printf("\n示例1: 不同时钟的时间\n");
    
    // CLOCK_REALTIME
    result = clock_gettime(CLOCK_REALTIME, &ts);
    if (result == -1) {
        printf("  CLOCK_REALTIME时间获取失败: %s\n", strerror(errno));
    } else {
        printf("  CLOCK_REALTIME时间: %ld.%09ld 秒\n", ts.tv_sec, ts.tv_nsec);
        printf("  对应日期: %s", ctime(&ts.tv_sec));
    }
    
    // CLOCK_MONOTONIC
    result = clock_gettime(CLOCK_MONOTONIC, &ts);
    if (result == -1) {
        printf("  CLOCK_MONOTONIC时间获取失败: %s\n", strerror(errno));
    } else {
        printf("  CLOCK_MONOTONIC时间: %ld.%09ld 秒\n", ts.tv_sec, ts.tv_nsec);
    }
    
    // CLOCK_PROCESS_CPUTIME_ID
    result = clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts);
    if (result == -1) {
        printf("  CLOCK_PROCESS_CPUTIME_ID时间获取失败: %s\n", strerror(errno));
    } else {
        printf("  进程CPU时间: %ld.%09ld 秒\n", ts.tv_sec, ts.tv_nsec);
    }
    
    // CLOCK_THREAD_CPUTIME_ID
    result = clock_gettime(CLOCK_THREAD_CPUTIME_ID, &ts);
    if (result == -1) {
        printf("  CLOCK_THREAD_CPUTIME_ID时间获取失败: %s\n", strerror(errno));
    } else {
        printf("  线程CPU时间: %ld.%09ld 秒\n", ts.tv_sec, ts.tv_nsec);
    }
    
    // 示例2: 高精度计时演示
    printf("\n示例2: 高精度计时演示\n");
    
    struct timespec start, end, elapsed;
    
    // 使用CLOCK_MONOTONIC进行计时
    if (clock_gettime(CLOCK_MONOTONIC, &start) == -1) {
        perror("  获取开始时间失败");
    } else {
        printf("  开始时间: %ld.%09ld 秒\n", start.tv_sec, start.tv_nsec);
        
        // 执行一些操作
        printf("  执行计算操作...\n");
        volatile long sum = 0;
        for (long i = 0; i < 1000000; i++) {
            sum += i;
        }
        
        if (clock_gettime(CLOCK_MONOTONIC, &end) == -1) {
            perror("  获取结束时间失败");
        } else {
            printf("  结束时间: %ld.%09ld 秒\n", end.tv_sec, end.tv_nsec);
            
            // 计算耗时
            if (end.tv_nsec < start.tv_nsec) {
                elapsed.tv_sec = end.tv_sec - start.tv_sec - 1;
                elapsed.tv_nsec = 1000000000 + end.tv_nsec - start.tv_nsec;
            } else {
                elapsed.tv_sec = end.tv_sec - start.tv_sec;
                elapsed.tv_nsec = end.tv_nsec - start.tv_nsec;
            }
            
            printf("  计算耗时: %ld.%09ld 秒\n", elapsed.tv_sec, elapsed.tv_nsec);
            printf("  即: %ld 纳秒\n", elapsed.tv_sec * 1000000000 + elapsed.tv_nsec);
        }
    }
    
    // 示例3: 错误处理演示
    printf("\n示例3: 错误处理演示\n");
    
    // 使用无效的时钟ID
    result = clock_gettime(999, &ts);
    if (result == -1) {
        if (errno == EINVAL) {
            printf("  无效时钟ID错误处理正确: %s\n", strerror(errno));
        }
    }
    
    // 使用NULL指针
    result = clock_gettime(CLOCK_REALTIME, NULL);
    if (result == -1) {
        if (errno == EFAULT) {
            printf("  NULL指针错误处理正确: %s\n", strerror(errno));
        }
    }
    
    // 示例4: 时间格式转换
    printf("\n示例4: 时间格式转换\n");
    
    if (clock_gettime(CLOCK_REALTIME, &ts) == 0) {
        printf("  原始时间: %ld.%09ld 秒\n", ts.tv_sec, ts.tv_nsec);
        
        // 转换为毫秒
        long milliseconds = ts.tv_sec * 1000 + ts.tv_nsec / 1000000;
        printf("  毫秒表示: %ld ms\n", milliseconds);
        
        // 转换为微秒
        long microseconds = ts.tv_sec * 1000000 + ts.tv_nsec / 1000;
        printf("  微秒表示: %ld μs\n", microseconds);
        
        // 转换为纳秒
        long long nanoseconds = (long long)ts.tv_sec * 1000000000 + ts.tv_nsec;
        printf("  纳秒表示: %lld ns\n", nanoseconds);
    }
    
    // 示例5: 不同时钟的特性对比
    printf("\n示例5: 不同时钟特性对比\n");
    
    printf("CLOCK_REALTIME特性:\n");
    printf("  - 表示实际时间（墙上时钟）\n");
    printf("  - 可以被系统管理员修改\n");
    printf("  - 受NTP同步影响\n");
    printf("  - 适用于获取当前日期时间\n\n");
    
    printf("CLOCK_MONOTONIC特性:\n");
    printf("  - 单调递增，不会倒退\n");
    printf("  - 不受系统时间调整影响\n");
    printf("  - 适用于测量时间间隔\n");
    printf("  - 进程启动时通常为0\n\n");
    
    printf("CLOCK_PROCESS_CPUTIME_ID特性:\n");
    printf("  - 测量进程使用的CPU时间\n");
    printf("  - 包括所有线程的CPU时间\n");
    printf("  - 不包括睡眠时间\n");
    printf("  - 适用于性能分析\n\n");
    
    printf("CLOCK_THREAD_CPUTIME_ID特性:\n");
    printf("  - 测量当前线程使用的CPU时间\n");
    printf("  - 不包括睡眠时间\n");
    printf("  - 适用于线程性能分析\n\n");
    
    // 示例6: 实际应用场景
    printf("示例6: 实际应用场景\n");
    
    // 场景1: 性能基准测试
    printf("场景1: 性能基准测试\n");
    if (clock_gettime(CLOCK_MONOTONIC, &start) == 0) {
        // 模拟算法执行
        volatile int dummy = 0;
        for (int i = 0; i < 1000000; i++) {
            dummy += i * i;
        }
        
        if (clock_gettime(CLOCK_MONOTONIC, &end) == 0) {
            long long duration = (end.tv_sec - start.tv_sec) * 1000000000LL + 
                               (end.tv_nsec - start.tv_nsec);
            printf("  算法执行时间: %lld 纳秒\n", duration);
        }
    }
    
    // 场景2: 超时控制
    printf("\n场景2: 超时控制\n");
    if (clock_gettime(CLOCK_MONOTONIC, &start) == 0) {
        long timeout_ns = 5000000000LL; // 5秒超时
        
        // 模拟等待操作
        while (1) {
            if (clock_gettime(CLOCK_MONOTONIC, &ts) == 0) {
                long long elapsed = (ts.tv_sec - start.tv_sec) * 1000000000LL + 
                                  (ts.tv_nsec - start.tv_nsec);
                if (elapsed >= timeout_ns) {
                    printf("  操作超时（5秒）\n");
                    break;
                }
            }
            usleep(100000); // 休眠100ms
        }
    }
    
    // 场景3: CPU使用率监控
    printf("\n场景3: CPU使用率监控\n");
    struct timespec cpu_start, cpu_end;
    struct timespec wall_start, wall_end;
    
    if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &cpu_start) == 0 &&
        clock_gettime(CLOCK_MONOTONIC, &wall_start) == 0) {
        
        // 执行一些CPU密集型操作
        volatile double result = 1.0;
        for (int i = 0; i < 1000000; i++) {
            result *= 1.000001;
        }
        
        if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &cpu_end) == 0 &&
            clock_gettime(CLOCK_MONOTONIC, &wall_end) == 0) {
            
            long long cpu_time = (cpu_end.tv_sec - cpu_start.tv_sec) * 1000000000LL + 
                               (cpu_end.tv_nsec - cpu_start.tv_nsec);
            long long wall_time = (wall_end.tv_sec - wall_start.tv_sec) * 1000000000LL + 
                                (wall_end.tv_nsec - wall_start.tv_nsec);
            
            double cpu_usage = (double)cpu_time / wall_time * 100;
            printf("  CPU使用率: %.2f%%\n", cpu_usage);
            printf("  CPU时间: %lld 纳秒\n", cpu_time);
            printf("  墙钟时间: %lld 纳秒\n", wall_time);
        }
    }
    
    // 示例7: 时区和本地时间
    printf("\n示例7: 时区和本地时间\n");
    
    if (clock_gettime(CLOCK_REALTIME, &ts) == 0) {
        printf("  UTC时间: %ld.%09ld\n", ts.tv_sec, ts.tv_nsec);
        
        // 转换为本地时间
        struct tm *local_tm = localtime(&ts.tv_sec);
        if (local_tm) {
            char time_str[100];
            strftime(time_str, sizeof(time_str), "%Y-%m-%d %H:%M:%S", local_tm);
            printf("  本地时间: %s.%09ld\n", time_str, ts.tv_nsec);
        }
    }
    
    printf("\n总结:\n");
    printf("clock_gettime是现代Linux系统中推荐的高精度计时函数\n");
    printf("支持多种时钟类型，满足不同应用场景需求\n");
    printf("提供纳秒级精度，优于传统的time和gettimeofday函数\n");
    printf("正确使用时钟类型对程序性能和正确性很重要\n");
    
    return 0;
}

clock_nanosleep – 高精度睡眠

函数介绍

clock_nanosleep系统调用提供纳秒级精度的睡眠功能，支持绝对时间和相对时间两种模式，比传统的nanosleep更加灵活。

函数原型

#include <time.h>

int clock_nanosleep(clockid_t clock_id, int flags,
                    const struct timespec *request,
                    struct timespec *remain);

功能

使进程睡眠指定的时间，支持高精度纳秒级睡眠。

参数

clockid_t clock_id: 时钟ID
int flags: 标志位
- 0: 相对时间睡眠
- TIMER_ABSTIME: 绝对时间睡眠
const struct timespec *request: 请求睡眠的时间
struct timespec *remain: 剩余时间（被信号中断时）

返回值

成功时返回0
被信号中断时返回-1，并设置errno为EINTR
失败时返回-1，并设置其他errno

相似函数

nanosleep(): 纳秒级睡眠
sleep(): 秒级睡眠
usleep(): 微秒级睡眠

示例代码

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <errno.h>
#include <string.h>
#include <signal.h>
#include <unistd.h>

// 信号处理函数
void signal_handler(int sig) {
    printf("  接收到信号 %d\n", sig);
}

int main() {
    struct timespec request, remain, start, end;
    int result;
    
    printf("=== Clock_nanosleep 函数示例 ===\n");
    
    // 示例1: 相对时间睡眠
    printf("\n示例1: 相对时间睡眠\n");
    
    // 睡眠100毫秒
    request.tv_sec = 0;
    request.tv_nsec = 100000000; // 100毫秒 = 100,000,000纳秒
    
    if (clock_gettime(CLOCK_MONOTONIC, &start) == -1) {
        perror("  获取开始时间失败");
    }
    
    printf("  开始睡眠: %ld.%09ld 秒\n", start.tv_sec, start.tv_nsec);
    result = clock_nanosleep(CLOCK_MONOTONIC, 0, &request, NULL);
    
    if (result == -1) {
        if (errno == EINTR) {
            printf("  睡眠被信号中断\n");
        } else {
            printf("  睡眠失败: %s\n", strerror(errno));
        }
    } else {
        printf("  睡眠完成\n");
        if (clock_gettime(CLOCK_MONOTONIC, &end) == -1) {
            perror("  获取结束时间失败");
        } else {
            long long actual_sleep = (end.tv_sec - start.tv_sec) * 1000000000LL + 
                                   (end.tv_nsec - start.tv_nsec);
            printf("  实际睡眠时间: %lld 纳秒\n", actual_sleep);
        }
    }
    
    // 示例2: 绝对时间睡眠
    printf("\n示例2: 绝对时间睡眠\n");
    
    // 获取当前时间
    if (clock_gettime(CLOCK_REALTIME, &start) == 0) {
        printf("  当前时间: %ld.%09ld 秒\n", start.tv_sec, start.tv_nsec);
        
        // 设置绝对睡眠时间（当前时间+2秒）
        struct timespec absolute_time;
        absolute_time.tv_sec = start.tv_sec + 2;
        absolute_time.tv_nsec = start.tv_nsec;
        
        printf("  绝对睡眠时间: %ld.%09ld 秒\n", absolute_time.tv_sec, absolute_time.tv_nsec);
        
        result = clock_nanosleep(CLOCK_REALTIME, TIMER_ABSTIME, &absolute_time, NULL);
        if (result == -1) {
            if (errno == EINTR) {
                printf("  绝对时间睡眠被信号中断\n");
            } else {
                printf("  绝对时间睡眠失败: %s\n", strerror(errno));
            }
        } else {
            printf("  绝对时间睡眠完成\n");
        }
    }
    
    // 示例3: 被信号中断的睡眠
    printf("\n示例3: 被信号中断的睡眠\n");
    
    // 设置信号处理
    signal(SIGUSR1, signal_handler);
    
    // 启动另一个线程发送信号
    pid_t pid = fork();
    if (pid == 0) {
        // 子进程：延迟发送信号
        sleep(1);
        kill(getppid(), SIGUSR1);
        exit(0);
    } else if (pid > 0) {
        // 父进程：长时间睡眠
        request.tv_sec = 5;
        request.tv_nsec = 0;
        
        printf("  开始5秒睡眠，1秒后会被信号中断\n");
        result = clock_nanosleep(CLOCK_MONOTONIC, 0, &request, &remain);
        
        if (result == -1 && errno == EINTR) {
            printf("  睡眠被信号中断\n");
            printf("  剩余时间: %ld.%09ld 秒\n", remain.tv_sec, remain.tv_nsec);
        }
        
        wait(NULL); // 等待子进程结束
    }
    
    // 示例4: 错误处理演示
    printf("\n示例4: 错误处理演示\n");
    
    // 使用无效的时钟ID
    request.tv_sec = 1;
    request.tv_nsec = 0;
    result = clock_nanosleep(999, 0, &request, NULL);
    if (result == -1) {
        if (errno == EINVAL) {
            printf("  无效时钟ID错误处理正确: %s\n", strerror(errno));
        }
    }
    
    // 使用无效的时间值
    request.tv_sec = -1;
    request.tv_nsec = 0;
    result = clock_nanosleep(CLOCK_MONOTONIC, 0, &request, NULL);
    if (result == -1) {
        if (errno == EINVAL) {
            printf("  无效时间值错误处理正确: %s\n", strerror(errno));
        }
    }
    
    // 使用过大的纳秒值
    request.tv_sec = 0;
    request.tv_nsec = 1000000000; // 10亿纳秒 = 1秒，但应该 < 1秒
    result = clock_nanosleep(CLOCK_MONOTONIC, 0, &request, NULL);
    if (result == -1) {
        if (errno == EINVAL) {
            printf("  纳秒值过大错误处理正确: %s\n", strerror(errno));
        }
    }
    
    // 示例5: 不同时钟的睡眠效果
    printf("\n示例5: 不同时钟的睡眠效果\n");
    
    printf("CLOCK_REALTIME睡眠:\n");
    printf("  - 基于系统实时时间\n");
    printf("  - 受系统时间调整影响\n");
    printf("  - 适用于绝对时间睡眠\n\n");
    
    printf("CLOCK_MONOTONIC睡眠:\n");
    printf("  - 基于单调递增时间\n");
    printf("  - 不受系统时间调整影响\n");
    printf("  - 适用于相对时间睡眠\n\n");
    
    // 示例6: 高精度定时器演示
    printf("示例6: 高精度定时器演示\n");
    
    printf("创建100毫秒间隔的定时器循环:\n");
    struct timespec interval;
    interval.tv_sec = 0;
    interval.tv_nsec = 100000000; // 100毫秒
    
    for (int i = 0; i < 5; i++) {
        if (clock_gettime(CLOCK_MONOTONIC, &start) == 0) {
            printf("  第%d次: 时间 %ld.%09ld\n", i+1, start.tv_sec, start.tv_nsec);
        }
        
        result = clock_nanosleep(CLOCK_MONOTONIC, 0, &interval, NULL);
        if (result == -1) {
            if (errno == EINTR) {
                printf("  第%d次: 睡眠被中断\n", i+1);
                break;
            }
        }
    }
    
    // 示例7: 睡眠精度测试
    printf("\n示例7: 睡眠精度测试\n");
    
    struct timespec sleep_times[] = {
        {0, 1000},      // 1微秒
        {0, 10000},     // 10微秒
        {0, 100000},    // 100微秒
        {0, 1000000},   // 1毫秒
        {0, 10000000},  // 10毫秒
        {0, 100000000}, // 100毫秒
        {1, 0}          // 1秒
    };
    
    const char *time_labels[] = {
        "1微秒", "10微秒", "100微秒", "1毫秒", "10毫秒", "100毫秒", "1秒"
    };
    
    printf("睡眠精度测试结果:\n");
    for (int i = 0; i < 7; i++) {
        if (clock_gettime(CLOCK_MONOTONIC, &start) == 0) {
            result = clock_nanosleep(CLOCK_MONOTONIC, 0, &sleep_times[i], NULL);
            if (clock_gettime(CLOCK_MONOTONIC, &end) == 0) {
                long long actual = (end.tv_sec - start.tv_sec) * 1000000000LL + 
                                 (end.tv_nsec - start.tv_nsec);
                long long requested = sleep_times[i].tv_sec * 1000000000LL + 
                                    sleep_times[i].tv_nsec;
                long long diff = actual - requested;
                
                printf("  %-8s: 请求%8lld ns, 实际%8lld ns, 误差%+6lld ns\n",
                       time_labels[i], requested, actual, diff);
            }
        }
    }
    
    // 示例8: 实际应用场景
    printf("\n示例8: 实际应用场景\n");
    
    // 场景1: 实时系统定时
    printf("场景1: 实时系统定时\n");
    printf("在实时应用中确保精确的时间间隔\n");
    
    // 场景2: 性能基准测试
    printf("\n场景2: 性能基准测试\n");
    printf("提供精确的延迟控制用于性能测试\n");
    
    // 场景3: 动画和游戏循环
    printf("\n场景3: 动画和游戏循环\n");
    printf("维持稳定的帧率和更新频率\n");
    
    // 场景4: 网络超时控制
    printf("\n场景4: 网络超时控制\n");
    printf("实现精确的网络操作超时机制\n");
    
    printf("\n总结:\n");
    printf("clock_nanosleep提供纳秒级精度的睡眠功能\n");
    printf("支持相对时间和绝对时间两种模式\n");
    printf("比传统sleep函数更加灵活和精确\n");
    printf("正确处理信号中断和剩余时间计算\n");
    printf("适用于需要高精度时间控制的应用场景\n");
    
    return 0;
}

clock_settime – 设置时钟时间

函数介绍

clock_settime系统调用用于设置指定时钟的时间值。它允许程序修改系统时钟，主要用于时间同步和系统管理。

函数原型

#include <time.h>

int clock_settime(clockid_t clk_id, const struct timespec *tp);

功能

设置指定时钟的时间值。

参数

clockid_t clk_id: 时钟ID（通常为CLOCK_REALTIME）
const struct timespec *tp: 指向timespec结构体的指针，包含新的时间值

返回值

成功时返回0
失败时返回-1，并设置errno

特殊限制

需要CAP_SYS_TIME能力或root权限
通常只能设置CLOCK_REALTIME时钟

相似函数

settimeofday(): 设置系统时间
stime(): 设置系统时间（已废弃）

示例代码

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <errno.h>
#include <string.h>
#include <unistd.h>

int main() {
    struct timespec current_time, new_time;
    int result;
    
    printf("=== Clock_settime 函数示例 ===\n");
    printf("当前用户 UID: %d\n", getuid());
    printf("当前有效 UID: %d\n", geteuid());
    
    // 示例1: 获取当前时间
    printf("\n示例1: 获取当前时间\n");
    
    if (clock_gettime(CLOCK_REALTIME, &current_time) == -1) {
        perror("  获取当前时间失败");
    } else {
        printf("  当前系统时间: %ld.%09ld 秒\n", current_time.tv_sec, current_time.tv_nsec);
        printf("  对应日期: %s", ctime(&current_time.tv_sec));
    }
    
    // 示例2: 权限检查
    printf("\n示例2: 权限检查\n");
    
    // 尝试设置时间（通常会失败）
    new_time.tv_sec = current_time.tv_sec;
    new_time.tv_nsec = current_time.tv_nsec;
    
    result = clock_settime(CLOCK_REALTIME, &new_time);
    if (result == -1) {
        if (errno == EPERM) {
            printf("  权限不足设置时间: %s\n", strerror(errno));
            printf("  说明: 需要CAP_SYS_TIME能力或root权限\n");
        } else {
            printf("  设置时间失败: %s\n", strerror(errno));
        }
    } else {
        printf("  时间设置成功\n");
    }
    
    // 示例3: 错误处理演示
    printf("\n示例3: 错误处理演示\n");
    
    // 使用无效的时钟ID
    result = clock_settime(999, &new_time);
    if (result == -1) {
        if (errno == EINVAL) {
            printf("  无效时钟ID错误处理正确: %s\n", strerror(errno));
        }
    }
    
    // 使用无效的时间值
    struct timespec invalid_time;
    invalid_time.tv_sec = -1;
    invalid_time.tv_nsec = 0;
    
    result = clock_settime(CLOCK_REALTIME, &invalid_time);
    if (result == -1) {
        if (errno == EINVAL) {
            printf("  无效时间值错误处理正确: %s\n", strerror(errno));
        }
    }
    
    // 使用过大的纳秒值
    invalid_time.tv_sec = current_time.tv_sec;
    invalid_time.tv_nsec = 1000000000; // 10亿纳秒，应该 < 1秒
    
    result = clock_settime(CLOCK_REALTIME, &invalid_time);
    if (result == -1) {
        if (errno == EINVAL) {
            printf("  纳秒值过大错误处理正确: %s\n", strerror(errno));
        }
    }
    
    // 使用NULL指针
    result = clock_settime(CLOCK_REALTIME, NULL);
    if (result == -1) {
        if (errno == EFAULT) {
            printf("  NULL指针错误处理正确: %s\n", strerror(errno));
        }
    }
    
    // 示例4: 支持的时钟类型
    printf("\n示例4: 支持的时钟类型\n");
    
    printf("CLOCK_REALTIME:\n");
    printf("  - 系统实时钟\n");
    printf("  - 可以被设置\n");
    printf("  - 用于表示当前时间\n\n");
    
    printf("其他时钟类型:\n");
    printf("  - CLOCK_MONOTONIC: 通常不能设置\n");
    printf("  - CLOCK_PROCESS_CPUTIME_ID: 不能设置\n");
    printf("  - CLOCK_THREAD_CPUTIME_ID: 不能设置\n\n");
    
    // 示例5: 时间格式转换
    printf("示例5: 时间格式转换\n");
    
    if (clock_gettime(CLOCK_REALTIME, &current_time) == 0) {
        printf("  当前时间: %ld.%09ld 秒\n", current_time.tv_sec, current_time.tv_nsec);
        
        // 从日期字符串转换为time_t
        struct tm time_info;
        strptime("2024-01-01 12:00:00", "%Y-%m-%d %H:%M:%S", &time_info);
        time_t new_time_t = mktime(&time_info);
        
        printf("  转换时间: %s", ctime(&new_time_t));
        
        // 转换为timespec格式
        struct timespec converted_time;
        converted_time.tv_sec = new_time_t;
        converted_time.tv_nsec = 0;
        
        printf("  timespec格式: %ld.%09ld 秒\n", converted_time.tv_sec, converted_time.tv_nsec);
    }
    
    // 示例6: 时区考虑
    printf("\n示例6: 时区考虑\n");
    
    if (clock_gettime(CLOCK_REALTIME, &current_time) == 0) {
        printf("  UTC时间: %ld.%09ld 秒\n", current_time.tv_sec, current_time.tv_nsec);
        
        // 获取本地时区偏移
        struct tm *utc_tm = gmtime(&current_time.tv_sec);
        struct tm *local_tm = localtime(&current_time.tv_sec);
        
        time_t utc_time = mktime(utc_tm);
        time_t local_time = mktime(local_tm);
        
        long tz_offset = local_time - utc_time;
        printf("  时区偏移: %+ld 秒\n", tz_offset);
    }
    
    // 示例7: 安全考虑
    printf("\n示例7: 安全考虑\n");
    printf("使用clock_settime的安全注意事项:\n");
    printf("1. 需要适当的权限（CAP_SYS_TIME或root）\n");
    printf("2. 不当的时间设置可能影响系统稳定性\n");
    printf("3. 时间跳跃可能影响依赖时间的应用程序\n");
    printf("4. 应该使用NTP等标准时间同步服务\n");
    printf("5. 在生产环境中谨慎使用\n\n");
    
    // 示例8: 实际应用场景
    printf("示例8: 实际应用场景\n");
    
    // 场景1: NTP客户端
    printf("场景1: NTP客户端\n");
    printf("  - 从NTP服务器获取时间\n");
    printf("  - 调整系统时钟\n");
    printf("  - 保持时间同步\n\n");
    
    // 场景2: 系统初始化
    printf("场景2: 系统初始化\n");
    printf("  - 设置初始系统时间\n");
    printf("  - 从硬件时钟同步\n");
    printf("  - 恢复时间设置\n\n");
    
    // 场景3: 调试和测试
    printf("场景3: 调试和测试\n");
    printf("  - 设置特定时间进行测试\n");
    printf("  - 模拟时间相关场景\n");
    printf("  - 性能基准测试\n\n");
    
    // 场景4: 时间同步服务
    printf("场景4: 时间同步服务\n");
    printf("  - 分布式系统时间协调\n");
    printf("  - 数据库事务时间戳\n");
    printf("  - 日志时间同步\n\n");
    
    // 示例9: 替代方案
    printf("示例9: 替代方案\n");
    printf("现代时间管理推荐使用:\n");
    printf("1. NTP守护进程（ntpd）\n");
    printf("2. systemd-timesyncd\n");
    printf("3. chrony\n");
    printf("4. chronyd\n");
    printf("5. 避免手动设置系统时间\n\n");
    
    printf("总结:\n");
    printf("clock_settime用于设置系统时钟时间\n");
    printf("需要适当的权限才能使用\n");
    printf("主要用于时间同步服务\n");
    printf("不当使用可能影响系统稳定性\n");
    printf("推荐使用标准的时间同步服务\n");
    
    return 0;
}

发表在 linux文章 | 留下评论

clone系统调用及示例

发表于2025-08-06由麦芽爸

我们继续介绍下一个函数。在 getsockopt 之后，根据您提供的列表，下一个函数是 clone。

clone 函数

1. 函数介绍

clone 是一个 Linux 特有的系统调用，它提供了一种非常灵活且底层的方式来创建新的进程或线程。它比标准的 fork 函数更加强大和复杂，允许调用者精确地控制子进程（或线程）与父进程（调用进程）之间共享哪些资源（如虚拟内存空间、文件描述符表、信号处理程序表等）。

你可以把 clone 想象成一个高度可定制的复制品制造机：

你有一个原始对象（父进程）。
你可以告诉机器（clone）：复制这个对象，但让新对象（子进程）和原对象共享某些部件（如内存、文件），而独立拥有另一些部件（如寄存器状态、栈）。
通过设置不同的参数，你可以制造出几乎完全独立的副本（类似 fork），或者共享大量资源的紧密副本（类似线程）。

实际上，Linux 上的 pthread 线程库在底层就是通过调用 clone 来创建线程的。

2. 函数原型

#define _GNU_SOURCE // 必须定义以使用 clone
#include <sched.h>  // 必需
#include <signal.h> // 定义了 SIGCHLD 等常量

// 标准形式 (通过宏定义)
int clone(int (*fn)(void *), void *stack, int flags, void *arg, ...
         /* pid_t *parent_tid, void *tls, pid_t *child_tid */ );

// 更底层的系统调用形式 (通常由库函数包装)
long syscall(SYS_clone, unsigned long flags, void *stack,
             int *parent_tid, int *child_tid, unsigned long tls);

注意: clone 的接口比较复杂，并且存在不同版本。上面展示的是最常用的、由 glibc 提供的包装函数形式。

3. 功能

创建新执行流: 创建一个新的执行流（可以看作一个轻量级进程或线程）。
控制资源共享: 通过 flags 参数，精确控制新创建的执行流与调用者共享哪些内核资源。
指定执行函数: 与 pthread_create 类似，clone 允许你指定一个函数 fn，新创建的执行流将从该函数开始执行。
指定栈空间: 调用者必须为新执行流提供一块栈空间（通过 stack 参数），这与 pthread_create 自动分配栈不同。
传递参数: 可以通过 arg 参数向新执行流的入口函数 fn 传递一个参数。

4. 参数

由于 clone 的复杂性，我们重点介绍 glibc 包装函数的常用参数：

int (*fn)(void *): 这是一个函数指针，指向新创建的执行流将要执行的入口函数。
- 该函数接受一个 void * 类型的参数，并返回一个 int 类型的值。
- 当这个函数返回时，新创建的执行流（子进程/线程）就会终止。
void *stack: 这是一个指针，指向为新执行流分配的栈空间的顶部（高地址）。
- 非常重要: 调用者必须自己分配并管理这块栈内存。clone 不会自动分配。
- 栈是从高地址向低地址增长的，所以这个指针应该指向分配的栈空间的末尾。
int flags: 这是最重要的参数，是一个位掩码（bitmask），用于指定新执行流与父进程共享哪些资源。常用的标志包括：
- CLONE_VM: 共享虚拟内存空间。如果设置，子进程和父进程将运行在同一个内存地址空间中（类似线程）。
- CLONE_FS: 共享文件系统信息（根目录、当前工作目录等）。
- CLONE_FILES: 共享文件描述符表。如果设置，子进程将继承父进程打开的文件描述符，并且后续在任一进程中打开/关闭文件都会影响另一个。
- CLONE_SIGHAND: 共享信号处理程序表。如果设置，子进程将继承父进程的信号处理设置。
- CLONE_PTRACE: 如果父进程正在被跟踪（ptrace），则子进程也将被跟踪。
- CLONE_VFORK: 暂停父进程的执行，直到子进程调用 exec 或 _exit。这模拟了 vfork 的语义。
- CLONE_PARENT: 新子进程的父进程将是调用进程的父进程，而不是调用进程本身。
- CLONE_THREAD: 将子进程置于调用进程的线程组中。这通常与 CLONE_VM, CLONE_FS, CLONE_FILES, CLONE_SIGHAND 一起使用来创建线程。
- CLONE_NEW* (如 CLONE_NEWNS, CLONE_NEWUSER): 用于创建命名空间（Namespace），这是容器技术（如 Docker）的基础。
- SIGCHLD: 这不是一个 CLONE_* 标志，但它经常与 clone 一起使用（按位或 |）。它指定当子进程退出时，应向父进程发送 SIGCHLD 信号。
void *arg: 这是一个通用指针，它将作为参数传递给入口函数 fn。
... (可变参数): 后面可能还有几个参数，用于更高级的用途（如设置线程本地存储 TLS、获取子进程 ID 等），在基础使用中通常可以忽略或传入 NULL。

5. 返回值

clone 的返回值比较特殊，因为它在父进程和子进程（新创建的执行流）中是不同的：

在父进程中:
- 如果成功，返回新创建子进程的**线程 ID **(Thread ID, TID)。在 Linux 中，TID 通常与 PID 相同（对于主线程），但对于使用 CLONE_THREAD 创建的线程，它们有相同的 PID 但不同的 TID。
- 如果失败，返回 -1，并设置 errno。
在子进程中 (新创建的执行流):
- 直接执行 fn(arg) 函数。
- fn 函数的返回值将成为 clone 系统调用在子进程中的返回值。
- 如果 fn 函数返回，子进程通常应该调用 _exit() 而不是 exit() 来终止，以避免刷新 stdio 缓冲区等可能影响父进程的操作。

6. 相似函数，或关联函数

fork: 创建一个新进程，子进程是父进程的一个完整副本，拥有独立的资源。clone 可以通过不设置任何共享标志来模拟 fork 的行为。
vfork: 类似于 fork，但在子进程调用 exec 或 _exit 之前会暂停父进程。clone 可以通过设置 CLONE_VFORK 标志来模拟 vfork。
pthread_create: POSIX 线程库函数，用于创建线程。在 Linux 上，它底层就是调用 clone，并自动处理栈分配、设置共享标志等。
_exit: 子进程在 fn 函数中执行完毕后，应调用 _exit 退出。
wait / waitpid: 父进程可以使用这些函数来等待由 clone（设置了 SIGCHLD）创建的子进程结束。

7. 示例代码

示例 1：使用 `clone` 模拟 `fork` (不共享任何资源)

这个例子演示了如何使用 clone 来创建一个与父进程几乎完全独立的子进程，效果类似于 fork。

// clone_fork_like.c
#define _GNU_SOURCE // 必须定义以使用 clone
#include <sched.h>  // clone
#include <sys/wait.h> // waitpid
#include <unistd.h>   // getpid
#include <stdio.h>    // printf, perror
#include <stdlib.h>   // exit, malloc, free
#include <signal.h>   // SIGCHLD
#include <string.h>   // strerror
#include <errno.h>    // errno

#define STACK_SIZE (1024 * 1024) // 1MB 栈空间

// 子进程要执行的函数
int child_function(void *arg) {
    char *msg = (char *)arg;
    printf("Child process (TID: %d) executing.\n", getpid());
    printf("Child received message: %s\n", msg);

    // 子进程可以执行自己的任务
    for (int i = 0; i < 3; ++i) {
        printf("  Child working... %d\n", i);
        sleep(1);
    }

    printf("Child process (TID: %d) finished.\n", getpid());
    // 子进程结束，返回值将成为 clone 在子进程中的返回值
    return 42;
}

int main() {
    char *stack;          // 指向栈空间的指针
    char *stack_top;      // 指向栈顶的指针 (clone 需要)
    pid_t ctid;           // 子进程的 TID

    // 1. 为子进程分配栈空间
    // 注意：栈是从高地址向低地址增长的
    stack = malloc(STACK_SIZE);
    if (stack == NULL) {
        perror("malloc stack failed");
        exit(EXIT_FAILURE);
    }
    // stack 指向分配内存的起始地址
    // stack_top 应该指向内存的末尾地址
    stack_top = stack + STACK_SIZE;

    printf("Parent process (PID: %d) starting.\n", getpid());

    // 2. 调用 clone 创建子进程
    // flags = SIGCHLD: 子进程退出时发送 SIGCHLD 信号给父进程
    //         (没有设置 CLONE_VM, CLONE_FILES 等，所以资源不共享，类似 fork)
    ctid = clone(child_function, stack_top, SIGCHLD, "Hello from parent to child!");
    // 注意：这里的 SIGCHLD 是一个常见的用法，表示子进程结束后通知父进程

    if (ctid == -1) {
        perror("clone failed");
        free(stack);
        exit(EXIT_FAILURE);
    }

    printf("Parent process (PID: %d) created child with TID: %d\n", getpid(), ctid);

    // 3. 父进程继续执行自己的任务
    printf("Parent process doing its own work...\n");
    for (int i = 0; i < 5; ++i) {
        printf("  Parent working... %d\n", i);
        sleep(1);
    }

    // 4. 父进程等待子进程结束
    int status;
    pid_t wpid = waitpid(ctid, &status, 0); // 等待特定的子进程
    if (wpid == -1) {
        perror("waitpid failed");
        free(stack);
        exit(EXIT_FAILURE);
    }

    if (WIFEXITED(status)) {
        int exit_code = WEXITSTATUS(status);
        printf("Parent: Child (TID %d) exited with status/code: %d\n", ctid, exit_code);
    } else {
        printf("Parent: Child (TID %d) did not exit normally.\n", ctid);
    }

    // 5. 清理资源
    free(stack);
    printf("Parent process (PID: %d) finished.\n", getpid());

    return 0;
}

代码解释:

定义了栈大小 STACK_SIZE 为 1MB。
定义了子进程的入口函数 child_function。这个函数接受一个 void * 参数，打印信息，做一些工作，然后返回 42。
在 main 函数中：
- 使用 malloc 分配栈空间。
- 计算栈顶指针 stack_top。因为栈向下增长，clone 需要栈顶地址。
- 调用 clone(child_function, stack_top, SIGCHLD, "Hello from parent to child!")。
  - child_function: 子进程入口。
  - stack_top: 子进程的栈顶。
  - SIGCHLD: 标志，表示子进程结束后发送信号。
  - "Hello...": 传递给 child_function 的参数。
- clone 在父进程中返回子进程的 TID。
- 父进程执行自己的任务。
- 调用 waitpid(ctid, ...) 等待子进程结束。
- 检查子进程的退出状态。WEXITSTATUS(status) 获取子进程 child_function 的返回值（42）。
- 释放栈内存。

示例 2：使用 `clone` 创建共享内存的执行流 (类似线程)

这个例子演示了如何使用 clone 创建一个与父进程共享内存空间的执行流，模拟线程的部分行为。

// clone_thread_like.c
#define _GNU_SOURCE
#include <sched.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <sys/wait.h>
#include <string.h>

#define STACK_SIZE (1024 * 1024)

// 全局变量，用于演示内存共享
volatile int shared_counter = 0;

// 子执行流函数
int thread_like_function(void *arg) {
    char *name = (char *)arg;
    printf("Thread-like process '%s' (TID: %d) started.\n", name, getpid());

    for (int i = 0; i < 100000; ++i) {
        // 修改共享变量
        shared_counter++;
    }
    printf("Thread-like process '%s' finished. Shared counter: %d\n", name, shared_counter);
    return 0;
}

int main() {
    char *stack1, *stack2;
    char *stack_top1, *stack_top2;
    pid_t tid1, tid2;

    stack1 = malloc(STACK_SIZE);
    stack2 = malloc(STACK_SIZE);
    if (!stack1 || !stack2) {
        perror("malloc stacks failed");
        free(stack1);
        free(stack2);
        exit(EXIT_FAILURE);
    }
    stack_top1 = stack1 + STACK_SIZE;
    stack_top2 = stack2 + STACK_SIZE;

    printf("Main process (PID: %d) creating two thread-like processes.\n", getpid());
    printf("Initial shared counter: %d\n", shared_counter);

    // 创建第一个"线程"
    // CLONE_VM: 共享虚拟内存 (包括全局变量 shared_counter)
    tid1 = clone(thread_like_function, stack_top1, CLONE_VM | SIGCHLD, "Thread-1");
    if (tid1 == -1) {
        perror("clone thread 1 failed");
        free(stack1);
        free(stack2);
        exit(EXIT_FAILURE);
    }

    // 创建第二个"线程"
    tid2 = clone(thread_like_function, stack_top2, CLONE_VM | SIGCHLD, "Thread-2");
    if (tid2 == -1) {
        perror("clone thread 2 failed");
        free(stack1);
        free(stack2);
        exit(EXIT_FAILURE);
    }

    printf("Main process created TID1: %d, TID2: %d\n", tid1, tid2);

    // 等待两个"线程"结束
    // 注意：由于共享内存，最后的 shared_counter 值是不确定的（竞态条件）
    waitpid(tid1, NULL, 0);
    waitpid(tid2, NULL, 0);

    printf("Main process finished. Final shared counter: %d (may be < 200000 due to race condition)\n", shared_counter);

    free(stack1);
    free(stack2);
    return 0;
}

代码解释:

定义了一个 volatile int shared_counter 全局变量。volatile 告诉编译器不要优化对它的访问，因为在多执行流环境下它的值可能随时改变。
thread_like_function 是两个”线程”将执行的函数。它们都对 shared_counter 进行大量递增操作。
在 main 函数中：
- 分配两个独立的栈空间。
- 调用两次 clone 创建两个执行流。
- 关键: flags 参数是 CLONE_VM | SIGCHLD。
  - CLONE_VM: 这使得子执行流与父进程共享虚拟内存地址空间。因此，它们访问的 shared_counter 是同一个变量。
- 父进程等待两个子执行流结束。
重要: 由于两个执行流共享内存并同时修改 shared_counter，而 shared_counter++ 不是原子操作，这会导致竞态条件（Race Condition）。最终的 shared_counter 值很可能小于 200000。这展示了在共享内存编程中进行同步（如使用互斥锁）的重要性。

示例 3：与 `pthread_create` 的对比

这个例子通过代码片段对比 clone 和更高级的 pthread_create。

// 使用 clone (底层，复杂)
int thread_func(void *arg) {
    // ... thread work ...
    return 0;
}
void* wrapper_func(void *arg) {
    return (void*)(long)thread_func(arg);
}
// In main:
char *stack = malloc(STACK_SIZE);
char *stack_top = stack + STACK_SIZE;
clone(thread_func, stack_top, CLONE_VM | CLONE_FILES | CLONE_SIGHAND | CLONE_THREAD, arg);

// 使用 pthread_create (高层，简单)
void* pthread_func(void *arg) {
    // ... thread work ...
    return NULL;
}
// In main:
pthread_t thread;
pthread_create(&thread, NULL, pthread_func, arg);
pthread_join(thread, NULL);

解释:

clone 需要手动管理栈、设置多个标志位、处理返回值等，非常底层。
pthread_create 自动处理了栈分配、设置了正确的共享标志、提供了简单的 pthread_t 标识符和 pthread_join 等待机制，更易于使用。

重要提示与注意事项:

底层且复杂: clone 是一个非常底层的系统调用，直接使用它非常复杂且容易出错。除非有特殊需求（如实现自己的线程库、容器技术），否则应优先使用 fork/vfork 或 pthread_create。
栈管理: 调用者必须自己分配和释放子进程/线程的栈空间。忘记释放会导致内存泄漏。
标志位: flags 参数是 clone 的核心。理解各种 CLONE_* 标志的含义及其组合效果至关重要。
CLONE_THREAD: 如果使用 CLONE_THREAD 创建线程，该线程将成为调用进程的线程组的一部分。线程组中的所有线程具有相同的 PID，但有不同的 TID。对线程组中的任何一个线程调用 exit 会杀死整个线程组。等待线程需要使用 pthread_join 类似的机制，而不是 wait/waitpid。
_exit vs exit: 子进程（线程）在执行函数返回后，应调用 _exit() 而非 exit()。exit() 会刷新 stdio 缓冲区等，可能对共享内存的父进程产生意外影响。
信号: 理解 SIGCHLD 标志以及如何正确等待子进程非常重要。
可移植性: clone 是 Linux 特有的系统调用，在其他 Unix 系统上不可用。

总结:

clone 是 Linux 提供的一个功能强大但使用复杂的系统调用，用于创建新的执行流（进程或线程）。它通过精细的标志位控制资源的共享，是实现线程库和高级进程管理功能（如容器）的基础。虽然直接使用它需要深入了解系统底层知识，但理解其工作原理对于掌握 Linux 进程和线程模型非常有帮助。

发表在 linux文章 | 留下评论

close系统调用及示例

发表于2025-08-06由麦芽爸

继续学习 Linux 系统编程中的基础函数。这次我们介绍 close 函数，它是与 open 相对应的，用于关闭不再需要的文件描述符。

1. 函数介绍

close 是一个 Linux 系统调用，其主要功能是关闭一个由 open、pipe、socket 等系统调用打开的文件描述符 (file descriptor)。关闭文件描述符会释放与之关联的内核资源（如文件表项），并使其可以被进程重新使用（例如，后续的 open 调用可能会返回这个刚刚被关闭的文件描述符值）。

你可以把它想象成离开房间时关上门，并交还钥匙（文件描述符），这样其他人（或你自己稍后）才能再使用这把钥匙（文件描述符号）进入别的房间（打开别的文件）。

2. 函数原型

#include <unistd.h>

int close(int fd);

3. 功能

释放资源: 释放与文件描述符 fd 相关的内核资源。
刷新缓冲: 对于某些类型的文件（如普通文件），内核可能会缓存写操作。调用 close 通常会触发将这些缓存的数据写入到实际的存储介质中（虽然不保证 100% 刷新，fsync 可以强制刷新）。
关闭连接: 对于套接字 (socket) 或管道 (pipe)，close 会关闭连接的一端。
回收描述符: 使文件描述符 fd 在当前进程中变为无效，该值可以被后续的文件操作（如 open、dup）重新使用。

4. 参数

int fd: 这是需要关闭的文件描述符。它应该是之前成功调用 open、creat、pipe、socket、dup 等函数返回的有效文件描述符。

5. 返回值

成功时: 返回 0。
失败时: 返回 -1，并设置全局变量 errno 来指示具体的错误原因（例如 EBADF 表示 fd 不是一个有效的、已打开的文件描述符）。

重要提示: 必须检查 close 的返回值！ 虽然很多人忽略，但 close 是可能失败的。如果 close 失败，可能意味着数据没有被正确写入（例如磁盘空间满、设备故障等）。忽略 close 的错误可能会导致数据丢失或不一致。

6. 相似函数，或关联函数

open: 与 close 相对应，用于打开文件并获取文件描述符。
read, write: 在文件描述符被 close 之后，不能再对它使用 read 或 write。
dup, dup2, fcntl(F_DUPFD): 这些函数可以复制文件描述符。需要注意的是，close 只关闭指定的那个文件描述符副本。只有当一个打开文件的最后一个引用（即最后一个指向该打开文件表项的文件描述符）被关闭时，相关的资源（如文件偏移量、状态标志）才会被真正释放，对于文件来说，数据也才会被刷新。
fsync: 在 close 之前显式调用 fsync(fd) 可以确保文件的所有修改都被写入到存储设备，提供更强的数据持久性保证。

7. 示例代码

示例 1：基本的打开、读取、关闭操作

这个例子结合了 open、read 和 close，展示了它们的标准使用流程。

#include <unistd.h>  // read, write, close
#include <fcntl.h>   // open, O_RDONLY
#include <stdio.h>   // perror, printf
#include <stdlib.h>  // exit
#include <errno.h>   // errno

#define BUFFER_SIZE 512

int main() {
    int fd;                 // 文件描述符
    char buffer[BUFFER_SIZE]; // 读取缓冲区
    ssize_t bytes_read;     // 实际读取的字节数
    int close_result;       // close 的返回值

    // 1. 打开文件
    fd = open("sample.txt", O_RDONLY);
    if (fd == -1) {
        perror("Error opening file 'sample.txt'");
        exit(EXIT_FAILURE);
    }
    printf("File 'sample.txt' opened successfully with fd: %d\n", fd);

    // 2. 读取文件内容
    printf("Reading file content:\n");
    while ((bytes_read = read(fd, buffer, BUFFER_SIZE)) > 0) {
        // 将读取的内容写到标准输出
        if (write(STDOUT_FILENO, buffer, bytes_read) != bytes_read) {
            perror("Error writing to stdout");
            // 即使写 stdout 出错，也要尝试关闭原始文件
            close(fd); // 忽略此处 close 的返回值，因为主要错误是 write
            exit(EXIT_FAILURE);
        }
    }

    if (bytes_read == -1) {
        perror("Error reading file");
        // 读取失败，关闭文件
        close(fd); // 忽略此处 close 的返回值，因为主要错误是 read
        exit(EXIT_FAILURE);
    }
    // bytes_read == 0, 表示到达文件末尾，正常流程

    // 3. 关闭文件 - 这是关键步骤
    close_result = close(fd);
    if (close_result == -1) {
        // close 失败！这是一个严重错误，需要处理
        perror("CRITICAL ERROR: Failed to close file 'sample.txt'");
        exit(EXIT_FAILURE); // 或者根据应用逻辑决定如何处理
    }
    printf("File 'sample.txt' closed successfully.\n");

    return 0;
}

代码解释:

使用 open 打开文件。
使用 read 和 write 循环读取并打印文件内容。
关键: 在所有可能的退出路径（正常结束、读/写错误）上都调用了 close(fd)。
最重要: 检查了 close(fd) 的返回值。如果返回 -1，则打印严重错误信息并退出。这确保了我们能发现 close 本身可能遇到的问题（如 I/O 错误导致缓冲区刷新失败）。

示例 2：处理多个文件描述符和错误检查

这个例子展示了打开多个文件，并在程序结束前逐一正确关闭它们，同时进行错误检查。

#include <unistd.h>  // close
#include <fcntl.h>   // open
#include <stdio.h>   // perror, printf
#include <stdlib.h>  // exit
#include <string.h>  // strerror

int main() {
    int fd1 = -1, fd2 = -1, fd3 = -1; // 初始化为无效值
    int result;

    // 打开几个不同的文件
    fd1 = open("file1.txt", O_RDONLY);
    if (fd1 == -1) {
        perror("Failed to open 'file1.txt'");
        // fd2, fd3 还未打开，无需关闭
        exit(EXIT_FAILURE);
    }

    fd2 = open("file2.txt", O_WRONLY | O_CREAT | O_TRUNC, 0644);
    if (fd2 == -1) {
        perror("Failed to open/create 'file2.txt'");
        // 关闭之前成功打开的 fd1
        if (close(fd1) == -1) {
            fprintf(stderr, "Warning: Also failed to close 'file1.txt': %s\n", strerror(errno));
        }
        exit(EXIT_FAILURE);
    }

    fd3 = open("/etc/passwd", O_RDONLY); // 尝试打开一个系统文件
    if (fd3 == -1) {
        perror("Failed to open '/etc/passwd'");
        // 关闭之前成功打开的 fd1 和 fd2
        if (close(fd1) == -1) {
            fprintf(stderr, "Warning: Also failed to close 'file1.txt': %s\n", strerror(errno));
        }
        if (close(fd2) == -1) {
            fprintf(stderr, "Warning: Also failed to close 'file2.txt': %s\n", strerror(errno));
        }
        exit(EXIT_FAILURE);
    }

    printf("All files opened successfully: fd1=%d, fd2=%d, fd3=%d\n", fd1, fd2, fd3);

    // ... 这里可以进行文件读写操作 ...

    // 程序结束前，关闭所有打开的文件描述符
    // 注意：关闭顺序通常不重要，但保持一致性是好习惯
    // 关闭时检查每个的返回值

    if (fd1 != -1) {
        result = close(fd1);
        if (result == -1) {
            // 主要错误：记录日志或处理
            fprintf(stderr, "ERROR: Failed to close 'file1.txt' (fd=%d): %s\n", fd1, strerror(errno));
            // 根据应用策略决定是否 exit(EXIT_FAILURE)
        } else {
            printf("Successfully closed 'file1.txt' (fd=%d)\n", fd1);
        }
        fd1 = -1; // 关闭后设为无效值，防止重复关闭
    }

    if (fd2 != -1) {
        result = close(fd2);
        if (result == -1) {
            fprintf(stderr, "ERROR: Failed to close 'file2.txt' (fd=%d): %s\n", fd2, strerror(errno));
        } else {
            printf("Successfully closed 'file2.txt' (fd=%d)\n", fd2);
        }
        fd2 = -1;
    }

    if (fd3 != -1) {
        result = close(fd3);
        if (result == -1) {
            fprintf(stderr, "ERROR: Failed to close '/etc/passwd' (fd=%d): %s\n", fd3, strerror(errno));
        } else {
            printf("Successfully closed '/etc/passwd' (fd=%d)\n", fd3);
        }
        fd3 = -1;
    }

    printf("Program finished closing all files.\n");
    return 0;
}

代码解释:

初始化文件描述符变量为 -1（无效值）。
依次尝试打开多个文件。
如果中间某个 open 失败，在退出前会关闭之前成功打开的文件描述符。
在程序正常结束前，有一个清理阶段，遍历所有可能有效的文件描述符（通过检查是否不等于 -1）并调用 close。
关键: 对每一次 close 调用都检查了返回值。如果失败，会打印错误信息。注意这里使用了 strerror(errno) 来获取 errno 对应的可读错误信息。
关闭后，将文件描述符变量设置回 -1，这是一种防止重复关闭的好习惯（虽然重复关闭同一个 已经关闭的 文件描述符通常是安全的，会返回错误 EBADF，但保持清晰的状态是好的实践）。

总结来说，close 是资源管理的关键环节。养成始终检查 close 返回值的习惯对于编写健壮的 Linux 程序至关重要。

发表在 linux文章 | 留下评论

connect系统调用及示例

发表于2025-08-06由麦芽爸

我们继续学习 Linux 系统编程中的重要函数。这次我们介绍 connect 函数，它是 TCP 客户端用来向服务器发起连接请求的核心系统调用。

1. 函数介绍

connect 是一个 Linux 系统调用，主要用于TCP 客户端（使用 SOCK_STREAM 套接字）来主动建立与服务器的连接。它也可以用于UDP 客户端（使用 SOCK_DGRAM 套接字）来设置默认的目标地址。

你可以把 connect 想象成拨打电话：

你先有了一个电话听筒（通过 socket 创建了套接字）。
你知道你要打给谁（知道服务器的 IP 地址和端口号）。
你按下拨打键（调用 connect）。
电话那头的服务器响铃，接听后，你们之间的通话线路就建立了。

对于 TCP 来说，connect 会触发 TCP 的**三次握手 **(Three-way Handshake) 过程，这是 TCP 协议用来建立可靠连接的标准步骤。

2. 函数原型

#include <sys/socket.h> // 必需

int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

3. 功能

**建立连接 **(TCP) 对于 TCP 套接字 (SOCK_STREAM)，connect 发起一个连接请求到由 addr 参数指定的服务器地址。它会执行 TCP 三次握手，直到连接成功建立或失败。
**设置默认目标 **(UDP) 对于 UDP 套接字 (SOCK_DGRAM)，connect 不会发送任何数据包或执行握手。它只是在内核中为该套接字记录下目标地址。之后对该套接字的 write/send 调用将默认发送到这个地址，read/recv 只会接收来自这个地址的数据。这简化了 UDP 客户端的编程，使其行为更像 TCP。

4. 参数

int sockfd: 这是之前通过 socket() 系统调用成功创建的套接字文件描述符。
const struct sockaddr *addr: 这是一个指向套接字地址结构的指针，该结构包含了要连接的服务器的地址信息（IP 地址和端口号）。
- 对于 IPv4，通常使用 struct sockaddr_in。
- 对于 IPv6，通常使用 struct sockaddr_in6。
- 在调用时，通常会将具体的地址结构（如 sockaddr_in）强制类型转换为 (struct sockaddr *) 传入。
socklen_t addrlen: 这是 addr 指向的地址结构的大小（以字节为单位）。
- 对于 struct sockaddr_in，这个值通常是 sizeof(struct sockaddr_in)。
- 对于 struct sockaddr_in6，这个值通常是 sizeof(struct sockaddr_in6)。

5. 返回值

**成功时 **(TCP) 对于 TCP 套接字，连接成功建立后，返回 0。此时，套接字 sockfd 已准备好进行数据传输（read/write）。
**成功时 **(UDP) 对于 UDP 套接字，总是立即返回 0，因为它只是设置了默认地址，并不真正“连接”。
失败时: 返回 -1，并设置全局变量 errno 来指示具体的错误原因（例如 ECONNREFUSED 远程主机拒绝连接，ETIMEDOUT 连接超时，EHOSTUNREACH 主机不可达，EADDRINUSE 本地地址已被使用，EINVAL 套接字状态无效等）。

阻塞与非阻塞:

阻塞套接字（默认）：调用 connect 会阻塞（挂起）当前进程，直到连接成功建立或发生错误。对于 TCP，这意味着等待三次握手完成。
非阻塞套接字（通过 SOCK_NONBLOCK 或 fcntl 设置）：调用 connect 会立即返回。
- 如果连接不能立即建立，connect 返回 -1，并将 errno 设置为 EINPROGRESS。
- 程序需要使用 select、poll 或 epoll 来检查套接字何时变为可写（表示连接完成），然后使用 getsockopt 检查 SO_ERROR 选项来确定连接最终是成功还是失败。

6. 相似函数，或关联函数

socket: 用于创建套接字，是 connect 的前置步骤。
bind: （服务器端）将套接字绑定到本地地址。客户端通常不需要显式调用 bind。
listen / accept: （服务器端）用于监听和接受客户端的连接请求。
getpeername: 连接建立后，用于获取对方（peer）的地址信息。
getsockname: 用于获取本地套接字的地址信息。
close: 关闭套接字，对于 TCP 连接，这会发起断开连接的四次挥手过程。

7. 示例代码

示例 1：基本的 TCP 客户端 `connect`

这个例子演示了一个典型的 TCP 客户端如何使用 connect 连接到服务器。

// tcp_client.c
#include <sys/socket.h>  // socket, connect
#include <netinet/in.h>  // sockaddr_in
#include <arpa/inet.h>   // inet_pton
#include <unistd.h>      // close, write, read
#include <stdio.h>       // perror, printf, fprintf
#include <stdlib.h>      // exit
#include <string.h>      // strlen, memset

#define PORT 8080
#define SERVER_IP "127.0.0.1" // 可替换为实际服务器 IP

int main() {
    int sock = 0;
    struct sockaddr_in serv_addr;
    char *hello = "Hello from TCP client";
    char buffer[1024] = {0};
    int valread;

    // 1. 创建 TCP 套接字
    if ((sock = socket(AF_INET, SOCK_STREAM, 0)) < 0) {
        perror("socket creation failed");
        exit(EXIT_FAILURE);
    }
    printf("TCP client socket created (fd: %d)\n", sock);

    // 2. 配置服务器地址
    memset(&serv_addr, 0, sizeof(serv_addr));
    serv_addr.sin_family = AF_INET;
    serv_addr.sin_port = htons(PORT);

    // 将 IPv4 地址从文本转换为二进制
    if (inet_pton(AF_INET, SERVER_IP, &serv_addr.sin_addr) <= 0) {
        fprintf(stderr, "Invalid address/ Address not supported: %s\n", SERVER_IP);
        close(sock);
        exit(EXIT_FAILURE);
    }

    // 3. 发起连接 (阻塞调用)
    printf("Connecting to %s:%d...\n", SERVER_IP, PORT);
    if (connect(sock, (struct sockaddr *)&serv_addr, sizeof(serv_addr)) < 0) {
        perror("connection failed");
        close(sock);
        exit(EXIT_FAILURE);
    }
    printf("Connected to server successfully.\n");

    // 4. 发送数据
    printf("Sending message: %s\n", hello);
    if (write(sock, hello, strlen(hello)) != (ssize_t)strlen(hello)) {
        perror("write failed");
        // 注意：write 返回 ssize_t，strlen 返回 size_t
        // 比较时最好类型一致或强制转换
    } else {
        printf("Message sent successfully.\n");
    }

    // 5. 接收响应
    printf("Waiting for server response...\n");
    valread = read(sock, buffer, sizeof(buffer) - 1);
    if (valread > 0) {
        buffer[valread] = '\0'; // 确保字符串结束
        printf("Received from server: %s\n", buffer);
    } else if (valread == 0) {
        printf("Server closed the connection.\n");
    } else {
        perror("read failed");
    }

    // 6. 关闭套接字
    close(sock);
    printf("Client socket closed.\n");

    return 0;
}

代码解释:

使用 socket(AF_INET, SOCK_STREAM, 0) 创建一个 IPv4 TCP 套接字。
初始化 struct sockaddr_in 结构体 serv_addr，填入服务器的 IP 地址和端口号。
- 使用 inet_pton(AF_INET, ...) 将点分十进制的 IP 字符串转换为网络二进制格式。
- 使用 htons(PORT) 将端口号从主机字节序转换为网络字节序。
关键步骤: 调用 connect(sock, (struct sockaddr *)&serv_addr, sizeof(serv_addr))。
- 这是一个阻塞调用。程序会在此处暂停，直到连接建立（三次握手完成）或失败。
- 如果服务器没有运行或无法访问，connect 会失败并返回 -1，同时设置 errno。
连接成功后，使用 write 发送数据到服务器。
使用 read 从服务器接收数据。
通信结束后，调用 close 关闭套接字。

示例 2：UDP 客户端使用 `connect` 简化通信

这个例子展示了如何在 UDP 客户端中使用 connect 来设置默认目标地址，从而可以使用 read/write 而不是 sendto/recvfrom。

// udp_client_with_connect.c
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define PORT 8081
#define SERVER_IP "127.0.0.1"

int main() {
    int sock;
    struct sockaddr_in serv_addr;
    char *message = "Hello UDP server via connect!";
    char buffer[1024];
    ssize_t bytes_sent, bytes_received;

    // 1. 创建 UDP 套接字
    sock = socket(AF_INET, SOCK_DGRAM, 0);
    if (sock < 0) {
        perror("socket creation failed");
        exit(EXIT_FAILURE);
    }
    printf("UDP client socket created (fd: %d)\n", sock);

    // 2. 配置服务器地址
    memset(&serv_addr, 0, sizeof(serv_addr));
    serv_addr.sin_family = AF_INET;
    serv_addr.sin_port = htons(PORT);
    if (inet_pton(AF_INET, SERVER_IP, &serv_addr.sin_addr) <= 0) {
        fprintf(stderr, "Invalid address/ Address not supported\n");
        close(sock);
        exit(EXIT_FAILURE);
    }

    // 3. 使用 connect 设置默认目标地址 (UDP 的 connect 不发送数据包)
    printf("Connecting UDP socket to %s:%d (sets default destination)...\n", SERVER_IP, PORT);
    if (connect(sock, (const struct sockaddr *)&serv_addr, sizeof(serv_addr)) < 0) {
        perror("connect failed");
        close(sock);
        exit(EXIT_FAILURE);
    }
    printf("UDP socket 'connected' (default destination set).\n");

    // 4. 发送数据 (无需指定地址)
    // write/send 都可以，因为目标地址已通过 connect 设置
    bytes_sent = write(sock, message, strlen(message));
    if (bytes_sent < 0) {
        perror("write failed");
    } else {
        printf("Sent %zd bytes to server: %s\n", bytes_sent, message);
    }

    // 5. 接收数据 (只接收来自已连接地址的数据)
    // read/recv 都可以
    bytes_received = read(sock, buffer, sizeof(buffer) - 1);
    if (bytes_received < 0) {
        perror("read failed");
    } else if (bytes_received == 0) {
        printf("Server closed the (logical) connection.\n");
    } else {
        buffer[bytes_received] = '\0';
        printf("Received %zd bytes from server: %s\n", bytes_received, buffer);
    }

    // 6. 关闭套接字
    close(sock);
    printf("UDP client socket closed.\n");

    return 0;
}

代码解释:

创建一个 SOCK_DGRAM (UDP) 套接字。
配置服务器地址 serv_addr。
关键步骤: 调用 connect(sock, ...). 对于 UDP，这不会发送任何网络数据包。
它只是告诉内核：“对于这个套接字 sock，如果没有特别指定，以后发送的数据就发到 serv_addr，接收的数据也只接受来自 serv_addr 的。”
连接后，可以使用 write/read 或 send/recv 进行数据传输，无需再指定目标地址（不像 sendto/recvfrom 那样）。
这简化了 UDP 客户端的代码，使其用法更接近 TCP。

示例 3：非阻塞 `connect` (高级用法)

这个例子演示了如何对非阻塞套接字使用 connect，并使用 select 来等待连接完成。

// nonblocking_connect.c
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <unistd.h>
#include <fcntl.h>    // fcntl
#include <sys/select.h> // select
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>    // errno

#define PORT 8080
#define SERVER_IP "127.0.0.1"
#define TIMEOUT_SEC 5

int main() {
    int sock;
    struct sockaddr_in serv_addr;
    fd_set write_fds;
    struct timeval timeout;
    int error;
    socklen_t len = sizeof(error);

    // 1. 创建 TCP 套接字
    sock = socket(AF_INET, SOCK_STREAM, 0);
    if (sock < 0) {
        perror("socket creation failed");
        exit(EXIT_FAILURE);
    }

    // 2. 将套接字设置为非阻塞模式
    int flags = fcntl(sock, F_GETFL, 0);
    if (flags < 0) {
        perror("fcntl F_GETFL failed");
        close(sock);
        exit(EXIT_FAILURE);
    }
    if (fcntl(sock, F_SETFL, flags | O_NONBLOCK) < 0) {
        perror("fcntl F_SETFL failed");
        close(sock);
        exit(EXIT_FAILURE);
    }
    printf("Socket set to non-blocking mode.\n");

    // 3. 配置服务器地址
    memset(&serv_addr, 0, sizeof(serv_addr));
    serv_addr.sin_family = AF_INET;
    serv_addr.sin_port = htons(PORT);
    if (inet_pton(AF_INET, SERVER_IP, &serv_addr.sin_addr) <= 0) {
        fprintf(stderr, "Invalid address\n");
        close(sock);
        exit(EXIT_FAILURE);
    }

    // 4. 发起连接 (非阻塞调用)
    printf("Initiating non-blocking connect to %s:%d...\n", SERVER_IP, PORT);
    int conn_result = connect(sock, (struct sockaddr *)&serv_addr, sizeof(serv_addr));

    if (conn_result < 0) {
        if (errno != EINPROGRESS) {
            perror("connect failed with unexpected error");
            close(sock);
            exit(EXIT_FAILURE);
        }
        // 如果 errno == EINPROGRESS, 连接正在进行中
        printf("Connect in progress...\n");
    } else {
        // 立即成功 (罕见)
        printf("Connect succeeded immediately.\n");
        close(sock);
        return 0;
    }

    // 5. 使用 select 等待套接字变为可写 (连接完成或失败)
    FD_ZERO(&write_fds);
    FD_SET(sock, &write_fds);
    timeout.tv_sec = TIMEOUT_SEC;
    timeout.tv_usec = 0;

    printf("Waiting up to %d seconds for connection to complete...\n", TIMEOUT_SEC);
    int select_result = select(sock + 1, NULL, &write_fds, NULL, &timeout);

    if (select_result < 0) {
        perror("select failed");
        close(sock);
        exit(EXIT_FAILURE);
    } else if (select_result == 0) {
        printf("Connection timed out after %d seconds.\n", TIMEOUT_SEC);
        close(sock);
        exit(EXIT_FAILURE);
    } else {
        // select 返回 > 0, 表示至少有一个 fd 就绪
        // 我们只监视了 sock 的可写事件
        if (FD_ISSET(sock, &write_fds)) {
            // 套接字可写，连接过程完成（成功或失败）
            // 需要通过 getsockopt 检查 SO_ERROR 来确定最终结果
            if (getsockopt(sock, SOL_SOCKET, SO_ERROR, &error, &len) < 0) {
                perror("getsockopt failed");
                close(sock);
                exit(EXIT_FAILURE);
            }

            if (error != 0) {
                // error 变量包含了 connect 失败时的 errno 值
                errno = error;
                perror("connect failed asynchronously");
                close(sock);
                exit(EXIT_FAILURE);
            } else {
                printf("Asynchronous connect succeeded!\n");
            }
        }
    }

    // 6. 连接成功，可以进行通信了
    printf("Ready to send/receive data on the connected socket.\n");

    // ... 这里可以进行 read/write 操作 ...

    close(sock);
    printf("Socket closed.\n");
    return 0;
}

代码解释:

创建 TCP 套接字。
使用 fcntl 将套接字设置为非阻塞模式 (O_NONBLOCK)。
配置服务器地址。
调用 connect。因为套接字是非阻塞的：
- 如果连接能立即建立，connect 返回 0（罕见）。
- 如果连接不能立即建立（通常是这种情况），connect 返回 -1，并将 errno 设置为 EINPROGRESS。这表明连接正在后台进行。
关键: 使用 select 来等待连接完成。
- 监视套接字的可写 (write_fds) 事件。对于非阻塞 connect，当连接尝试完成（无论成功还是失败）时，套接字会变为可写。
- 设置一个超时时间，避免无限期等待。
select 返回后，检查是超时还是套接字就绪。
如果套接字就绪，调用 getsockopt(sock, SOL_SOCKET, SO_ERROR, ...) 来获取连接的最终状态。
- 如果 SO_ERROR 的值为 0，表示连接成功。
- 如果 SO_ERROR 的值非 0，该值就是连接失败时的错误码，将其赋给 errno 并打印错误信息。
连接成功后，套接字就可以像平常一样用于 read/write 了。

重要提示与注意事项:

TCP 三次握手: 对于 TCP 套接字，connect 的核心作用是启动并等待三次握手完成。
UDP 的特殊性: 对于 UDP，connect 不涉及网络交互，仅在内核中设置默认地址。
阻塞 vs 非阻塞: 理解阻塞和非阻塞 connect 的行为差异对于编写高性能或响应式网络程序至关重要。
错误处理: connect 失败的错误码 (errno) 提供了丰富的信息，如 ECONNREFUSED (端口未监听), ETIMEDOUT (超时), ENETUNREACH (网络不可达) 等。
客户端通常不 bind: 客户端程序通常不需要调用 bind 来绑定本地地址，操作系统会自动分配一个临时端口。
getpeername: 连接建立后，可以使用 getpeername 来确认连接的对端地址。

总结:

connect 是 TCP 客户端发起网络连接的关键函数，它对于 UDP 客户端则提供了一种简化地址管理的方法。掌握其在阻塞和非阻塞模式下的行为对于进行有效的网络编程非常重要。

发表在 linux文章 | 留下评论

copy_file_range系统调用示例

发表于2025-08-06由麦芽爸

这次我们介绍 copy_file_rangeLinux 系统编程中的重要函数

1. 函数介绍

copy_file_range 是一个相对较新的 Linux 系统调用（内核版本 >= 4.5），专门用于在两个文件描述符之间高效地复制数据。

你可以把它想象成一个优化版的 “文件剪切板” 功能：

你不需要先 read 把数据从源文件拿到用户空间的缓冲区。
也不需要再 write 把数据从用户空间缓冲区放到目标文件。
而是直接告诉内核：“嘿，内核，帮我把数据从文件 A 的这里，复制到文件 B 的那里。”

内核会尽可能地在内核空间内部完成这个操作，利用各种优化手段（如copy-on-write (COW)、 reflink、缓冲区到缓冲区的直接传输等），避免了在用户空间和内核空间之间来回复制数据（即避免了用户态和内核态的上下文切换以及数据拷贝**的开销），从而极大地提高了文件复制的效率。

这在复制大文件、备份、文件系统内部移动/复制（如果文件系统支持）等场景下尤其有用。

2. 函数原型

#define _GNU_SOURCE // 必须定义以使用 copy_file_range
#include <unistd.h> // ssize_t
#include <fcntl.h>  // 定义了相关的标志 (如果需要)

ssize_t copy_file_range(int fd_in, off_t *off_in,
                        int fd_out, off_t *off_out,
                        size_t len, unsigned int flags);

3. 功能

高效复制: 在内核内部将数据从一个文件描述符 fd_in 复制到另一个文件描述符 fd_out。
指定范围: 可以指定源文件的起始偏移量 off_in、目标文件的起始偏移量 off_out 以及要复制的字节数 len。
灵活偏移: 通过 off_in 和 off_out 指针，可以控制是使用文件的当前偏移量还是指定绝对偏移量。
潜在优化: 内核可能会利用文件系统特性（如 reflink）来实现零拷贝或写时复制，使得复制操作极其快速。

4. 参数

int fd_in: 源文件的文件描述符。这个文件描述符必须是可读的。
off_t *off_in: 指向一个 off_t 类型变量的指针，该变量指定在源文件中开始复制的偏移量。
- 如果 off_in 是 NULL: 复制从源文件的当前偏移量（由 lseek(fd_in, 0, SEEK_CUR) 决定）开始。复制操作会更新源文件的当前偏移量（增加已复制的字节数）。
- 如果 off_in 非 NULL: 复制从 *off_in 指定的绝对偏移量开始。复制操作不会更新源文件的当前偏移量，但会更新 *off_in 的值（增加已复制的字节数）。
int fd_out: 目标文件的文件描述符。这个文件描述符必须是可写的。
off_t *off_out: 指向一个 off_t 类型变量的指针，该变量指定在目标文件中开始写入的偏移量。
- 如果 off_out 是 NULL: 数据写入到目标文件的当前偏移量。复制操作会更新目标文件的当前偏移量。
- 如果 off_out 非 NULL: 数据写入到 *off_out 指定的绝对偏移量。复制操作不会更新目标文件的当前偏移量，但会更新 *off_out 的值。
size_t len: 请求复制的最大字节数。
unsigned int flags: 控制复制行为的标志。在 Linux 中，目前这个参数必须设置为 0。保留供将来扩展。

5. 返回值

成功时: 返回实际复制的字节数（一个非负值）。这个数可能小于请求的 len（例如，在读取时遇到文件末尾）。
失败时: 返回 -1，并设置全局变量 errno 来指示具体的错误原因（例如 EBADF 文件描述符无效或权限不足，EINVAL 参数无效，EXDEV fd_in 和 fd_out 不在同一个文件系统挂载点上且文件系统不支持跨挂载点复制，ENOMEM 内存不足等）。

6. 相似函数，或关联函数

sendfile: 用于在文件描述符之间（通常是文件到套接字）高效传输数据，是 copy_file_range 的前身和灵感来源之一。sendfile 通常不支持两个普通文件之间的复制（在旧内核上）。
splice: 用于在两个可 pipe 的文件描述符之间移动数据，也是一种零拷贝技术。
传统的 read/write 循环: 最基础的文件复制方法，效率较低，因为涉及多次用户态/内核态切换和数据拷贝。
mmap + memcpy: 另一种零拷贝思路，但使用起来更复杂，且不一定比 copy_file_range 更快。

7. 示例代码

示例 1：基本使用 `copy_file_range` 复制文件

这个例子演示了如何使用 copy_file_range 将一个文件的内容复制到另一个文件。

// copy_file_range_basic.c
#define _GNU_SOURCE
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <sys/stat.h>

#define BUFFER_SIZE 1024

int main(int argc, char *argv[]) {
    if (argc != 3) {
        fprintf(stderr, "Usage: %s <source_file> <destination_file>\n", argv[0]);
        exit(EXIT_FAILURE);
    }

    const char *src_filename = argv[1];
    const char *dst_filename = argv[2];
    int src_fd, dst_fd;
    struct stat src_stat;
    off_t offset_in, offset_out;
    ssize_t bytes_copied, total_bytes_copied = 0;
    size_t remaining;

    // 1. 打开源文件 (只读)
    src_fd = open(src_filename, O_RDONLY);
    if (src_fd == -1) {
        perror("Error opening source file");
        exit(EXIT_FAILURE);
    }

    // 2. 获取源文件大小
    if (fstat(src_fd, &src_stat) == -1) {
        perror("Error getting source file stats");
        close(src_fd);
        exit(EXIT_FAILURE);
    }

    // 3. 创建/打开目标文件 (写入、创建、截断)
    dst_fd = open(dst_filename, O_WRONLY | O_CREAT | O_TRUNC, 0644);
    if (dst_fd == -1) {
        perror("Error opening/creating destination file");
        close(src_fd);
        exit(EXIT_FAILURE);
    }

    printf("Copying '%s' to '%s' using copy_file_range()...\n", src_filename, dst_filename);
    printf("Source file size: %ld bytes\n", (long)src_stat.st_size);

    // 4. 使用 copy_file_range 进行复制
    // 初始化偏移量为 0
    offset_in = 0;
    offset_out = 0;
    remaining = src_stat.st_size;

    while (remaining > 0) {
        // 尝试复制剩余的所有字节，或者一个大块
        // copy_file_range 可能不会一次复制完所有请求的字节
        size_t to_copy = (remaining > 0x7ffff000) ? 0x7ffff000 : remaining; // 限制单次调用大小

        bytes_copied = copy_file_range(src_fd, &offset_in, dst_fd, &offset_out, to_copy, 0);

        if (bytes_copied == -1) {
            perror("Error in copy_file_range");
            // 尝试清理
            close(src_fd);
            close(dst_fd);
            exit(EXIT_FAILURE);
        }

        if (bytes_copied == 0) {
            // 可能已经到达源文件末尾
            fprintf(stderr, "Warning: copy_file_range returned 0 before copying all data.\n");
            break;
        }

        total_bytes_copied += bytes_copied;
        remaining -= bytes_copied;
        printf("  Copied %zd bytes (total: %zd)\n", bytes_copied, total_bytes_copied);
    }

    printf("Copy completed. Total bytes copied: %zd\n", total_bytes_copied);

    // 5. 关闭文件描述符
    if (close(src_fd) == -1) {
        perror("Error closing source file");
    }
    if (close(dst_fd) == -1) {
        perror("Error closing destination file");
    }

    return 0;
}

如何测试:

# 创建一个大一点的测试文件
dd if=/dev/urandom of=large_source_file.txt bs=1M count=10 # 创建 10MB 随机数据文件
# 或者简单点
echo "This is the content of the source file." > small_source_file.txt

# 编译并运行
gcc -o copy_file_range_basic copy_file_range_basic.c
./copy_file_range_basic small_source_file.txt copied_file.txt

# 检查结果
cat copied_file.txt
ls -l small_source_file.txt copied_file.txt

代码解释:

检查命令行参数。
以只读模式打开源文件 src_fd。
使用 fstat 获取源文件的大小 src_stat.st_size。
以写入、创建、截断模式打开（或创建）目标文件 dst_fd。
关键步骤: 进入 while 循环进行复制。
- 初始化 offset_in 和 offset_out 为 0。
- remaining 变量跟踪还剩多少字节需要复制。
- 在循环中，调用 copy_file_range(src_fd, &offset_in, dst_fd, &offset_out, to_copy, 0)。
  - src_fd, dst_fd: 源和目标文件描述符。
  - &offset_in, &offset_out: 传递偏移量的指针。这使得 copy_file_range 在复制后自动更新这两个变量，指向下一次复制的起始位置。
  - to_copy: 本次尝试复制的字节数（做了大小限制）。
  - 0: flags 参数，必须为 0。
- 检查返回值 bytes_copied。
- 如果成功（> 0），则更新 total_bytes_copied 和 remaining。
- 如果返回 0，可能表示源文件已到末尾。
- 如果返回 -1，则处理错误。
循环直到 remaining 为 0 或出错。
打印总复制字节数。
关闭文件描述符。

示例 2：对比 `copy_file_range` 与传统 `read`/`write` 循环

这个例子通过复制同一个大文件，对比 copy_file_range 和传统的 read/write 循环在性能上的差异。

// copy_file_range_vs_read_write.c
#define _GNU_SOURCE
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <sys/stat.h>
#include <time.h>

#define BUFFER_SIZE (1024 * 1024) // 1MB buffer

// 使用 read/write 循环复制文件
ssize_t copy_with_read_write(int src_fd, int dst_fd) {
    char *buffer = malloc(BUFFER_SIZE);
    if (!buffer) {
        perror("malloc buffer");
        return -1;
    }

    ssize_t total = 0;
    ssize_t nread, nwritten;

    while ((nread = read(src_fd, buffer, BUFFER_SIZE)) > 0) {
        char *buf_ptr = buffer;
        ssize_t nleft = nread;

        while (nleft > 0) {
            nwritten = write(dst_fd, buf_ptr, nleft);
            if (nwritten <= 0) {
                if (nwritten == -1 && errno == EINTR) {
                    continue; // Interrupted, retry
                }
                perror("write");
                free(buffer);
                return -1;
            }
            nleft -= nwritten;
            buf_ptr += nwritten;
        }
        total += nread;
    }

    if (nread == -1) {
        perror("read");
        free(buffer);
        return -1;
    }

    free(buffer);
    return total;
}

// 使用 copy_file_range 复制文件
ssize_t copy_with_copy_file_range(int src_fd, int dst_fd, size_t file_size) {
    off_t offset_in = 0;
    off_t offset_out = 0;
    size_t remaining = file_size;
    ssize_t bytes_copied, total = 0;

    while (remaining > 0) {
        size_t to_copy = (remaining > 0x7ffff000) ? 0x7ffff000 : remaining;
        bytes_copied = copy_file_range(src_fd, &offset_in, dst_fd, &offset_out, to_copy, 0);
        if (bytes_copied == -1) {
            perror("copy_file_range");
            return -1;
        }
        if (bytes_copied == 0) {
            break;
        }
        total += bytes_copied;
        remaining -= bytes_copied;
    }
    return total;
}

int main(int argc, char *argv[]) {
    if (argc != 2) {
        fprintf(stderr, "Usage: %s <source_file>\n", argv[0]);
        exit(EXIT_FAILURE);
    }

    const char *src_filename = argv[1];
    int src_fd;
    struct stat src_stat;
    clock_t start, end;
    double cpu_time_used;

    // 打开源文件
    src_fd = open(src_filename, O_RDONLY);
    if (src_fd == -1) {
        perror("open source file");
        exit(EXIT_FAILURE);
    }

    if (fstat(src_fd, &src_stat) == -1) {
        perror("fstat source file");
        close(src_fd);
        exit(EXIT_FAILURE);
    }

    printf("Source file: %s\n", src_filename);
    printf("File size: %ld bytes (%.2f MB)\n", (long)src_stat.st_size, (double)src_stat.st_size / (1024*1024));


    // --- 测试 1: copy_file_range ---
    printf("\n--- Testing copy_file_range ---\n");
    char dst_filename1[] = "copy_file_range_dst.tmp";
    int dst_fd1 = open(dst_filename1, O_WRONLY | O_CREAT | O_TRUNC, 0644);
    if (dst_fd1 == -1) {
        perror("open destination file 1");
        close(src_fd);
        exit(EXIT_FAILURE);
    }

    // 重置源文件偏移量
    if (lseek(src_fd, 0, SEEK_SET) == -1) {
        perror("lseek src_fd");
        close(src_fd);
        close(dst_fd1);
        exit(EXIT_FAILURE);
    }

    start = clock();
    ssize_t copied1 = copy_with_copy_file_range(src_fd, dst_fd1, src_stat.st_size);
    end = clock();

    if (copied1 == -1) {
        fprintf(stderr, "copy_file_range failed.\n");
    } else {
        cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC;
        printf("  Bytes copied: %zd\n", copied1);
        printf("  Time taken: %f seconds\n", cpu_time_used);
    }

    close(dst_fd1);


    // --- 测试 2: read/write loop ---
    printf("\n--- Testing read/write loop ---\n");
    char dst_filename2[] = "read_write_dst.tmp";
    int dst_fd2 = open(dst_filename2, O_WRONLY | O_CREAT | O_TRUNC, 0644);
    if (dst_fd2 == -1) {
        perror("open destination file 2");
        close(src_fd);
        // Cleanup
        unlink(dst_filename1);
        exit(EXIT_FAILURE);
    }

    // 重置源文件偏移量
    if (lseek(src_fd, 0, SEEK_SET) == -1) {
        perror("lseek src_fd");
        close(src_fd);
        close(dst_fd2);
        unlink(dst_filename1);
        exit(EXIT_FAILURE);
    }

    start = clock();
    ssize_t copied2 = copy_with_read_write(src_fd, dst_fd2);
    end = clock();

    if (copied2 == -1) {
        fprintf(stderr, "read/write loop failed.\n");
    } else {
        cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC;
        printf("  Bytes copied: %zd\n", copied2);
        printf("  Time taken: %f seconds\n", cpu_time_used);
    }

    close(dst_fd2);
    close(src_fd);

    // --- 清理 ---
    unlink(dst_filename1);
    unlink(dst_filename2);

    printf("\nPerformance comparison completed.\n");
    if (copied1 != -1 && copied2 != -1) {
        printf("copy_file_range is expected to be faster, especially for large files on same filesystem.\n");
    }
    return 0;
}

如何测试:

# 创建一个较大的测试文件
dd if=/dev/zero of=test_large_file.txt bs=1M count=100 # 100MB 文件

# 编译并运行
gcc -o copy_file_range_vs_read_write copy_file_range_vs_read_write.c
./copy_file_range_vs_read_write test_large_file.txt

代码解释:

定义了两个函数：copy_with_read_write 和 copy_with_copy_file_range，分别实现两种复制方法。
copy_with_read_write:
- 分配一个 1MB 的缓冲区。
- 使用 while 循环 read 数据到缓冲区。
- 内层 while 循环确保 write 将整个缓冲区的内容都写入目标文件（处理 write 可能部分写入的情况）。
- 累计复制的总字节数。
copy_with_copy_file_range:
- 使用 off_t 变量 offset_in 和 offset_out 来跟踪源和目标的偏移量。
- 使用 while 循环调用 copy_file_range，直到复制完所有数据。
main 函数：
- 获取源文件大小。
- 依次测试两种方法。
- 使用 clock() 来测量 CPU 时间（注意：clock 测量的是 CPU 时间，不是墙上时间，但对于比较相对性能还是有用的）。
- 打印结果并清理临时文件。

重要提示与注意事项:

内核版本: 需要 Linux 内核 4.5 或更高版本。
glibc 版本: 需要 glibc 2.27 或更高版本才能直接使用 copy_file_range 函数。旧版本可能需要使用 syscall。
性能优势: copy_file_range 的主要优势在于其潜在的内核内部优化。如果源文件和目标文件在同一个支持 reflink 的文件系统（如 Btrfs, XFS, OCFS2）上，copy_file_range 可能会瞬间创建一个写时复制（COW）的副本，速度极快。即使不支持 reflink，它也通常比 read/write 循环更高效，因为它减少了用户态和内核态之间的数据拷贝。
flags 参数: 目前必须为 0。未来可能会添加新标志。
跨文件系统: copy_file_range 可能不支持在不同挂载点的文件系统之间复制（返回 EXDEV 错误），尽管某些组合可能支持。
偏移量指针: 理解 off_in 和 off_out 指针的行为（NULL vs 非 NULL）非常重要。使用指针允许在不修改文件自身偏移量的情况下进行复制，非常适合多线程环境或需要精确控制偏移的场景。
返回值: 像许多 I/O 函数一样，copy_file_range 可能不会一次完成所有请求的字节复制，需要循环处理。
错误处理: 始终检查返回值和 errno。EBADF, EINVAL, EXDEV, ENOMEM 是可能遇到的错误。

总结:

copy_file_range 是一个强大且高效的系统调用，用于在文件描述符之间复制数据。它通过将复制操作完全下放到内核来避免用户空间的开销，并可能利用底层文件系统的高级特性（如 reflink）来实现极致的性能。对于需要在 Linux 系统上进行高性能文件复制的应用程序来说，copy_file_range 是一个值得优先考虑的选择。

发表在 linux文章 | 留下评论

chdir 系统调用及示例

发表于2025-08-06由麦芽爸

chdir – 改变当前工作目录

函数介绍

chdir系统调用用于改变进程的当前工作目录。成功调用后，进程的所有相对路径操作都基于新的工作目录进行。

函数原型

#include <unistd.h>

int chdir(const char *path);

功能

改变进程当前工作目录到指定路径。

参数

const char *path: 目标目录的路径名（可以是相对路径或绝对路径）

返回值

成功时返回0
失败时返回-1，并设置errno：
- EACCES: 权限不足
- EIO: I/O错误
- ELOOP: 符号链接循环
- ENAMETOOLONG: 路径名过长
- ENOENT: 目录不存在
- ENOTDIR: 路径不是目录
- EROFS: 目录在只读文件系统上

相似函数

fchdir(): 通过文件描述符改变当前工作目录
getcwd(): 获取当前工作目录

示例代码

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
#include <sys/stat.h>
#include <limits.h>

int main() {
    char buffer[PATH_MAX];
    char original_dir[PATH_MAX];
    
    printf("=== Chdir函数示例 ===\n");
    
    // 保存原始目录
    if (getcwd(original_dir, sizeof(original_dir)) == NULL) {
        perror("获取原始目录失败");
        exit(EXIT_FAILURE);
    }
    printf("原始工作目录: %s\n", original_dir);
    
    // 示例1: 基本的目录切换操作
    printf("\n示例1: 基本的目录切换操作\n");
    
    // 切换到根目录
    if (chdir("/") == -1) {
        perror("切换到根目录失败");
    } else {
        printf("成功切换到根目录\n");
        if (getcwd(buffer, sizeof(buffer)) != NULL) {
            printf("当前目录: %s\n", buffer);
        }
    }
    
    // 切换回原始目录
    if (chdir(original_dir) == -1) {
        perror("返回原始目录失败");
    } else {
        printf("成功返回原始目录\n");
        if (getcwd(buffer, sizeof(buffer)) != NULL) {
            printf("当前目录: %s\n", buffer);
        }
    }
    
    // 示例2: 创建测试环境
    printf("\n示例2: 创建测试环境\n");
    
    // 创建测试目录结构
    const char *base_dir = "test_chdir_base";
    const char *sub_dir1 = "test_chdir_base/subdir1";
    const char *sub_dir2 = "test_chdir_base/subdir2";
    const char *deep_dir = "test_chdir_base/subdir1/deepdir";
    
    // 创建目录
    if (mkdir(base_dir, 0755) == -1 && errno != EEXIST) {
        perror("创建基础目录失败");
    } else {
        printf("创建基础目录: %s\n", base_dir);
    }
    
    if (mkdir(sub_dir1, 0755) == -1 && errno != EEXIST) {
        perror("创建子目录1失败");
    } else {
        printf("创建子目录1: %s\n", sub_dir1);
    }
    
    if (mkdir(sub_dir2, 0755) == -1 && errno != EEXIST) {
        perror("创建子目录2失败");
    } else {
        printf("创建子目录2: %s\n", sub_dir2);
    }
    
    if (mkdir(deep_dir, 0755) == -1 && errno != EEXIST) {
        perror("创建深层目录失败");
    } else {
        printf("创建深层目录: %s\n", deep_dir);
    }
    
    // 在目录中创建测试文件
    int fd = open("test_chdir_base/test_file.txt", O_CREAT | O_WRONLY, 0644);
    if (fd != -1) {
        const char *content = "Test file content for chdir demonstration";
        write(fd, content, strlen(content));
        close(fd);
        printf("创建测试文件: test_chdir_base/test_file.txt\n");
    }
    
    // 示例3: 绝对路径和相对路径切换
    printf("\n示例3: 绝对路径和相对路径切换\n");
    
    // 使用绝对路径切换
    char absolute_path[PATH_MAX * 2];
    snprintf(absolute_path, sizeof(absolute_path), "%s/%s", original_dir, base_dir);
    printf("使用绝对路径切换: %s\n", absolute_path);
    
    if (chdir(absolute_path) == -1) {
        perror("使用绝对路径切换失败");
    } else {
        printf("绝对路径切换成功\n");
        if (getcwd(buffer, sizeof(buffer)) != NULL) {
            printf("当前目录: %s\n", buffer);
        }
    }
    
    // 使用相对路径切换到子目录
    printf("使用相对路径切换到子目录1\n");
    if (chdir("subdir1") == -1) {
        perror("切换到子目录1失败");
    } else {
        printf("切换到子目录1成功\n");
        if (getcwd(buffer, sizeof(buffer)) != NULL) {
            printf("当前目录: %s\n", buffer);
        }
    }
    
    // 使用相对路径返回上级目录
    printf("使用相对路径返回上级目录\n");
    if (chdir("..") == -1) {
        perror("返回上级目录失败");
    } else {
        printf("返回上级目录成功\n");
        if (getcwd(buffer, sizeof(buffer)) != NULL) {
            printf("当前目录: %s\n", buffer);
        }
    }
    
    // 切换到深层目录
    printf("切换到深层目录\n");
    if (chdir("subdir1/deepdir") == -1) {
        perror("切换到深层目录失败");
    } else {
        printf("切换到深层目录成功\n");
        if (getcwd(buffer, sizeof(buffer)) != NULL) {
            printf("当前目录: %s\n", buffer);
        }
    }
    
    // 使用多个..返回
    printf("使用多个..返回原始目录\n");
    if (chdir("../../..") == -1) {
        perror("多级返回失败");
    } else {
        printf("多级返回成功\n");
        if (getcwd(buffer, sizeof(buffer)) != NULL) {
            printf("当前目录: %s\n", buffer);
        }
    }
    
    // 示例4: 目录切换的副作用演示
    printf("\n示例4: 目录切换的副作用演示\n");
    
    // 切换到测试目录
    if (chdir(base_dir) == -1) {
        perror("切换到测试目录失败");
    } else {
        printf("切换到测试目录\n");
        
        // 在当前目录创建文件
        fd = open("created_in_cwd.txt", O_CREAT | O_WRONLY, 0644);
        if (fd != -1) {
            const char *file_content = "File created in current working directory";
            write(fd, file_content, strlen(file_content));
            close(fd);
            printf("在当前目录创建文件: created_in_cwd.txt\n");
        }
        
        // 列出当前目录文件
        printf("当前目录文件:\n");
        system("ls -la");
        
        // 切换目录后再次查看
        if (chdir("subdir1") == -1) {
            perror("切换到子目录失败");
        } else {
            printf("\n切换到子目录后:\n");
            system("ls -la");
        }
    }
    
    // 返回原始目录
    if (chdir(original_dir) == -1) {
        perror("返回原始目录失败");
    }
    
    // 示例5: 错误处理演示
    printf("\n示例5: 错误处理演示\n");
    
    // 尝试切换到不存在的目录
    if (chdir("/nonexistent/directory") == -1) {
        printf("切换到不存在的目录: %s\n", strerror(errno));
    }
    
    // 尝试切换到文件而不是目录
    if (chdir("/etc/passwd") == -1) {
        printf("切换到文件而非目录: %s\n", strerror(errno));
    }
    
    // 尝试切换到没有权限的目录
    if (chdir("/root") == -1) {
        printf("切换到无权限目录: %s\n", strerror(errno));
    }
    
    // 尝试使用过长的路径名
    char long_path[PATH_MAX + 100];
    memset(long_path, 'a', sizeof(long_path) - 1);
    long_path[sizeof(long_path) - 1] = '\0';
    if (chdir(long_path) == -1) {
        printf("使用过长路径名: %s\n", strerror(errno));
    }
    
    // 示例6: 实际应用场景
    printf("\n示例6: 实际应用场景\n");
    
    // 场景1: 程序初始化时切换到工作目录
    printf("场景1: 程序工作目录设置\n");
    const char *work_dir = "/tmp";
    if (chdir(work_dir) == -1) {
        printf("无法切换到工作目录 %s: %s\n", work_dir, strerror(errno));
        printf("使用当前目录作为工作目录\n");
    } else {
        printf("成功切换到工作目录: %s\n", work_dir);
    }
    
    // 场景2: 备份脚本中的目录操作
    printf("场景2: 备份操作模拟\n");
    char backup_dir[PATH_MAX];
    snprintf(backup_dir, sizeof(backup_dir), "%s/backup_test", original_dir);
    
    // 创建备份目录
    if (mkdir(backup_dir, 0755) == -1 && errno != EEXIST) {
        perror("创建备份目录失败");
    } else {
        printf("创建备份目录: %s\n", backup_dir);
        
        // 切换到备份目录
        if (chdir(backup_dir) == -1) {
            perror("切换到备份目录失败");
        } else {
            printf("切换到备份目录进行操作\n");
            
            // 模拟备份操作
            system("echo 'Backup operation in progress...' > backup.log");
            system("date >> backup.log");
            printf("备份操作记录已保存\n");
        }
    }
    
    // 场景3: 构建系统中的目录管理
    printf("场景3: 构建系统目录管理\n");
    struct {
        const char *dir_name;
        const char *purpose;
    } build_dirs[] = {
        {"src", "源代码目录"},
        {"include", "头文件目录"},
        {"lib", "库文件目录"},
        {"bin", "可执行文件目录"},
        {"obj", "目标文件目录"}
    };
    
    // 切换到基础目录
    if (chdir(base_dir) == -1) {
        perror("切换到基础目录失败");
    } else {
        printf("在 %s 中创建构建目录结构:\n", base_dir);
        
        for (int i = 0; i < 5; i++) {
            if (mkdir(build_dirs[i].dir_name, 0755) == -1 && errno != EEXIST) {
                printf("  创建 %s 失败: %s\n", build_dirs[i].dir_name, strerror(errno));
            } else {
                printf("  创建 %s (%s)\n", build_dirs[i].dir_name, build_dirs[i].purpose);
            }
        }
    }
    
    // 场景4: Web服务器目录切换
    printf("场景4: Web服务器目录安全\n");
    const char *web_root = "/var/www";
    printf("Web服务器尝试切换到根目录: %s\n", web_root);
    
    // 检查目录是否存在和可访问
    if (access(web_root, F_OK) == 0) {
        if (access(web_root, R_OK | X_OK) == 0) {
            printf("目录存在且可访问\n");
            // 在实际应用中会进行chdir操作
        } else {
            printf("目录存在但权限不足\n");
        }
    } else {
        printf("目录不存在或无法访问\n");
    }
    
    // 示例7: 目录切换的安全考虑
    printf("\n示例7: 目录切换的安全考虑\n");
    
    // 保存原始目录文件描述符（用于安全返回）
    int original_fd = open(".", O_RDONLY);
    if (original_fd != -1) {
        printf("保存原始目录文件描述符: %d\n", original_fd);
        
        // 执行目录切换
        if (chdir(base_dir) == -1) {
            perror("切换目录失败");
        } else {
            printf("切换到测试目录\n");
            
            // 执行一些操作
            system("pwd");
            
            // 使用文件描述符安全返回
            if (fchdir(original_fd) == -1) {
                perror("使用文件描述符返回失败");
            } else {
                printf("使用文件描述符安全返回原始目录\n");
                system("pwd");
            }
        }
        
        close(original_fd);
    }
    
    // 示例8: 相对路径解析演示
    printf("\n示例8: 相对路径解析\n");
    
    if (chdir(base_dir) == -1) {
        perror("切换到测试目录失败");
    } else {
        printf("当前目录: ");
        system("pwd");
        
        // 相对路径解析示例
        struct {
            const char *relative_path;
            const char *description;
        } paths[] = {
            {".", "当前目录"},
            {"..", "上级目录"},
            {"./subdir1", "当前目录下的子目录"},
            {"../subdir2", "上级目录下的另一个子目录"},
            {"subdir1/./deepdir", "带.的路径"},
            {"subdir1/../subdir2", "带..的路径"}
        };
        
        for (int i = 0; i < 6; i++) {
            printf("路径 '%s' (%s):\n", paths[i].relative_path, paths[i].description);
            
            // 保存当前位置
            int save_fd = open(".", O_RDONLY);
            if (save_fd != -1) {
                // 尝试切换
                if (chdir(paths[i].relative_path) == 0) {
                    printf("  切换成功: ");
                    system("pwd");
                    
                    // 返回原位置
                    if (fchdir(save_fd) == -1) {
                        perror("  返回失败");
                    }
                } else {
                    printf("  切换失败: %s\n", strerror(errno));
                }
                close(save_fd);
            }
            printf("\n");
        }
    }
    
    // 返回原始目录
    if (chdir(original_dir) == -1) {
        perror("最终返回原始目录失败");
    }
    
    // 清理测试资源
    printf("\n清理测试资源...\n");
    
    // 删除测试文件
    char test_file_path[PATH_MAX * 2];
    snprintf(test_file_path, sizeof(test_file_path), "%s/%s/created_in_cwd.txt", 
             original_dir, base_dir);
    if (access(test_file_path, F_OK) == 0) {
        unlink(test_file_path);
        printf("删除测试文件\n");
    }
    
    // 删除备份目录和文件
    char backup_file_path[PATH_MAX * 2];
    snprintf(backup_file_path, sizeof(backup_file_path), "%s/backup.log", backup_dir);
    if (access(backup_file_path, F_OK) == 0) {
        unlink(backup_file_path);
    }
    if (access(backup_dir, F_OK) == 0) {
        rmdir(backup_dir);
        printf("删除备份目录\n");
    }
    
    // 删除构建目录
    if (chdir(base_dir) == 0) {
        for (int i = 0; i < 5; i++) {
            rmdir(build_dirs[i].dir_name);
        }
        chdir(original_dir);
    }
    
    // 删除测试目录结构
    char deep_dir_path[PATH_MAX * 2];
    snprintf(deep_dir_path, sizeof(deep_dir_path), "%s/%s", original_dir, deep_dir);
    if (access(deep_dir_path, F_OK) == 0) {
        rmdir(deep_dir_path);
    }
    
    char subdir1_path[PATH_MAX * 2];
    snprintf(subdir1_path, sizeof(subdir1_path), "%s/%s", original_dir, sub_dir1);
    if (access(subdir1_path, F_OK) == 0) {
        rmdir(subdir1_path);
    }
    
    char subdir2_path[PATH_MAX * 2];
    snprintf(subdir2_path, sizeof(subdir2_path), "%s/%s", original_dir, sub_dir2);
    if (access(subdir2_path, F_OK) == 0) {
        rmdir(subdir2_path);
    }
    
    char test_file[PATH_MAX * 2];
    snprintf(test_file, sizeof(test_file), "%s/%s/test_file.txt", original_dir, base_dir);
    if (access(test_file, F_OK) == 0) {
        unlink(test_file);
    }
    
    char base_dir_path[PATH_MAX * 2];
    snprintf(base_dir_path, sizeof(base_dir_path), "%s/%s", original_dir, base_dir);
    if (access(base_dir_path, F_OK) == 0) {
        rmdir(base_dir_path);
        printf("删除测试目录结构完成\n");
    }
    
    return 0;
}

发表在 linux文章 | 留下评论

adjtimex 函数系统调用及示例

发表于2025-08-05由麦芽爸

adjtimex 函数详解

1. 函数介绍

adjtimex 是Linux系统调用，用于查询和调整系统时钟的状态。它是NTP（Network Time Protocol）守护进程和其他时间同步工具的核心接口，提供了精确的时钟调整和状态查询功能。通过 adjtimex，可以实现高精度的时间同步和时钟管理。

2. 函数原型

#include <sys/timex.h>
int adjtimex(struct timex *buf);

3. 功能

adjtimex 允许用户查询系统时钟的状态信息，包括时钟偏移、频率调整、最大误差等，同时支持调整时钟参数以实现精确的时间同步。它是Linux时间子系统的重要组成部分，为高精度时间管理提供了底层支持。

4. 参数

*struct timex buf: 指向timex结构体的指针，用于传递时钟参数和接收状态信息

5. 返回值

成功: 返回时钟状态（TIME_OK, TIME_INS, TIME_DEL, TIME_OOP, TIME_WAIT, TIME_ERROR）
失败: 返回-1，并设置errno

6. 相似函数，或关联函数

settimeofday: 设置系统时间
gettimeofday: 获取系统时间
clock_adjtime: 更现代的时钟调整接口
ntp_adjtime: NTP时间调整函数
clock_gettime/clock_settime: 高精度时钟操作

7. 示例代码

示例1：基础adjtimex使用

#include <sys/timex.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <time.h>
#include <unistd.h>

/**
 * 显示时钟状态信息
 */
void show_clock_status(const struct timex *tx) {
    printf("=== 时钟状态信息 ===\n");
    
    // 时钟偏移
    printf("时钟偏移: %ld.%06ld 秒\n", 
           tx->offset / 1000000, labs(tx->offset) % 1000000);
    
    // 时钟频率
    printf("时钟频率: %ld ppm\n", tx->freq / 65536);  // 转换为ppm
    
    // 最大误差
    printf("最大误差: %ld 毫秒\n", tx->maxerror);
    
    // 估算误差
    printf("估算误差: %ld 毫秒\n", tx->esterror);
    
    // 状态标志
    printf("状态标志: 0x%04x\n", tx->status);
    printf("  ");
    if (tx->status & STA_PLL) printf("STA_PLL ");
    if (tx->status & STA_PPSFREQ) printf("STA_PPSFREQ ");
    if (tx->status & STA_PPSTIME) printf("STA_PPSTIME ");
    if (tx->status & STA_FLL) printf("STA_FLL ");
    if (tx->status & STA_INS) printf("STA_INS ");
    if (tx->status & STA_DEL) printf("STA_DEL ");
    if (tx->status & STA_UNSYNC) printf("STA_UNSYNC ");
    if (tx->status & STA_FREQHOLD) printf("STA_FREQHOLD ");
    if (tx->status & STA_PPSSIGNAL) printf("STA_PPSSIGNAL ");
    if (tx->status & STA_PPSJITTER) printf("STA_PPSJITTER ");
    if (tx->status & STA_PPSWANDER) printf("STA_PPSWANDER ");
    if (tx->status & STA_PPSERROR) printf("STA_PPSERROR ");
    if (tx->status & STA_CLOCKERR) printf("STA_CLOCKERR ");
    printf("\n");
    
    // 时钟精度
    printf("时钟精度: %ld 毫秒\n", tx->precision);
    
    // PLL时间常数
    printf("PLL时间常数: %ld\n", tx->constant);
    
    // PPM容忍度
    printf("PPM容忍度: %ld\n", tx->tolerance);
    
    // 时钟状态
    printf("时钟状态: ");
    switch (tx->state) {
        case TIME_OK: printf("TIME_OK (正常)\n"); break;
        case TIME_INS: printf("TIME_INS (即将插入闰秒)\n"); break;
        case TIME_DEL: printf("TIME_DEL (即将删除闰秒)\n"); break;
        case TIME_OOP: printf("TIME_OOP (闰秒处理中)\n"); break;
        case TIME_WAIT: printf("TIME_WAIT (等待同步)\n"); break;
        case TIME_ERROR: printf("TIME_ERROR (时钟错误)\n"); break;
        default: printf("未知状态 (%d)\n", tx->state); break;
    }
    
    printf("\n");
}

/**
 * 演示基础adjtimex使用方法
 */
int demo_adjtimex_basic() {
    struct timex tx;
    int result;
    
    printf("=== 基础adjtimex使用示例 ===\n");
    
    // 初始化timex结构体
    memset(&tx, 0, sizeof(tx));
    tx.modes = 0;  // 查询模式，不修改任何参数
    
    // 调用adjtimex查询时钟状态
    printf("1. 查询时钟状态:\n");
    result = adjtimex(&tx);
    
    if (result == -1) {
        printf("  查询时钟状态失败: %s\n", strerror(errno));
        if (errno == EPERM) {
            printf("  原因：需要CAP_SYS_TIME权限\n");
        } else if (errno == EINVAL) {
            printf("  原因：参数无效\n");
        }
        return -1;
    }
    
    printf("  查询成功\n");
    show_clock_status(&tx);
    
    // 显示时钟信息
    printf("2. 时钟详细信息:\n");
    printf("  系统时钟: %ld.%06ld 秒\n", tx.time.tv_sec, tx.time.tv_usec);
    
    // 显示频率调整信息
    printf("3. 频率调整信息:\n");
    printf("  频率偏移: %ld (系统单位)\n", tx.freq);
    printf("  频率偏移: %.3f ppm\n", (double)tx.freq / 65536.0);
    
    // 显示PLL参数
    printf("4. PLL参数:\n");
    printf("  PLL偏移: %ld\n", tx.offset);
    printf("  PLL最大误差: %ld 毫秒\n", tx.maxerror);
    printf("  PLL估算误差: %ld 毫秒\n", tx.esterror);
    printf("  PLL时间常数: %ld\n", tx.constant);
    
    // 演示权限检查
    printf("\n5. 权限检查:\n");
    uid_t uid = getuid();
    printf("  当前用户ID: %d\n", uid);
    if (uid == 0) {
        printf("  ✓ 具有root权限，可以调整时钟参数\n");
    } else {
        printf("  ✗ 没有root权限，只能查询时钟状态\n");
        printf("  提示：调整时钟参数需要root权限或CAP_SYS_TIME能力\n");
    }
    
    // 演示时钟状态解释
    printf("\n=== 时钟状态解释 ===\n");
    printf("TIME_OK: 时钟同步正常\n");
    printf("TIME_INS: 即将插入闰秒\n");
    printf("TIME_DEL: 即将删除闰秒\n");
    printf("TIME_OOP: 闰秒处理中\n");
    printf("TIME_WAIT: 等待同步\n");
    printf("TIME_ERROR: 时钟错误\n");
    
    // 演示状态标志解释
    printf("\n=== 状态标志解释 ===\n");
    printf("STA_PLL: PLL模式启用\n");
    printf("STA_PPSFREQ: PPS频率调整\n");
    printf("STA_PPSTIME: PPS时间调整\n");
    printf("STA_FLL: 频率锁定环模式\n");
    printf("STA_INS: 即将插入闰秒\n");
    printf("STA_DEL: 即将删除闰秒\n");
    printf("STA_UNSYNC: 时钟未同步\n");
    printf("STA_FREQHOLD: 频率保持\n");
    
    return 0;
}

int main() {
    return demo_adjtimex_basic();
}

示例2：时钟调整和同步

#include <sys/timex.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <time.h>
#include <unistd.h>
#include <math.h>

/**
 * 模拟NTP时间同步
 */
int simulate_ntp_synchronization() {
    struct timex tx;
    int result;
    double network_delay = 0.05;  // 50ms网络延迟
    double frequency_drift = 50.0;  // 50ppm频率漂移
    
    printf("=== NTP时间同步模拟 ===\n");
    
    // 获取当前时钟状态
    printf("1. 获取当前时钟状态:\n");
    memset(&tx, 0, sizeof(tx));
    tx.modes = 0;  // 查询模式
    
    result = adjtimex(&tx);
    if (result == -1) {
        printf("  获取时钟状态失败: %s\n", strerror(errno));
        return -1;
    }
    
    printf("  当前时钟频率: %.3f ppm\n", (double)tx.freq / 65536.0);
    printf("  当前最大误差: %ld 毫秒\n", tx.maxerror);
    printf("  当前估算误差: %ld 毫秒\n", tx.esterror);
    
    // 模拟网络时间查询
    printf("\n2. 模拟网络时间查询:\n");
    struct timeval network_time;
    gettimeofday(&network_time, NULL);
    
    // 模拟时钟偏移（100ms）
    long offset_us = 100000;  // 100ms偏移
    printf("  检测到时钟偏移: %.3f ms\n", offset_us / 1000.0);
    printf("  网络延迟: %.3f ms\n", network_delay * 1000);
    printf("  频率漂移: %.3f ppm\n", frequency_drift);
    
    // 调整时钟参数
    printf("\n3. 调整时钟参数:\n");
    if (getuid() == 0) {
        memset(&tx, 0, sizeof(tx));
        tx.modes = ADJ_OFFSET | ADJ_FREQUENCY;
        tx.offset = offset_us;  // 微秒偏移
        tx.freq = (long)(frequency_drift * 65536);  // 转换为系统单位
        
        printf("  设置时钟偏移: %ld 微秒\n", tx.offset);
        printf("  设置频率调整: %.3f ppm\n", (double)tx.freq / 65536.0);
        
        result = adjtimex(&tx);
        if (result == -1) {
            printf("  调整时钟参数失败: %s\n", strerror(errno));
        } else {
            printf("  ✓ 时钟参数调整成功\n");
            printf("  新时钟状态: ");
            switch (result) {
                case TIME_OK: printf("TIME_OK\n"); break;
                case TIME_INS: printf("TIME_INS\n"); break;
                case TIME_DEL: printf("TIME_DEL\n"); break;
                case TIME_OOP: printf("TIME_OOP\n"); break;
                case TIME_WAIT: printf("TIME_WAIT\n"); break;
                case TIME_ERROR: printf("TIME_ERROR\n"); break;
                default: printf("未知状态 %d\n", result); break;
            }
        }
    } else {
        printf("  ✗ 没有权限调整时钟参数\n");
        printf("  提示：需要root权限才能调整时钟\n");
    }
    
    // 验证调整结果
    printf("\n4. 验证调整结果:\n");
    memset(&tx, 0, sizeof(tx));
    tx.modes = 0;  // 查询模式
    
    result = adjtimex(&tx);
    if (result != -1) {
        printf("  调整后时钟频率: %.3f ppm\n", (double)tx.freq / 65536.0);
        printf("  调整后最大误差: %ld 毫秒\n", tx.maxerror);
        printf("  调整后估算误差: %ld 毫秒\n", tx.esterror);
    }
    
    return 0;
}

/**
 * 演示时钟频率调整
 */
int demo_frequency_adjustment() {
    struct timex tx;
    int result;
    
    printf("=== 时钟频率调整演示 ===\n");
    
    // 显示原始频率
    printf("1. 原始时钟频率:\n");
    memset(&tx, 0, sizeof(tx));
    tx.modes = 0;  // 查询模式
    
    result = adjtimex(&tx);
    if (result == -1) {
        printf("  查询时钟频率失败: %s\n", strerror(errno));
        return -1;
    }
    
    double original_freq = (double)tx.freq / 65536.0;
    printf("  原始频率: %.6f ppm\n", original_freq);
    
    // 调整频率（需要root权限）
    printf("\n2. 调整时钟频率:\n");
    if (getuid() == 0) {
        // 增加50ppm频率调整
        long freq_adjust = 50 * 65536;  // 50ppm转换为系统单位
        
        memset(&tx, 0, sizeof(tx));
        tx.modes = ADJ_FREQUENCY;
        tx.freq = freq_adjust;
        
        printf("  设置频率调整: %.3f ppm\n", (double)freq_adjust / 65536.0);
        
        result = adjtimex(&tx);
        if (result == -1) {
            printf("  频率调整失败: %s\n", strerror(errno));
        } else {
            printf("  ✓ 频率调整成功\n");
        }
        
        // 验证调整结果
        printf("\n3. 验证频率调整:\n");
        memset(&tx, 0, sizeof(tx));
        tx.modes = 0;  // 查询模式
        
        result = adjtimex(&tx);
        if (result != -1) {
            double new_freq = (double)tx.freq / 65536.0;
            printf("  调整后频率: %.6f ppm\n", new_freq);
            printf("  频率变化: %.6f ppm\n", new_freq - original_freq);
        }
        
        // 恢复原始频率
        printf("\n4. 恢复原始频率:\n");
        memset(&tx, 0, sizeof(tx));
        tx.modes = ADJ_FREQUENCY;
        tx.freq = (long)(original_freq * 65536);
        
        result = adjtimex(&tx);
        if (result == -1) {
            printf("  恢复原始频率失败: %s\n", strerror(errno));
        } else {
            printf("  ✓ 原始频率恢复成功\n");
        }
    } else {
        printf("  ✗ 没有权限调整时钟频率\n");
        printf("  提示：需要root权限才能调整时钟频率\n");
    }
    
    return 0;
}

int main() {
    printf("=== adjtimex时钟调整演示 ===\n");
    
    // 演示NTP同步模拟
    if (simulate_ntp_synchronization() != 0) {
        return -1;
    }
    
    printf("\n" "=" * 50 "\n");
    
    // 演示频率调整
    if (demo_frequency_adjustment() != 0) {
        return -1;
    }
    
    return 0;
}

示例3：高精度时间同步

#include <sys/timex.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <time.h>
#include <unistd.h>
#include <math.h>

/**
 * 高精度时间同步器结构
 */
typedef struct {
    double last_offset;      // 上次时钟偏移
    double last_frequency;   // 上次频率调整
    time_t last_sync_time;   // 上次同步时间
    int sync_count;          // 同步次数
    double avg_offset;       // 平均偏移
    double jitter;           // 抖动
    int precision_ppm;       // 精度(ppm)
} precision_sync_t;

/**
 * 初始化高精度同步器
 */
int init_precision_sync(precision_sync_t *sync) {
    memset(sync, 0, sizeof(precision_sync_t));
    sync->precision_ppm = 1;  // 1ppm精度
    sync->last_sync_time = time(NULL);
    
    printf("高精度时间同步器初始化完成\n");
    printf("  目标精度: %d ppm\n", sync->precision_ppm);
    
    return 0;
}

/**
 * 获取高精度时间
 */
int get_high_precision_time(struct timespec *ts) {
    struct timex tx;
    int result;
    
    memset(&tx, 0, sizeof(tx));
    tx.modes = 0;  // 查询模式
    
    result = adjtimex(&tx);
    if (result == -1) {
        return -1;
    }
    
    ts->tv_sec = tx.time.tv_sec;
    ts->tv_nsec = tx.time.tv_usec * 1000;  // 转换为纳秒
    
    return 0;
}

/**
 * 计算时钟偏移
 */
double calculate_clock_offset() {
    struct timespec system_time, precise_time;
    double offset;
    
    // 获取系统时间
    if (clock_gettime(CLOCK_REALTIME, &system_time) == -1) {
        return 0.0;
    }
    
    // 获取高精度时间
    if (get_high_precision_time(&precise_time) == -1) {
        return 0.0;
    }
    
    // 计算偏移（纳秒）
    offset = (precise_time.tv_sec - system_time.tv_sec) * 1000000000.0 +
             (precise_time.tv_nsec - system_time.tv_nsec);
    
    return offset / 1000000.0;  // 转换为毫秒
}

/**
 * 演示高精度时间同步
 */
int demo_high_precision_sync() {
    precision_sync_t sync;
    struct timex tx;
    int result;
    int sync_attempts = 5;
    
    printf("=== 高精度时间同步演示 ===\n");
    
    // 初始化同步器
    printf("1. 初始化高精度同步器:\n");
    if (init_precision_sync(&sync) != 0) {
        return -1;
    }
    
    // 显示初始时钟状态
    printf("\n2. 初始时钟状态:\n");
    memset(&tx, 0, sizeof(tx));
    tx.modes = 0;  // 查询模式
    
    result = adjtimex(&tx);
    if (result == -1) {
        printf("  获取时钟状态失败: %s\n", strerror(errno));
        return -1;
    }
    
    printf("  时钟状态: ");
    switch (tx.state) {
        case TIME_OK: printf("TIME_OK (正常)\n"); break;
        case TIME_ERROR: printf("TIME_ERROR (错误)\n"); break;
        default: printf("状态 %d\n", tx.state); break;
    }
    
    printf("  当前频率: %.3f ppm\n", (double)tx.freq / 65536.0);
    printf("  最大误差: %ld 毫秒\n", tx.maxerror);
    printf("  估算误差: %ld 毫秒\n", tx.esterror);
    printf("  时钟精度: %ld 毫秒\n", tx.precision);
    
    // 模拟高精度时间同步过程
    printf("\n3. 高精度时间同步过程:\n");
    
    for (int i = 0; i < sync_attempts; i++) {
        printf("  第 %d 次同步:\n", i + 1);
        
        // 模拟网络时间查询
        double network_offset = (rand() % 1000 - 500) / 1000.0;  // -0.5到0.5毫秒
        double network_delay = (rand() % 100) / 1000.0;  // 0到0.1毫秒
        
        printf("    网络偏移: %.3f ms\n", network_offset);
        printf("    网络延迟: %.3f ms\n", network_delay);
        
        // 计算真实的时钟偏移
        double actual_offset = calculate_clock_offset();
        printf("    实际偏移: %.3f ms\n", actual_offset);
        
        // 计算综合偏移
        double total_offset = network_offset + actual_offset;
        printf("    综合偏移: %.3f ms\n", total_offset);
        
        // 更新同步统计
        sync.last_offset = total_offset;
        sync.avg_offset = (sync.avg_offset * sync.sync_count + total_offset) / (sync.sync_count + 1);
        sync.sync_count++;
        
        // 计算抖动
        if (sync.sync_count > 1) {
            sync.jitter = fabs(total_offset - sync.avg_offset);
        }
        
        printf("    平均偏移: %.3f ms\n", sync.avg_offset);
        printf("    当前抖动: %.3f ms\n", sync.jitter);
        
        // 如果有root权限，进行时钟调整
        if (getuid() == 0) {
            memset(&tx, 0, sizeof(tx));
            tx.modes = ADJ_OFFSET | ADJ_STATUS;
            tx.offset = (long)(total_offset * 1000);  // 转换为微秒
            tx.status = STA_PLL;  // 启用PLL模式
            
            printf("    调整时钟偏移: %ld 微秒\n", tx.offset);
            
            result = adjtimex(&tx);
            if (result == -1) {
                printf("    时钟调整失败: %s\n", strerror(errno));
            } else {
                printf("    ✓ 时钟调整成功\n");
            }
        } else {
            printf("    ℹ 没有权限进行时钟调整\n");
        }
        
        // 记录同步时间
        sync.last_sync_time = time(NULL);
        
        if (i < sync_attempts - 1) {
            sleep(2);  // 间隔同步
        }
    }
    
    // 显示最终同步结果
    printf("\n4. 最终同步结果:\n");
    printf("  同步次数: %d\n", sync.sync_count);
    printf("  最后偏移: %.3f ms\n", sync.last_offset);
    printf("  平均偏移: %.3f ms\n", sync.avg_offset);
    printf("  最大抖动: %.3f ms\n", sync.jitter);
    printf("  最后同步: %s", ctime(&sync.last_sync_time));
    
    // 计算同步精度
    double sync_accuracy = 1000.0 / pow(10, sync.precision_ppm);  // 简化的精度计算
    printf("  同步精度: %.6f 秒\n", sync_accuracy);
    
    // 显示高精度同步优势
    printf("\n=== 高精度同步优势 ===\n");
    printf("1. 精度提升:\n");
    printf("   ✓ 纳秒级时间精度\n");
    printf("   ✓ 微秒级偏移调整\n");
    printf("   ✓ PPM级频率控制\n");
    
    printf("\n2. 稳定性保障:\n");
    printf("   ✓ 抖动抑制\n");
    printf("   ✓ 误差累积控制\n");
    printf("   ✓ 频率漂移补偿\n");
    
    printf("\n3. 实时性:\n");
    printf("   ✓ 快速收敛\n");
    printf("   ✓ 动态调整\n");
    printf("   ✓ 状态监控\n");
    
    return 0;
}

int main() {
    return demo_high_precision_sync();
}

示例4：时间同步监控

#include <sys/timex.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <time.h>
#include <unistd.h>
#include <math.h>

/**
 * 时间同步监控数据结构
 */
typedef struct {
    time_t timestamp;
    long offset_us;      // 偏移(微秒)
    long frequency_ppm;   // 频率(ppm)
    long max_error_ms;   // 最大误差(毫秒)
    long est_error_ms;   // 估算误差(毫秒)
    int clock_state;     // 时钟状态
    int status_flags;    // 状态标志
    double jitter_ms;    // 抖动(毫秒)
} sync_monitor_data_t;

/**
 * 时间同步监控器
 */
typedef struct {
    sync_monitor_data_t history[100];  // 历史数据
    int history_count;
    int max_history;
    time_t start_time;
    int monitoring;
} sync_monitor_t;

/**
 * 初始化监控器
 */
int init_sync_monitor(sync_monitor_t *monitor) {
    memset(monitor, 0, sizeof(sync_monitor_t));
    monitor->max_history = 100;
    monitor->start_time = time(NULL);
    monitor->monitoring = 1;
    
    printf("时间同步监控器初始化完成\n");
    printf("  最大历史记录数: %d\n", monitor->max_history);
    printf("  启动时间: %s", ctime(&monitor->start_time));
    
    return 0;
}

/**
 * 收集时钟状态数据
 */
int collect_clock_data(sync_monitor_t *monitor) {
    struct timex tx;
    int result;
    
    if (monitor->history_count >= monitor->max_history) {
        // 循环覆盖旧数据
        memmove(&monitor->history[0], &monitor->history[1], 
                sizeof(sync_monitor_data_t) * (monitor->max_history - 1));
        monitor->history_count = monitor->max_history - 1;
    }
    
    sync_monitor_data_t *current = &monitor->history[monitor->history_count];
    
    // 获取时钟状态
    memset(&tx, 0, sizeof(tx));
    tx.modes = 0;  // 查询模式
    
    result = adjtimex(&tx);
    if (result == -1) {
        printf("获取时钟状态失败: %s\n", strerror(errno));
        return -1;
    }
    
    // 填充监控数据
    current->timestamp = time(NULL);
    current->offset_us = tx.offset;
    current->frequency_ppm = tx.freq / 65536;
    current->max_error_ms = tx.maxerror;
    current->est_error_ms = tx.esterror;
    current->clock_state = tx.state;
    current->status_flags = tx.status;
    current->jitter_ms = 0.0;  // 简化处理
    
    // 计算抖动（与前一个采样点的差异）
    if (monitor->history_count > 0) {
        sync_monitor_data_t *previous = &monitor->history[monitor->history_count - 1];
        current->jitter_ms = fabs((current->offset_us - previous->offset_us) / 1000.0);
    }
    
    monitor->history_count++;
    
    return 0;
}

/**
 * 显示时钟状态
 */
void show_clock_status(const sync_monitor_data_t *data) {
    printf("时间: %s", ctime(&data->timestamp));
    printf("  时钟偏移: %.3f ms\n", data->offset_us / 1000.0);
    printf("  频率调整: %ld ppm\n", data->frequency_ppm);
    printf("  最大误差: %ld ms\n", data->max_error_ms);
    printf("  估算误差: %ld ms\n", data->est_error_ms);
    printf("  时钟状态: %d\n", data->clock_state);
    printf("  抖动: %.3f ms\n", data->jitter_ms);
    
    printf("  状态标志: 0x%04x ", data->status_flags);
    if (data->status_flags & STA_PLL) printf("STA_PLL ");
    if (data->status_flags & STA_UNSYNC) printf("STA_UNSYNC ");
    if (data->status_flags & STA_FREQHOLD) printf("STA_FREQHOLD ");
    printf("\n");
}

/**
 * 分析时间同步质量
 */
void analyze_sync_quality(const sync_monitor_t *monitor) {
    if (monitor->history_count < 2) {
        printf("数据不足，无法分析同步质量\n");
        return;
    }
    
    printf("=== 时间同步质量分析 ===\n");
    
    // 计算统计信息
    double total_offset = 0, max_offset = 0, min_offset = 999999;
    double total_jitter = 0, max_jitter = 0;
    long total_error = 0, max_error = 0;
    int sync_ok_count = 0;
    
    for (int i = 0; i < monitor->history_count; i++) {
        const sync_monitor_data_t *data = &monitor->history[i];
        double abs_offset = fabs(data->offset_us / 1000.0);
        
        total_offset += abs_offset;
        total_jitter += data->jitter_ms;
        total_error += data->est_error_ms;
        
        if (abs_offset > max_offset) max_offset = abs_offset;
        if (abs_offset < min_offset) min_offset = abs_offset;
        if (data->jitter_ms > max_jitter) max_jitter = data->jitter_ms;
        if (data->est_error_ms > max_error) max_error = data->est_error_ms;
        
        if (abs_offset < 10.0) sync_ok_count++;  // 10ms以内认为同步良好
    }
    
    double avg_offset = total_offset / monitor->history_count;
    double avg_jitter = total_jitter / monitor->history_count;
    double avg_error = (double)total_error / monitor->history_count;
    double sync_quality = (double)sync_ok_count / monitor->history_count * 100;
    
    printf("同步质量统计:\n");
    printf("  平均偏移: %.3f ms\n", avg_offset);
    printf("  最大偏移: %.3f ms\n", max_offset);
    printf("  最小偏移: %.3f ms\n", min_offset);
    printf("  平均抖动: %.3f ms\n", avg_jitter);
    printf("  最大抖动: %.3f ms\n", max_jitter);
    printf("  平均误差: %.3f ms\n", avg_error);
    printf("  最大误差: %ld ms\n", max_error);
    printf("  同步质量: %.1f%%\n", sync_quality);
    
    // 质量评估
    printf("\n质量评估:\n");
    if (avg_offset < 1.0) {
        printf("  ✓ 优秀: 平均偏移 < 1ms\n");
    } else if (avg_offset < 10.0) {
        printf("  ℹ 良好: 平均偏移 < 10ms\n");
    } else {
        printf("  ⚠ 需要改善: 平均偏移 > 10ms\n");
    }
    
    if (sync_quality > 95.0) {
        printf("  ✓ 高可靠性: 同步质量 > 95%%\n");
    } else if (sync_quality > 80.0) {
        printf("  ℹ 中等可靠性: 同步质量 > 80%%\n");
    } else {
        printf("  ⚠ 低可靠性: 同步质量 < 80%%\n");
    }
}

/**
 * 演示时间同步监控
 */
int demo_sync_monitoring() {
    sync_monitor_t monitor;
    const int monitor_duration = 30;  // 监控30秒
    time_t start_time, current_time;
    
    printf("=== 时间同步监控演示 ===\n");
    
    // 初始化监控器
    printf("1. 初始化监控器:\n");
    if (init_sync_monitor(&monitor) != 0) {
        return -1;
    }
    
    // 开始监控
    printf("\n2. 开始时间同步监控:\n");
    printf("   监控时长: %d 秒\n", monitor_duration);
    printf("   采样间隔: 2 秒\n");
    
    start_time = time(NULL);
    
    while (difftime(time(NULL), start_time) < monitor_duration) {
        // 收集时钟数据
        if (collect_clock_data(&monitor) == 0) {
            if (monitor.history_count > 0) {
                const sync_monitor_data_t *latest = 
                    &monitor.history[monitor.history_count - 1];
                
                printf("\n--- 采样点 %d ---\n", monitor.history_count);
                show_clock_status(latest);
            }
        } else {
            printf("收集时钟数据失败\n");
        }
        
        sleep(2);  // 2秒采样间隔
    }
    
    // 显示监控结果
    printf("\n3. 监控结果:\n");
    printf("  总采样点数: %d\n", monitor.history_count);
    printf("  监控时长: %.0f 秒\n", difftime(time(NULL), start_time));
    
    if (monitor.history_count > 0) {
        printf("  最新数据:\n");
        const sync_monitor_data_t *latest = 
            &monitor.history[monitor.history_count - 1];
        show_clock_status(latest);
    }
    
    // 分析同步质量
    printf("\n4. 同步质量分析:\n");
    analyze_sync_quality(&monitor);
    
    // 显示历史趋势
    printf("\n5. 历史趋势 (最后10个采样点):\n");
    int start_index = (monitor.history_count > 10) ? monitor.history_count - 10 : 0;
    
    printf("%-20s %-10s %-8s %-8s %-8s\n", 
           "时间", "偏移(ms)", "频率(ppm)", "误差(ms)", "抖动(ms)");
    printf("%-20s %-10s %-8s %-8s %-8s\n", 
           "----", "--------", "--------", "--------", "--------");
    
    for (int i = start_index; i < monitor.history_count; i++) {
        const sync_monitor_data_t *data = &monitor.history[i];
        char time_str[20];
        strftime(time_str, sizeof(time_str), "%H:%M:%S", localtime(&data->timestamp));
        
        printf("%-20s %-10.3f %-8ld %-8ld %-8.3f\n",
               time_str,
               data->offset_us / 1000.0,
               data->frequency_ppm,
               data->est_error_ms,
               data->jitter_ms);
    }
    
    return 0;
}

int main() {
    return demo_sync_monitoring();
}

示例5：NTP客户端实现

#include <sys/timex.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <time.h>
#include <unistd.h>
#include <math.h>

/**
 * NTP服务器信息结构
 */
typedef struct {
    char hostname[256];
    int port;
    double stratum;          // 层级
    double delay;            // 延迟
    double offset;           // 偏移
    double dispersion;       // 离散度
    time_t last_contact;     // 最后联系时间
    int reachable;           // 是否可达
} ntp_server_t;

/**
 * NTP客户端结构
 */
typedef struct {
    ntp_server_t servers[5];
    int server_count;
    int current_server;
    double sync_threshold;   // 同步阈值(毫秒)
    double max_adjustment;    // 最大调整值(毫秒)
    int sync_interval;       // 同步间隔(秒)
} ntp_client_t;

/**
 * 初始化NTP客户端
 */
int init_ntp_client(ntp_client_t *client) {
    memset(client, 0, sizeof(ntp_client_t));
    
    // 初始化测试服务器
    const char *test_servers[] = {
        "pool.ntp.org",
        "time.google.com",
        "time.cloudflare.com",
        "ntp.aliyun.com",
        NULL
    };
    
    printf("=== NTP客户端初始化 ===\n");
    
    for (int i = 0; test_servers[i] && i < 5; i++) {
        ntp_server_t *server = &client->servers[i];
        strncpy(server->hostname, test_servers[i], sizeof(server->hostname) - 1);
        server->hostname[sizeof(server->hostname) - 1] = '\0';
        server->port = 123;  // NTP标准端口
        server->stratum = 2 + i;  // 模拟不同层级
        server->delay = 0.05 + (rand() / (double)RAND_MAX) * 0.1;  // 50-150ms延迟
        server->offset = (rand() / (double)RAND_MAX) * 2.0 - 1.0;  // -1到1秒偏移
        server->dispersion = 0.01 + (rand() / (double)RAND_MAX) * 0.05;  // 10-60ms离散度
        server->last_contact = time(NULL);
        server->reachable = 1;  // 模拟可达
        
        client->server_count++;
        printf("  添加服务器 %d: %s (层级: %.0f)\n", 
               i + 1, server->hostname, server->stratum);
    }
    
    client->current_server = 0;
    client->sync_threshold = 100.0;  // 100ms阈值
    client->max_adjustment = 500.0;   // 500ms最大调整
    client->sync_interval = 60;       // 60秒同步间隔
    
    printf("  同步阈值: %.1f ms\n", client->sync_threshold);
    printf("  最大调整: %.1f ms\n", client->max_adjustment);
    printf("  同步间隔: %d 秒\n", client->sync_interval);
    
    return 0;
}

/**
 * 选择最佳NTP服务器
 */
int select_best_ntp_server(ntp_client_t *client) {
    int best_server = -1;
    double best_quality = 999999.0;
    
    printf("选择最佳NTP服务器:\n");
    
    for (int i = 0; i < client->server_count; i++) {
        ntp_server_t *server = &client->servers[i];
        
        if (server->reachable) {
            // 计算服务器质量（基于层级、延迟和离散度）
            double quality = server->stratum + server->delay * 10 + server->dispersion * 5;
            
            printf("  服务器 %d (%s): 质量评分 %.3f\n", 
                   i + 1, server->hostname, quality);
            
            if (quality < best_quality) {
                best_quality = quality;
                best_server = i;
            }
        }
    }
    
    if (best_server != -1) {
        client->current_server = best_server;
        printf("  选择最佳服务器: %s\n", 
               client->servers[best_server].hostname);
    }
    
    return best_server;
}

/**
 * 模拟NTP时间同步
 */
int simulate_ntp_sync(ntp_client_t *client) {
    struct timex tx;
    int result;
    int best_server = select_best_ntp_server(client);
    
    if (best_server == -1) {
        printf("没有可用的NTP服务器\n");
        return -1;
    }
    
    ntp_server_t *server = &client->servers[best_server];
    
    printf("=== NTP时间同步 ===\n");
    printf("同步服务器: %s\n", server->hostname);
    printf("服务器层级: %.0f\n", server->stratum);
    printf("网络延迟: %.3f 秒\n", server->delay);
    printf("时钟偏移: %.3f 秒\n", server->offset);
    printf("离散度: %.3f 秒\n", server->dispersion);
    
    // 检查是否需要同步
    if (fabs(server->offset) * 1000 > client->sync_threshold) {
        printf("时钟偏移过大，需要同步\n");
        
        // 如果有root权限，进行时钟调整
        if (getuid() == 0) {
            printf("具有root权限，进行时钟调整:\n");
            
            memset(&tx, 0, sizeof(tx));
            
            // 根据偏移大小选择调整策略
            if (fabs(server->offset) > 1.0) {
                // 大偏移：步进调整
                printf("  大偏移调整:\n");
                tx.modes = ADJ_SETOFFSET;
                tx.time.tv_sec = (long)server->offset;
                tx.time.tv_usec = (long)((server->offset - (long)server->offset) * 1000000);
                printf("    设置时间偏移: %ld.%06ld 秒\n", 
                       tx.time.tv_sec, tx.time.tv_usec);
            } else {
                // 小偏移：渐进调整
                printf("  渐进调整:\n");
                tx.modes = ADJ_OFFSET | ADJ_STATUS;
                tx.offset = (long)(server->offset * 1000000);  // 转换为微秒
                tx.status = STA_PLL;
                printf("    调整偏移: %ld 微秒\n", tx.offset);
            }
            
            result = adjtimex(&tx);
            if (result == -1) {
                printf("  时钟调整失败: %s\n", strerror(errno));
                return -1;
            }
            
            printf("  ✓ 时钟调整成功\n");
            
            // 显示调整后的状态
            printf("  调整后状态: ");
            switch (result) {
                case TIME_OK: printf("TIME_OK\n"); break;
                case TIME_INS: printf("TIME_INS\n"); break;
                case TIME_DEL: printf("TIME_DEL\n"); break;
                case TIME_OOP: printf("TIME_OOP\n"); break;
                case TIME_WAIT: printf("TIME_WAIT\n"); break;
                case TIME_ERROR: printf("TIME_ERROR\n"); break;
                default: printf("状态 %d\n", result); break;
            }
        } else {
            printf("没有root权限，无法进行时钟调整\n");
            printf("建议使用NTP守护进程进行时间同步\n");
        }
    } else {
        printf("时钟偏移在可接受范围内，无需调整\n");
    }
    
    // 更新服务器联系时间
    server->last_contact = time(NULL);
    
    return 0;
}

/**
 * 演示NTP客户端功能
 */
int demo_ntp_client() {
    ntp_client_t client;
    struct timex tx;
    int result;
    
    printf("=== NTP客户端功能演示 ===\n");
    
    // 初始化客户端
    printf("1. 初始化NTP客户端:\n");
    if (init_ntp_client(&client) != 0) {
        return -1;
    }
    
    // 显示当前系统时间
    printf("\n2. 当前系统时间:\n");
    struct timeval current_time;
    gettimeofday(&current_time, NULL);
    printf("  系统时间: %ld.%06ld\n", current_time.tv_sec, current_time.tv_usec);
    
    // 获取当前时钟状态
    printf("\n3. 当前时钟状态:\n");
    memset(&tx, 0, sizeof(tx));
    tx.modes = 0;  // 查询模式
    
    result = adjtimex(&tx);
    if (result != -1) {
        printf("  时钟状态: ");
        switch (tx.state) {
            case TIME_OK: printf("TIME_OK (正常)\n"); break;
            case TIME_ERROR: printf("TIME_ERROR (错误)\n"); break;
            default: printf("状态 %d\n", tx.state); break;
        }
        printf("  时钟偏移: %ld.%06ld 秒\n", tx.offset / 1000000, labs(tx.offset) % 1000000);
        printf("  时钟频率: %.3f ppm\n", (double)tx.freq / 65536.0);
        printf("  最大误差: %ld 毫秒\n", tx.maxerror);
        printf("  估算误差: %ld 毫秒\n", tx.esterror);
    }
    
    // 模拟NTP同步
    printf("\n4. 模拟NTP时间同步:\n");
    if (simulate_ntp_sync(&client) != 0) {
        printf("NTP同步失败\n");
        return -1;
    }
    
    // 显示同步后状态
    printf("\n5. 同步后时钟状态:\n");
    memset(&tx, 0, sizeof(tx));
    tx.modes = 0;  // 查询模式
    
    result = adjtimex(&tx);
    if (result != -1) {
        printf("  时钟状态: ");
        switch (result) {
            case TIME_OK: printf("TIME_OK (正常)\n"); break;
            case TIME_INS: printf("TIME_INS (即将插入闰秒)\n"); break;
            case TIME_DEL: printf("TIME_DEL (即将删除闰秒)\n"); break;
            case TIME_OOP: printf("TIME_OOP (闰秒处理中)\n"); break;
            case TIME_WAIT: printf("TIME_WAIT (等待同步)\n"); break;
            case TIME_ERROR: printf("TIME_ERROR (时钟错误)\n"); break;
            default: printf("状态 %d\n", result); break;
        }
        printf("  时钟偏移: %ld.%06ld 秒\n", tx.offset / 1000000, labs(tx.offset) % 1000000);
        printf("  时钟频率: %.3f ppm\n", (double)tx.freq / 65536.0);
        printf("  最大误差: %ld 毫秒\n", tx.maxerror);
        printf("  估算误差: %ld 毫秒\n", tx.esterror);
    }
    
    // 显示NTP服务器信息
    printf("\n6. NTP服务器信息:\n");
    for (int i = 0; i < client.server_count; i++) {
        ntp_server_t *server = &client.servers[i];
        printf("  服务器 %d: %s\n", i + 1, server->hostname);
        printf("    层级: %.0f\n", server->stratum);
        printf("    延迟: %.3f 秒\n", server->delay);
        printf("    偏移: %.3f 秒\n", server->offset);
        printf("    离散度: %.3f 秒\n", server->dispersion);
        printf("    最后联系: %s", ctime(&server->last_contact));
        printf("    可达: %s\n", server->reachable ? "是" : "否");
    }
    
    // 显示NTP客户端优势
    printf("\n=== NTP客户端优势 ===\n");
    printf("1. 高精度同步:\n");
    printf("   ✓ 微秒级时间精度\n");
    printf("   ✓ PPM级频率控制\n");
    printf("   ✓ 毫秒级偏移调整\n");
    
    printf("\n2. 智能选择:\n");
    printf("   ✓ 多服务器支持\n");
    printf("   ✓ 质量评分算法\n");
    printf("   ✓ 动态服务器切换\n");
    
    printf("\n3. 安全特性:\n");
    printf("   ✓ 权限检查\n");
    printf("   ✓ 错误处理\n");
    printf("   ✓ 状态监控\n");
    
    printf("\n4. 灵活配置:\n");
    printf("   ✓ 可配置阈值\n");
    printf("   ✓ 动态间隔调整\n");
    printf("   ✓ 多种调整策略\n");
    
    return 0;
}

int main() {
    return demo_ntp_client();
}

adjtimex 使用注意事项

系统要求：

内核版本: 需要支持adjtimex的Linux内核
权限要求: 调整时钟需要root权限或CAP_SYS_TIME能力
架构支持: 支持所有主流架构

时钟状态：

TIME_OK: 时钟同步正常
TIME_INS: 即将插入闰秒
TIME_DEL: 即将删除闰秒
TIME_OOP: 闰秒处理中
TIME_WAIT: 等待同步
TIME_ERROR: 时钟错误

状态标志：

STA_PLL: 启用相位锁定环
STA_UNSYNC: 时钟未同步
STA_FREQHOLD: 频率保持
STA_PPSSIGNAL: PPS信号有效
STA_PPSJITTER: PPS抖动过大
STA_PPSWANDER: PPS频率漂移过大
STA_PPSERROR: PPS错误

调整模式：

ADJ_OFFSET: 调整时钟偏移
ADJ_FREQUENCY: 调整时钟频率
ADJ_MAXERROR: 设置最大误差
ADJ_ESTERROR: 设置估算误差
ADJ_STATUS: 设置状态标志
ADJ_TIMECONST: 设置时间常数
ADJ_SETOFFSET: 设置时间偏移
ADJ_MICRO: 微调模式
ADJ_NANO: 纳秒模式
ADJ_TICK: 调整时钟滴答

错误处理：

EPERM: 权限不足
EINVAL: 参数无效
EFAULT: 指针无效
EACCES: 访问被拒绝
EPERM: 操作被禁止

性能考虑：

调整频率: 避免过于频繁的时钟调整
调整幅度: 控制每次调整的幅度
系统负载: 考虑调整对系统性能的影响
监控开销: 减少监控带来的开销

安全考虑：

权限验证: 确保有适当的权限进行调整
参数验证: 验证所有输入参数的有效性
错误恢复: 准备适当的错误恢复机制
审计日志: 记录所有时钟调整操作

最佳实践：

渐进调整: 优先使用渐进调整而非步进调整
权限检查: 执行前检查是否具有足够权限
状态监控: 持续监控时钟状态和性能
错误处理: 妥善处理各种错误情况
日志记录: 记录所有重要的操作和状态变化

timex结构体详解

struct timex 结构：

struct timex {
    unsigned int modes;     // 操作模式
    long offset;            // 时钟偏移(微秒)
    long freq;              // 频率调整(系统单位)
    long maxerror;          // 最大误差(毫秒)
    long esterror;          // 估算误差(毫秒)
    int status;             // 状态标志
    long constant;          // PLL时间常数
    long precision;         // 时钟精度(毫秒)
    long tolerance;         // 频率容忍度(ppm)
    struct timeval time;    // 当前时间
    long tick;              // 时钟滴答值
    long ppsfreq;           // PPS频率(系统单位)
    long jitter;            // PPS抖动(纳秒)
    int shift;              // PPS间隔宽度
    long stabil;             // PPS频率稳定度
    long jitcnt;            // PPS抖动计数
    long calcnt;            // PPS校准计数
    long errcnt;            // PPS错误计数
    long stbcnt;            // PPS稳定计数
    int tai;                // TAI偏移
    int state;              // 时钟状态
    int :32; int :32; int :32; int :32;
};

常见使用场景

1. NTP客户端：

// 调整时钟偏移
struct timex tx;
tx.modes = ADJ_OFFSET;
tx.offset = measured_offset_us;
adjtimex(&tx);

2. 系统时钟监控：

// 监控时钟状态
struct timex tx;
tx.modes = 0;  // 查询模式
int state = adjtimex(&tx);

3. 高精度时间服务：

// 频率调整
struct timex tx;
tx.modes = ADJ_FREQUENCY;
tx.freq = ppm * 65536;  // 转换为系统单位
adjtimex(&tx);

总结

adjtimex 是Linux系统中强大的时间管理函数，提供了：

精确控制: 微秒级时钟偏移调整
频率管理: PPM级频率控制
状态监控: 实时时钟状态查询
错误处理: 完善的错误处理机制

通过合理使用 adjtimex，可以构建高精度的时间同步系统。在实际应用中，需要注意权限要求、错误处理和性能优化等关键问题。

发表在 linux文章 | 留下评论