葫芦的运维日志_linux

✸ ✸ ✸

Pod 启动后流量何时真正流入

在 Kubernetes 中，Pod 启动并不意味着它立刻能接收流量。K8s 通过探针（Probe）机制来判断 Pod 是否"准备好了"。理解这个时间差，对于避免部署时的 502 错误至关重要。

三种探针

探针	作用	失败后果
startupProbe	检测容器是否启动完成	杀死容器并重启
readinessProbe	检测容器是否准备好接收流量	从 Service Endpoints 中移除
livenessProbe	检测容器是否还活着	杀死容器并重启

流量流入时间计算

Pod 创建
  |
  v
容器启动（拉镜像 + 启动进程）
  |
  v
等待 initialDelaySeconds（首次探测前的等待时间）
  |
  v
执行 readinessProbe 探测
  |
  +--失败--> 等待 periodSeconds 后重试
  |            （最多 failureThreshold 次）
  |
  +--成功--> successThreshold 次连续成功
  |
  v
Pod 标记为 Ready
  |
  v
Endpoints Controller 将 Pod IP 加入 Service Endpoints
  |
  v
kube-proxy / iptables / IPVS 更新规则
  |
  v
流量真正流入 Pod

最快流入时间（一切顺利）：

流量流入时间 = 容器启动时间 
             + initialDelaySeconds 
             + (successThreshold - 1) * periodSeconds 
             + timeoutSeconds
             + Endpoints 更新延迟（通常 1-2 秒）

探针配置示例

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: app
        image: my-app:v1
        ports:
        - containerPort: 8080
        
        # 启动探针：给慢启动应用更多时间
        startupProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
          failureThreshold: 30    # 最多等 5+30*5=155 秒
          
        # 就绪探针：决定是否接收流量
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5   # 启动后等 5 秒再探测
          periodSeconds: 5         # 每 5 秒探测一次
          timeoutSeconds: 3        # 探测超时时间
          successThreshold: 1      # 成功 1 次即标记 Ready
          failureThreshold: 3      # 连续失败 3 次标记 NotReady
          
        # 存活探针：检测是否需要重启
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 15
          periodSeconds: 10
          timeoutSeconds: 3
          failureThreshold: 3

探针类型

# HTTP GET 探针（最常用）
readinessProbe:
  httpGet:
    path: /ready
    port: 8080
    httpHeaders:
    - name: Custom-Header
      value: probe

# TCP Socket 探针（适合非 HTTP 服务）
readinessProbe:
  tcpSocket:
    port: 3306

# 命令探针（最灵活）
readinessProbe:
  exec:
    command:
    - cat
    - /tmp/healthy

# gRPC 探针（K8s 1.24+）
readinessProbe:
  grpc:
    port: 50051

常见问题和最佳实践

问题1：部署时出现 502

原因：旧 Pod 被杀掉了，新 Pod 还没 Ready，流量无处可去。

# 解决：配置滚动更新策略
spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0      # 不允许有不可用的 Pod
      maxSurge: 1            # 先启动新 Pod，再杀旧 Pod
  minReadySeconds: 10        # Ready 后再等 10 秒才算真正可用

问题2：Pod 频繁重启

原因：livenessProbe 的 initialDelaySeconds 太短，应用还没启动完就被判定为"死了"。

# 解决：用 startupProbe 替代过长的 initialDelaySeconds
startupProbe:
  httpGet:
    path: /healthz
    port: 8080
  failureThreshold: 30
  periodSeconds: 10
# startupProbe 成功后，livenessProbe 才开始工作

问题3：readiness 和 liveness 用同一个接口

不推荐。readiness 失败只是暂时不接收流量，liveness 失败会杀掉容器。如果用同一个接口，一个临时的依赖故障（比如数据库短暂不可用）会导致容器被杀掉重启，反而加重问题。

# 推荐：分开实现
/healthz  --> livenessProbe  (只检查进程是否活着)
/ready    --> readinessProbe (检查依赖是否就绪：DB、Redis、下游服务)

调试探针

# 查看 Pod 事件，看探针是否失败
kubectl describe pod my-app-xxx

# 关键事件：
# Unhealthy: Readiness probe failed: ...
# Unhealthy: Liveness probe failed: ...

# 手动测试探针接口
kubectl exec my-app-xxx -- curl -s http://localhost:8080/ready
kubectl exec my-app-xxx -- curl -s http://localhost:8080/healthz

# 查看 Pod 是否 Ready
kubectl get pods -o wide
# READY 列显示 1/1 表示就绪，0/1 表示未就绪

探针参数速查

参数	默认值	说明
initialDelaySeconds	0	容器启动后等多久开始探测
periodSeconds	10	探测间隔
timeoutSeconds	1	单次探测超时
successThreshold	1	连续成功几次算通过
failureThreshold	3	连续失败几次算失败

总结

Pod 流量流入时间 = 容器启动 + initialDelaySeconds + 探测时间 + Endpoints 更新延迟。三个探针各司其职：startupProbe 管启动、readinessProbe 管流量、livenessProbe 管存活。分开设计探针接口，配合 maxUnavailable: 0 的滚动更新策略，就能实现零停机部署。

✸ ✸ ✸

本文作者：王梓 | 原文链接：https://www.bthlt.com/note/13738743-Linuxk8s知识点

出处：葫芦的运维日志 | 转载请注明出处并保留原文链接

📜 留言板

留言提交后需管理员审核通过才会显示

k8s知识点