Kubernetes

Kubernetes 下开源的 GPU 虚拟化项目

📅 2025年06月14日 · ☕ 4 分钟

1. k8s-device-plugin https://github.com/NVIDIA/k8s-device-plugin 是 NVIDIA 官方提供的 Kubernetes 设备插件，用于在 Kubernetes 集群中管理和分配 NVIDIA GPU 资源。 k8s-device-plugin 通过与 kubelet 的交互，自动发现和注册 GPU 设备，并将其作为资源提供给 Kubernetes 调度器。它支持多种 GPU 型号，并能够处理 GPU 的分片和共享。 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 apiVersion: v1 kind: Pod metadata: name: gpu-pod spec: restartPolicy: Never containers: - name: cuda-container

给 Node Exporter 添加抓取凭证

📅 2025年05月24日 · ☕ 1 分钟

1. 背景 Node Exporter 是 Prometheus 生态系统中用于收集主机指标的常用组件，但默认情况下不提供访问认证。本文介绍如何为 Kubernetes 环境中的 Node Exporter 添加基本认证，提高安全性。 2. Node Exporter 配置凭证 2.1 生成加密密码使用 htpasswd 工具生成加密密码： 1 htpasswd -nBC 12 "" | tr -d ':\n' 这里需要输入密码，生成的输出将是一个

kube-proxy 异常导致节点上的 Pod 无法访问 Service

📅 2025年03月31日 · ☕ 3 分钟

1. 问题描述相关 Pod 1 2 3 4 5 6 kubectl -n istio-system get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES istiod-647c7c9d95-7n7n6 1/1 Running 0 77m 10.244.173.51 docs-ai-a800-4 <none> <none> istiod-647c7c9d95-k6l88 1/1 Running 0 30m 10.244.210.160 ai-a40-2 <none> <none> istiod-647c7c9d95-pj82r 1/1 Running 0 51m 10.244.229.217 docs-ai-a800-2 <none> <none> 相关 Service 1 2 3 4 kubectl -n istio-system get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE istiod ClusterIP 10.99.225.56 <none> 15010/TCP,15012/TCP,443/TCP,15014/TCP 645d 1 2 3 4 kubectl -n istio-system get endpoints NAME ENDPOINTS AGE istiod 10.244.173.51:15012,10.244.210.160:15012,10.244.229.217:15012 + 9 more... 645d Endpoints 与 Pod 的 IP 是一致的。测试结果在异常节点

kubectl logs 无法查看 Pod 日志报错 NotFound

📅 2025年02月22日 · ☕ 1 分钟

1. 现象能查看 Pod 的信息 1 2 3 4 kubectl -n my-testns get pod my-testpod NAME READY STATUS RESTARTS AGE my-testpod 1/1 Running 0 2d13h 不能查看 Pod 的日志 1 2 3 kubectl -n my-testns logs my-testpod -f Error from server (NotFound): the server could not find the requested resource ( pods/log my-testpod) 在 Pod 所在主机上可以通过 docker logs 查看容器日志。测试 Kubelet 的健康状态 OK 1 curl -k https://x.x.x.x:10250/healthz 这里要使用主机的 IP 地址，kubectl logs 命名会直接

在 Kubernetes 部署 Jumpserver 跳板机

📅 2025年01月09日 · ☕ 3 分钟

1. 部署 Jumpserver 需要提前准备好 StorageClass，用于存储 Jumpserver 的数据。除了下面提到的数据库，各个组件 jms-core、jms-web、jms-koko、jms-lion、jms-chen 都需要一个 PV 存储。 1.1 部署 MySQL 参考 https://github.com/shaowenchen/ops-hub/blob/main/database/mysql8.yaml ，部署 MySQL。需要调整