使用kubeadmin创建kubernetes集群

集群主机信息

主机名 IP 角色 部署组件
dev-vm1 192.168.1.127 master etcd、kube-apiserver、kube-controller-manage、kube-scheduller、kubelet、kube-proxy、kubectl、kubeadmin、calico、haproxy
dev-vm2 192.168.1.249 master etcd、kube-apiserver、kube-controller-manage、kube-scheduller、kubelet、kube-proxy、kubectl、kubeadmin、calico
dev-vm3 192.168.1.125 master etcd、kube-apiserver、kube-controller-manage、kube-scheduller、kubelet、kube-proxy、kubectl、kubeadmin、calico

初始化系统

允许iptables检查桥接流量

确保 br_netfilter 模块被加载。通过运行lsmod | grep br_netfilter来检查模块是否加载。若要加载该模块,可执行sudo modprobe br_netfilter。配置net.bridge.bridge-nf-call-iptables变量为1,如下过程:

1
2
3
4
5
6
7
8
9
[root@dev-vm1 ~]# lsmod | grep br_netfilter
[root@dev-vm1 ~]# modprobe br_netfilter
[root@dev-vm1 ~]# lsmod | grep br_netfilter
br_netfilter 22256 0
bridge 151336 2 br_netfilter,ebtable_broute
[root@dev-vm1 ~]# sysctl -w net.bridge.bridge-nf-call-iptables=1
net.bridge.bridge-nf-call-iptables = 1
[root@dev-vm1 ~]# sysctl -w net.bridge.bridge-nf-call-ip6tables=1
net.bridge.bridge-nf-call-ip6tables = 1

配置内核参数系统重启后仍生效:

1
2
3
4
5
6
7
8
9
10
11
12
13
cat> /etc/modules-load.d/k8s.conf << EOF
br_netfilter
ip_vs
ip_vs_rr
ip_vs_wrr
ip_vs_sh
nf_conntrack_ipv4
EOF

cat >> /etc/sysctl.d/k8s.conf << EOF
net.bridge.bridge-nf-call-iptables=1
net.bridge.bridge-nf-call-ip6tables=1
EOF

禁用交换分区(swap)

1
2
3
4
5
#关闭swap  
[root@dev-vm1 ~]# swapoff -a && sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
cat >> /etc/sysctl.d/k8s.conf << EOF
vm.swappiness=0
EOF

禁用SELinux和关闭防火墙

1
2
3
[root@dev-vm1 ~]# setenforce 0
[root@dev-vm1 ~]# sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config
[root@dev-vm1 ~]# systemctl disable firewalld && systemctl stop firewalld

在所有主机上设置hosts解释

1
2
3
4
5
6
[root@dev-vm1 ~]# vi /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.127 dev-vm1
192.168.1.249 dev-vm2
192.168.1.125 dev-vm3

配置kubernetes安装yum源

1
2
3
4
5
6
7
8
9
10
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=http://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=http://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg
http://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF

执行以下命令刷新yum源:  

1
2
3
[root@dev-vm1 ~]# yum clean all
[root@dev-vm1 ~]# yum makecache
[root@dev-vm1 ~]# yum repolist

安装docker

删除系统中旧版本docker(如果有)

1
2
3
4
5
6
7
8
sudo yum remove docker \
docker-client \
docker-client-latest \
docker-common \
docker-latest \
docker-latest-logrotate \
docker-logrotate \
docker-engine

配置docker安装yum源

1
2
3
sudo yum-config-manager \
--add-repo \
https://download.docker.com/linux/centos/docker-ce.repo

安装最新版docker

1
2
3
4
5
[root@dev-vm1 ~]# yum install -y yum-utils device-mapper-persistent-data lvm2
[root@dev-vm1 ~]# sudo yum install docker-ce docker-ce-cli containerd.io

#启动docker服务
[root@dev-vm1 ~]# systemctl daemon-reload && systemctl enable docker && systemctl start docker

配置Docker守护程序,使用systemd来管理容器的cgroup

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
sudo mkdir /etc/docker
cat <<EOF | sudo tee /etc/docker/daemon.json
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2"
}
EOF

#重启docker服务
sudo systemctl enable docker
sudo systemctl daemon-reload
sudo systemctl restart docker

安装kubeadmin

1
2
3
[root@dev-vm1 ~]# yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
#开启kubelet开机启动
[root@dev-vm1 ~]# systemctl enable --now kubelet

docker方式配置haproxy

新建/etc/haproxy/目录作为haproxy配置文件映射目录,配置文件内容如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
global
daemon #以守护进程的方式工作于后台,其等同于“-D”选项的功能
nbproc 1
log 127.0.0.1 local2 #通过local2进行log记录
pidfile /var/run/haproxy.pid
maxconn 5000 #最大并发连接数,等同于命令行选项“-n” default=4000
defaults
mode http #mode{tcp/http/health}: tcp在第四层, http在第七层.
retries 3 #3次连接失败就认为服务器不可用,通过check进行检查
option redispatch #serverId对应的服务器挂掉后,强制定向到其他健康的服务器
option abortonclose
maxconn 4096
timeout connect 5000ms
timeout client 30000ms
timeout server 30000ms
timeout check 2000
log global
#监控页面配置
listen admin_stats
stats enable
bind 0.0.0.0:8080 #监听端口
mode http
option httplog
maxconn 5
stats refresh 30s #30s刷新一次页面
stats uri /moniter #虚拟路径
stats hide-version #隐藏HAProxy的版本号
stats realm Global\ statistics
stats auth admin:admin123456 #登录账号:密码
#配置完毕后可以通过 http://ip:8080/moniter同时输入账号密码来访问HAProxy的监控页面
listen test
bind 0.0.0.0:7443
log 127.0.0.1 local0 debug
balance roundrobin #负载均衡算法
mode tcp
server dev-vm1 192.168.1.127:6443 check port 6443 inter 2 rise 1 fall 2 maxconn 300
server dev-vm2 192.168.1.249:6443 check port 6443 inter 2 rise 1 fall 2 maxconn 300
server dev-vm3 192.168.1.125:6443 check port 6443 inter 2 rise 1 fall 2 maxconn 300

haproxy容器启动命令如下:

1
2
3
docker run -dit --restart=always --name k8s-haproxy \
-v /etc/haproxy/haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg:ro \
--network host haproxy:2.3

初始化第一个master节点

使用如下命令可以列出安装所需组件镜像

1
2
3
4
5
6
7
8
[root@dev-test01 ~]# kubeadm config images list
k8s.gcr.io/kube-apiserver:v1.22.0
k8s.gcr.io/kube-controller-manager:v1.22.0
k8s.gcr.io/kube-scheduler:v1.22.0
k8s.gcr.io/kube-proxy:v1.22.0
k8s.gcr.io/pause:3.5
k8s.gcr.io/etcd:3.5.0-0
k8s.gcr.io/coredns/coredns:v1.8.4

在dev-vm1上运行如下命令:  

1
kubeadm init --control-plane-endpoint "192.168.1.181:6443" --upload-certs --image-repository registry.aliyuncs.com/google_containers --pod-network-cidr=10.96.0.0/12 

说明:

  • --control-plane-endpoint 标志应该被设置成负载均衡器的地址或 DNS 和端口。
  • --upload-certs 标志用来将在所有控制平面实例之间的共享证书上传到集群。
  • --image-repository 由于国内无法google镜像仓库,使用此参数设置为阿里的镜像仓库。
  • 由于在阿里镜像仓库里没registry.aliyuncs.com/google_containers/coredns/coredns:v1.8.4,所以需要先下载registry.aliyuncs.com/google_containers/coredns:latest然后再打上相应tag。
  • --apiserver-advertise-address可用于为控制平面节点的API server设置广播地址,--control-plane-endpoint可用于为所有控制平面节点设置共享端点。
     
    1
    docker tag registry.aliyuncs.com/google_containers/coredns:latest registry.aliyuncs.com/google_containers/coredns:v1.8.4

命令输出如下:  

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of the control-plane node running the following command on each as root:

kubeadm join 192.168.1.181:6443 --token 63wy1b.x3rls1juhbll8wgw \
--discovery-token-ca-cert-hash sha256:e72693bfb1f30f27c64445d215722a2773d17af649ac00528cd2d3d4ae687e4a \
--control-plane --certificate-key 6b3d9ffcbdd14880fa8c1481035519775c6853fb69730a0d9bacf159342f2fc2

Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.1.181:6443 --token 63wy1b.x3rls1juhbll8wgw \
--discovery-token-ca-cert-hash sha256:e72693bfb1f30f27c64445d215722a2773d17af649ac00528cd2d3d4ae687e4a

说明:

  • 将此输出复制到文本文件。 稍后其它master节点和工作节点加入集群时需要使用.
  • --upload-certskubeadm init一起使用时,master的证书被加密并上传到kubeadm-certs Secret 中。
  • 要重新上传证书并生成新的解密密钥,请在已加入集群节点的控制平面上使用以下命令:
    kubeadm init phase upload-certs --upload-certs
  • 还可以在init期间指定自定义的–certificate-key,以后可以由 join 使用。要生成这样的密钥,可以使用命令kubeadm certs certificate-key

将上面生成的admin.conf移到用户主目录下:

1
2
3
mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config

配置kube-proxy基于ipvs网络模式

修改kube-proxy的启动配置,将mode值修改为”ipvs”,默认这个值是空。如下:


重启kube-proxy,查看日志有Using ipvs Proxier表明kube-proxy成功设置为ipvs模式,如下图:

安装网络插件calico

网络插件选择calico

配置网络管理

创建配置文件/etc/NetworkManager/conf.d/calico.conf以防止NetworkManager干扰接口,内容如下:

1
2
3
4
cat> /etc/NetworkManager/conf.d/calico.conf << EOF
[keyfile]
unmanaged-devices=interface-name:cali*;interface-name:tunl*;interface-name:vxlan.calico
EOF

下载calico安装calico.yaml

1
[root@dev-vm1 ~]# curl https://docs.projectcalico.org/manifests/calico.yaml -O

修改calico.yaml文件

calico.yaml默认pods cidr为192.168.0.0/16并且是注释掉的,因此修改为10.96.0.0/12并去掉注释。

1
[root@dev-vm1 ~]# sed -i -e "s?192.168.0.0/16?10.96.0.0/12?g" calico.yaml

修改后如下:

1
2
- name: CALICO_IPV4POOL_CIDR
value: "10.96.0.0/12"
1
[root@dev-vm1 ~]# kubectl apply -f calico.yaml

复制第一台master上证书到另外两台master相应目录中

1
2
3
4
5
6
7
8
备注:操作前分别在两台master主机上创建相应目录mkdir -p /etc/kubernetes/pki/etcd/
[root@dev-vm1 ~]# cd /etc/kubernetes/pki/
[root@dev-vm1 pki]# scp ca.* sa.* front-proxy-ca.* 192.168.1.249:/etc/kubernetes/pki/
[root@dev-vm1 pki]# scp ca.* sa.* front-proxy-ca.* 192.168.1.125:/etc/kubernetes/pki/
[root@dev-vm1 pki]# scp etcd/ca.* 192.168.1.249:/etc/kubernetes/pki/etcd/
[root@dev-vm1 pki]# scp etcd/ca.* 192.168.1.125:/etc/kubernetes/pki/etcd/
[root@dev-vm1 pki]# scp /etc/kubernetes/admin.conf 192.168.1.249:/etc/kubernetes/
[root@dev-vm1 pki]# scp /etc/kubernetes/admin.conf 192.168.1.125:/etc/kubernetes/

将其它master加入集群

在需要加入集群的节点上运行下列命令:
如初始化第一个master时一样需要行将registry.aliyuncs.com/google_containers/coredns:latest下载到主机。

1
2
[root@dev-vm2 ~]# docker pull registry.aliyuncs.com/google_containers/coredns:latestt
[root@dev-vm2 ~]# docker tag registry.aliyuncs.com/google_containers/coredns:latest registry.aliyuncs.com/google_containers/coredns:v1.8.4

在运行加入集群命令前需要在节点上创建/etc/cni/net.d目录,如下命令:

1
2
3
4
[root@dev-vm2 ~]# mkdir -p /etc/cni/net.d
kubeadm join 192.168.1.181:6443 --token 63wy1b.x3rls1juhbll8wgw \
--discovery-token-ca-cert-hash sha256:e72693bfb1f30f27c64445d215722a2773d17af649ac00528cd2d3d4ae687e4a \
--control-plane --certificate-key 6b3d9ffcbdd14880fa8c1481035519775c6853fb69730a0d9bacf159342f2fc2

安装Dashbord

下载dashbord部署yaml文件

1
[root@dev-test01 ~]# wget https://raw.githubusercontent.com/kubernetes/dashboard/v2.3.1/aio/deploy/recommended.yaml

说明:由于某些原因国内是无法直接下载这个文件

部署dashbord

应用上一步下载的yaml文件即可,如下:

1
[root@dev-test01 ~]# kubectl apply -f recommended.yaml

创建dashbord UI管理用户帐号

新建dashboard-adminuser.yaml文件,内容如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
apiVersion: v1
kind: ServiceAccount
metadata:
name: admin-user
namespace: kubernetes-dashboard

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: admin-user
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: admin-user
namespace: kubernetes-dashboard

应用dashboard-adminuser.yaml文件创建帐号:

1
[root@dev-test01 ~]# kubectl apply -f dashboard-adminuser.yaml

获取访问Bearer token

1
2
3
kubectl -n kubernetes-dashboard get secret $(kubectl -n kubernetes-dashboard get sa/admin-user -o jsonpath="{.secrets[0].name}") -o go-template="{{.data.token | base64decode}}"

eyJhbGciOiJSUzI1NiIsImtpZCI6IkNVSms5MFVwdGxEOHlYc0g1TmlTVWJsOUtHblQtT0s0TDlFcDdmTFBTMkEifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJhZG1pbi11c2VyLXRva2VuLTZ0ZG02Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImFkbWluLXVzZXIiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiIzODg5NDk0ZC1jMjU2LTQwYmItODBiMS00YmE5NjRkNTg4NzQiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZXJuZXRlcy1kYXNoYm9hcmQ6YWRtaW4tdXNlciJ9.EzTxAMBgCcJ7HWn28NFBj58teXPjdak2ef-nkqIk7sdBej5TYzkOr_oGkVifEb2ZHsXiMdoGAKH7wO4a4m5K628ghLOPVy1Q8GKGWfvdZqTnqt9ALnmVoFgPnfnnWU3IEENQXwKEpfwJrnRB16-CoGep_rqhFAhGd5o7Y0dxw495FH-hEU8qRNWJr9ZzCcRVf03zG2_fO7zoW9xrCKvL-iZHjGCMbv2hif8zYlJTUpKen0kWa-rElJ7eEF3eJSjq0PwtUtPna9-Z-98CqyuTYs5UDJt7_JJ--Dg16ybQEA-e8bKLnzeEGNdJ4ddppO0UJUCn926ohoA12ElIJKqmOw

部署Kubernetes Metrics Server

Metrics Server 是 Kubernetes 内置自动缩放管道的可扩展、高效的容器资源指标来源。

下载Metrics server部署yaml文件

1
wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

说明:由于某些原因国内是无法直接下载这个文件

修改components.yaml文件,并应用部署

在components.yaml文件中的Deployment部分的容器参数中添加--kubelet-insecure-tls参数关闭证书验证


使用外部etcd集群配置高可用k8s集群

配置高可用etcd集群

配置etcd高可用集群参考etcd的相关文档,这里不在描述。

复制etcd访问的tls证书

从etcd集群中的任何etcd节点复制下列证书到第一个master节点:

1
2
3
/etc/kubernetes/pki/etcd/ca.crt
/etc/kubernetes/pki/apiserver-etcd-client.crt
/etc/kubernetes/pki/apiserver-etcd-client.key

设置kubeadmin配置文件

创建一个名为 kubeadm-config.yaml的文件内容如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: stable
controlPlaneEndpoint: "LOAD_BALANCER_DNS:LOAD_BALANCER_PORT" #apiserver负载均衡器ip和端口
imageRepository: "registry.aliyuncs.com/google_containers"
etcd:
external:
endpoints:
- https://ETCD_0_IP:2379
- https://ETCD_1_IP:2379
- https://ETCD_2_IP:2379
caFile: /etc/kubernetes/pki/etcd/ca.crt
certFile: /etc/kubernetes/pki/apiserver-etcd-client.crt
keyFile: /etc/kubernetes/pki/apiserver-etcd-client.key

初始化集群第一个master节点

初始化master、应用网络插件、其它master节点加入集群等的步骤都与使用内部etcd时的初始化一样,运行如下命令:  

1
kubeadm init --config kubeadm-config.yaml --upload-certs

安装过程出现问题

coreDNS状态不正常

1
2
3
4
5
6
7
[root@dev-vm1 ~]# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-78d6f96c7b-t2xnr 1/1 Running 0 22m
kube-system calico-node-4t2pb 1/1 Running 0 22m
kube-system coredns-545d6fc579-5rqjz 0/1 Running 0 28m
kube-system coredns-545d6fc579-gztfz 0/1 Running 0 28m
kube-system etcd-dev-vm1 1/1 Running 0 28m

查看pod日志报如下错误:

1
2
3
4
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
E0607 08:24:21.215568 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.21.1/tools/cache/reflector.go:167: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: endpointslices.discovery.k8s.io is forbidden: User "system:serviceaccount:kube-system:coredns" cannot list resource "endpointslices" in API group "discovery.k8s.io" at the cluster scope
[INFO] plugin/ready: Still waiting on: "kubernetes"

由日志信息可看出是权限问题。  
解决方法: 修改群集角色system:coredns的权限范围,添加下面部分

1
2
3
4
5
6
7
8
[root@dev-vm1 ~]# kubectl edit clusterrole system:coredns
- apiGroups:
- discovery.k8s.io
resources:
- endpointslices
verbs:
- list
- watch

查看集群核心组件状态时,scheduler和controller-manager异常

使用kubectl get componentstatus输出状态如下:

1
2
3
4
5
6
[root@dev-vm1 ~]# kubectl  get componentstatus
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
controller-manager Unhealthy Get "http://127.0.0.1:10252/healthz": dial tcp 127.0.0.1:10252: connect: connection refused
scheduler Unhealthy Get "http://127.0.0.1:10251/healthz": dial tcp 127.0.0.1:10251: connect: connection refused
etcd-0 Healthy {"health":"true"}

出现这个问题的原因是/etc/kubernetes/manifests/下的kube-controller-manager.yaml和kube-scheduler.yaml中启动参数设置的默认端口是0。  
解决方法:将相应的--port 0参数注释掉,然后重启kubelet服务即可。如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[root@dev-vm1 ~]# vi /etc/kubernetes/manifests/kube-controller-manager.yaml 
apiVersion: v1
kind: Pod
....
spec:
containers:
- command:
- kube-controller-manager
- --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf
- --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf
- --bind-address=127.0.0.1
- --client-ca-file=/etc/kubernetes/pki/ca.crt
- --cluster-name=kubernetes
- --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt
- --cluster-signing-key-file=/etc/kubernetes/pki/ca.key
- --controllers=*,bootstrapsigner,tokencleaner
- --kubeconfig=/etc/kubernetes/controller-manager.conf
- --leader-elect=true
#   - --port=0 #注释掉这一行
1
2
3
4
5
6
7
8
9
10
11
12
13
14
[root@dev-vm1 ~]# vi /etc/kubernetes/manifests/kube-scheduler.yaml 
apiVersion: v1
kind: Pod
...
spec:
containers:
- command:
- kube-scheduler
- --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
- --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
- --bind-address=127.0.0.1
- --kubeconfig=/etc/kubernetes/scheduler.conf
- --leader-elect=true
# - --port=0 #注释掉这一行

kubelet报异常日志

使用systemctl status kubelet -l查看服务状态时报如下异常日志:

1
2
3
4
5
6
7
Jun 07 22:07:02 dev-vm3 kubelet[43654]: I0607 22:07:02.184088   43654 container_manager_linux.go:995] "CPUAccounting not enabled for process" pid=43654
Jun 07 22:07:02 dev-vm3 kubelet[43654]: I0607 22:07:02.184098 43654 container_manager_linux.go:998] "MemoryAccounting not enabled for process" pid=43654
Jun 07 22:07:03 dev-vm3 kubelet[43654]: E0607 22:07:03.870531 43654 summary_sys_containers.go:47] "Failed to get system container stats" err="failed to get cgroup stats for \"/system.slice/kubelet.service\": failed to get container info for \"/system.slice/kubelet.service\": unknown container \"/system.slice/kubelet.service\"" containerName="/system.slice/kubelet.service"
Jun 07 22:07:03 dev-vm3 kubelet[43654]: E0607 22:07:03.870594 43654 summary_sys_containers.go:47] "Failed to get system container stats" err="failed to get cgroup stats for \"/system.slice/docker.service\": failed to get container info for \"/system.slice/docker.service\": unknown container \"/system.slice/docker.service\"" containerName="/system.slice/docker.service"
Jun 07 22:07:13 dev-vm3 kubelet[43654]: E0607 22:07:13.959837 43654 summary_sys_containers.go:47] "Failed to get system container stats" err="failed to get cgroup stats for \"/system.slice/kubelet.service\": failed to get container info for \"/system.slice/kubelet.service\": unknown container \"/system.slice/kubelet.service\"" containerName="/system.slice/kubelet.service"
Jun 07 22:07:13 dev-vm3 kubelet[43654]: E0607 22:07:13.959873 43654 summary_sys_containers.go:47] "Failed to get system container stats" err="failed to get cgroup stats for \"/system.slice/docker.service\": failed to get container info for \"/system.slice/docker.service\": unknown container \"/system.slice/docker.service\"" containerName="/system.slice/docker.service"
Jun 07 22:07:24 dev-vm3 kubelet[43654]: E0607 22:07:24.068455 43654 summary_sys_containers.go:47] "Failed to get system container stats" err="failed to get cgroup stats for \"/system.slice/kubelet.service\": failed to get container info for \"/system.slice/kubelet.service\": unknown container \"/system.slice/kubelet.service\"" containerName="/system.slice/kubelet.service"

原因是在CentOS系统上,kubelet启动时,会执行节点资源统计,需要systemd中需要开启对应的选项,如果不设置,kubelet是无法执行该统计命令,导致 kubelet一直报上面的错误信息。
解决方法:在kubelet服务的管理配置文件中添加如下选项

1
2
3
4
[root@dev-vm3 ~]# vi /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf
[Service]
CPUAccounting=true
MemoryAccounting=true