kubeadm证书以及etcd证书过期处理

故障现象

在测试环境使用kubeadm部署的集群,在运行了一年之后今天,出现k8s api无法调取的现象,使用kubectl命令获取资源均返回如下报错:

1
2
[root@master35 ~]# kubectl get nodes
Unable to connect to the server: x509: certificate has expired or is not yet valid

一看报错,大概率是证书到期了,经过命令一查证书时间,果然是

1
openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -text |grep ' Not '

替换apiserver证书

进入master节点

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
cd /etc/kubernetes
# 备份证书和配置
mkdir ./pki_bak
mkdir ./conf_bak
mv pki/apiserver* ./pki_bak/
mv pki/front-proxy-client.* ./pki_bak/
mv ./admin.conf ./conf_bak/
mv ./kubelet.conf ./conf_bak/
mv ./controller-manager.conf ./conf_bak/
mv ./scheduler.conf ./conf_bak/

# 创建证书
kubeadm alpha phase certs apiserver --apiserver-advertise-address ${MASTER_API_SERVER_IP}
kubeadm alpha phase certs apiserver-kubelet-client
kubeadm alpha phase certs front-proxy-client

会发现谷歌被强,命令执行不上,会报错,所以用配置文件来执行命令
kubeadm alpha phase certs apiserver --config /root/yaml/kubeadm-config.yaml
kubeadm alpha phase certs apiserver-kubelet-client --config /root/yaml/kubeadm-config.yaml
kubeadm alpha phase certs front-proxy-client --config /root/yaml/kubeadm-config.yaml

# 生成新配置文件
kubeadm alpha phase kubeconfig all --config /root/yaml/kubeadm-config.yaml

# 将新生成的admin配置文件覆盖掉原本的admin文件
mv $HOME/.kube/config $HOME/.kube/config.old
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config
chmod 777 $HOME/.kube/config

完成上方操作后,docker restart重启kube-apiserver,kube-controller,kube-scheduler这3个容器

如果有多台master节点,先仿照上方将证书文件和配置文件进行备份,然后将这一台配置完成的master上的证书和配置scp过去

验证

kubectl命令发现还是无法查看资源,检查apiserver的日志: docker logs

1
2
3
1 customresource_discovery_controller.go:156] Shutting down DiscoveryController
1 available_controller.go:266] Shutting down AvailableConditionController
1 crdregistration_controller.go:115] Shutting down crd-autoregister controller

怀疑是etcd证书的原因

etcd证书过期处理

看下etcd证书配置文件,发现是8760h

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
 cat config.json
{
"signing": {
"default": {
"expiry": "8760h"
},
"profiles": {
"kubernetes": {
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
],
"expiry": "8760h"
}
}
}
}

首先备份etcd数据:

1
2
cd /var/lib
tar -zvcf etcd.tar.gz etcd/

修改ca配置文件,将默认证书签署过期时间修改为10年:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[root@master35 etcd]# cat ca-config.json 
{
"signing": {
"default": {
"expiry": "87600h"
},
"profiles": {
"kubernetes": {
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
],
"expiry": "87600h"
}
}
}
}

生成新证书:

1
2
3
4
5
6
7
8
9
10
11
#删除过期证书
rm -f /etc/etcd/ssl/*

# 创建新证书
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=config.json -profile=kubernetes etcd-csr.json | cfssljson -bare etcd
cp etcd.pem etcd-key.pem ca.pem /etc/etcd/ssl/
#拷贝到其他etcd节点
scp -r /etc/etcd/ssl root@${other_node}:/etc/etcd/

# 重启etcd服务(记住,要3个节点一起重启,不然会hang住)
systemctl restart etcd

etcd替换成功后,再重启kube-apiserver,kube-controller,kube-scheduler这3个容器

Donate