k8s1.12以下版本设置LXCFS

LXCFS简介

社区中常见的做法是利用lxcfs来提供容器中的资源可见性。lxcfs 是一个开源的FUSE(用户态文件系统)实现来支持LXC容器,它也可以支持Docker容器。

LXCFS通过用户态文件系统,在容器中提供下列 procfs 的文件。

1
2
3
4
5
6
/proc/cpuinfo
/proc/diskstats
/proc/meminfo
/proc/stat
/proc/swaps
/proc/uptime

比如,把宿主机的 /var/lib/lxcfs/proc/memoinfo 文件挂载到Docker容器的/proc/meminfo位置后。容器中进程读取相应文件内容时,LXCFS的FUSE实现会从容器对应的Cgroup中读取正确的内存限制。从而使得应用获得正确的资源约束设定。

Docker环境下LXCFS使用

安装 lxcfs 的RPM包

1
2
wget https://copr-be.cloud.fedoraproject.org/results/ganto/lxd/epel-7-x86_64/00486278-lxcfs/lxcfs-2.0.5-3.el7.centos.x86_64.rpm
yum install lxcfs-2.0.5-3.el7.centos.x86_64.rpm

启动 lxcfs

1
lxcfs /var/lib/lxcfs &

测试

1
2
3
4
5
6
7
8
docker run -it -m 256m \
-v /var/lib/lxcfs/proc/cpuinfo:/proc/cpuinfo:rw \
-v /var/lib/lxcfs/proc/diskstats:/proc/diskstats:rw \
-v /var/lib/lxcfs/proc/meminfo:/proc/meminfo:rw \
-v /var/lib/lxcfs/proc/stat:/proc/stat:rw \
-v /var/lib/lxcfs/proc/swaps:/proc/swaps:rw \
-v /var/lib/lxcfs/proc/uptime:/proc/uptime:rw \
ubuntu:16.04 /bin/bash

结果

1
2
3
4
5
6
7
8
9
10
11
12
13
[root@node1 ~]# docker run -it -m 256m \
> -v /var/lib/lxcfs/proc/cpuinfo:/proc/cpuinfo:rw \
> -v /var/lib/lxcfs/proc/diskstats:/proc/diskstats:rw \
> -v /var/lib/lxcfs/proc/meminfo:/proc/meminfo:rw \
> -v /var/lib/lxcfs/proc/stat:/proc/stat:rw \
> -v /var/lib/lxcfs/proc/swaps:/proc/swaps:rw \
> -v /var/lib/lxcfs/proc/uptime:/proc/uptime:rw \
> ubuntu:16.04 /bin/bash
root@6bcd804eef79:/# free -m
total used free shared buff/cache available
Mem: 256 0 254 189 0 254
Swap: 256 0 256
root@6bcd804eef79:/#

我们可以看到total的内存为256MB,配置已经生效。

lxcfs 的 Kubernetes实践

在kubernetes中使用lxcfs需要解决两个问题:

第一个问题是每个node上都要启动lxcfs,这个简单,部署一个daemonset就可以了。

第二个问题是将lxcfs维护的/proc文件挂载到每个容器中

在集群节点上安装并启动lxcfs,我们将用Kubernetes的方式,用利用容器和DaemonSet方式来运行 lxcfs FUSE文件系统。

1
2
git clone https://github.com/denverdino/lxcfs-initializer
cd lxcfs-initializer

lxcfs-daemonset

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
apiVersion: apps/v1beta2
kind: DaemonSet
metadata:
name: lxcfs
labels:
app: lxcfs
spec:
selector:
matchLabels:
app: lxcfs
template:
metadata:
labels:
app: lxcfs
spec:
hostPID: true
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
containers:
- name: lxcfs
image: registry.cn-hangzhou.aliyuncs.com/denverdino/lxcfs:3.0.4
imagePullPolicy: Always
securityContext:
privileged: true
volumeMounts:
- name: cgroup
mountPath: /sys/fs/cgroup
- name: lxcfs
mountPath: /var/lib/lxcfs
mountPropagation: Bidirectional
- name: usr-local
mountPath: /usr/local
volumes:
- name: cgroup
hostPath:
path: /sys/fs/cgroup
- name: usr-local
hostPath:
path: /usr/local
- name: lxcfs
hostPath:
path: /var/lib/lxcfs
type: DirectoryOrCreate
1
kubectl apply -f lxcfs-daemonset.yaml

Kubernetes提供了 Initializer 扩展机制,可以用于对资源创建进行拦截和注入处理,我们可以借助它优雅地完成对lxcfs文件的自动化挂载。

lxcfs-initializer

Initializer功能开启

在Kubernetes 1.13中initializers还是一个alpha特性,需要在Kube-apiserver中添加参数开启。

这里使用的是kubernets 1.12,设置方法是一样的:

1
2
--enable-admission-plugins="Initializers,NamespaceLifecycle,NamespaceExists,LimitRanger,SecurityContextDeny,ServiceAccount,ResourceQuota"
--runtime-config=admissionregistration.k8s.io/v1alpha1

--enable-admission-plugins--admission-control互斥,如果同时设置,kube-apiserver启动报错:

1
2
error: [admission-control and enable-admission-plugins/disable-admission-plugins flags are mutually exclusive, 
enable-admission-plugins plugin "--runtime-config=admissionregistration.k8s.io/v1alpha1" is unknown]

InitializerConfiguration

InitializerConfiguration资源中定义了一组的initializers

每个initializer有一个名字和多个规则,规则中是它要作用的资源,例如下面的initializers中只有一个initializer,名称为podimage.example.com,作用于v1版本的pods。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
apiVersion: admissionregistration.k8s.io/v1alpha1
kind: InitializerConfiguration
metadata:
name: example-config
initializers:
# the name needs to be fully qualified, i.e., containing at least two "."
- name: podimage.example.com
rules:
# apiGroups, apiVersion, resources all support wildcard "*".
# "*" cannot be mixed with non-wildcard.
- apiGroups:
- ""
apiVersions:
- v1
resources:
- pods

在kubernets中创建了上面的initializers之后,新建的pod在pending阶段,metadata中会添加一个initializer列表:

1
2
3
4
5
6
metadata:
creationTimestamp: 2019-01-09T08:56:36Z
generateName: echo-7cfbbd7d49-
initializers:
pending:
- name: podimage.example.com

注意需要加上参数--include-uninitialized=true才能看到处于这个阶段的Pod:

1
./kubectl.sh -n demo-echo get pod --include-uninitialized=true -o yaml

metadata中initializers列表不为空的Pod,处于正在等待初始化状态,需要部署一个initializer controller对处于这个阶段中的pod完成初始化后, pod才能退出pending状态。。

initializer controller需要自己根据需要实现。

Initializer Controller

initializer controller监听指定类型的resource,当发现有新创建的resouce创建时,通过检查resource的metadata中的initializer名单,决定是否要对resource进行初始化设置,并且在完成设置之后,需要将对应的initializer名单从resource的metadata中删除,否则resource就一直处于等待初始化设置的状态。

具体实现可以参考lxcfs-initializer

如果有多个InitializerConfiguration和多个Initializer Controller,会怎样?

没有在文档中找到具体的说明,k8s的文档中Initializer章节的内容很少,这里通过实验,判断一下。

创建了两个不同名的但是包含相同rule的InitializerConfiguration:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
apiVersion: admissionregistration.k8s.io/v1alpha1
kind: InitializerConfiguration
metadata:
name: example-config
initializers:
# the name needs to be fully qualified, i.e., containing at least two "."
- name: podimage.example.com
rules:
# apiGroups, apiVersion, resources all support wildcard "*".
# "*" cannot be mixed with non-wildcard.
- apiGroups:
- ""
apiVersions:
- v1
resources:
- pods
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
apiVersion: admissionregistration.k8s.io/v1alpha1
kind: InitializerConfiguration
metadata:
name: example-config-2
initializers:
# the name needs to be fully qualified, i.e., containing at least two "."
- name: podimage-2.example.com rules:
# apiGroups, apiVersion, resources all support wildcard "*".
# "*" cannot be mixed with non-wildcard.
- apiGroups:
- ""
apiVersions:
- v1
resources:
- pods
- name: podimage.example.com
rules:
# apiGroups, apiVersion, resources all support wildcard "*".
# "*" cannot be mixed with non-wildcard.
- apiGroups:
- ""
apiVersions:
- v1
resources:
- pods

Pod中的metadata是这样的:

1
2
3
4
5
6
7
8
metadata:
creationTimestamp: 2019-01-10T04:03:12Z
generateName: echo-7cfbbd7d49-
initializers:
pending:
- name: podimage.example.com
- name: podimage-2.example.com
- name: podimage.example.com

之后又通过调整InitializerConfiguration的名称排序、创建的先后顺序、内部的rules的顺序,多次试验之后发现,多个InitializerConfiguration的在metadata中是按照它们的名称排序的,和创建时间无关。

每个InitializerConfiguration中的rules在metadata中顺序与它们定义的顺序一致。

根据lxcfs-initializer的实现以及k8s的文档,Initializer Controller对目标Resource完成设置之后,需要从metadata中移除对应的Initializer。

如果定义了多个Initializer,并且有多个Initializer Controller各自负责不同的Initializer,这时候需要小心设计,既要防止“漏掉”应当处理的Resource,导致Resource长期不落地,又要防止已经被删除的Initializer又被重新写上了,重复处理时出现错误。

根据现在掌握的信息,目前比较稳妥的做法是,将多个Initializer设计为顺序无关,谁先执行都可以,否则只有创建一个InitializerConfiguration,rules的顺序就是Initiliazer的顺序。只设计一个Initializer Controller,或者将多个Initializer Controller设计成串行执行,让它们监测Resource的创建和变化,不仅仅是刚创建,否则一些Initializer可能被漏掉。

lxcfs-initializer这个Initializer只关心ADD事件,如果同时有其它的Initializer Controller存在,可能会漏掉一些Resource。

1
2
3
4
5
6
7
8
9
10
_, controller := cache.NewInformer(includeUninitializedWatchlist, &v1.Deployment{}, resyncPeriod,
cache.ResourceEventHandlerFuncs{
AddFunc: func(obj interface{}) {
err := initializeDeployment(obj.(*v1.Deployment), c, clientset)
if err != nil {
log.Println(err)
}
},
},
)

实例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: lxcfs-initializer-default
namespace: default
rules:
- apiGroups: ["*"]
resources: ["pods"]
verbs: ["initialize", "update", "patch", "watch", "list"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: lxcfs-initializer-service-account
namespace: default
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: lxcfs-initializer-role-binding
subjects:
- kind: ServiceAccount
name: lxcfs-initializer-service-account
namespace: default
roleRef:
kind: ClusterRole
name: lxcfs-initializer-default
apiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1
kind: Deployment
metadata:
initializers:
pending: []
labels:
app: lxcfs-initializer
name: lxcfs-initializer
spec:
replicas: 1
selector:
matchLabels:
app: lxcfs-initializer
template:
metadata:
labels:
app: lxcfs-initializer
spec:
serviceAccountName: lxcfs-initializer-service-account
containers:
- name: lxcfs-initializer
image: registry.cn-hangzhou.aliyuncs.com/denverdino/lxcfs-initializer:0.0.4
imagePullPolicy: Always
args:
- "-annotation=initializer.kubernetes.io/lxcfs"
- "-require-annotation=true"
---
apiVersion: admissionregistration.k8s.io/v1alpha1
kind: InitializerConfiguration
metadata:
name: lxcfs.initializer
initializers:
- name: lxcfs.initializer.kubernetes.io
rules:
- apiGroups:
- "*"
apiVersions:
- "*"
resources:
- pods

首先我们创建了service account lxcfs-initializer-service-account,并对其授权了 “pod” 资源的查找、更改等权限。然后我们部署了一个名为 “lxcfs-initializer” 的Initializer,利用上述SA启动一个容器来处理对 “pod” 资源的创建,如果deployment中包含 initializer.kubernetes.io/lxcfstrue的注释,就会对该应用中容器进行文件挂载

1
kubectl apply -f lxcfs-initializer.yaml

下面我们部署一个简单的Apache应用

为其分配256MB内存,并且声明了如下注释 "initializer.kubernetes.io/lxcfs": "true"

web.yaml:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: web
name: web
spec:
replicas: 1
selector:
matchLabels:
app: web
template:
metadata:
annotations:
"initializer.kubernetes.io/lxcfs": "true"
labels:
app: web
spec:
containers:
- name: web
image: httpd:2.4.32
imagePullPolicy: Always
resources:
requests:
memory: "256Mi"
cpu: "500m"
limits:
memory: "256Mi"
cpu: "500m"
1
kubectl apply -f web.yaml

验证

1
2
3
4
5
kubectl exec web-7f6bc6797c-rb9sk free
total used free shared buffers cached
Mem: 262144 2876 259268 2292 0 304
-/+ buffers/cache: 2572 259572
Swap: 0 0 0

注意:

如果自己的k8s版本跟我的不一样,可以kubectl api-versions看看版本

Donate