k8s garbage collector分析(1)-啟動分析
garbage collector介紹
Kubernetes garbage collector即垃圾收集器,存在於kube-controller-manger中,它負責回收kubernetes中的資源物件,監聽資源物件事件,更新物件之間的依賴關係,並根據物件的刪除策略來決定是否刪除其關聯物件。
關於刪除關聯物件,細一點說就是,使用級聯刪除策略去刪除一個owner
時,會連帶這個owner
物件的dependent
物件也一起刪除掉。
關於物件的關聯依賴關係,garbage collector會監聽資源物件事件,根據資源物件中ownerReference
的值,來構建物件間的關聯依賴關係,也即owner
與dependent
之間的關係。
關於owner與dependent的介紹
以建立deployment物件為例進行講解。
建立deployment物件後,kube-controller-manager為其創建出replicaset物件,且自動將該deployment的資訊設定到replicaset物件ownerReference
值。如下面示例,即說明replicaset物件test-1-59d7f45ffb
的owner
為deployment物件test-1
,deployment物件test-1
的dependent
為replicaset物件test-1-59d7f45ffb
。
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-1
namespace: test
uid: 4973d370-3221-46a7-8d86-e145bf9ad0ce
...
apiVersion: apps/v1
kind: ReplicaSet
metadata:
name: test-1-59d7f45ffb
namespace: test
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: Deployment
name: test-1
uid: 4973d370-3221-46a7-8d86-e145bf9ad0ce
uid: 386c380b-490e-470b-a33f-7d5b0bf945fb
...
同理,replicaset物件建立後,kube-controller-manager為其創建出pod物件,這些pod物件也會將replicaset物件的資訊設定到pod物件的ownerReference
的值中,replicaset是pod的owner
,pod是replicaset的dependent
。
物件中ownerReference
的值,指定了owner
與dependent
之間的關係。
garbage collector架構圖
garbage collector中最關鍵的程式碼就是garbagecollector.go
與graph_builder.go
兩部分。
garbage collector的主要組成為1個圖(物件關聯依賴關係圖)、2個處理器(GraphBuilder
與GarbageCollector
)、3個事件佇列(graphChanges
、attemptToDelete
與attemptToOrphan
):
1個圖
(1)uidToNode
:物件關聯依賴關係圖,由GraphBuilder
維護,維護著所有物件間的關聯依賴關係。在該圖裡,每一個k8s物件會對應著關係圖裡的一個node
,而每個node
都會維護一個owner
列表以及dependent
列表。
示例:現有一個deployment A,replicaset B(owner為deployment A),pod C(owner為replicaset B),則物件關聯依賴關係如下:
3個node,分別是A、B、C
A對應一個node,無owner,dependent列表裡有B;
B對應一個node,owner列表裡有A,dependent列表裡有C;
C對應一個node,owner列表裡有B,無dependent。
2個處理器
(1)GraphBuilder
:負責維護所有物件的關聯依賴關係圖,併產生事件觸發GarbageCollector
執行物件回收刪除操作。GraphBuilder
從graphChanges
事件佇列中獲取事件進行消費,根據資源物件中ownerReference
的值,來構建、更新、刪除物件間的關聯依賴關係圖,也即owner
與dependent
之間的關係圖,然後再作為生產者生產事件,放入attemptToDelete
或attemptToOrphan
佇列中,觸發GarbageCollector
執行,看是否需要進行關聯物件的回收刪除操作,而GarbageCollector
進行物件的回收刪除操作時會依賴於uidToNode
這個關係圖。
(2)GarbageCollector
:負責回收刪除物件。GarbageCollector
作為消費者,從attemptToDelete
與attemptToOrphan
佇列中取出事件進行處理,若一個物件被刪除,且其刪除策略為級聯刪除,則進行關聯物件的回收刪除。關於刪除關聯物件,細一點說就是,使用級聯刪除策略去刪除一個owner
時,會連帶這個owner
物件的dependent
物件也一起刪除掉。
3個事件佇列
(1)graphChanges
:list/watch apiserver,獲取事件,由informer
生產,由GraphBuilder
消費;
(2)attemptToDelete
:級聯刪除事件佇列,由GraphBuilder
生產,由GarbageCollector
消費;
(3)attemptToOrphan
:孤兒刪除事件佇列,由GraphBuilder
生產,由GarbageCollector
消費。
garbage collector相關啟動引數分析
kcm元件啟動引數中,與garbage collector
相關的引數程式碼如下:
// cmd/kube-controller-manager/app/options/garbagecollectorcontroller.go
// AddFlags adds flags related to GarbageCollectorController for controller manager to the specified FlagSet.
func (o *GarbageCollectorControllerOptions) AddFlags(fs *pflag.FlagSet) {
if o == nil {
return
}
fs.Int32Var(&o.ConcurrentGCSyncs, "concurrent-gc-syncs", o.ConcurrentGCSyncs, "The number of garbage collector workers that are allowed to sync concurrently.")
fs.BoolVar(&o.EnableGarbageCollector, "enable-garbage-collector", o.EnableGarbageCollector, "Enables the generic garbage collector. MUST be synced with the corresponding flag of the kube-apiserver.")
}
從程式碼中可以看到,kcm元件啟動引數中有兩個引數與garbage collector
相關,分別是:
(1)enable-garbage-collector
:是否開啟garbage collector
,預設值為true
;
(2)concurrent-gc-syncs
:garbage collector
同步操作的worker數量,預設20
。
garbage collector的原始碼分析將分成兩部分進行,分別是:
(1)啟動分析;
(2)核心處理邏輯分析。
本篇部落格先對garbage collector進行啟動分析。
garbage collector原始碼分析-啟動分析
基於tag v1.17.4
https://github.com/kubernetes/kubernetes/releases/tag/v1.17.4
直接以startGarbageCollectorController
函式作為garbage collector的原始碼分析入口。
startGarbageCollectorController
startGarbageCollectorController函式主要邏輯如下:
(1)根據EnableGarbageCollector
變數的值來決定是否開啟garbage collector
,EnableGarbageCollector
變數的值根據kcm元件啟動引數--enable-garbage-collector
配置獲取,預設為true
;不開啟則直接返回,不會繼續往下執行;
(2)初始化discoveryClient
,主要用來獲取叢集中的所有資源物件;
(3)呼叫garbagecollector.GetDeletableResources
,獲取叢集內garbage collector
需要處理去刪除回收的所有資源物件,支援delete
, list
, watch
三種操作的資源物件稱為 deletableResource
;
(4)呼叫garbagecollector.NewGarbageCollector
初始化garbage collector
;
(5)呼叫garbageCollector.Run
,啟動garbage collector
;
(6)呼叫garbageCollector.Sync
監聽叢集中的deletableResources
,當出現新的deletableResources
時同步到monitors
中,確保監控叢集中的所有資源;
(7)暴露http服務,註冊 debug 介面,用於debug,用來提供由GraphBuilder
構建的叢集內所有物件的關聯關係。
// cmd/kube-controller-manager/app/core.go
func startGarbageCollectorController(ctx ControllerContext) (http.Handler, bool, error) {
if !ctx.ComponentConfig.GarbageCollectorController.EnableGarbageCollector {
return nil, false, nil
}
gcClientset := ctx.ClientBuilder.ClientOrDie("generic-garbage-collector")
discoveryClient := cacheddiscovery.NewMemCacheClient(gcClientset.Discovery())
config := ctx.ClientBuilder.ConfigOrDie("generic-garbage-collector")
metadataClient, err := metadata.NewForConfig(config)
if err != nil {
return nil, true, err
}
// Get an initial set of deletable resources to prime the garbage collector.
deletableResources := garbagecollector.GetDeletableResources(discoveryClient)
ignoredResources := make(map[schema.GroupResource]struct{})
for _, r := range ctx.ComponentConfig.GarbageCollectorController.GCIgnoredResources {
ignoredResources[schema.GroupResource{Group: r.Group, Resource: r.Resource}] = struct{}{}
}
garbageCollector, err := garbagecollector.NewGarbageCollector(
metadataClient,
ctx.RESTMapper,
deletableResources,
ignoredResources,
ctx.ObjectOrMetadataInformerFactory,
ctx.InformersStarted,
)
if err != nil {
return nil, true, fmt.Errorf("failed to start the generic garbage collector: %v", err)
}
// Start the garbage collector.
workers := int(ctx.ComponentConfig.GarbageCollectorController.ConcurrentGCSyncs)
go garbageCollector.Run(workers, ctx.Stop)
// Periodically refresh the RESTMapper with new discovery information and sync
// the garbage collector.
go garbageCollector.Sync(gcClientset.Discovery(), 30*time.Second, ctx.Stop)
return garbagecollector.NewDebugHandler(garbageCollector), true, nil
}
下面對startGarbageCollectorController
函式裡的部分邏輯稍微展開一下分析。
1.garbagecollector.NewGarbageCollector
NewGarbageCollector函式負責初始化garbage collector
。主要邏輯如下:
(1)初始化GarbageCollector
結構體;
(2)初始化GraphBuilder
結構體,並賦值給GarbageCollector
結構體的dependencyGraphBuilder
屬性。
// pkg/controller/garbagecollector/garbagecollector.go
func NewGarbageCollector(
metadataClient metadata.Interface,
mapper resettableRESTMapper,
deletableResources map[schema.GroupVersionResource]struct{},
ignoredResources map[schema.GroupResource]struct{},
sharedInformers controller.InformerFactory,
informersStarted <-chan struct{},
) (*GarbageCollector, error) {
attemptToDelete := workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "garbage_collector_attempt_to_delete")
attemptToOrphan := workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "garbage_collector_attempt_to_orphan")
absentOwnerCache := NewUIDCache(500)
gc := &GarbageCollector{
metadataClient: metadataClient,
restMapper: mapper,
attemptToDelete: attemptToDelete,
attemptToOrphan: attemptToOrphan,
absentOwnerCache: absentOwnerCache,
}
gb := &GraphBuilder{
metadataClient: metadataClient,
informersStarted: informersStarted,
restMapper: mapper,
graphChanges: workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "garbage_collector_graph_changes"),
uidToNode: &concurrentUIDToNode{
uidToNode: make(map[types.UID]*node),
},
attemptToDelete: attemptToDelete,
attemptToOrphan: attemptToOrphan,
absentOwnerCache: absentOwnerCache,
sharedInformers: sharedInformers,
ignoredResources: ignoredResources,
}
if err := gb.syncMonitors(deletableResources); err != nil {
utilruntime.HandleError(fmt.Errorf("failed to sync all monitors: %v", err))
}
gc.dependencyGraphBuilder = gb
return gc, nil
}
1.1 gb.syncMonitors
gb.syncMonitors的主要作用是呼叫gb.controllerFor
對各個deletableResources
(deletableResources
指支援 “delete”, “list”, “watch” 三種操作的資源物件)資源物件的infomer
做初始化,併為資源的變化事件註冊eventHandler
(AddFunc、UpdateFunc 和 DeleteFunc),對於資源的add、update、delete event,都會push到graphChanges
佇列中,然後gb.processGraphChanges
會從graphChanges
佇列中取出event進行處理(後面介紹garbage collector處理邏輯的時候會做詳細分析)。
// pkg/controller/garbagecollector/graph_builder.go
func (gb *GraphBuilder) syncMonitors(resources map[schema.GroupVersionResource]struct{}) error {
gb.monitorLock.Lock()
defer gb.monitorLock.Unlock()
toRemove := gb.monitors
if toRemove == nil {
toRemove = monitors{}
}
current := monitors{}
errs := []error{}
kept := 0
added := 0
for resource := range resources {
if _, ok := gb.ignoredResources[resource.GroupResource()]; ok {
continue
}
if m, ok := toRemove[resource]; ok {
current[resource] = m
delete(toRemove, resource)
kept++
continue
}
kind, err := gb.restMapper.KindFor(resource)
if err != nil {
errs = append(errs, fmt.Errorf("couldn't look up resource %q: %v", resource, err))
continue
}
c, s, err := gb.controllerFor(resource, kind)
if err != nil {
errs = append(errs, fmt.Errorf("couldn't start monitor for resource %q: %v", resource, err))
continue
}
current[resource] = &monitor{store: s, controller: c}
added++
}
gb.monitors = current
for _, monitor := range toRemove {
if monitor.stopCh != nil {
close(monitor.stopCh)
}
}
klog.V(4).Infof("synced monitors; added %d, kept %d, removed %d", added, kept, len(toRemove))
// NewAggregate returns nil if errs is 0-length
return utilerrors.NewAggregate(errs)
}
gb.controllerFor
gb.controllerFor主要是對資源物件的infomer
做初始化,併為資源的變化事件註冊eventHandler
(AddFunc、UpdateFunc 和 DeleteFunc),對於資源的add、update、delete event,都會push到graphChanges
佇列中。
// pkg/controller/garbagecollector/graph_builder.go
func (gb *GraphBuilder) controllerFor(resource schema.GroupVersionResource, kind schema.GroupVersionKind) (cache.Controller, cache.Store, error) {
handlers := cache.ResourceEventHandlerFuncs{
// add the event to the dependencyGraphBuilder's graphChanges.
AddFunc: func(obj interface{}) {
event := &event{
eventType: addEvent,
obj: obj,
gvk: kind,
}
gb.graphChanges.Add(event)
},
UpdateFunc: func(oldObj, newObj interface{}) {
// TODO: check if there are differences in the ownerRefs,
// finalizers, and DeletionTimestamp; if not, ignore the update.
event := &event{
eventType: updateEvent,
obj: newObj,
oldObj: oldObj,
gvk: kind,
}
gb.graphChanges.Add(event)
},
DeleteFunc: func(obj interface{}) {
// delta fifo may wrap the object in a cache.DeletedFinalStateUnknown, unwrap it
if deletedFinalStateUnknown, ok := obj.(cache.DeletedFinalStateUnknown); ok {
obj = deletedFinalStateUnknown.Obj
}
event := &event{
eventType: deleteEvent,
obj: obj,
gvk: kind,
}
gb.graphChanges.Add(event)
},
}
shared, err := gb.sharedInformers.ForResource(resource)
if err != nil {
klog.V(4).Infof("unable to use a shared informer for resource %q, kind %q: %v", resource.String(), kind.String(), err)
return nil, nil, err
}
klog.V(4).Infof("using a shared informer for resource %q, kind %q", resource.String(), kind.String())
// need to clone because it's from a shared cache
shared.Informer().AddEventHandlerWithResyncPeriod(handlers, ResourceResyncTime)
return shared.Informer().GetController(), shared.Informer().GetStore(), nil
}
2.garbageCollector.Run
garbageCollector.Run負責啟動garbage collector
,主要邏輯如下:
(1)呼叫gc.dependencyGraphBuilder.Run
:啟動GraphBuilder
;
(2)根據啟動引數配置的worker數量,起相應數量的goroutine,執行gc.runAttemptToDeleteWorker
與gc.runAttemptToOrphanWorker
,兩者屬於GarbageCollector
的核心處理邏輯,都是去刪除需要被回收物件,具體分析會在下篇部落格裡進行分析。
// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) Run(workers int, stopCh <-chan struct{}) {
defer utilruntime.HandleCrash()
defer gc.attemptToDelete.ShutDown()
defer gc.attemptToOrphan.ShutDown()
defer gc.dependencyGraphBuilder.graphChanges.ShutDown()
klog.Infof("Starting garbage collector controller")
defer klog.Infof("Shutting down garbage collector controller")
go gc.dependencyGraphBuilder.Run(stopCh)
if !cache.WaitForNamedCacheSync("garbage collector", stopCh, gc.dependencyGraphBuilder.IsSynced) {
return
}
klog.Infof("Garbage collector: all resource monitors have synced. Proceeding to collect garbage")
// gc workers
for i := 0; i < workers; i++ {
go wait.Until(gc.runAttemptToDeleteWorker, 1*time.Second, stopCh)
go wait.Until(gc.runAttemptToOrphanWorker, 1*time.Second, stopCh)
}
<-stopCh
}
2.1 gc.dependencyGraphBuilder.Run
gc.dependencyGraphBuilder.Run負責啟動啟動GraphBuilder
,主要邏輯如下:
(1)呼叫gb.startMonitors
,啟動前面1.1 gb.syncMonitors
中提到的infomers;
(2)每隔1s迴圈呼叫gb.runProcessGraphChanges
,做GraphBuilder
的核心邏輯處理,核心處理邏輯會在下篇部落格裡進行分析。
// pkg/controller/garbagecollector/graph_builder.go
func (gb *GraphBuilder) Run(stopCh <-chan struct{}) {
klog.Infof("GraphBuilder running")
defer klog.Infof("GraphBuilder stopping")
// Set up the stop channel.
gb.monitorLock.Lock()
gb.stopCh = stopCh
gb.running = true
gb.monitorLock.Unlock()
// Start monitors and begin change processing until the stop channel is
// closed.
gb.startMonitors()
wait.Until(gb.runProcessGraphChanges, 1*time.Second, stopCh)
// Stop any running monitors.
gb.monitorLock.Lock()
defer gb.monitorLock.Unlock()
monitors := gb.monitors
stopped := 0
for _, monitor := range monitors {
if monitor.stopCh != nil {
stopped++
close(monitor.stopCh)
}
}
// reset monitors so that the graph builder can be safely re-run/synced.
gb.monitors = nil
klog.Infof("stopped %d of %d monitors", stopped, len(monitors))
}
3.garbageCollector.Sync
garbageCollector.Sync的主要功能是週期性的查詢叢集中所有的deletableResources
,呼叫gc.resyncMonitors
來更新GraphBuilder
的monitors
,為新出現的資源物件初始化infomer
和註冊eventHandler
,然後啟動infomer
,對已經移除的資源物件的monitors
進行銷燬。
// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) Sync(discoveryClient discovery.ServerResourcesInterface, period time.Duration, stopCh <-chan struct{}) {
oldResources := make(map[schema.GroupVersionResource]struct{})
wait.Until(func() {
// Get the current resource list from discovery.
newResources := GetDeletableResources(discoveryClient)
...
if err := gc.resyncMonitors(newResources); err != nil {
utilruntime.HandleError(fmt.Errorf("failed to sync resource monitors (attempt %d): %v", attempt, err))
return false, nil
}
klog.V(4).Infof("resynced monitors")
...
3.1 gc.resyncMonitors
呼叫gc.dependencyGraphBuilder.syncMonitors
:初始化infomer
和註冊eventHandler
;
呼叫gc.dependencyGraphBuilder.startMonitors
:啟動infomer
。
// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) resyncMonitors(deletableResources map[schema.GroupVersionResource]struct{}) error {
if err := gc.dependencyGraphBuilder.syncMonitors(deletableResources); err != nil {
return err
}
gc.dependencyGraphBuilder.startMonitors()
return nil
}
4.garbagecollector.NewDebugHandler
garbagecollector.NewDebugHandler暴露http服務,註冊 debug 介面,用於debug,用來提供由GraphBuilder
構建的叢集內所有物件的關聯關係。
// pkg/controller/garbagecollector/dump.go
func NewDebugHandler(controller *GarbageCollector) http.Handler {
return &debugHTTPHandler{controller: controller}
}
type debugHTTPHandler struct {
controller *GarbageCollector
}
func (h *debugHTTPHandler) ServeHTTP(w http.ResponseWriter, req *http.Request) {
if req.URL.Path != "/graph" {
http.Error(w, "", http.StatusNotFound)
return
}
var graph graph.Directed
if uidStrings := req.URL.Query()["uid"]; len(uidStrings) > 0 {
uids := []types.UID{}
for _, uidString := range uidStrings {
uids = append(uids, types.UID(uidString))
}
graph = h.controller.dependencyGraphBuilder.uidToNode.ToGonumGraphForObj(uids...)
} else {
graph = h.controller.dependencyGraphBuilder.uidToNode.ToGonumGraph()
}
data, err := dot.Marshal(graph, "full", "", " ")
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", "text/vnd.graphviz")
w.Header().Set("X-Content-Type-Options", "nosniff")
w.Write(data)
w.WriteHeader(http.StatusOK)
}
獲取物件關聯關係圖
獲取全部的物件關聯關係圖:
curl http://{master_ip}:{kcm_port}/debug/controllers/garbagecollector/graph -o {output_file}
獲取特定uid的物件關聯關係圖:
curl http://{master_ip}:{kcm_port}/debug/controllers/garbagecollector/graph?uid={project_uid} -o {output_file}
示例:
curl http://192.168.1.10:10252/debug/controllers/garbagecollector/graph?uid=8727f640-112e-21eb-11dd-626400510df6 -o /home/test
總結
garbage collector介紹
Kubernetes garbage collector即垃圾收集器,存在於kube-controller-manger中,它負責回收kubernetes中的資源物件,監聽資源物件事件,更新物件之間的依賴關係,並根據物件的刪除策略來決定是否刪除其關聯物件。
garbage collector架構圖
garbage collector的主要組成為1個圖(物件關聯依賴關係圖)、2個處理器(GraphBuilder
與GarbageCollector
)、3個事件佇列(graphChanges
、attemptToDelete
與attemptToOrphan
)。
garbage collector啟動分析
garbage collector的啟動主要是啟動了2個處理器(GraphBuilder
與GarbageCollector
),定義了物件關聯依賴關係圖以及3個事件佇列(graphChanges
、attemptToDelete
與attemptToOrphan
)。
從apiserver list/watch的事件會放入到graphChanges
佇列,而GraphBuilder
從graphChanges
佇列中取出事件進行處理,構建物件關聯依賴關係圖,並根據物件刪除策略將關聯物件放入attemptToDelete
或attemptToOrphan
佇列中,接著GarbageCollector
會從attemptToDelete
與attemptToOrphan
佇列中取出事件,再從物件關聯依賴關係圖中獲取資訊進行處理,最後回收刪除物件。