garbage collector介紹

Kubernetes garbage collector即垃圾收集器,存在於kube-controller-manger中,它負責回收kubernetes中的資源物件,監聽資源物件事件,更新物件之間的依賴關係,並根據物件的刪除策略來決定是否刪除其關聯物件。

關於刪除關聯物件,細一點說就是,使用級聯刪除策略去刪除一個owner時,會連帶這個owner物件的dependent物件也一起刪除掉。

關於物件的關聯依賴關係,garbage collector會監聽資源物件事件,根據資源物件中ownerReference 的值,來構建物件間的關聯依賴關係,也即ownerdependent之間的關係。

關於owner與dependent的介紹

以建立deployment物件為例進行講解。

建立deployment物件後,kube-controller-manager為其創建出replicaset物件,且自動將該deployment的資訊設定到replicaset物件ownerReference值。如下面示例,即說明replicaset物件test-1-59d7f45ffbowner為deployment物件test-1,deployment物件test-1dependent為replicaset物件test-1-59d7f45ffb

apiVersion: apps/v1
kind: Deployment
metadata:
name: test-1
namespace: test
uid: 4973d370-3221-46a7-8d86-e145bf9ad0ce
...
apiVersion: apps/v1
kind: ReplicaSet
metadata:
name: test-1-59d7f45ffb
namespace: test
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: Deployment
name: test-1
uid: 4973d370-3221-46a7-8d86-e145bf9ad0ce
uid: 386c380b-490e-470b-a33f-7d5b0bf945fb
...

同理,replicaset物件建立後,kube-controller-manager為其創建出pod物件,這些pod物件也會將replicaset物件的資訊設定到pod物件的ownerReference的值中,replicaset是pod的owner,pod是replicaset的dependent

物件中ownerReference 的值,指定了ownerdependent之間的關係。

garbage collector架構圖

garbage collectort的詳細架構與核心處理邏輯如下圖。

garbage collector中最關鍵的程式碼就是garbagecollector.gograph_builder.go兩部分。

garbage collector的主要組成為1個圖(物件關聯依賴關係圖)、2個處理器(GraphBuilderGarbageCollector)、3個事件佇列(graphChangesattemptToDeleteattemptToOrphan):

1個圖

(1)uidToNode:物件關聯依賴關係圖,由GraphBuilder維護,維護著所有物件間的關聯依賴關係。在該圖裡,每一個k8s物件會對應著關係圖裡的一個node,而每個node都會維護一個owner列表以及dependent列表。

示例:現有一個deployment A,replicaset B(owner為deployment A),pod C(owner為replicaset B),則物件關聯依賴關係如下:

3個node,分別是A、B、C

A對應一個node,無owner,dependent列表裡有B;
B對應一個node,owner列表裡有A,dependent列表裡有C;
C對應一個node,owner列表裡有B,無dependent。

2個處理器

(1)GraphBuilder:負責維護所有物件的關聯依賴關係圖,併產生事件觸發GarbageCollector執行物件回收刪除操作。GraphBuildergraphChanges事件佇列中獲取事件進行消費,根據資源物件中ownerReference的值,來構建、更新、刪除物件間的關聯依賴關係圖,也即ownerdependent之間的關係圖,然後再作為生產者生產事件,放入attemptToDeleteattemptToOrphan佇列中,觸發GarbageCollector執行,看是否需要進行關聯物件的回收刪除操作,而GarbageCollector進行物件的回收刪除操作時會依賴於uidToNode這個關係圖。

(2)GarbageCollector:負責回收刪除物件。GarbageCollector作為消費者,從attemptToDeleteattemptToOrphan佇列中取出事件進行處理,若一個物件被刪除,且其刪除策略為級聯刪除,則進行關聯物件的回收刪除。關於刪除關聯物件,細一點說就是,使用級聯刪除策略去刪除一個owner時,會連帶這個owner物件的dependent物件也一起刪除掉。

3個事件佇列

(1)graphChanges:list/watch apiserver,獲取事件,由informer生產,由GraphBuilder消費;

(2)attemptToDelete:級聯刪除事件佇列,由GraphBuilder生產,由GarbageCollector消費;

(3)attemptToOrphan:孤兒刪除事件佇列,由GraphBuilder生產,由GarbageCollector消費。

物件刪除策略

kubernetes 中有三種物件刪除策略:OrphanForeground Background,刪除某個物件時,可以指定刪除策略。下面對這三種策略進行介紹。

Foreground前臺刪除

Foreground即前臺刪除策略,屬於級聯刪除策略,垃圾收集器會刪除物件的所有dependent

使用前臺刪除策略刪除某個物件時,該物件的 deletionTimestamp 欄位被設定,且物件的 metadata.finalizers 欄位包含值 foregroundDeletion,用於阻塞該物件刪除,等到垃圾收集器在刪除了該物件中所有有阻塞能力的dependent物件(物件的 ownerReference.blockOwnerDeletion=true) 之後,再去除該物件的 metadata.finalizers 欄位中的值 foregroundDeletion,然後刪除該owner物件。

以刪除deployment為例,使用前臺刪除策略,則按照Pod->ReplicaSet->Deployment的順序進行刪除。

Background後臺刪除

Background即後臺刪除策略,屬於級聯刪除策略,Kubernetes會立即刪除該owner物件,之後垃圾收集器會在後臺自動刪除其所有的dependent物件。

當刪除一個物件時使用了Background後臺刪除策略時,該物件因沒有相關的Finalizer設定(只有刪除策略為foregroundOrphan時會設定相關Finalizer),會直接被刪除,接著GraphBuilder會監聽到該物件的delete事件,會將其dependents放入到attemptToDelete佇列中去,觸發GarbageCollectordependents物件的回收刪除處理。

以刪除deployment為例,使用後臺刪除策略,則按照Deployment->ReplicaSet->Pod的順序進行刪除。

Orphan孤兒刪除

Orphan即孤兒刪除策略,屬於非級聯刪除策略,即刪除某個物件時,不會自動刪除它的dependent,這些dependent也被稱作孤立物件。

當刪除一個物件時使用了Orphan孤兒刪除策略時,該物件的 metadata.finalizers 欄位包含值 orphan,用於阻塞該物件刪除,直至GarbageCollector將其所有dependentsOwnerReferences屬性中的該owner的相關欄位去除,再去除該owner物件的 metadata.finalizers 欄位中的值 Orphan,最後才能刪除該owner物件。

以刪除deployment為例,使用孤兒刪除策略,則只刪除Deployment,對應ReplicaSet和Pod不刪除。

刪除物件時指定刪除策略

當刪除物件時沒有特別指定刪除策略,將會使用預設刪除策略:Background即後臺刪除策略。

(1)指定後臺刪除策略

curl -X DELETE localhost:8080/apis/apps/v1/namespaces/default/replicasets/my-repset \
-d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Background"}' \
-H "Content-Type: application/json"

(2)指定前臺刪除策略

curl -X DELETE localhost:8080/apis/apps/v1/namespaces/default/replicasets/my-repset \
-d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Foreground"}' \
-H "Content-Type: application/json"

(3)指定孤兒刪除策略

curl -X DELETE localhost:8080/apis/apps/v1/namespaces/default/replicasets/my-repset \
-d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Orphan"}' \
-H "Content-Type: application/json"

garbage collector的原始碼分析分成兩部分進行,分別是:

(1)啟動分析;

(2)核心處理邏輯分析。

上一篇部落格已經對garbage collector的啟動進行了分析,本篇部落格對garbage collector的核心處理邏輯進行分析。

garbage collector原始碼分析-處理邏輯分析

基於tag v1.17.4

https://github.com/kubernetes/kubernetes/releases/tag/v1.17.4

前面講過,garbage collector中最關鍵的程式碼就是garbagecollector.gograph_builder.go兩部分,也即GarbageCollector structGraphBuilder struct,所以下面處理邏輯分析將分成兩大塊進行分析。

1.GraphBuilder

首先先看到GraphBuilder

GraphBuilder 主要有2個功能:

(1)基於 informers 中的資源事件在 uidToNode 屬性中維護著所有物件的關聯依賴關係;

(2)處理 graphChanges 中的事件,並作為生產者將事件放入到 attemptToDeleteattemptToOrphan 兩個佇列中,觸發消費者GarbageCollector進行物件的回收刪除操作。

1.1 GraphBuilder struct

先來簡單的分析下GraphBuilder struct,裡面最關鍵的幾個屬性及作用如下:

(1)graphChanges:informers 監聽到的事件會放在 graphChanges 中,然後GraphBuilder會作為消費者,處理graphChanges佇列中的事件;

(2)uidToNode(物件依賴關聯關係圖):根據物件uid,維護所有物件的關聯依賴關係,也即前面說的ownerdependent之間的關係,也可以理解為GraphBuilder會維護一張所有物件的關聯依賴關係圖,而GarbageCollector進行物件的回收刪除操作時會依賴於這個關係圖;

(3)attemptToDeleteattemptToOrphanGraphBuilder作為生產者往attemptToDeleteattemptToOrphan 兩個佇列中存放事件,然後GarbageCollector作為消費者會處理 attemptToDeleteattemptToOrphan 兩個佇列中的事件。

// pkg/controller/garbagecollector/graph_builder.go
type GraphBuilder struct {
... // monitors are the producer of the graphChanges queue, graphBuilder alters
// the in-memory graph according to the changes.
graphChanges workqueue.RateLimitingInterface
// uidToNode doesn't require a lock to protect, because only the
// single-threaded GraphBuilder.processGraphChanges() reads/writes it.
uidToNode *concurrentUIDToNode
// GraphBuilder is the producer of attemptToDelete and attemptToOrphan, GC is the consumer.
attemptToDelete workqueue.RateLimitingInterface
attemptToOrphan workqueue.RateLimitingInterface ...
}
// pkg/controller/garbagecollector/graph.go
type concurrentUIDToNode struct {
uidToNodeLock sync.RWMutex
uidToNode map[types.UID]*node
}
// pkg/controller/garbagecollector/graph.go
type node struct {
...
dependents map[*node]struct{}
...
owners []metav1.OwnerReference
}

從結構體定義中可以看到,一個k8s物件對應著物件關聯依賴關係圖裡的一個node,而每個node都會維護一個owner列表以及dependent列表。

1.2 GraphBuilder-gb.processGraphChanges

接下來看到GraphBuilder的處理邏輯部分,從gb.processGraphChanges作為入口進行處理邏輯分析。

前面說過,informers 監聽到的事件會放入到 graphChanges 佇列中,然後GraphBuilder會作為消費者,處理graphChanges佇列中的事件,而processGraphChanges方法就是GraphBuilder作為消費者處理graphChanges佇列中事件地方。

所以在此方法中,GraphBuilder既是消費者又是生產者,消費處理graphChanges 中的所有事件並進行分類,再生產事件放入到 attemptToDeleteattemptToOrphan 兩個佇列中去,讓GarbageCollector作為消費者去處理這兩個佇列中的事件。

主要邏輯:

(1)從graphChanges佇列中取出事件進行處理;

(2)讀取uidToNode,判斷該物件是否已經存在於已構建的物件依賴關聯關係圖中;下面就開始根據物件是否存在於物件依賴關聯關係圖中以及事件型別來做不同的處理邏輯;

(3)若 uidToNode 中不存在該 node 且該事件是 addEventupdateEvent,則為該 object 建立對應的 node,並呼叫 gb.insertNode 將該 node 加到 uidToNode 中,然後將該 node 新增到其 ownerdependents 中;

然後再呼叫 gb.processTransitions 方法做處理,該方法的處理邏輯是判斷該物件是否處於刪除狀態,若處於刪除狀態會判斷該物件是以 orphan 模式刪除還是以 foreground 模式刪除(其實就是判斷deployment物件的finalizer來區分刪除模式,刪除deployment的時候會帶上刪除策略,kube-apiserver會根據刪除策略給deployment物件打上相應的finalizer),若以 orphan 模式刪除,則將該 node 加入到 attemptToOrphan 佇列中,若以 foreground 模式刪除則將該物件以及其所有 dependents 都加入到 attemptToDelete 佇列中;

(4)若 uidToNode 中存在該 node 且該事件是 addEventupdateEvent 時,則呼叫 referencesDiffs 方法檢查該物件的 OwnerReferences 欄位是否有變化,有變化則做相應處理,更新物件依賴關聯關係圖,最後呼叫 gb.processTransitions做處理;

(5)若事件為刪除事件,則呼叫gb.removeNode,從uidToNode中刪除該物件,然後從該node所有ownersdependents中刪除該物件,再把該物件的dependents放入到attemptToDelete佇列中,觸發GarbageCollector處理;最後檢查該 node 的所有 owners,若有處於刪除狀態的 owner,此時該 owner 可能處於刪除阻塞狀態正在等待該 node 的刪除,將該 owner 加入到 attemptToDelete佇列中,觸發GarbageCollector處理。

// pkg/controller/garbagecollector/graph_builder.go
func (gb *GraphBuilder) runProcessGraphChanges() {
for gb.processGraphChanges() {
}
} // Dequeueing an event from graphChanges, updating graph, populating dirty_queue.
func (gb *GraphBuilder) processGraphChanges() bool {
item, quit := gb.graphChanges.Get()
if quit {
return false
}
defer gb.graphChanges.Done(item)
event, ok := item.(*event)
if !ok {
utilruntime.HandleError(fmt.Errorf("expect a *event, got %v", item))
return true
}
obj := event.obj
accessor, err := meta.Accessor(obj)
if err != nil {
utilruntime.HandleError(fmt.Errorf("cannot access obj: %v", err))
return true
}
klog.V(5).Infof("GraphBuilder process object: %s/%s, namespace %s, name %s, uid %s, event type %v", event.gvk.GroupVersion().String(), event.gvk.Kind, accessor.GetNamespace(), accessor.GetName(), string(accessor.GetUID()), event.eventType)
// Check if the node already exists
existingNode, found := gb.uidToNode.Read(accessor.GetUID())
if found {
// this marks the node as having been observed via an informer event
// 1. this depends on graphChanges only containing add/update events from the actual informer
// 2. this allows things tracking virtual nodes' existence to stop polling and rely on informer events
existingNode.markObserved()
}
switch {
case (event.eventType == addEvent || event.eventType == updateEvent) && !found:
newNode := &node{
identity: objectReference{
OwnerReference: metav1.OwnerReference{
APIVersion: event.gvk.GroupVersion().String(),
Kind: event.gvk.Kind,
UID: accessor.GetUID(),
Name: accessor.GetName(),
},
Namespace: accessor.GetNamespace(),
},
dependents: make(map[*node]struct{}),
owners: accessor.GetOwnerReferences(),
deletingDependents: beingDeleted(accessor) && hasDeleteDependentsFinalizer(accessor),
beingDeleted: beingDeleted(accessor),
}
gb.insertNode(newNode)
// the underlying delta_fifo may combine a creation and a deletion into
// one event, so we need to further process the event.
gb.processTransitions(event.oldObj, accessor, newNode)
case (event.eventType == addEvent || event.eventType == updateEvent) && found:
// handle changes in ownerReferences
added, removed, changed := referencesDiffs(existingNode.owners, accessor.GetOwnerReferences())
if len(added) != 0 || len(removed) != 0 || len(changed) != 0 {
// check if the changed dependency graph unblock owners that are
// waiting for the deletion of their dependents.
gb.addUnblockedOwnersToDeleteQueue(removed, changed)
// update the node itself
existingNode.owners = accessor.GetOwnerReferences()
// Add the node to its new owners' dependent lists.
gb.addDependentToOwners(existingNode, added)
// remove the node from the dependent list of node that are no longer in
// the node's owners list.
gb.removeDependentFromOwners(existingNode, removed)
} if beingDeleted(accessor) {
existingNode.markBeingDeleted()
}
gb.processTransitions(event.oldObj, accessor, existingNode)
case event.eventType == deleteEvent:
if !found {
klog.V(5).Infof("%v doesn't exist in the graph, this shouldn't happen", accessor.GetUID())
return true
}
// removeNode updates the graph
gb.removeNode(existingNode)
existingNode.dependentsLock.RLock()
defer existingNode.dependentsLock.RUnlock()
if len(existingNode.dependents) > 0 {
gb.absentOwnerCache.Add(accessor.GetUID())
}
for dep := range existingNode.dependents {
gb.attemptToDelete.Add(dep)
}
for _, owner := range existingNode.owners {
ownerNode, found := gb.uidToNode.Read(owner.UID)
if !found || !ownerNode.isDeletingDependents() {
continue
}
// this is to let attempToDeleteItem check if all the owner's
// dependents are deleted, if so, the owner will be deleted.
gb.attemptToDelete.Add(ownerNode)
}
}
return true
}

結合程式碼分析可以得知,當刪除一個物件時使用了Background後臺刪除策略時,該物件因沒有相關的Finalizer設定(只有刪除策略為ForegroundOrphan時會設定相關Finalizer),會直接被刪除,接著GraphBuilder會監聽到該物件的delete事件,會將其dependents放入到attemptToDelete佇列中去,觸發GarbageCollectordependents物件的回收刪除處理。

1.2.1 gb.insertNode

呼叫 gb.insertNodenode 加到 uidToNode 中,然後將該 node 新增到其 ownerdependents 中。

// pkg/controller/garbagecollector/graph_builder.go
func (gb *GraphBuilder) insertNode(n *node) {
gb.uidToNode.Write(n)
gb.addDependentToOwners(n, n.owners)
} func (gb *GraphBuilder) addDependentToOwners(n *node, owners []metav1.OwnerReference) {
for _, owner := range owners {
ownerNode, ok := gb.uidToNode.Read(owner.UID)
if !ok {
// Create a "virtual" node in the graph for the owner if it doesn't
// exist in the graph yet.
ownerNode = &node{
identity: objectReference{
OwnerReference: owner,
Namespace: n.identity.Namespace,
},
dependents: make(map[*node]struct{}),
virtual: true,
}
klog.V(5).Infof("add virtual node.identity: %s\n\n", ownerNode.identity)
gb.uidToNode.Write(ownerNode)
}
ownerNode.addDependent(n)
if !ok {
// Enqueue the virtual node into attemptToDelete.
// The garbage processor will enqueue a virtual delete
// event to delete it from the graph if API server confirms this
// owner doesn't exist.
gb.attemptToDelete.Add(ownerNode)
}
}
}

1.2.2 gb.processTransitions

gb.processTransitions 方法檢查k8s物件是否處於刪除狀態(物件的deletionTimestamp屬性不為空則處於刪除狀態),並且物件裡含有刪除策略對應的finalizer,然後做相應的處理。

因為只有刪除策略為ForegroundOrphan時物件才會會設定相關Finalizer,所以該方法只會處理刪除策略為ForegroundOrphan的物件,對於刪除策略為Background的物件不做處理。

若物件的deletionTimestamp屬性不為空,且有Orphaned刪除策略對應的finalizer,則將對應的node放入到 attemptToOrphan 佇列中,觸發GarbageCollector去消費處理;

若物件的deletionTimestamp屬性不為空,且有foreground刪除策略對應的finalizer,則呼叫n.markDeletingDependents標記 node deletingDependents 屬性為 true,代表該nodedependents正在被刪除,並將對應的node及其dependents放入到 attemptToDelete 佇列中,觸發GarbageCollector去消費處理。

// pkg/controller/garbagecollector/graph_builder.go
func (gb *GraphBuilder) processTransitions(oldObj interface{}, newAccessor metav1.Object, n *node) {
if startsWaitingForDependentsOrphaned(oldObj, newAccessor) {
klog.V(5).Infof("add %s to the attemptToOrphan", n.identity)
gb.attemptToOrphan.Add(n)
return
}
if startsWaitingForDependentsDeleted(oldObj, newAccessor) {
klog.V(2).Infof("add %s to the attemptToDelete, because it's waiting for its dependents to be deleted", n.identity)
// if the n is added as a "virtual" node, its deletingDependents field is not properly set, so always set it here.
n.markDeletingDependents()
for dep := range n.dependents {
gb.attemptToDelete.Add(dep)
}
gb.attemptToDelete.Add(n)
}
} func startsWaitingForDependentsOrphaned(oldObj interface{}, newAccessor metav1.Object) bool {
return deletionStartsWithFinalizer(oldObj, newAccessor, metav1.FinalizerOrphanDependents)
} func startsWaitingForDependentsDeleted(oldObj interface{}, newAccessor metav1.Object) bool {
return deletionStartsWithFinalizer(oldObj, newAccessor, metav1.FinalizerDeleteDependents)
} func deletionStartsWithFinalizer(oldObj interface{}, newAccessor metav1.Object, matchingFinalizer string) bool {
// if the new object isn't being deleted, or doesn't have the finalizer we're interested in, return false
if !beingDeleted(newAccessor) || !hasFinalizer(newAccessor, matchingFinalizer) {
return false
} // if the old object is nil, or wasn't being deleted, or didn't have the finalizer, return true
if oldObj == nil {
return true
}
oldAccessor, err := meta.Accessor(oldObj)
if err != nil {
utilruntime.HandleError(fmt.Errorf("cannot access oldObj: %v", err))
return false
}
return !beingDeleted(oldAccessor) || !hasFinalizer(oldAccessor, matchingFinalizer)
} func beingDeleted(accessor metav1.Object) bool {
return accessor.GetDeletionTimestamp() != nil
} func hasFinalizer(accessor metav1.Object, matchingFinalizer string) bool {
finalizers := accessor.GetFinalizers()
for _, finalizer := range finalizers {
if finalizer == matchingFinalizer {
return true
}
}
return false
}

1.2.3 gb.removeNode

呼叫gb.removeNode,從uidToNode中刪除該物件,然後從該node所有ownersdependents中刪除該物件,再把該物件的dependents放入到attemptToDelete佇列中,觸發GarbageCollector處理;最後檢查該 node 的所有 owners,若有處於刪除狀態的 owner,此時該 owner 可能處於刪除阻塞狀態正在等待該 node 的刪除,將該 owner 加入到 attemptToDelete佇列中,觸發GarbageCollector處理。

// pkg/controller/garbagecollector/graph_builder.go
func (gb *GraphBuilder) removeNode(n *node) {
gb.uidToNode.Delete(n.identity.UID)
gb.removeDependentFromOwners(n, n.owners)
} func (gb *GraphBuilder) removeDependentFromOwners(n *node, owners []metav1.OwnerReference) {
for _, owner := range owners {
ownerNode, ok := gb.uidToNode.Read(owner.UID)
if !ok {
continue
}
ownerNode.deleteDependent(n)
}
}

2.GarbageCollector

再來看到GarbageCollector

GarbageCollector 主要有2個功能:

(1)處理 attemptToDelete佇列中的事件,根據物件刪除策略foregroundbackground做相應的回收邏輯處理,刪除關聯物件;

(2)處理 attemptToOrphan佇列中的事件,根據物件刪除策略Orphan,更新該owner的所有dependents物件,將物件的OwnerReferences屬性中該owner的相關欄位去除,接著再更新該owner物件,去除Orphan刪除策略對應的finalizers

GarbageCollector的2個關鍵處理方法:

(1)gc.runAttemptToDeleteWorker:主要負責處理attemptToDelete佇列中的事件,負責刪除策略為foregroundbackground的物件回收處理;

(2)gc.runAttemptToOrphanWorker:主要負責處理attemptToOrphan佇列中的事件,負責刪除策略為Orphan的物件回收處理。

2.1 GarbageCollector struct

先來簡單的分析下GarbageCollector struct,裡面最關鍵的幾個屬性及作用如下:

(1)attemptToDeleteattemptToOrphanGraphBuilder作為生產者往attemptToDeleteattemptToOrphan 兩個佇列中存放事件,然後GarbageCollector作為消費者會處理 attemptToDeleteattemptToOrphan 兩個佇列中的事件。

// pkg/controller/garbagecollector/garbagecollector.go
type GarbageCollector struct {
...
attemptToDelete workqueue.RateLimitingInterface
attemptToOrphan workqueue.RateLimitingInterface
...
}

2.2 GarbageCollector-gc.runAttemptToDeleteWorker

接下來看到GarbageCollector的處理邏輯部分,從gc.runAttemptToDeleteWorker作為入口進行處理邏輯分析。

runAttemptToDeleteWorker主要邏輯為迴圈呼叫attemptToDeleteWorker方法。

attemptToDeleteWorker方法主要邏輯:

(1)從attemptToDelete佇列中取出物件;

(2)呼叫 gc.attemptToDeleteItem 嘗試刪除 node

(3)若刪除失敗則重新加入到 attemptToDelete 佇列中進行重試。

// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) runAttemptToDeleteWorker() {
for gc.attemptToDeleteWorker() {
}
} func (gc *GarbageCollector) attemptToDeleteWorker() bool {
item, quit := gc.attemptToDelete.Get()
gc.workerLock.RLock()
defer gc.workerLock.RUnlock()
if quit {
return false
}
defer gc.attemptToDelete.Done(item)
n, ok := item.(*node)
if !ok {
utilruntime.HandleError(fmt.Errorf("expect *node, got %#v", item))
return true
}
err := gc.attemptToDeleteItem(n)
if err != nil {
if _, ok := err.(*restMappingError); ok {
// There are at least two ways this can happen:
// 1. The reference is to an object of a custom type that has not yet been
// recognized by gc.restMapper (this is a transient error).
// 2. The reference is to an invalid group/version. We don't currently
// have a way to distinguish this from a valid type we will recognize
// after the next discovery sync.
// For now, record the error and retry.
klog.V(5).Infof("error syncing item %s: %v", n, err)
} else {
utilruntime.HandleError(fmt.Errorf("error syncing item %s: %v", n, err))
}
// retry if garbage collection of an object failed.
gc.attemptToDelete.AddRateLimited(item)
} else if !n.isObserved() {
// requeue if item hasn't been observed via an informer event yet.
// otherwise a virtual node for an item added AND removed during watch reestablishment can get stuck in the graph and never removed.
// see https://issue.k8s.io/56121
klog.V(5).Infof("item %s hasn't been observed via informer yet", n.identity)
gc.attemptToDelete.AddRateLimited(item)
}
return true
}

2.2.1 gc.attemptToDeleteItem

主要邏輯:

(1)判斷 node 是否處於刪除狀態;

(2)從 apiserver 獲取該 node 對應的物件;

(3)呼叫item.isDeletingDependents方法:通過 nodedeletingDependents 欄位判斷該 node 當前是否正在刪除 dependents,若是則呼叫 gc.processDeletingDependentsItem 方法對dependents做進一步處理:檢查該nodeblockingDependents 是否被完全刪除,若是則移除該 node 對應物件的相關 finalizer,若否,則將未刪除的 blockingDependents 加入到 attemptToDelete佇列中;

上面分析GraphBuilder時說到,在 GraphBuilder 處理 graphChanges 中的事件時,在processTransitions方法邏輯裡,會呼叫n.markDeletingDependents,標記 node deletingDependents 屬性為 true

(4)呼叫gc.classifyReferencesnodeowner分為3類,分別是solid(至少有一個 owner 存在且不處於刪除狀態)、danglingowner 均不存在)、waitingForDependentsDeletionowner 存在,處於刪除狀態且正在等待其 dependents 被刪除);

(5)接下來將根據soliddanglingwaitingForDependentsDeletion的數量做不同的邏輯處理;

(6)第一種情況:當solid數量不為0時,即該node至少有一個 owner 存在且不處於刪除狀態,則說明該物件還不能被回收刪除,此時將 danglingwaitingForDependentsDeletion 列表中的 ownernodeownerReferences 中刪除;

(7)第二種情況:solid數量為0,該 nodeowner 處於 waitingForDependentsDeletion 狀態並且 nodedependents 未被完全刪除,將使用foreground前臺刪除策略來刪除該node對應的物件;

(8)當不滿足以上兩種情況時(即),進入該預設處理邏輯:按照刪除物件時使用的刪除策略,呼叫 apiserver 的介面刪除物件。

// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) attemptToDeleteItem(item *node) error {
klog.V(2).Infof("processing item %s", item.identity)
// "being deleted" is an one-way trip to the final deletion. We'll just wait for the final deletion, and then process the object's dependents.
if item.isBeingDeleted() && !item.isDeletingDependents() {
klog.V(5).Infof("processing item %s returned at once, because its DeletionTimestamp is non-nil", item.identity)
return nil
}
// TODO: It's only necessary to talk to the API server if this is a
// "virtual" node. The local graph could lag behind the real status, but in
// practice, the difference is small.
latest, err := gc.getObject(item.identity)
switch {
case errors.IsNotFound(err):
// the GraphBuilder can add "virtual" node for an owner that doesn't
// exist yet, so we need to enqueue a virtual Delete event to remove
// the virtual node from GraphBuilder.uidToNode.
klog.V(5).Infof("item %v not found, generating a virtual delete event", item.identity)
gc.dependencyGraphBuilder.enqueueVirtualDeleteEvent(item.identity)
// since we're manually inserting a delete event to remove this node,
// we don't need to keep tracking it as a virtual node and requeueing in attemptToDelete
item.markObserved()
return nil
case err != nil:
return err
} if latest.GetUID() != item.identity.UID {
klog.V(5).Infof("UID doesn't match, item %v not found, generating a virtual delete event", item.identity)
gc.dependencyGraphBuilder.enqueueVirtualDeleteEvent(item.identity)
// since we're manually inserting a delete event to remove this node,
// we don't need to keep tracking it as a virtual node and requeueing in attemptToDelete
item.markObserved()
return nil
} // TODO: attemptToOrphanWorker() routine is similar. Consider merging
// attemptToOrphanWorker() into attemptToDeleteItem() as well.
if item.isDeletingDependents() {
return gc.processDeletingDependentsItem(item)
} // compute if we should delete the item
ownerReferences := latest.GetOwnerReferences()
if len(ownerReferences) == 0 {
klog.V(2).Infof("object %s's doesn't have an owner, continue on next item", item.identity)
return nil
} solid, dangling, waitingForDependentsDeletion, err := gc.classifyReferences(item, ownerReferences)
if err != nil {
return err
}
klog.V(5).Infof("classify references of %s.\nsolid: %#v\ndangling: %#v\nwaitingForDependentsDeletion: %#v\n", item.identity, solid, dangling, waitingForDependentsDeletion) switch {
case len(solid) != 0:
klog.V(2).Infof("object %#v has at least one existing owner: %#v, will not garbage collect", item.identity, solid)
if len(dangling) == 0 && len(waitingForDependentsDeletion) == 0 {
return nil
}
klog.V(2).Infof("remove dangling references %#v and waiting references %#v for object %s", dangling, waitingForDependentsDeletion, item.identity)
// waitingForDependentsDeletion needs to be deleted from the
// ownerReferences, otherwise the referenced objects will be stuck with
// the FinalizerDeletingDependents and never get deleted.
ownerUIDs := append(ownerRefsToUIDs(dangling), ownerRefsToUIDs(waitingForDependentsDeletion)...)
patch := deleteOwnerRefStrategicMergePatch(item.identity.UID, ownerUIDs...)
_, err = gc.patch(item, patch, func(n *node) ([]byte, error) {
return gc.deleteOwnerRefJSONMergePatch(n, ownerUIDs...)
})
return err
case len(waitingForDependentsDeletion) != 0 && item.dependentsLength() != 0:
deps := item.getDependents()
for _, dep := range deps {
if dep.isDeletingDependents() {
// this circle detection has false positives, we need to
// apply a more rigorous detection if this turns out to be a
// problem.
// there are multiple workers run attemptToDeleteItem in
// parallel, the circle detection can fail in a race condition.
klog.V(2).Infof("processing object %s, some of its owners and its dependent [%s] have FinalizerDeletingDependents, to prevent potential cycle, its ownerReferences are going to be modified to be non-blocking, then the object is going to be deleted with Foreground", item.identity, dep.identity)
patch, err := item.unblockOwnerReferencesStrategicMergePatch()
if err != nil {
return err
}
if _, err := gc.patch(item, patch, gc.unblockOwnerReferencesJSONMergePatch); err != nil {
return err
}
break
}
}
klog.V(2).Infof("at least one owner of object %s has FinalizerDeletingDependents, and the object itself has dependents, so it is going to be deleted in Foreground", item.identity)
// the deletion event will be observed by the graphBuilder, so the item
// will be processed again in processDeletingDependentsItem. If it
// doesn't have dependents, the function will remove the
// FinalizerDeletingDependents from the item, resulting in the final
// deletion of the item.
policy := metav1.DeletePropagationForeground
return gc.deleteObject(item.identity, &policy)
default:
// item doesn't have any solid owner, so it needs to be garbage
// collected. Also, none of item's owners is waiting for the deletion of
// the dependents, so set propagationPolicy based on existing finalizers.
var policy metav1.DeletionPropagation
switch {
case hasOrphanFinalizer(latest):
// if an existing orphan finalizer is already on the object, honor it.
policy = metav1.DeletePropagationOrphan
case hasDeleteDependentsFinalizer(latest):
// if an existing foreground finalizer is already on the object, honor it.
policy = metav1.DeletePropagationForeground
default:
// otherwise, default to background.
policy = metav1.DeletePropagationBackground
}
klog.V(2).Infof("delete object %s with propagation policy %s", item.identity, policy)
return gc.deleteObject(item.identity, &policy)
}
}
gc.processDeletingDependentsItem

主要邏輯:檢查該nodeblockingDependents(即阻塞owner刪除的dpendents)是否被完全刪除,若是則移除該 node 對應物件的相關 finalizer(finalizer移除後,kube-apiserver會刪除該物件),若否,則將未刪除的 blockingDependents 加入到 attemptToDelete佇列中。

// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) processDeletingDependentsItem(item *node) error {
blockingDependents := item.blockingDependents()
if len(blockingDependents) == 0 {
klog.V(2).Infof("remove DeleteDependents finalizer for item %s", item.identity)
return gc.removeFinalizer(item, metav1.FinalizerDeleteDependents)
}
for _, dep := range blockingDependents {
if !dep.isDeletingDependents() {
klog.V(2).Infof("adding %s to attemptToDelete, because its owner %s is deletingDependents", dep.identity, item.identity)
gc.attemptToDelete.Add(dep)
}
}
return nil
}
item.blockingDependents

item.blockingDependents返回會阻塞node刪除的dependents。一個dependents會不會阻塞owner的刪除,主要看這個dependentsownerReferencesblockOwnerDeletion屬性值是否為true,為true則代表該dependents會阻塞owner的刪除。

// pkg/controller/garbagecollector/graph.go
func (n *node) blockingDependents() []*node {
dependents := n.getDependents()
var ret []*node
for _, dep := range dependents {
for _, owner := range dep.owners {
if owner.UID == n.identity.UID && owner.BlockOwnerDeletion != nil && *owner.BlockOwnerDeletion {
ret = append(ret, dep)
}
}
}
return ret
}

2.3 GarbageCollector-gc.runAttemptToOrphanWorker

gc.runAttemptToOrphanWorker方法是負責處理orphan刪除策略刪除的 node

gc.runAttemptToDeleteWorker主要邏輯為迴圈呼叫gc.attemptToDeleteWorker方法。

下面來看一下gc.attemptToDeleteWorker方法的主要邏輯:

(1)從attemptToOrphan佇列中取出物件;

(2)呼叫gc.orphanDependents方法:更新該owner的所有dependents物件,將物件的OwnerReferences屬性中該owner的相關欄位去除,失敗則將該owner重新加入到attemptToOrphan佇列中;

(3)呼叫gc.removeFinalizer方法:更新該owner物件,去除Orphan刪除策略對應的finalizers

// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) runAttemptToOrphanWorker() {
for gc.attemptToOrphanWorker() {
}
} func (gc *GarbageCollector) attemptToOrphanWorker() bool {
item, quit := gc.attemptToOrphan.Get()
gc.workerLock.RLock()
defer gc.workerLock.RUnlock()
if quit {
return false
}
defer gc.attemptToOrphan.Done(item)
owner, ok := item.(*node)
if !ok {
utilruntime.HandleError(fmt.Errorf("expect *node, got %#v", item))
return true
}
// we don't need to lock each element, because they never get updated
owner.dependentsLock.RLock()
dependents := make([]*node, 0, len(owner.dependents))
for dependent := range owner.dependents {
dependents = append(dependents, dependent)
}
owner.dependentsLock.RUnlock() err := gc.orphanDependents(owner.identity, dependents)
if err != nil {
utilruntime.HandleError(fmt.Errorf("orphanDependents for %s failed with %v", owner.identity, err))
gc.attemptToOrphan.AddRateLimited(item)
return true
}
// update the owner, remove "orphaningFinalizer" from its finalizers list
err = gc.removeFinalizer(owner, metav1.FinalizerOrphanDependents)
if err != nil {
utilruntime.HandleError(fmt.Errorf("removeOrphanFinalizer for %s failed with %v", owner.identity, err))
gc.attemptToOrphan.AddRateLimited(item)
}
return true
}

2.3.1 gc.orphanDependents

主要邏輯:更新指定owner的所有dependents物件,將物件的OwnerReferences屬性中該owner的相關欄位去除,對於每個dependents,分別起一個goroutine來處理,加快處理速度。

// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) orphanDependents(owner objectReference, dependents []*node) error {
errCh := make(chan error, len(dependents))
wg := sync.WaitGroup{}
wg.Add(len(dependents))
for i := range dependents {
go func(dependent *node) {
defer wg.Done()
// the dependent.identity.UID is used as precondition
patch := deleteOwnerRefStrategicMergePatch(dependent.identity.UID, owner.UID)
_, err := gc.patch(dependent, patch, func(n *node) ([]byte, error) {
return gc.deleteOwnerRefJSONMergePatch(n, owner.UID)
})
// note that if the target ownerReference doesn't exist in the
// dependent, strategic merge patch will NOT return an error.
if err != nil && !errors.IsNotFound(err) {
errCh <- fmt.Errorf("orphaning %s failed, %v", dependent.identity, err)
}
}(dependents[i])
}
wg.Wait()
close(errCh) var errorsSlice []error
for e := range errCh {
errorsSlice = append(errorsSlice, e)
} if len(errorsSlice) != 0 {
return fmt.Errorf("failed to orphan dependents of owner %s, got errors: %s", owner, utilerrors.NewAggregate(errorsSlice).Error())
}
klog.V(5).Infof("successfully updated all dependents of owner %s", owner)
return nil
}

總結

先來回顧一下garbage collector的構架與核心處理邏輯。

garbage collector的主要組成為1個圖(物件關聯依賴關係圖)、2個處理器(GraphBuilderGarbageCollector)、3個事件佇列(graphChangesattemptToDeleteattemptToOrphan)。

從apiserver list/watch的事件會放入到graphChanges佇列,而GraphBuildergraphChanges佇列中取出事件進行處理,構建物件關聯依賴關係圖,並根據物件刪除策略將關聯物件放入attemptToDeleteattemptToOrphan佇列中,接著GarbageCollector會從attemptToDeleteattemptToOrphan佇列中取出事件,再從物件關聯依賴關係圖中獲取資訊進行處理,最後回收刪除物件。

物件刪除策略

總結一下3種物件刪除策略下,node及其物件的刪除過程。

Foreground前臺刪除

Foreground即前臺刪除策略,屬於級聯刪除策略,垃圾收集器會刪除物件的所有dependent

使用前臺刪除策略刪除某個物件時,該物件的 deletionTimestamp 欄位被設定,且物件的 metadata.finalizers 欄位包含值 foregroundDeletion,用於阻塞該物件刪除,等到垃圾收集器在刪除了該物件中所有有阻塞能力的dependent物件(物件的 ownerReference.blockOwnerDeletion=true) 之後,再去除該物件的 metadata.finalizers 欄位中的值 foregroundDeletion,然後刪除該owner物件。

以刪除deployment為例,使用前臺刪除策略,則按照Pod->ReplicaSet->Deployment的順序進行刪除。

Background後臺刪除

Background即後臺刪除策略,屬於級聯刪除策略,Kubernetes會立即刪除該owner物件,之後垃圾收集器會在後臺自動刪除其所有的dependent物件。

當刪除一個物件時使用了Background後臺刪除策略時,該物件因沒有相關的Finalizer設定(只有刪除策略為foregroundOrphan時會設定相關Finalizer),會直接被刪除,接著GraphBuilder會監聽到該物件的delete事件,會將其dependents放入到attemptToDelete佇列中去,觸發GarbageCollectordependents物件的回收刪除處理。

以刪除deployment為例,使用後臺刪除策略,則按照Deployment->ReplicaSet->Pod的順序進行刪除。

Orphan孤兒刪除

Orphan即孤兒刪除策略,屬於非級聯刪除策略,即刪除某個物件時,不會自動刪除它的dependent,這些dependent也被稱作孤立物件。

當刪除一個物件時使用了Orphan孤兒刪除策略時,該物件的 metadata.finalizers 欄位包含值 orphan,用於阻塞該物件刪除,直至GarbageCollector將其所有dependentsOwnerReferences屬性中的該owner的相關欄位去除,再去除該owner物件的 metadata.finalizers 欄位中的值 Orphan,最後才能刪除該owner物件。

以刪除deployment為例,使用孤兒刪除策略,則只刪除Deployment,對應ReplicaSet和Pod不刪除。