需求

之前工作流的執行都是用的docker-java提供的api拉起的docker容器直接跑服務,但是最新線上的新業務資源消耗較大,單個容器如果不加控制,CPU和記憶體都會拉滿,導致伺服器莫名宕機事故的發生,所以Docker限制cpu使用率和記憶體限制就得安排上

實施

HostConfig構建

自定義HostConfig,設定cpu和記憶體限制,pipeline配置了就按照配置來,如果沒有就走預設配置

public void setUp() {
this.dockerHostConfig = new HostConfig();
Double memoryValue = this.pipeline.getMemory() != null
? this.pipeline.getMemory() * 1024 * 1024 * 1024
: this.config.getDefaultMemoryLimitInGb() * 1024 * 1024 * 1024;
this.dockerHostConfig.withMemory(memoryValue.longValue()); double cpu = StringUtils.isNotBlank(this.pipeline.getCpu())
? Double.parseDouble(this.pipeline.getCpu())
: this.config.getDefaultCpuCoreLimit();
// 單個 CPU 為 1024,兩個為 2048,以此類推
this.dockerHostConfig.withCpuShares((int)(cpu * 1024));
}

CreateContainerCmd 構建

public String startContainer(String image,
String name,
List<ContainerPortBind> portBinds,
List<ContainerVolumeBind> volumeBinds,
List<String> extraHosts,
List<String> envs,
List<String> entrypoints,
HostConfig hostConfig,
String... cmds) {
List<Volume> volumes = new ArrayList<>();
List<Bind> volumesBinds = new ArrayList<>(); ……
……
…… CreateContainerCmd cmd = this.client.createContainerCmd(image)
.withName(name)
.withVolumes(volumes)
.withBinds(volumesBinds); if (portBinds != null && portBinds.size() > 0) {
cmd = cmd.withPortBindings(portBindings);
} if (cmds != null && cmds.length > 0) {
cmd = cmd.withCmd(cmds);
} if (extraHosts != null && extraHosts.size() > 0) {
cmd.withExtraHosts(extraHosts);
} if (envs != null) {
cmd.withEnv(envs);
} if (entrypoints != null) {
cmd.withEntrypoint(entrypoints);
} // 這一句是重點
cmd.withHostConfig(hostConfig); CreateContainerResponse container = cmd.exec();
this.client.startContainerCmd(container.getId()).exec();
return container.getId();
}

docker inspect containerId

執行 docker inspect a436678ccb0c 結果如下

"HostConfig": {
"Binds": [],
"ContainerIDFile": "",
"LogConfig": {
"Type": "json-file",
"Config": {
"max-file": "3",
"max-size": "10m"
}
},
"NetworkMode": "default",
"PortBindings": null,
"RestartPolicy": {
"Name": "",
"MaximumRetryCount": 0
}
"CpuShares": 2048,
"Memory": 6442450944,
"NanoCpus": 0,
"CgroupParent": "",
"BlkioWeight": 0,
"BlkioWeightDevice": null
}

CpuShares和Memory已經是我們設定的預設值,API生效,我們再來看下執行的日誌

proc "pipeline_task_4b86c7830e4c4e39a77c454589c9e7e9_1" starting 2021-09-22 17:30:15 logPath:/mnt/xx/xx/logs/2021/09/22/bfbadf65-ac41-459d-a96d-3dc9a0105c25/job.log
+ java -jar /datavolume/xxx/xx.jar --spring.profiles.active=test
STDERR: Error: Unable to access jarfile /datavolume/xxx/xx.jar
5c494aeacb87af3a46a4fedc6e695ae888d4d2b9d7e603f24ef7fe114956c782 finished!
proc "pipeline_task_4b86c7830e4c4e39a77c454589c9e7e9_1" exited with status 1
proc "新增節點" error
start to kill all pipeline task
pipeline exit with error

執行檔案沒有找到,向上看Binds為空,所以掛載丟了,可以為什麼了?明明 withVolumes()withBinds() 兩個方法邏輯都沒有動,還是看下原始碼分析一下吧

問題定位與解決

看原始碼之前我們先了解一下docker的hostConfig,檔案路徑在:/var/lib/docker/containers//hostconfig.json

其實這個就是容器執行的宿主機配置,磁碟繫結,cpu、記憶體限制、DNS、網路以及種種配置都在這個檔案中,docker-java中HostConfig物件其實就是這個json對應的model,我們自定義了HostConfig物件,問題應當是出在 cmd.withHostConfig(hostConfig); 這一句程式碼上

以前的繫結邏輯

之前沒有限制,所以在例項化CreateContainerCmd時候沒有定製HostConfig引數

CreateContainerCmd cmd = this.client.createContainerCmd(image)
.withName(name)
.withVolumes(volumes)
.withBinds(volumesBinds);

CreateContainerCmd withBinds

/**
*
* @deprecated see {@link #getHostConfig()}
*/
@Deprecated
default CreateContainerCmd withBinds(Bind... binds) {
Objects.requireNonNull(binds, "binds was not specified");
getHostConfig().setBinds(binds);
return this;
}

getHostConfig() 方法追溯到實現類 CreateContainerCmdImpl hostConfig是直接在類例項化的時候new出來的一個新物件

@JsonProperty("HostConfig")
private HostConfig hostConfig = new HostConfig();

我們再看下 CreateContainerCmdwithHostConfig() 方法,程式碼也是在實現類裡面

@Override
public CreateContainerCmd withHostConfig(HostConfig hostConfig) {
this.hostConfig = hostConfig;
return this;
}

直接覆蓋了物件中原來的hostConfig, 我們的withHostConfig又在最後呼叫的可不就把掛載丟了嗎,正好CreateContainerCmd 的 withBinds 方法也被 @Deprecated 修飾了,我們就來調整一下程式碼

public String startContainer(String image,
String name,
List<ContainerPortBind> portBinds,
List<ContainerVolumeBind> volumeBinds,
List<String> extraHosts,
List<String> envs,
List<String> entrypoints,
HostConfig hostConfig,
String... cmds) {
List<Volume> volumes = new ArrayList<>();
List<Bind> volumesBinds = new ArrayList<>(); …… //這一行很關鍵
hostConfig.withBinds(volumesBinds); if (portBinds != null && portBinds.size() > 0) {
hostConfig.withPortBindings(portBindings);
} if (extraHosts != null && extraHosts.size() > 0) {
hostConfig.withExtraHosts(extraHosts.toArray(new String[extraHosts.size()]));
}
CreateContainerCmd cmd = this.client.createContainerCmd(image).withHostConfig(hostConfig)
.withName(name)
.withVolumes(volumes); if (cmds != null && cmds.length > 0) {
cmd = cmd.withCmd(cmds);
} if (envs != null) {
cmd.withEnv(envs);
} if (entrypoints != null) {
cmd.withEntrypoint(entrypoints);
} CreateContainerResponse container = cmd.exec();
this.client.startContainerCmd(container.getId()).exec();
return container.getId();
};

OK,搞定,docker stats 檢視容器的cpu佔用,始終不會超過200%

參考連結

https://github.com/docker-java/docker-java