-
Notifications
You must be signed in to change notification settings - Fork 586
Introduce CPUAffinity process property instead of execCPUAffinity #1296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -340,18 +340,38 @@ For Linux-based systems, the `process` object supports the following process-spe | |||||
|
||||||
* **`class`** (string, REQUIRED) specifies the I/O scheduling class. Possible values are `IOPRIO_CLASS_RT`, `IOPRIO_CLASS_BE`, and `IOPRIO_CLASS_IDLE`. | ||||||
* **`priority`** (int, REQUIRED) specifies the priority level within the class. The value should be an integer ranging from 0 (highest) to 7 (lowest). | ||||||
* **`execCPUAffinity`** (object, OPTIONAL) specifies CPU affinity used to execute the process. | ||||||
* **`CPUAffinity`** (object, OPTIONAL) specifies CPU affinity used to execute processes in the runtime and the container. | ||||||
All the properties in this setting expects as argument a string that contain a comma-separated list, with dashes to represent | ||||||
ranges. For example, `0-3,7` represents CPUs 0,1,2,3, and 7. If omitted or empty for particular [lifecycle](runtime.md#lifecycle) stage, | ||||||
the runtime SHOULD NOT change process' CPU affinity, and the affinity is determined by the Linux kernel. | ||||||
The following properties are available: | ||||||
* **`runtimeCreate`** (string, OPTIONAL) is a list of CPUs the runtime parent process should run on during the runtime creation stage, | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. shouldn't this just be configured by the process that runs the OCI runtime so that more of the startup runs on the correct CPUs? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For better or worse, this might be non-trivial to do if someone is using a tool that wraps an OCI runtime and doesn't provide a knob for it (basically the same argument as runtime hooks). I thin matching the lifecycle stages (which also match the hook names) as done here is probably the simplest approach. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. theoretically it is possible, e.g. via systemd to restrict container runtime to run on subset of CPUs, but this gives granularity on splitting CPU affinity for containerd/cri-o itself vs. per-container hook execution. In other words, do not affect higher level runtime' CPU quota with operations that might be local to subset of containers. regarding names: currently it is in form "whereWhat", but OCI hooks are named "whatWhere". if we feel that OCI hook naming is better, I can rename proposed fields in this PR to have more consistency with the hooks. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think having the same names as the hooks would be preferable if only for consistency. |
||||||
before entering the container's namespaces or cgroups. CPU affinity should be applied on early stages of container creation, | ||||||
so that any potential CPU consuming operations inside runtime (e.g. [`createRuntime` hooks](#createRuntime-hooks)) | ||||||
will be affinitized to specified list of CPUs. | ||||||
* **`containerCreate`** (string, OPTIONAL) is a list of CPUs the process should run on during the container creation stage, | ||||||
after entering the container's namespaces but before the user process or any of [`createContainer` hooks](#createContainer-hooks) are executed. | ||||||
* **`containerStart`** (string, OPTIONAL) is a list of CPUs the process should be shoud run on during the container start stage, | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
inside the container's namespace during `start` operation. The affinity should be applied before executing [`startContainer` hooks](#startContainer-hooks). | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why do we need to differentiate between these two? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You could argue that the runtime should have a particular set of affinities which then should be adjusted to be more or less restrictive as late as possible to avoid latency on those cores. I suspect most people wouldn't set both at the same time but the distinction does make some sense. Now, whether this should be done before or after applying seccomp rules is a different question... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
the difference is that "runc create" and "runc start" might be executed in different PIDs, thus those PIDs by default would inherit CPU affinity from parent process(es). Having above in the spec allows to make sure that OCI hooks are executed (and quota accounted) on the correct "infrastructure" CPUs. |
||||||
* **`runtimeExec`** (string, OPTIONAL) is a list of CPUs a runtime parent | ||||||
process to be run during `exec` operation, before the transition to container's | ||||||
cgroup. | ||||||
* **`containerExec`** (string, OPTIONAL) is a list of CPUs the process will be run on during `exec` operationon | ||||||
after the transition to container's cgroup. | ||||||
* **`execCPUAffinity`** (object, OPTIONAL, **DEPRECATED**) specifies CPU affinity used to execute the process. | ||||||
This setting is not applicable to the container's init process. | ||||||
The following properties are available: | ||||||
* **`initial`** (string, OPTIONAL) is a list of CPUs a runtime parent | ||||||
process to be run on initially, before the transition to container's | ||||||
cgroup. This is a a comma-separated list, with dashes to represent | ||||||
ranges. For example, `0-3,7` represents CPUs 0,1,2,3, and 7. | ||||||
Deprecated in favor of `runtimeExec` in `CPUAffinity`. | ||||||
* **`final`** (string, OPTIONAL) is a list of CPUs the process will be run | ||||||
on after the transition to container's cgroup. The format is the same as | ||||||
for `initial`. If omitted or empty, runtime SHOULD NOT change process' | ||||||
CPU affinity after the process is moved to container's cgroup, and the | ||||||
final affinity is determined by the Linux kernel. | ||||||
Deprecated in favor of `containerExec` in `CPUAffinity`. | ||||||
|
||||||
### <a name="configZOSProcess" />z/OS Process | ||||||
|
||||||
|
@@ -435,9 +455,11 @@ _Note: symbolic name for uid and gid, such as uname and gname respectively, are | |||||
"soft": 1024 | ||||||
} | ||||||
], | ||||||
"execCPUAffinity": { | ||||||
"initial": "7", | ||||||
"final": "0-3,7" | ||||||
"CPUAffinity": { | ||||||
"runtimeCreate": "7", | ||||||
"runtimeExec": "7", | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For completeness, should we add alsoi |
||||||
"containerCreate": "0-3,7", | ||||||
"containerExec": "0-3,7" | ||||||
} | ||||||
} | ||||||
``` | ||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -94,8 +94,12 @@ type Process struct { | |
SelinuxLabel string `json:"selinuxLabel,omitempty" platform:"linux"` | ||
// IOPriority contains the I/O priority settings for the cgroup. | ||
IOPriority *LinuxIOPriority `json:"ioPriority,omitempty" platform:"linux"` | ||
// ExecCPUAffinity is Deprecated. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you want various linters to warn about using deprecated functionality (and other linters, that insist that description starts with the // ExecCPUAffinity specifies CPU affinity for exec processes.
//
// Deprecated: use [CPUAffinity] instead. (I.e. after the description, on a separate paragraph, etc.) |
||
// ExecCPUAffinity specifies CPU affinity for exec processes. | ||
ExecCPUAffinity *CPUAffinity `json:"execCPUAffinity,omitempty" platform:"linux"` | ||
// Deprecated: use [Process.CPUAffinity] | ||
ExecCPUAffinity *ExecCPUAffinity `json:"execCPUAffinity,omitempty" platform:"linux"` | ||
// CPUAffinity specifies CPU affinity for executing processes | ||
CPUAffinity *CPUAffinity `json:"CPUAffinity,omitempty" platform:"linux"` | ||
} | ||
|
||
// LinuxCapabilities specifies the list of allowed capabilities that are kept for a process. | ||
|
@@ -129,12 +133,21 @@ const ( | |
IOPRIO_CLASS_IDLE IOPriorityClass = "IOPRIO_CLASS_IDLE" | ||
) | ||
|
||
// CPUAffinity specifies process' CPU affinity. | ||
type CPUAffinity struct { | ||
// ExecCPUAffinity specifies process' CPU affinity during runtime exec operation. | ||
type ExecCPUAffinity struct { | ||
Initial string `json:"initial,omitempty"` | ||
Final string `json:"final,omitempty"` | ||
} | ||
|
||
// CPUAffinity specifies process' CPU affinity. | ||
type CPUAffinity struct { | ||
RuntimeCreate string `json:"runtimeCreate,omitempty"` | ||
ContainerCreate string `json:"containerCreate,omitempty"` | ||
ContainerStart string `json:"containerStart,omitempty"` | ||
RuntimeExec string `json:"runtimeExec,omitempty"` | ||
ContainerExec string `json:"containerExec,omitempty"` | ||
} | ||
|
||
// Box specifies dimensions of a rectangle. Used for specifying the size of a console. | ||
type Box struct { | ||
// Height is the vertical dimension of a box. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be very useful to have an
all
special value which indicates that the CPU affinity should be reset to include as many CPUs as possible.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "max" makes that clearer symantically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with either, but I guess you could say that "max" implies that it might not be all.