Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 26 additions & 4 deletions config.md
Original file line number Diff line number Diff line change
Expand Up @@ -340,18 +340,38 @@ For Linux-based systems, the `process` object supports the following process-spe

* **`class`** (string, REQUIRED) specifies the I/O scheduling class. Possible values are `IOPRIO_CLASS_RT`, `IOPRIO_CLASS_BE`, and `IOPRIO_CLASS_IDLE`.
* **`priority`** (int, REQUIRED) specifies the priority level within the class. The value should be an integer ranging from 0 (highest) to 7 (lowest).
* **`execCPUAffinity`** (object, OPTIONAL) specifies CPU affinity used to execute the process.
* **`CPUAffinity`** (object, OPTIONAL) specifies CPU affinity used to execute processes in the runtime and the container.
All the properties in this setting expects as argument a string that contain a comma-separated list, with dashes to represent
ranges. For example, `0-3,7` represents CPUs 0,1,2,3, and 7. If omitted or empty for particular [lifecycle](runtime.md#lifecycle) stage,
the runtime SHOULD NOT change process' CPU affinity, and the affinity is determined by the Linux kernel.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be very useful to have an all special value which indicates that the CPU affinity should be reset to include as many CPUs as possible.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "max" makes that clearer symantically

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with either, but I guess you could say that "max" implies that it might not be all.

The following properties are available:
* **`runtimeCreate`** (string, OPTIONAL) is a list of CPUs the runtime parent process should run on during the runtime creation stage,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this just be configured by the process that runs the OCI runtime so that more of the startup runs on the correct CPUs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For better or worse, this might be non-trivial to do if someone is using a tool that wraps an OCI runtime and doesn't provide a knob for it (basically the same argument as runtime hooks).

I thin matching the lifecycle stages (which also match the hook names) as done here is probably the simplest approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

theoretically it is possible, e.g. via systemd to restrict container runtime to run on subset of CPUs, but this gives granularity on splitting CPU affinity for containerd/cri-o itself vs. per-container hook execution. In other words, do not affect higher level runtime' CPU quota with operations that might be local to subset of containers.

regarding names: currently it is in form "whereWhat", but OCI hooks are named "whatWhere". if we feel that OCI hook naming is better, I can rename proposed fields in this PR to have more consistency with the hooks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having the same names as the hooks would be preferable if only for consistency.

before entering the container's namespaces or cgroups. CPU affinity should be applied on early stages of container creation,
so that any potential CPU consuming operations inside runtime (e.g. [`createRuntime` hooks](#createRuntime-hooks))
will be affinitized to specified list of CPUs.
* **`containerCreate`** (string, OPTIONAL) is a list of CPUs the process should run on during the container creation stage,
after entering the container's namespaces but before the user process or any of [`createContainer` hooks](#createContainer-hooks) are executed.
* **`containerStart`** (string, OPTIONAL) is a list of CPUs the process should be shoud run on during the container start stage,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* **`containerStart`** (string, OPTIONAL) is a list of CPUs the process should be shoud run on during the container start stage,
* **`containerStart`** (string, OPTIONAL) is a list of CPUs the process should be run on during the container start stage,

inside the container's namespace during `start` operation. The affinity should be applied before executing [`startContainer` hooks](#startContainer-hooks).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to differentiate between these two?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could argue that the runtime should have a particular set of affinities which then should be adjusted to be more or less restrictive as late as possible to avoid latency on those cores.

I suspect most people wouldn't set both at the same time but the distinction does make some sense. Now, whether this should be done before or after applying seccomp rules is a different question...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to differentiate between these two?

the difference is that "runc create" and "runc start" might be executed in different PIDs, thus those PIDs by default would inherit CPU affinity from parent process(es). Having above in the spec allows to make sure that OCI hooks are executed (and quota accounted) on the correct "infrastructure" CPUs.

* **`runtimeExec`** (string, OPTIONAL) is a list of CPUs a runtime parent
process to be run during `exec` operation, before the transition to container's
cgroup.
* **`containerExec`** (string, OPTIONAL) is a list of CPUs the process will be run on during `exec` operationon
after the transition to container's cgroup.
* **`execCPUAffinity`** (object, OPTIONAL, **DEPRECATED**) specifies CPU affinity used to execute the process.
This setting is not applicable to the container's init process.
The following properties are available:
* **`initial`** (string, OPTIONAL) is a list of CPUs a runtime parent
process to be run on initially, before the transition to container's
cgroup. This is a a comma-separated list, with dashes to represent
ranges. For example, `0-3,7` represents CPUs 0,1,2,3, and 7.
Deprecated in favor of `runtimeExec` in `CPUAffinity`.
* **`final`** (string, OPTIONAL) is a list of CPUs the process will be run
on after the transition to container's cgroup. The format is the same as
for `initial`. If omitted or empty, runtime SHOULD NOT change process'
CPU affinity after the process is moved to container's cgroup, and the
final affinity is determined by the Linux kernel.
Deprecated in favor of `containerExec` in `CPUAffinity`.

### <a name="configZOSProcess" />z/OS Process

Expand Down Expand Up @@ -435,9 +455,11 @@ _Note: symbolic name for uid and gid, such as uname and gname respectively, are
"soft": 1024
}
],
"execCPUAffinity": {
"initial": "7",
"final": "0-3,7"
"CPUAffinity": {
"runtimeCreate": "7",
"runtimeExec": "7",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness, should we add alsoi containerStart in this example?

"containerCreate": "0-3,7",
"containerExec": "0-3,7"
}
}
```
Expand Down
25 changes: 25 additions & 0 deletions schema/config-schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,31 @@
}
}
},
"CPUAffinity": {
"type": "object",
"properties": {
"runtimeCreate": {
"type": "string",
"pattern": "^[0-9, -]*$"
},
"containerCreate": {
"type": "string",
"pattern": "^[0-9, -]*$"
},
"containerStart": {
"type": "string",
"pattern": "^[0-9, -]*$"
},
"runtimeExec": {
"type": "string",
"pattern": "^[0-9, -]*$"
},
"containerExec": {
"type": "string",
"pattern": "^[0-9, -]*$"
}
}
},
"execCPUAffinity": {
"type": "object",
"properties": {
Expand Down
19 changes: 16 additions & 3 deletions specs-go/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -94,8 +94,12 @@ type Process struct {
SelinuxLabel string `json:"selinuxLabel,omitempty" platform:"linux"`
// IOPriority contains the I/O priority settings for the cgroup.
IOPriority *LinuxIOPriority `json:"ioPriority,omitempty" platform:"linux"`
// ExecCPUAffinity is Deprecated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want various linters to warn about using deprecated functionality (and other linters, that insist that description starts with the Id, to keep working fine), the deprecation notice should be added like this:

// ExecCPUAffinity specifies CPU affinity for exec processes.
//
// Deprecated: use [CPUAffinity] instead.

(I.e. after the description, on a separate paragraph, etc.)

// ExecCPUAffinity specifies CPU affinity for exec processes.
ExecCPUAffinity *CPUAffinity `json:"execCPUAffinity,omitempty" platform:"linux"`
// Deprecated: use [Process.CPUAffinity]
ExecCPUAffinity *ExecCPUAffinity `json:"execCPUAffinity,omitempty" platform:"linux"`
// CPUAffinity specifies CPU affinity for executing processes
CPUAffinity *CPUAffinity `json:"CPUAffinity,omitempty" platform:"linux"`
}

// LinuxCapabilities specifies the list of allowed capabilities that are kept for a process.
Expand Down Expand Up @@ -129,12 +133,21 @@ const (
IOPRIO_CLASS_IDLE IOPriorityClass = "IOPRIO_CLASS_IDLE"
)

// CPUAffinity specifies process' CPU affinity.
type CPUAffinity struct {
// ExecCPUAffinity specifies process' CPU affinity during runtime exec operation.
type ExecCPUAffinity struct {
Initial string `json:"initial,omitempty"`
Final string `json:"final,omitempty"`
}

// CPUAffinity specifies process' CPU affinity.
type CPUAffinity struct {
RuntimeCreate string `json:"runtimeCreate,omitempty"`
ContainerCreate string `json:"containerCreate,omitempty"`
ContainerStart string `json:"containerStart,omitempty"`
RuntimeExec string `json:"runtimeExec,omitempty"`
ContainerExec string `json:"containerExec,omitempty"`
}

// Box specifies dimensions of a rectangle. Used for specifying the size of a console.
type Box struct {
// Height is the vertical dimension of a box.
Expand Down
Loading