-
Notifications
You must be signed in to change notification settings - Fork 330
Description
Originally opened by @mpvl in cuelang/cue#543
Background
Currently, for regular fields, an author of a definition can make backwards-compatible modifications without fear of breaking its user, as long as users do not use this definition in an embedding. This is a great property to have for large-scale engineering.
A backwards compatible change in general is when an API is modified to be more less specific, with the exception for definitions that existing fields may not be removed, unless paired with a catch-all [string]: T
or ...
. This is because, by default, a user may not add a field to a closed struct, so removing a field from the API will break a usage of this definition that specifies this field.
Note that a user may still add fields to an existing definitions by embedding it. In this case, changes to original definition may break its usage. There is a clear distinction, however, between using something as an embedding or not, and being able to give guarantees on the basis or whether or not a definition is used as an embedding is a clear rule. Also vet rules could warn about the use of definitions from outside the current module as an embedding.
The Problem
The problem is that this desirable property does not hold for nested definitions. Consider this definition:
#D: {
a: int
}
A user of this template could write
foo: #D & {
#bar: int
}
introducing a new definition in the definition #D. This is allowed because the current closedness rules do not apply to definitions.
Suppose the original definition is now changed to
#D: {
a: int
#bar: string
}
This now results in a breakage of foo
downstream. This may be fine if both #D
and foo
are maintained by a single owner, but it is problematic if they are maintained by different organizations. The problem is two-fold: on the one hand the user of #D
has no guarantee that the maintainer of #D
will not break their usage, while the maintainer of #D
cannot introduce a definition in #D
without knowing it may break a user.
Note that the same problem does not occur for this snippet
import "acme.com/pkg"
foo: pkg.#D & {
_#bar: int
}
because hidden definitions are not visible across package boundaries and live in their own namespace (NOTE: not implemented/enforced yet).
Proposal
We propose disallowing adding definitions to closed structs. In the example above, this means that
#D: a: int
foo: #D
foo: #bar: int // proposal makes this illegal
would be illegal and result in a #bar not allowed
error.
The user could still write
#D: a: int
foo: {
#D
#bar: int
}
resulting in the same issue. However, this makes the behavior consistent with regular field and results in a single guideline upon which non-breakage guarantees can rely.
Note that users can still also write
#D: {
a: int
}
foo: #D
foo: _#bar: int
to work around the issue as well.
Other less tangible benefits of this proposal is that conceptually there are less scenarios in how fields behave, as it makes the behavior of definitions more like that of regular fields. This may result in simpler model for the user and results in simpler code.
We propose that the ...
operator does not apply to definitions.
Impact and transition
This is a breaking change. An automated rewrite is somewhat complex and would likely need to operate on the semantic level. Detecting breakage cases should be fairly straightforward. An automated rewrite will not likely be as straightforward, but seems in the realm of possibilities. It may be that it requires user interaction to decide if an offending inclusion of a definition should be rewritten as a an embedding or a hidden definition. Always rewriting it as an embedding may be the closest to the original semantics, though.
To minimize changes,
foo: #D
foo: #bar: int
could be rewritten as
foo: { #D, #bar: _ }
foo: #bar: int
This would not be too hard if position information of conflicting values is accurate.
It would be good to first implement the scoping rules of hidden fields before rolling this change out.
Discussion
Default constraints
The spec currently allows ...T
to be used in structs as well (not yet implemented). It seems to not make sense to apply T
to definitions and regular fields. The most likely reason why one would want to add a definition in a certain scope is because it either is a common type for multiple fields (see #limit
below) or a type that is part of an expression at various points in the struct's fields. In the first case, ...T
would just be applied multiple time. There seem to be very few favorable outcomes for the latter.
As it doesn't make sense to allow ...T
to apply to definitions, it seems consistent to not have ...
apply for definitions.
Users can use embedding to work around the constraint of adding definitions.
Possible future extensions
Note that what is discussed in this section is NOT a proposal, but rather a discussion on how the remaining limitations could be addressed in the future in a backwards compatible way.
The introduction of definitions may be useful in some cases. For instance, an author of a definition may want to base its definition on another definition, but also introduce a definition with some common patterns for its users. For instance
// amce.com/foo
package foo
#A: {
a: int
b:
}
// example.com/pkg
package pkg
import "acme.com/foo"
#E: foo.#A & {
#limit: <10 // illegal under the new proposal
a: #limit
b: #limit
}
Users of #E
could write
import "example.com/pkg"
e: pkg.#E & { #limit: <5 }
to collectively tighten the limit of all fields.
To work around the limitations of the new proposal, the author could either use embeddings or a hidden definition. In this case, however, neither are satisfactory. It cannot be hidden, because users of the package are supposed to modify #limit
and this it cannot be hidden. Using embedding brings us back to square one, as the author of #A
could introduce #limit
later on and break #E
.
To resolve this issue, we could introduce qualified definitions that live in the namespace of the package in which they are defined. Syntactically, this could look like:
// example.com/pkg
package pkg
import "acme.com/foo"
#E: foo.#A & {
pkg#limit: <10
a: pkg#limit
b: pkg#limit
}
Here pkg#limit
is a broadening of the identifier syntax, where a #
may be preceded not just by _
, but any valid regular CUE identifier. The identifier here either refers to the handle of the current package or an imported package, fully qualifying the namespace of the definition. So in the example, pkg#limit
is qualified by the tuple ("example.com/pkg", #limit).
A usage of #E
would look like:
import "example.com/pkg"
e: pkg.#E & { pkg#limit: <5 }
This mechanism would allow authors to introduce definitions in a backwards compatible way without worrying about breaking things downstream or being broken by an upstream change.
This mechanism would only apply to definitions and not to regular fields. Regular fields are considered what ends up ultimately in a generated configuration and inherently represent a flat space. Definitions don't have this restriction and can benefit from the scoping results as introduced in this paragraph.
Note the close relation of scoped definitions with hidden definitions: 1) both are allowed to introduce a new definition within a closed struct (must be in the current package in both cases), 2) both have an additional qualifier before the #
in a definition name, 3) in both cases the meaning of this qualifier is some fully qualified package path (which may be the current package), 4) the implementation of both relies on qualifying a field based on package names available in the current file during compile time. Note that this is mechanism is almost identical to how exported and unexported fields work in Go.
We reiterate, that the discussion of scoped definitions is missing a lot of details and is intended to be a proposal, but serves as an example to allow scoped definitions in a world where closedness rules apply to non-scoped definitions.