Skip to content

Proposal: don't allow the introduction of new (non-hidden) definitions in a closed struct #543

@cueckoo

Description

@cueckoo

Originally opened by @mpvl in cuelang/cue#543

Background

Currently, for regular fields, an author of a definition can make backwards-compatible modifications without fear of breaking its user, as long as users do not use this definition in an embedding. This is a great property to have for large-scale engineering.

A backwards compatible change in general is when an API is modified to be more less specific, with the exception for definitions that existing fields may not be removed, unless paired with a catch-all [string]: T or .... This is because, by default, a user may not add a field to a closed struct, so removing a field from the API will break a usage of this definition that specifies this field.

Note that a user may still add fields to an existing definitions by embedding it. In this case, changes to original definition may break its usage. There is a clear distinction, however, between using something as an embedding or not, and being able to give guarantees on the basis or whether or not a definition is used as an embedding is a clear rule. Also vet rules could warn about the use of definitions from outside the current module as an embedding.

The Problem

The problem is that this desirable property does not hold for nested definitions. Consider this definition:

#D: {
    a: int
}

A user of this template could write

foo: #D & {
    #bar: int
}

introducing a new definition in the definition #D. This is allowed because the current closedness rules do not apply to definitions.

Suppose the original definition is now changed to

#D: {
    a: int
   #bar: string
}

This now results in a breakage of foo downstream. This may be fine if both #D and foo are maintained by a single owner, but it is problematic if they are maintained by different organizations. The problem is two-fold: on the one hand the user of #D has no guarantee that the maintainer of #D will not break their usage, while the maintainer of #D cannot introduce a definition in #D without knowing it may break a user.

Note that the same problem does not occur for this snippet

import "acme.com/pkg"

foo: pkg.#D & {
    _#bar: int
}

because hidden definitions are not visible across package boundaries and live in their own namespace (NOTE: not implemented/enforced yet).

Proposal

We propose disallowing adding definitions to closed structs. In the example above, this means that

#D: a: int

foo: #D
foo: #bar: int // proposal makes this illegal

would be illegal and result in a #bar not allowed error.

The user could still write

#D: a: int

foo: {
    #D
    #bar: int
}

resulting in the same issue. However, this makes the behavior consistent with regular field and results in a single guideline upon which non-breakage guarantees can rely.

Note that users can still also write

#D: {
    a: int
}

foo: #D
foo: _#bar: int

to work around the issue as well.

Other less tangible benefits of this proposal is that conceptually there are less scenarios in how fields behave, as it makes the behavior of definitions more like that of regular fields. This may result in simpler model for the user and results in simpler code.

We propose that the ... operator does not apply to definitions.

Impact and transition

This is a breaking change. An automated rewrite is somewhat complex and would likely need to operate on the semantic level. Detecting breakage cases should be fairly straightforward. An automated rewrite will not likely be as straightforward, but seems in the realm of possibilities. It may be that it requires user interaction to decide if an offending inclusion of a definition should be rewritten as a an embedding or a hidden definition. Always rewriting it as an embedding may be the closest to the original semantics, though.

To minimize changes,

foo: #D
foo: #bar: int

could be rewritten as

foo: { #D, #bar: _ }
foo: #bar: int

This would not be too hard if position information of conflicting values is accurate.

It would be good to first implement the scoping rules of hidden fields before rolling this change out.

Discussion

Default constraints

The spec currently allows ...T to be used in structs as well (not yet implemented). It seems to not make sense to apply T to definitions and regular fields. The most likely reason why one would want to add a definition in a certain scope is because it either is a common type for multiple fields (see #limit below) or a type that is part of an expression at various points in the struct's fields. In the first case, ...T would just be applied multiple time. There seem to be very few favorable outcomes for the latter.

As it doesn't make sense to allow ...T to apply to definitions, it seems consistent to not have ... apply for definitions.

Users can use embedding to work around the constraint of adding definitions.

Possible future extensions

Note that what is discussed in this section is NOT a proposal, but rather a discussion on how the remaining limitations could be addressed in the future in a backwards compatible way.

The introduction of definitions may be useful in some cases. For instance, an author of a definition may want to base its definition on another definition, but also introduce a definition with some common patterns for its users. For instance

// amce.com/foo
package foo

#A: {
  a: int
  b: 
}
// example.com/pkg
package pkg

import "acme.com/foo"

#E: foo.#A & {
   #limit: <10 // illegal under the new proposal
  a: #limit
  b: #limit
}

Users of #E could write

import "example.com/pkg"

e: pkg.#E & { #limit: <5 }

to collectively tighten the limit of all fields.

To work around the limitations of the new proposal, the author could either use embeddings or a hidden definition. In this case, however, neither are satisfactory. It cannot be hidden, because users of the package are supposed to modify #limit and this it cannot be hidden. Using embedding brings us back to square one, as the author of #A could introduce #limit later on and break #E.

To resolve this issue, we could introduce qualified definitions that live in the namespace of the package in which they are defined. Syntactically, this could look like:

// example.com/pkg
package pkg

import "acme.com/foo"

#E: foo.#A & {
  pkg#limit: <10
  a: pkg#limit
  b: pkg#limit
}

Here pkg#limit is a broadening of the identifier syntax, where a # may be preceded not just by _, but any valid regular CUE identifier. The identifier here either refers to the handle of the current package or an imported package, fully qualifying the namespace of the definition. So in the example, pkg#limit is qualified by the tuple ("example.com/pkg", #limit).

A usage of #E would look like:

import "example.com/pkg"

e: pkg.#E & { pkg#limit: <5 }

This mechanism would allow authors to introduce definitions in a backwards compatible way without worrying about breaking things downstream or being broken by an upstream change.

This mechanism would only apply to definitions and not to regular fields. Regular fields are considered what ends up ultimately in a generated configuration and inherently represent a flat space. Definitions don't have this restriction and can benefit from the scoping results as introduced in this paragraph.

Note the close relation of scoped definitions with hidden definitions: 1) both are allowed to introduce a new definition within a closed struct (must be in the current package in both cases), 2) both have an additional qualifier before the # in a definition name, 3) in both cases the meaning of this qualifier is some fully qualified package path (which may be the current package), 4) the implementation of both relies on qualifying a field based on package names available in the current file during compile time. Note that this is mechanism is almost identical to how exported and unexported fields work in Go.

We reiterate, that the discussion of scoped definitions is missing a lot of details and is intended to be a proposal, but serves as an example to allow scoped definitions in a world where closedness rules apply to non-scoped definitions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions