Skip to content

AbstractVarInfo interface #1

@phipsgabler

Description

@phipsgabler

There are many aspects that make VarInfo a very complex data structure:

  1. VarInfo really consists of multiple mappings: Metadata maps variables to their values,
    their distributions, their range in the underlying vector, and additional metadata such as flags,
    GIDs, and orders. Additionally, there are scalars logp and num_produce associated with the
    whole thing.
  2. These mappings are not normal dictionaries. First, the order of variables needs to be preserved
    (would an ordered dictionary be enough for that?). Second, keys are themselves hierarchical,
    since they contain index values, and we need to be able to handle "sub-keys": get out, say,
    x[1:3] from the entry x.
  3. The mapping between keys and values gets complicated even more, since some of the values need to
    be transformed between the distribution supports and Euclidean space. This is currently
    implemented by special logic in some getters/setters, which automatically call link/invlink.

Correct me if I'm missing something.

Now, it seems to me that a sensible separation of concerns would involve a separate interface of the
dictionary part, taking care of the mappings and the "sub-key" handling. Then the actual VarInfo
can just wrap this special dictionary (or several thereof), and add the additional metadata and
special logic to it. Then different implementation of the dictionary essentially replace the
different kinds of Metadata that separate UntypedVarInfo from TypedVarInfo.

I haven’t really worked with traits yet, but they might also have some advantages. For example, the
recently added ThreadSafeVarInfo is really only adding a thin layer to add an orthogonal concept
to the other variants.

Dictionary Trie Interface

Should describe an ordered mapping from VarName to whatever: actual values (<:Real or
<:AbstractArray{<:Real}?), distributions, flags, etc. – but only one such mapping at a time.

  • Pure functions:
    • iterate, yielding pairs of VarName and the stored value
    • IteratorEltype == HasEltype(), IteratorSize = HasLength()
    • keys, values, pairs, length consistent with iterate
    • eltype, keytype, valuetype
    • get, getindex, haskey for indexing by VarName
    • merge to join two VarInfos
  • Mutating functions:
    • push!, merge! to add and join elements
    • setindex!
    • empty!, delete!
  • Conversions:
    • To array, for vectorization/linearization of values. Could be a method of Array, or copyto!.
    • To (nested?) named tuple
  • Trie-specific functions:
    • Search for matching keys
    • Extract subkeys

David has argued that linearized storage destroys shape information. I think that is does not need
to: one could have one big array, with reshaped views into it – scalars would then just be
zero-dimensional views (akin to Refs). On the other hand, a tree-like implementation could have some other advantages, both with respect to ease of implementation and behaviour.

Subsumption, the issue of matching "sub-keys", can behave in different ways depending on the value
type. For sampled values, using sub-keys should always work: when a value of a vector-of-vectors
x is stored, then x[1][1] must be a valid key. However, this does not need to be the case for
distributions: when S comes from a matrix valued distribution, then S[1,1] is not necessarily
available.

Some pain points from the current implementation: vectorized access, shape preservation, association
of names and values.

Related: Gen's ChoiceMap

VarInfo proper

  • getlogp, setlogp!, resetlogp!, and acclogp! for handling log probabilities

  • gen_num_produce, set_num_produce, increment_num_produce, reset_num_produce; although these
    are specific to particle samplers, and would ideally be handled by them (maybe with a custom
    context in Turing?). A similar argument holds for the order information and certain flags.

  • Linking: Since I haven't yet written any HMC samplers or the like, I unfortunately don't know how
    feasible/practical it is to make most of this more explicit. Mohamed, in a recent
    PR
    , has introduced methods for the
    getters/setters that take the distribution as additional argument to figure out the right thing:

    vi[vn, dist]

    Since that indexing in Julia supports keyword arguments, even somthing like the following would be possible:

    vi[vn, link=dist]

    The other getters and setters could then use the same kind of syntax.

    Then, there would also have to be (I guess?) additional methods for the conversion methods, to
    perform bijections for the array of vectorized values.

    Also, link!/invlink! for the complete mapping – is that necessary?

  • Probably some resemblance of many of the old getters and setters: getidx, getdist, getrange,
    getall, getval, getvals, getinds, syms, getgid, flags. Hopefully some of them will
    become redundant.

  • There should be one well-defined way to build up a VarInfo object. Perhaps an empty one + muliple
    calls to push!. What about push!!, with the possibility of immutable implementations?
    Alternatively: based on mergeing? Maybe merge can be general enough to cover both
    construction of all complex VarInfos, and producing MixedVarInfos when required.

VarName

This is working pretty well in its current form, but the logic for subsumption could be defined a
bit more explicitely. Some helper methods are certainly needed in order to properly support
subsumption in the dictionary interface, and would become useful elsewhere.

In the long term: what about more complex things, like getproperty-based variables?

Generated quantities

Different PPLs handle these in different ways – one alternative is to include them in the trace
(e.g., Soss.jl now uses a nested tuple). If this is considered useful, it needs to included in the
interface.

Related work/discussions

CC @yebai, @cpfiffer, @devmotion, @mohamed82008

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions