Hiera: Implementing a data_dig backend

Included in Puppet Enterprise 2017.1.

Note: This page goes directly into the details of implementing one type of backend. For an intro to the custom backends system, see How custom backends work.

A data_dig backend function is similar to a lookup_key function. But instead of looking up a single key, it looks up a single sequence of keys and subkeys.

Hiera lets you look up individual members of hash and array values using key.subkey notation. In cases where:

  • Lookups are relatively expensive.
  • The data source knows how to extract elements from hash and array values.
  • Users are likely to pass key.subkey requests to the lookup function to access subsets of large data structures.

…then it’s possible to get better performance by writing a data_dig backend instead of a lookup_key backend.


We don’t currently have any realistic examples of data_dig backends. Let us know if you see any in the wild.

Arguments and return type

Hiera calls a data_dig function with three arguments:

  1. An array of lookup key segments.

    The array of key segments is made by splitting the requested lookup key on the dot (.) subkey separator. For example, a lookup for users.dbadmin.uid would result in ['users', 'dbadmin', 'uid']. Positive base-10 integer subkeys (for accessing array members) are converted to Integer objects, but other number-like subkeys remain as strings.

  2. A hash of options. (More on this below.)
  3. A Puppet::LookupContext object. (More on this below.)

The function must either call the context object’s not_found method, or return a value for the requested sequence of key segments.

Example signatures:

Puppet language:

function mymodule::hiera_backend(
  Array[Variant[String, Numeric]] $segments,
  Hash                            $options,
  Puppet::LookupContext           $context,


dispatch :hiera_backend do
  param 'Array[Variant[String, Numeric]]', :segments
  param 'Hash', :options
  param 'Puppet::LookupContext', :context

Like other Hiera data sources, a data_dig function can use the special lookup_options key to configure merge behavior for other keys. See Configuring merge behavior in Hiera data for more info.

If you want to support Hiera interpolation tokens like %{variable} or %{lookup('key')} in your data, you must call context.interpolate on your values before returning them.

The options hash

Hierarchy levels are configured in hiera.yaml. When calling a backend function, Hiera passes a modified version of that configuration as a hash.

The options hash contains the following keys:

  • path — The absolute path to a file on disk. Only present if the user set one of the path, paths, glob, or globs settings. Hiera ensures the file exists before passing it to the function.

    Note: If your backend uses data files, use the context object’s cached_file_data method to read them.

  • uri — A URI that your function can use to locate a data source. Only present if the user set uri or uris. Hiera doesn’t verify the URI before passing it to the function.
  • Every key from the hierarchy level’s options setting. In your documentation, make sure to list any options your backend requires or accepts. Note that the path and uri keys are reserved.

For example: this hierarchy level in hiera.yaml…

  - name: "Secret data: per-node, per-datacenter, common"
    lookup_key: eyaml_lookup_key # eyaml backend
    datadir: data
      - "secrets/nodes/%{trusted.certname}.eyaml"
      - "secrets/location/%{facts.whereami}.eyaml"
      - "common.eyaml"
      pkcs7_private_key: /etc/puppetlabs/puppet/eyaml/private_key.pkcs7.pem
      pkcs7_public_key:  /etc/puppetlabs/puppet/eyaml/public_key.pkcs7.pem

…would result in several different options hashes (depending on the current node’s facts, whether the files exist, etc.), but they would all resemble the following:

  'path' => '/etc/puppetlabs/code/environments/production/data/secrets/nodes/web01.example.com.eyaml',
  'pkcs7_private_key' => '/etc/puppetlabs/puppet/eyaml/private_key.pkcs7.pem',
  'pkcs7_public_key' => '/etc/puppetlabs/puppet/eyaml/public_key.pkcs7.pem'

In your function’s signature, you can validate the options hash by using the Struct data type to restrict its contents. In particular, note that you can disable all of the path(s) and glob(s) settings for your backend by disallowing the path key in the options hash.

Calling conventions for data_dig functions

Hiera generally calls data_dig functions once per data source for every unique sequence of key segments.

Note that a given hierarchy level can refer to multiple data sources with the paths, uris, and glob(s) settings. Hiera handles each hierarchy level as follows:

  • If the path(s) or glob(s) settings are used, Hiera figures out which files actually exist and calls the function once for each. If no files were found, the function won’t be called at all.
  • If the uri(s) settings are used, Hiera calls the function once per URI.
  • If none of those settings are used, Hiera calls the function once.

Hiera tries to cache the value for a given sequence of key segments and use the cached value on subsequent lookups. However, it might call a function again for a given key and data source if the inputs change — for example, if hiera.yaml interpolates a local variable in a file path, Hiera would have to call the function again for scopes where that variable has a different value. (This has a significant performance impact, and is why we tell users to only interpolate facts, trusted, and server_facts in the hierarchy.)

The Puppet::LookupContext object

To support caching and other needs, Hiera provides backends a special Puppet::LookupContext object, which has several methods you can call for various effects.

  • In Ruby functions, this is a normal Ruby object of class Puppet::LookupContext, and you can call methods with standard Ruby syntax (like context.not_found).
  • In Puppet language functions, the context object appears as a special data type (Object) that has methods attached. Right now, there isn’t anything else in the Puppet language that acts like this.

    You can call its methods using Puppet’s chained function call syntax with the method name instead of a normal function — for example, $context.not_found. For methods that take a block, use Puppet’s lambda syntax (parameters outside block) instead of Ruby’s block syntax (parameters inside block).

The following methods are available:


Tells Hiera to move on to the next data source. Call this method when your function can’t find a value for a given lookup. This method does not return.

For data_hash backends, use this when the requested data source doesn’t exist. (If it exists and is empty, return an empty hash.) Missing data sources aren’t an issue when using path(s)/glob(s), but are important for backends that locate their own data sources.

For lookup_key and data_dig backends, use this when a requested key isn’t present in the data source or the data source doesn’t exist. Don’t return undef/nil for missing keys, since that’s a legal value that can be set in data.


Returns the provided value, but with any Hiera interpolation tokens (like %{variable} or %{lookup('key')}) replaced by their value. This lets you opt-in to allowing Hiera-style interpolation in your backend’s data sources. Works recursively on arrays and hashes; hashes can interpolate into both keys and values.

In data_hash backends, interpolation is automatically supported and you don’t need to call this method.

In lookup_key and data_dig backends, you must call this method if you want to support interpolation; if you don’t, Hiera assumes you have your own thing going on.


Returns the name of the environment whose hiera.yaml called the function. Returns undef (in Puppet) or nil (in Ruby) if the function was called by the global or module layer.


Returns the name of the module whose hiera.yaml called the function. Returns undef (in Puppet) or nil (in Ruby) if the function was called by the global or environment layer.

cache(key, value)

Caches a value, in a per-data-source private cache; also returns the cached value.

On future lookups in this data source, you can retrieve values with cached_value(key). Cached values are immutable, but you can replace the value for an existing key. Cache keys can be anything valid as a key for a Ruby hash. (Notably, this means you can use nil as a key.)

For example, on its first invocation for a given YAML file, the built-in eyaml_lookup_key backend reads the whole file and caches it, and then decrypts only the specific value that was requested. On subsequent lookups into that file, it gets the encrypted value from the cache instead of reading the file from disk again. It also caches decrypted values, so that it won’t have to decrypt again if the same key is looked up repeatedly.

The cache is also useful for storing session keys or connection objects for backends that access a network service.

Cache lifetime and scope

Each Puppet::LookupContext cache only lasts for the duration of the current catalog compilation; a node can’t access values cached for a previous node.

Hiera creates a separate cache for each combination of inputs for a function call, including inputs like name that are configured in hiera.yaml but not passed to the function. So not only does each hierarchy level have its own cache, but hierarchy levels that use multiple paths have a separate cache for each path.

If any inputs to a function change (for example, a path interpolates a local variable whose value changes between lookups), Hiera uses a fresh cache.


Caches all the key/value pairs from a given hash; returns undef (in Puppet) or nil (in Ruby).


Returns a previously cached value from the per-data-source private cache. Returns nil or undef if no value with this name has been cached. See cache(key, value) above for more info about how the cache works.


Checks whether the cache has a value for a given key yet. Returns true or false.


Returns everything in the per-data-source cache, as an iterable object. Note that this iterable object isn’t a hash; if you want a hash, you can use Hash($context.all_cached()) (in the Puppet language) or Hash[context.all_cached()] (in Ruby).

cached_file_data(path) {|content| ...}

Note: The header above uses Ruby’s block syntax. To call this method in the Puppet language, you would use cached_file_data(path) |content| { ... }.

For best performance, use this method to read files in Hiera backends.

Returns the content of the specified file, as a string. If an optional block is provided, it passes the content to the block and returns the block’s return value. For example, the built-in JSON backend uses a block to parse JSON and return a hash:

    context.cached_file_data(path) do |content|
      rescue JSON::ParserError => ex
        # Filename not included in message, so we add it here.
        raise Puppet::DataBinding::LookupError, "Unable to parse (#{path}): #{ex.message}"

On repeated access to a given file, Hiera checks whether the file has changed on disk. If it hasn’t, Hiera uses cached data instead of reading and parsing the file again.

This method does not use the same per-data-source caches as cache(key, value) and friends. It uses a separate cache that lasts across multiple catalog compilations, and is tied to Puppet Server’s environment cache.

Since the cache can outlive a given node’s catalog compilation, do not do any node-specific pre-processing (like calling context.interpolate) in this method’s block.

explain() { 'message' }

Note: The header above uses Ruby’s block syntax. To call this method in the Puppet language, you would use explain() || { 'message' }. In both cases, the provided block must take zero arguments.

Adds a message, which appears in debug messages or when using puppet lookup --explain. The block provided to this function must return a string.

This is meant for complex lookups where a function tries several different things before arriving at the value. Note that the built-in backends don’t use the explain method, and they still have relatively verbose explanations; this is for when you need to go above and beyond that.

Feel free to not worry about performance when constructing your message; Hiera never executes the explain block unless debugging is enabled.

↑ Back to top