Helper functions

Helper functions#

Helper utilities for creating and interpreting pyhf workspace inputs.

This module provides WorkspaceInterpreter, a thin layer around a pyhf background-only workspace that bookkeeps signal injection, channel removal and conversion between signal maps and JSONPatch documents.

It also exposes a small set of pure helper functions that build the patch operation dictionaries consumed by pyhf and convenience transformations of the workspace such as luminosity extrapolation and systematic-uncertainty rescaling.

class spey_pyhf.WorkspaceInterpreter(background_only_model: Dict)[source]#

Bookkeeping wrapper around a pyhf background-only workspace.

The interpreter holds the original background-only pyhf workspace dictionary together with a parallel description of any signal injection, control-region masking and modifier configuration provided by the user. Once populated it can produce the JSONPatch document that pyhf consumes to build the signal-plus-background statistical model, and it can produce derived workspaces with rescaled luminosity or rescaled systematic uncertainties.

Parameters:: background_only_model (Dict) – a valid pyhf workspace description for the background-only fit, containing at least the keys channels, observations and measurements.

add_patch(signal_patch: List[Dict]) → None[source]#

Replace the current signal configuration with one read from a JSONPatch.

Parameters:: signal_patch (List[Dict]) – JSONPatch document, typically produced by make_patch(), describing signal sample additions and channel removals.

background_only_model#: pyhf workspace description for the background-only fit.

property bin_map: Dict[str, int]#

Number of bins for every channel declared in the workspace.

Returns:: mapping from channel name to the number of bins of its first sample.
Return type:: Dict[str, int]

property channels: Iterator[str]#

Iterate over the channel names declared in the workspace.

Returns:: generator yielding the channel names in the order they appear in workspace["channels"].
Return type:: Iterator[str]

property expected_background_yields: Dict[str, List[float]]#

Total expected background yields per channel, given the current configuration.

Channels listed in remove_list are skipped. A warning is emitted once for any channel that is kept but has not been configured with a signal injection.

Returns:: mapping from channel name to the bin-wise sum of all sample yields contributing to that channel.
Return type:: Dict[str, List[float]]

extrapolate_luminosity(factor: float) → WorkspaceInterpreter[source]#

Return a luminosity-extrapolated copy of this interpreter.

Every sample yield, observation count and luminosity-sensitive modifier data field is multiplied by factor. The transformation assumes that relative uncertainties remain constant, so absolute per-bin uncertainties (carried by shapesys, staterror and histosys alternative templates) scale linearly with the yields. Dimensionless modifier data (normsys, normfactor, lumi, shapefactor) is left unchanged.

Both the background-only workspace and any registered signal injection are scaled. The original interpreter is not modified.

Added in version 0.2.1.

Parameters:: factor (float) – luminosity scale factor, typically new_lumi / old_lumi. Must be strictly positive.
Raises:: ValueError – if factor is not strictly positive.
Returns:: a new interpreter wrapping a deep copy of the workspace with all yields and absolute uncertainties scaled by factor, preserving the existing signal injections and channel-removal list.
Return type:: WorkspaceInterpreter

get_channels(channel_index: List[int] | List[str]) → List[str][source]#

Resolve a mix of channel indices and channel names to channel names.

Parameters:: channel_index (Union[List[int], List[str]]) – indices and/or names of the channels to look up.
Returns:: channel names whose index or name appears in channel_index.
Return type:: List[str]

guess_CRVR() → List[str][source]#

Return all channel names that look like control or validation regions.

Classification follows guess_channel_type().

Returns:: channel names classified as "CR" or "VR".
Return type:: List[str]

guess_channel_type(channel_name: str) → str[source]#

Heuristically classify a channel as control, validation or signal region.

The classification is purely string-based: the uppercased channel name is searched for the substrings "CR", "VR" or "SR" in that order and the first match wins. Any other channel name returns "__unknown__". Because the check is a substring match, channel names that happen to contain these letters for unrelated reasons may be misclassified.

Parameters:: channel_name (str) – name of the channel to classify.
Raises:: ValueError – if channel_name is not a channel of this workspace.
Returns:: one of "CR", "VR", "SR" or "__unknown__".
Return type:: str

inject_signal(channel: str, data: List[float], modifiers: List[Dict] | None = None) → None[source]#

If modifiers is provided but does not contain the default lumi and normfactor modifiers (with poi_name taken from the first measurement), they are appended automatically.

Parameters:

channel (str) – name of the target channel; must already exist in the background-only workspace.
data (List[float]) – signal yields, one entry per bin of the channel.
modifiers (Optional[List[Dict]], default None) – modifier dictionaries to attach to the signal sample. When None, _default_modifiers() is used.

Raises:

ValueError – if channel does not exist in the workspace, or if the length of data does not match the number of bins of channel.

make_patch() → List[Dict][source]#

Convert the registered signal injections and removals into a JSONPatch.

The returned patch list contains, in order, one add operation per channel registered via inject_signal(), followed by the remove operations for channels registered via remove_channel(), sorted in descending index order so that earlier indices remain valid as pyhf applies the patch.

Raises:: ValueError – if no signal has been registered yet.
Returns:: JSONPatch document for the signal-plus-background workspace.
Return type:: List[Dict]

patch_to_map(signal_patch: List[Dict], return_remove_list: bool = False) → Tuple[Dict[str, List[float]], Dict[str, List[Dict]], List[str]] | Tuple[Dict[str, List[float]], Dict[str, List[Dict]]][source]#

Convert a JSONPatch document into the internal signal map.

>>> signal_map = {channel_name: signal_yields}
>>> modifier_map = {channel_name: signal_modifiers}

Parameters:

signal_patch (List[Dict]) – JSONPatch document for the signal.
return_remove_list (bool, default False) –
if True, also return the list of channel names marked for removal.

Added in version 0.1.5.

Returns:

mapping from channel name to signal yields, mapping from channel name to signal modifiers, and (optionally) the list of channel names marked for removal.

Return type:

Tuple[Dict[str, List[float]], Dict[str, List[Dict]], List[str]] or Tuple[Dict[str, List[float]], Dict[str, List[Dict]]]

property poi_name: List[Tuple[str, str]]#

Parameter-of-interest name for each measurement.

Returns:: list of (measurement_name, poi_name) tuples, one per entry of workspace["measurements"].
Return type:: List[Tuple[str, str]]

remove_channel(channel_name: str) → None[source]#

Mark a channel to be removed from the likelihood.

Added in version 0.1.5.

Parameters:: channel_name (str) – name of the channel to be removed. Channels unknown to the workspace produce an error log and no modification.

property remove_list: List[str]#

Names of channels marked for removal from the model.

Added in version 0.1.5.

Returns:: channel names registered via remove_channel().
Return type:: List[str]

reset_signal() → None[source]#: Drop all registered signal injections and channel removals.

scale_systematics(fraction: float, modifier_types: List[str] | None = None) → WorkspaceInterpreter[source]#

Return a copy in which systematic-uncertainty deviations are rescaled.

For each modifier whose type is in modifier_types the deviation from the nominal value is multiplied by fraction:

normsys up/down scale factors are rescaled around 1, so that a fraction of 0 makes the systematic vanish (hi = lo = 1) and a fraction of 1 is a no-op;
histosys alternative templates are rescaled around the nominal sample yields with the same convention.

Statistical modifiers (shapesys, staterror) are never modified by this method, regardless of modifier_types: passing one of them raises a ValueError. Sample yields and observations are unchanged.

The original interpreter is not modified.

Added in version 0.2.1.

Parameters:

fraction (float) – multiplicative factor applied to each systematic deviation. 1 is a no-op, 0 removes the systematic, intermediate values shrink it. Must be non-negative.
modifier_types (Optional[List[str]], default None) – modifier type values to rescale. When None, defaults to ["normsys", "histosys"]. Statistical modifier types (shapesys, staterror) are not allowed.

Raises:

ValueError – if fraction is negative, or if modifier_types contains a statistical modifier type.

Returns:

a new interpreter wrapping a deep copy of the workspace with the requested systematic deviations rescaled, preserving the existing signal injections and channel-removal list.

Return type:

WorkspaceInterpreter

property signal_per_channel: Dict[str, List[float]]#

Currently registered signal yields, keyed by channel name.

Returns:: mapping from channel name to the signal yields registered via inject_signal() or add_patch().
Return type:: Dict[str, List[float]]

summary(measurement_name: str | None = None, show_samples: bool = False, show_modifiers: bool = False, max_channels: int = 50) → None[source]#

Print a human-readable summary of the workspace and the signal injection state.

The header reports workspace-level statistics (version, number of channels, measurements and observations). Each measurement is listed with its parameter of interest and parameter count. For every channel the summary shows its guessed region type (CR / VR / SR), bin count, observation total, expected-background total, sample count and an aggregated count of modifier types attached to its samples. Injected signals and channels marked for removal are listed at the bottom.

Added in version 0.2.1.

Parameters:

measurement_name (Optional[str], default None) – if given, restrict the per-measurement section to the named measurement.
show_samples (bool, default False) – if True, list every sample name and its yield total beneath each channel.
show_modifiers (bool, default False) – if True, list every modifier name and type per sample. Implies show_samples.
max_channels (int, default 50) – maximum number of channels to print per measurement.

Helper functions

Contents

Helper functions#