Interface

Interface#

class spey_pyhf.interface.UncorrelatedBackground(signal_yields: List[float], background_yields: List[float], data: List[int], absolute_uncertainties: List[float])[source]#

Bases: PyhfInterface

This backend initiates pyhf.simplemodels.uncorrelated_background, forming an uncorrelated histogram structure with given inputs.

Parameters:

signal_yields (List[float]) – signal yields
background_yields (List[float]) – background yields
data (List[float]) – observations
absolute_uncertainties (List[float]) – absolute uncertainties on the background

asimov_negative_loglikelihood(poi_test: float = 1.0, expected: ExpectationType = observed, test_statistics: str = 'qtilde', **kwargs) → Tuple[float, ndarray]#

Compute negative log-likelihood at fixed \(\mu\) for Asimov data.

Note

Interface first calls backend specific methods to compute likelihood. If they are not implemented, it optimizes objective function through spey interface. Either prescription to optimizing the likelihood or objective function must be available for a backend to be sucessfully integrated to the spey interface.

Parameters:

poi_test (float, default 1.0) – parameter of interest, \(\mu\).
expected (ExpectationType) –
Sets which values the fitting algorithm should focus and p-values to be computed.
- observed: Computes the p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth (default).
- aposteriori: Computes the expected p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth.
- apriori: Computes the expected p-values with via pre-fit prescription which means that the SM will be assumed to be the truth.
test_statistics (Text, default "qtilde") –
test statistics.
- 'qtilde': (default) performs the calculation using the alternative test statistic, \(\tilde{q}_{\mu}\), see eq. (62) of [arXiv:1007.1727] (qmu_tilde()).
  
  Warning
  
  Note that this assumes that \(\hat\mu\geq0\), hence allow_negative_signal assumed to be False. If this function has been executed by user, spey assumes that this is taken care of throughout the external code consistently. Whilst computing p-values or upper limit on \(\mu\) through spey this is taken care of automatically in the backend.
- 'q': performs the calculation using the test statistic \(q_{\mu}\), see eq. (54) of [arXiv:1007.1727] (qmu()).
- 'q0': performs the calculation using the discovery test statistic, see eq. (47) of [arXiv:1007.1727] \(q_{0}\) (q0()).
kwargs – keyword arguments for the optimiser.

Raises:

NotImplementedError – If the method is not available for the backend.

Returns:

value of negative log-likelihood at POI of interest and fit parameters (\(\mu\) and \(\theta\)).

Return type:

Tuple[float, np.ndarray]

author: str = 'SpeysideHEP'#: Author of the backend

combine(other, **kwargs)#

A routine to combine to statistical models.

Note

This function is only available if the backend has a specific routine for combination between same or other backends.

Parameters:: other (BackendBase) – Statistical model object to be combined.
Raises:: NotImplementedError – If the backend does not have a combination scheme.
Returns:: Create a new statistical model from combination of this and other one.
Return type:: BackendBase

config(allow_negative_signal: bool = True, poi_upper_bound: float = 10.0) → ModelConfig#

Model configuration.

Parameters:

allow_negative_signal (bool, default True) – If True \(\hat\mu\) value will be allowed to be negative.
poi_upper_bound (float, default 40.0) – upper bound for parameter of interest, \(\mu\).

Returns:

Model configuration. Information regarding the position of POI in parameter list, suggested input and bounds.

Return type:

ModelConfig

doi: List[str] = ['10.5281/zenodo.1169739', '10.21105/joss.02823']#: Citable DOI for the backend

expected_data(pars: List[float]) → List[float]#

Compute the expected value of the statistical model

Parameters:: pars (List[float]) – nuisance parameters, \(\theta\) and parameter of interest, \(\mu\).
Returns:: Expected data of the statistical model
Return type:: List[float]

get_hessian_logpdf_func(expected: ExpectationType = observed, data: List[float] | ndarray | None = None) → Callable[[ndarray], float]#

Currently Hessian of \(\log\mathcal{L}(\mu, \theta)\) is only used to compute variance on \(\mu\). This method returns a callable function which takes fit parameters (\(\mu\) and \(\theta\)) and returns Hessian.

Parameters:

expected (ExpectationType) –
Sets which values the fitting algorithm should focus and p-values to be computed.
- observed: Computes the p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth (default).
- aposteriori: Computes the expected p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth.
- apriori: Computes the expected p-values with via pre-fit prescription which means that the SM will be assumed to be the truth.
data (Union[List[float], np.ndarray], default None) – input data that to fit

Returns:

Function that takes fit parameters (\(\mu\) and \(\theta\)) and returns Hessian of \(\log\mathcal{L}(\mu, \theta)\).

Return type:

Callable[[np.ndarray], float]

get_logpdf_func(expected: ExpectationType = observed, data: List[float] | ndarray | None = None) → Callable[[ndarray], float]#

Generate function to compute \(\log\mathcal{L}(\mu, \theta)\) where \(\mu\) is the parameter of interest and \(\theta\) are nuisance parameters.

Parameters:

expected (ExpectationType) –
Sets which values the fitting algorithm should focus and p-values to be computed.
- observed: Computes the p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth (default).
- aposteriori: Computes the expected p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth.
- apriori: Computes the expected p-values with via pre-fit prescription which means that the SM will be assumed to be the truth.
data (Union[List[float], np.ndarray], default None) – input data that to fit

Returns:

Function that takes fit parameters (\(\mu\) and \(\theta\)) and computes \(\log\mathcal{L}(\mu, \theta)\).

Return type:

Callable[[np.ndarray], float]

get_objective_function(expected: ExpectationType = observed, data: List[float] | ndarray | None = None, do_grad: bool = True) → Callable[[ndarray], float | Tuple[float, ndarray]]#

Objective function is the function to perform the optimisation on. This function is expected to be twice negative log-likelihood, \(-2\log\mathcal{L}(\mu, \theta)\). Additionally, if available it canbe bundled with the gradient of twice negative log-likelihood.

Parameters:

expected (ExpectationType) –
Sets which values the fitting algorithm should focus and p-values to be computed.
- observed: Computes the p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth (default).
- aposteriori: Computes the expected p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth.
- apriori: Computes the expected p-values with via pre-fit prescription which means that the SM will be assumed to be the truth.
data (Union[List[float], np.ndarray], default None) – input data that to fit
do_grad (bool, default True) – If True return objective and its gradient as tuple (subject to availablility) if False only returns objective function.

Returns:

Function which takes fit parameters (\(\mu\) and \(\theta\)) and returns either objective or objective and its gradient.

Return type:

Callable[[np.ndarray], Union[float, Tuple[float, np.ndarray]]]

get_sampler(pars: ndarray) → Callable[[int], ndarray]#

Retreives the function to sample from.

Parameters:: pars (np.ndarray) – fit parameters (\(\mu\) and \(\theta\))
Returns:: Function that takes number_of_samples as input and draws as many samples from the statistical model.
Return type:: Callable[[int], np.ndarray]

property is_alive: bool#: Returns True if at least one bin has non-zero signal yield.

manager#: pyhf Manager to handle the interface with pyhf

minimize_asimov_negative_loglikelihood(expected: ExpectationType = observed, test_statistics: str = 'qtilde', **kwargs) → Tuple[float, ndarray]#

A backend specific method to minimize negative log-likelihood for Asimov data.

Note

Parameters:

expected (ExpectationType) –
Sets which values the fitting algorithm should focus and p-values to be computed.
- observed: Computes the p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth (default).
- aposteriori: Computes the expected p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth.
- apriori: Computes the expected p-values with via pre-fit prescription which means that the SM will be assumed to be the truth.
test_statistics (Text, default "qtilde") –
test statistics.
- 'qtilde': (default) performs the calculation using the alternative test statistic, \(\tilde{q}_{\mu}\), see eq. (62) of [arXiv:1007.1727] (qmu_tilde()).
  
  Warning
  
  Note that this assumes that \(\hat\mu\geq0\), hence allow_negative_signal assumed to be False. If this function has been executed by user, spey assumes that this is taken care of throughout the external code consistently. Whilst computing p-values or upper limit on \(\mu\) through spey this is taken care of automatically in the backend.
- 'q': performs the calculation using the test statistic \(q_{\mu}\), see eq. (54) of [arXiv:1007.1727] (qmu()).
- 'q0': performs the calculation using the discovery test statistic, see eq. (47) of [arXiv:1007.1727] \(q_{0}\) (q0()).
kwargs – keyword arguments for the optimiser.

Raises:

NotImplementedError – If the method is not available for the backend.

Returns:

value of negative log-likelihood and fit parameters (\(\mu\) and \(\theta\)).

Return type:

Tuple[float, np.ndarray]

minimize_negative_loglikelihood(expected: ExpectationType = observed, allow_negative_signal: bool = True, **kwargs) → Tuple[float, ndarray]#

A backend specific method to minimize negative log-likelihood.

Note

Parameters:

expected (ExpectationType) –
Sets which values the fitting algorithm should focus and p-values to be computed.
- observed: Computes the p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth (default).
- aposteriori: Computes the expected p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth.
- apriori: Computes the expected p-values with via pre-fit prescription which means that the SM will be assumed to be the truth.
allow_negative_signal (bool, default True) – If True \(\hat\mu\) value will be allowed to be negative.
kwargs – keyword arguments for the optimiser.

Raises:

NotImplementedError – If the method is not available for the backend.

Returns:

value of negative log-likelihood and fit parameters (\(\mu\) and \(\theta\)).

Return type:

Tuple[float, np.ndarray]

property model: Base#: Retreive statistical model container

name: str = 'pyhf.uncorrelated_background'#: Name of the backend

negative_loglikelihood(poi_test: float = 1.0, expected: ExpectationType = observed, **kwargs) → Tuple[float, ndarray]#

Backend specific method to compute negative log-likelihood for a parameter of interest \(\mu\).

Note

Parameters:

poi_test (float, default 1.0) – parameter of interest, \(\mu\).
expected (ExpectationType) –
Sets which values the fitting algorithm should focus and p-values to be computed.
- observed: Computes the p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth (default).
- aposteriori: Computes the expected p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth.
- apriori: Computes the expected p-values with via pre-fit prescription which means that the SM will be assumed to be the truth.
kwargs – keyword arguments for the optimiser.

Raises:

NotImplementedError – If the method is not available for the backend.

Returns:

value of negative log-likelihood at POI of interest and fit parameters (\(\mu\) and \(\theta\)).

Return type:

Tuple[float, np.ndarray]

spey_requires: str = '>=0.2.0,<0.3.0'#: Spey version required for the backend

version: str = '0.2.1'#: Version of the backend

class spey_pyhf.interface.FullStatisticalModel(signal_patch: Dict, background_only_model: str | Dict)[source]#

Bases: PyhfInterface

pyhf Interface. For details on input structure please see this link

Parameters:

signal_patch (List[Dict]) –
Patch data for signal model. please see this link for details on the structure of the input.
background_only_model (Dict or Text) – This input expects background only data that describes the full statistical model for the background. It also accepts str input which indicates the full path to the background only JSON file.

Example:

>>> import spey

>>> background_only = {
...     "channels": [
...         {
...             "name": "singlechannel",
...             "samples": [
...                 {
...                     "name": "background",
...                     "data": [50.0, 52.0],
...                     "modifiers": [
...                         {
...                             "name": "uncorr_bkguncrt",
...                             "type": "shapesys",
...                             "data": [3.0, 7.0],
...                         }
...                     ],
...                 }
...             ],
...         }
...     ],
...     "observations": [{"name": "singlechannel", "data": [51.0, 48.0]}],
...     "measurements": [{"name": "Measurement", "config": {"poi": "mu", "parameters": []}}],
...     "version": "1.0.0",
... }
>>> signal = [
...     {
...         "op": "add",
...         "path": "/channels/0/samples/1",
...         "value": {
...             "name": "signal",
...             "data": [12.0, 11.0],
...             "modifiers": [{"name": "mu", "type": "normfactor", "data": None}],
...         },
...     }
... ]
>>> stat_wrapper = spey.get_backend("pyhf")
>>> statistical_model = stat_wrapper(
...     analysis="simple_pyhf",
...     background_only_model=background_only,
...     signal_patch=signal,
... )
>>> statistical_model.exclusion_confidence_level() # [0.9474850259721279]

asimov_negative_loglikelihood(poi_test: float = 1.0, expected: ExpectationType = observed, test_statistics: str = 'qtilde', **kwargs) → Tuple[float, ndarray]#

Compute negative log-likelihood at fixed \(\mu\) for Asimov data.

Note

Parameters:

poi_test (float, default 1.0) – parameter of interest, \(\mu\).
expected (ExpectationType) –
Sets which values the fitting algorithm should focus and p-values to be computed.
- observed: Computes the p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth (default).
- aposteriori: Computes the expected p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth.
- apriori: Computes the expected p-values with via pre-fit prescription which means that the SM will be assumed to be the truth.
test_statistics (Text, default "qtilde") –
test statistics.
- 'qtilde': (default) performs the calculation using the alternative test statistic, \(\tilde{q}_{\mu}\), see eq. (62) of [arXiv:1007.1727] (qmu_tilde()).
  
  Warning
  
  Note that this assumes that \(\hat\mu\geq0\), hence allow_negative_signal assumed to be False. If this function has been executed by user, spey assumes that this is taken care of throughout the external code consistently. Whilst computing p-values or upper limit on \(\mu\) through spey this is taken care of automatically in the backend.
- 'q': performs the calculation using the test statistic \(q_{\mu}\), see eq. (54) of [arXiv:1007.1727] (qmu()).
- 'q0': performs the calculation using the discovery test statistic, see eq. (47) of [arXiv:1007.1727] \(q_{0}\) (q0()).
kwargs – keyword arguments for the optimiser.

Raises:

NotImplementedError – If the method is not available for the backend.

Returns:

value of negative log-likelihood at POI of interest and fit parameters (\(\mu\) and \(\theta\)).

Return type:

Tuple[float, np.ndarray]

author: str = 'SpeysideHEP'#: Author of the backend

combine(other, **kwargs)[source]#

Combine full statistical models generated by pyhf interface

Parameters:

other (FullStatisticalModel) – other statistical model to be combined with this model
kwargs –
pyhf specific inputs:
- join (str, default None): How to join the two workspaces. Pick from "none", "outer", "left outer" or “right outer”.
- merge_channels (bool): Whether or not to merge channels when performing the combine. This is only done with "outer", "left outer", and "right outer" options.
non-pyhf specific inputs:
- update_measurements (bool, default True): In case the measurement name of two statistical models are the same, other statistical model’s measurement name will be updated. If set to False measurements will remain as is.
Note

This model is "left" and other model is considered to be "right".

Raises:

CombinationError – Raised if its not possible to combine statistical models.

Returns:

Combined statistical model.

Return type:

FullStatisticalModel

config(allow_negative_signal: bool = True, poi_upper_bound: float = 10.0) → ModelConfig#

Model configuration.

Parameters:

allow_negative_signal (bool, default True) – If True \(\hat\mu\) value will be allowed to be negative.
poi_upper_bound (float, default 40.0) – upper bound for parameter of interest, \(\mu\).

Returns:

Model configuration. Information regarding the position of POI in parameter list, suggested input and bounds.

Return type:

ModelConfig

doi: List[str] = ['10.5281/zenodo.1169739', '10.21105/joss.02823']#: Citable DOI for the backend

expected_data(pars: List[float]) → List[float]#

Compute the expected value of the statistical model

Parameters:: pars (List[float]) – nuisance parameters, \(\theta\) and parameter of interest, \(\mu\).
Returns:: Expected data of the statistical model
Return type:: List[float]

get_hessian_logpdf_func(expected: ExpectationType = observed, data: List[float] | ndarray | None = None) → Callable[[ndarray], float]#

Parameters:

expected (ExpectationType) –
Sets which values the fitting algorithm should focus and p-values to be computed.
- observed: Computes the p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth (default).
- aposteriori: Computes the expected p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth.
- apriori: Computes the expected p-values with via pre-fit prescription which means that the SM will be assumed to be the truth.
data (Union[List[float], np.ndarray], default None) – input data that to fit

Returns:

Function that takes fit parameters (\(\mu\) and \(\theta\)) and returns Hessian of \(\log\mathcal{L}(\mu, \theta)\).

Return type:

Callable[[np.ndarray], float]

get_logpdf_func(expected: ExpectationType = observed, data: List[float] | ndarray | None = None) → Callable[[ndarray], float]#

Generate function to compute \(\log\mathcal{L}(\mu, \theta)\) where \(\mu\) is the parameter of interest and \(\theta\) are nuisance parameters.

Parameters:

expected (ExpectationType) –
Sets which values the fitting algorithm should focus and p-values to be computed.
- observed: Computes the p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth (default).
- aposteriori: Computes the expected p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth.
- apriori: Computes the expected p-values with via pre-fit prescription which means that the SM will be assumed to be the truth.
data (Union[List[float], np.ndarray], default None) – input data that to fit

Returns:

Function that takes fit parameters (\(\mu\) and \(\theta\)) and computes \(\log\mathcal{L}(\mu, \theta)\).

Return type:

Callable[[np.ndarray], float]

get_objective_function(expected: ExpectationType = observed, data: List[float] | ndarray | None = None, do_grad: bool = True) → Callable[[ndarray], float | Tuple[float, ndarray]]#

Parameters:

expected (ExpectationType) –
Sets which values the fitting algorithm should focus and p-values to be computed.
- observed: Computes the p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth (default).
- aposteriori: Computes the expected p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth.
- apriori: Computes the expected p-values with via pre-fit prescription which means that the SM will be assumed to be the truth.
data (Union[List[float], np.ndarray], default None) – input data that to fit
do_grad (bool, default True) – If True return objective and its gradient as tuple (subject to availablility) if False only returns objective function.

Returns:

Function which takes fit parameters (\(\mu\) and \(\theta\)) and returns either objective or objective and its gradient.

Return type:

Callable[[np.ndarray], Union[float, Tuple[float, np.ndarray]]]

get_sampler(pars: ndarray) → Callable[[int], ndarray]#

Retreives the function to sample from.

Parameters:: pars (np.ndarray) – fit parameters (\(\mu\) and \(\theta\))
Returns:: Function that takes number_of_samples as input and draws as many samples from the statistical model.
Return type:: Callable[[int], np.ndarray]

property is_alive: bool#: Returns True if at least one bin has non-zero signal yield.

manager#: pyhf Manager to handle the interface with pyhf

minimize_asimov_negative_loglikelihood(expected: ExpectationType = observed, test_statistics: str = 'qtilde', **kwargs) → Tuple[float, ndarray]#

A backend specific method to minimize negative log-likelihood for Asimov data.

Note

Parameters:

expected (ExpectationType) –
Sets which values the fitting algorithm should focus and p-values to be computed.
- observed: Computes the p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth (default).
- aposteriori: Computes the expected p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth.
- apriori: Computes the expected p-values with via pre-fit prescription which means that the SM will be assumed to be the truth.
test_statistics (Text, default "qtilde") –
test statistics.
- 'qtilde': (default) performs the calculation using the alternative test statistic, \(\tilde{q}_{\mu}\), see eq. (62) of [arXiv:1007.1727] (qmu_tilde()).
  
  Warning
  
  Note that this assumes that \(\hat\mu\geq0\), hence allow_negative_signal assumed to be False. If this function has been executed by user, spey assumes that this is taken care of throughout the external code consistently. Whilst computing p-values or upper limit on \(\mu\) through spey this is taken care of automatically in the backend.
- 'q': performs the calculation using the test statistic \(q_{\mu}\), see eq. (54) of [arXiv:1007.1727] (qmu()).
- 'q0': performs the calculation using the discovery test statistic, see eq. (47) of [arXiv:1007.1727] \(q_{0}\) (q0()).
kwargs – keyword arguments for the optimiser.

Raises:

NotImplementedError – If the method is not available for the backend.

Returns:

value of negative log-likelihood and fit parameters (\(\mu\) and \(\theta\)).

Return type:

Tuple[float, np.ndarray]

minimize_negative_loglikelihood(expected: ExpectationType = observed, allow_negative_signal: bool = True, **kwargs) → Tuple[float, ndarray]#

A backend specific method to minimize negative log-likelihood.

Note

Parameters:

expected (ExpectationType) –
Sets which values the fitting algorithm should focus and p-values to be computed.
- observed: Computes the p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth (default).
- aposteriori: Computes the expected p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth.
- apriori: Computes the expected p-values with via pre-fit prescription which means that the SM will be assumed to be the truth.
allow_negative_signal (bool, default True) – If True \(\hat\mu\) value will be allowed to be negative.
kwargs – keyword arguments for the optimiser.

Raises:

NotImplementedError – If the method is not available for the backend.

Returns:

value of negative log-likelihood and fit parameters (\(\mu\) and \(\theta\)).

Return type:

Tuple[float, np.ndarray]

property model: Base#: Retreive statistical model container

name: str = 'pyhf'#: Name of the backend

negative_loglikelihood(poi_test: float = 1.0, expected: ExpectationType = observed, **kwargs) → Tuple[float, ndarray]#

Backend specific method to compute negative log-likelihood for a parameter of interest \(\mu\).

Note

Parameters:

poi_test (float, default 1.0) – parameter of interest, \(\mu\).
expected (ExpectationType) –
Sets which values the fitting algorithm should focus and p-values to be computed.
- observed: Computes the p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth (default).
- aposteriori: Computes the expected p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth.
- apriori: Computes the expected p-values with via pre-fit prescription which means that the SM will be assumed to be the truth.
kwargs – keyword arguments for the optimiser.

Raises:

NotImplementedError – If the method is not available for the backend.

Returns:

value of negative log-likelihood at POI of interest and fit parameters (\(\mu\) and \(\theta\)).

Return type:

Tuple[float, np.ndarray]

spey_requires: str = '>=0.2.0,<0.3.0'#: Spey version required for the backend

version: str = '0.2.1'#: Version of the backend

Simplified likelihoods#

Convert pyhf full statistical models into the simplified likelihood framework.

This module implements Simplify, a spey.ConverterBase plug-in that approximates a pyhf HistFactory likelihood by one of the three simplified-likelihood backends shipped with spey:

“default.correlated_background” – multi-bin Poisson likelihood with a multivariate-Gaussian constraint on the combined background nuisance parameters;
“default.third_moment_expansion” – extension that captures the leading skewness of the per-bin background distribution through a quadratic deformation of the expected counts;
“default.effective_sigma” – variable-Gaussian (effective-\(\sigma\)) treatment of asymmetric per-bin background uncertainties.

Mathematical setting#

Following Buckley, Citron, Fichet, Kraml, Waltenberger and Wardle (JHEP 04 (2019) 064, arXiv:1809.05548), an experimental likelihood with \(N\) independent elementary nuisance parameters \(\boldsymbol{\delta}\) over \(P\) observed counts \(\{n_I^{\mathrm{obs}}\}\) (here \(P\) is the number of analysis bins) is approximated, in the regime \(N \geq P\), by

\[\mathcal{L}_S(\boldsymbol{\alpha}, \boldsymbol{\theta}) = \prod_{I=1}^{P} \mathrm{Pois}\!\left(n_I^{\mathrm{obs}} \,\big|\, n_{s,I}(\boldsymbol{\alpha}) + a_I + b_I\,\theta_I + c_I\,\theta_I^2 \right) \cdot \frac{\exp\!\left(-\tfrac{1}{2}\boldsymbol{\theta}^{\mathrm{T}} \boldsymbol{\rho}^{-1}\boldsymbol{\theta}\right)} {\sqrt{(2\pi)^P}} ,\]

where \(\boldsymbol{\alpha}\) parametrise the new-physics signal, \(n_{s,I}(\boldsymbol{\alpha})\) are the per-bin signal yields and \(\boldsymbol{\theta} = (\theta_1, \dots, \theta_P)\) are combined nuisance parameters, one per bin, that summarise the action of the elementary \(\boldsymbol{\delta}\). The latter are unit-variance, centred Gaussians whose joint correlations are encoded in the \(P \times P\) matrix \(\boldsymbol{\rho}\). The coefficients \((a_I, b_I, c_I, \rho_{IJ})\) are obtained by matching the first three central moments of the background expectation at \(\mu = 0\) to those of the full likelihood (Buckley et al. eqs. 2.6–2.8):

\[\begin{split}m_{1,I} &= a_I + c_I , \\ m_{2,IJ} &= b_I\,b_J\,\rho_{IJ} + 2\,c_I\,c_J\,\rho_{IJ}^{\,2} , \\ m_{3,I} &= 6\,b_I^{\,2}\,c_I + 8\,c_I^{\,3} .\end{split}\]

Inverting these relations (eqs. 2.9–2.12) yields

\[\begin{split}c_I &= -\mathrm{sign}(m_{3,I})\,\sqrt{2\,m_{2,II}}\, \cos\!\left[ \frac{4\pi}{3} + \frac{1}{3}\, \arctan\!\sqrt{8\,\frac{m_{2,II}^{\,3}}{m_{3,I}^{\,2}} - 1} \right] , \\ b_I &= \sqrt{m_{2,II} - 2\,c_I^{\,2}} , \\ a_I &= m_{1,I} - c_I , \\ \rho_{IJ} &= \frac{1}{4\,c_I\,c_J}\, \left(\sqrt{(b_I\,b_J)^{2} + 8\,c_I\,c_J\,m_{2,IJ}} - b_I\,b_J\right) ,\end{split}\]

valid in the regime \(8\,m_{2,II}^{\,3} \geq m_{3,I}^{\,2}\). When \(m_{3,I} \rightarrow 0\) the quadratic correction vanishes (\(c_I \rightarrow 0\)) and the expansion collapses to the standard simplified likelihood with \(a_I = m_{1,I}\), \(b_I = \sqrt{m_{2,II}}\) and a multivariate-Gaussian constraint on \(\boldsymbol{\theta}\) whose correlation matrix is \(\rho_{IJ} = m_{2,IJ}/(b_I\,b_J)\). The latter case is implemented by “default.correlated_background”, while the full quadratic form is implemented by “default.third_moment_expansion”.

For asymmetric uncertainties the same combined-parameter strategy is used, but the per-bin expectation \(n^{b}_I + \theta_I\,\sigma_I\) of the standard form is replaced by the variable-Gaussian prescription of Barlow (arXiv:physics/0406120, Sec. 3.6):

\[\sigma^{\mathrm{eff}}_I(\theta_I) = \sqrt{\sigma^{+}_I\,\sigma^{-}_I + (\sigma^{+}_I - \sigma^{-}_I)(\theta_I - n^{b}_I)} ,\]

so that the conditional standard deviation interpolates smoothly between the upper (\(\sigma^{+}\)) and lower (\(\sigma^{-}\)) absolute uncertainties of the bin. This is the form used by “default.effective_sigma”.

Conversion algorithm#

The converter implements the Monte-Carlo moment extraction advocated in Buckley et al. Sec. 4. For a user-supplied full statistical model \(\mathcal{L}^{\mathrm{SR}}\) (which need not be limited to signal regions, but is referred to as such here for brevity), the algorithm proceeds as follows.

Control likelihood. A control likelihood \(\mathcal{L}^{c}\) is built from the background-only pyhf workspace by attaching a zero-yield signal sample to every channel listed in control_region_indices (defaulting to a substring-based guess of CR/VR channels). When include_modifiers_in_control_model is true the signal modifiers — and therefore their associated nuisance parameters — are propagated to the control sample so that signal-induced systematics contribute to the constraint covariance. Channels outside control_region_indices keep their background-only structure. The parameter of interest \(\mu\) is retained so that the workspace has a valid signal+background topology but is fixed to zero in the following step and therefore has no effect on the per-bin yields.
Conditional MLE. \(\mathcal{L}^{c}\) is profiled at \(\mu = 0\) to obtain the conditional best-fit nuisance vector \(\hat{\boldsymbol{\theta}}_0^{c}\). Because every signal yield in \(\mathcal{L}^{c}\) is zero at \(\mu = 0\), this profile coincides with the maximum-likelihood estimate of the background-only fit.
Nuisance covariance. The observed Fisher information

\[V^{-1}_{ij} = -\,\frac{\partial^{2} \log\mathcal{L}^{c}}{\partial\theta_i\,\partial\theta_j} \bigg|_{(\mu,\,\boldsymbol{\theta}) = (0,\,\hat{\boldsymbol{\theta}}_0^{c})}\]

is evaluated from the Hessian provided by pyhf’s jax backend after deleting the row and column associated with \(\mu\). Its inverse \(\mathbf{V}\) is the asymptotic covariance of the nuisance parameters at the conditional MLE.
Sampling. Nuisance draws \(\tilde{\boldsymbol{\theta}} \sim \mathcal{N}(\hat{\boldsymbol{\theta}}_0^{c}, \mathbf{V})\) are taken. If \(\mathcal{L}^{\mathrm{SR}}\) has nuisance parameters that are not present in \(\mathcal{L}^{c}\) (for instance because some signal-only modifiers were excluded from the control model), the missing entries are profiled by maximising \(\mathcal{L}^{\mathrm{SR}}\) at \(\mu = 0\) with the entries shared with \(\mathcal{L}^{c}\) held at \(\tilde{\boldsymbol{\theta}}\) through equality constraints. Each accepted parameter vector is forwarded to the pyhf sampler of \(\mathcal{L}^{\mathrm{SR}}\) to draw one Poisson realisation per bin (include_auxiliary=False). Draws that would require sampling from a Poisson with a non-positive rate are silently rejected and the loop continues until number_of_samples accepted samples are collected.
Moments. With \(\tilde{n}_b\) denoting the matrix of per-bin samples, the simplified-likelihood inputs are

\[\begin{split}m_1 &= \mathbb{E}[\tilde{n}_b] , \\ \Sigma &= \mathrm{cov}(\tilde{n}_b) , \\ m_3 &= \mathbb{E}\!\left[(\tilde{n}_b - m_1)^{\,3}\right] ,\end{split}\]

estimated from the sample mean, sample covariance and sample third moment, respectively. For “default.effective_sigma” the symmetric \((m_1, \Sigma)\) summary is supplemented by the 68% sample quantiles that define the per-bin asymmetric envelope

\[\begin{split}\sigma^{+}_I &= |\,Q_{0.8413}(\tilde{n}_{b,I}) - m_{1,I}\,| , \\ \sigma^{-}_I &= |\,m_{1,I} - Q_{0.1587}(\tilde{n}_{b,I})\,| ,\end{split}\]

where \(Q_p\) denotes the empirical \(p\)-quantile, and \(\Sigma\) is reduced to the correlation matrix \(\rho_{IJ} = \Sigma_{IJ}/\sqrt{\Sigma_{II}\Sigma_{JJ}}\).

The resulting summary statistics are passed to the chosen simplified-likelihood backend, which evaluates the \((a, b, c, \boldsymbol{\rho})\) parameters internally according to the formulae above and returns the simplified statistical model. When save_model is provided, the moments (and the quantile envelopes for “default.effective_sigma”) are persisted to a compressed .npz file together with the channel order inferred from the underlying pyhf configuration, so that the model can be rebuilt without re-running the sampler.

class spey_pyhf.simplify.Simplify[source]#

Convert a pyhf full statistical model into the simplified likelihood framework.

The converter approximates the input full statistical model by one of three spey simplified-likelihood backends: “default.correlated_background”, “default.third_moment_expansion” or “default.effective_sigma”. The methodology — the construction of a control likelihood \(\mathcal{L}^{c}\), the multivariate-Gaussian sampling of its nuisance parameters and the Monte-Carlo extraction of the first few central moments of the per-bin background distribution — is described in detail in the spey_pyhf.simplify module documentation and follows Buckley et al., JHEP 04 (2019) 064 (arXiv:1809.05548). The asymmetric variant follows Barlow, arXiv:physics/0406120.

For details on the target simplified-likelihood backends, see the spey default plug-ins page; a user-level walk-through is also provided in the spey-pyhf online documentation.

Parameters:

statistical_model (StatisticalModel) – constructed full statistical model backed by pyhf with the jax backend enabled. The jax backend is required because the algorithm queries the Hessian of the log-likelihood through automatic differentiation.
fittype (Text, default "postfit") – expectation type used when constructing and profiling the control model. "postfit" maps to spey.ExpectationType.observed (uses the observed auxiliary data) and "prefit" to spey.ExpectationType.apriori (uses the pre-fit auxiliary data).
convert_to (Text, default "default.correlated_background") – target simplified-likelihood backend. Must be one of "default.correlated_background", "default.third_moment_expansion" or "default.effective_sigma".
number_of_samples (int, default 1000) – number of accepted Monte-Carlo samples used to estimate the background moments and, where relevant, the asymmetric quantile envelopes. Samples that would require evaluating a Poisson at a non-positive rate are rejected and do not count toward this total.
control_region_indices (List[int] or List[Text], default None) – indices or names of the control and validation regions in the background-only workspace. These are the channels into which a zero-yield signal sample is injected when constructing \(\mathcal{L}^{c}\). If None, the interface guesses CR/VR channels via the substring heuristic of guess_CRVR(). A ConversionError is raised if no CR/VR channel can be identified.
include_modifiers_in_control_model (bool, default False) – if True, the signal modifiers (and therefore their nuisance parameters) are attached to the zero-yield signal sample injected into the control regions, so that signal-induced systematics contribute to the nuisance covariance \(\mathbf{V}\) of \(\mathcal{L}^{c}\). By default modifiers are excluded.
save_model (Text, default None) –
full path to which the extracted summary statistics are persisted. The data is stored as a compressed NumPy archive (.npz); the suffix is added automatically when missing.

Reading the saved model:

One can read the saved model using NumPy’s load() function
```
>>> import numpy as np
>>> saved_model = np.load("/PATH/TO/DIR/MODELNAME.npz")
>>> data = saved_model["data"]
```
The archive always contains:
- "covariance_matrix": \(P \times P\) covariance matrix \(\Sigma\) between bins, estimated from the accepted samples.
- "background_yields": per-bin sample mean \(m_{1,I}\). This is the simplified-likelihood background expectation, not the original background yield of the full pyhf model.
- "data": per-bin observed counts \(n_I^{\mathrm{obs}}\) read from the underlying pyhf workspace.
- "channel_order": channel-name list inferred from the pyhf configuration. The simplified backend assumes this ordering when interpreting a signal patch.
Additional keys are written depending on convert_to:
- "third_moments": per-bin diagonal third central moment \(m_{3,I}\), written when convert_to == "default.third_moment_expansion".
- "absolute_uncertainty_envelops": per-bin pairs \((\sigma^{+}_I, \sigma^{-}_I)\) extracted from the 68% sample quantiles, written when convert_to == "default.effective_sigma".

Raises:

ConversionError – when convert_to is not one of the supported targets, or when no control region can be identified.
AssertionError – if statistical_model is not a pyhf model, or if its underlying pyhf manager does not have the jax backend enabled.

Example:

As an example, let us use the JSON files provided for the ATLAS-SUSY-2019-08 analysis, which can be found in HEPData. Once these are downloaded one can read them and construct a full statistical model as follows.

>>> import json, spey
>>> with open("1Lbb-likelihoods-hepdata/BkgOnly.json", "r") as f:
>>>         background_only = json.load(f)
>>> with open("1Lbb-likelihoods-hepdata/patchset.json", "r") as f:
>>>     signal = json.load(f)["patches"][0]["patch"]

>>> pdf_wrapper = spey.get_backend("pyhf")
>>> full_statistical_model = pdf_wrapper(
...     background_only_model=background_only, signal_patch=signal
... )
>>> full_statistical_model.backend.manager.backend = "jax"

Note that patchset.json includes more than one patch set, which is why we used only one of them. The last line enables the jax backend in the pyhf interface, which is required to compute the Hessian of the statistical model used by the simplification procedure.

The "pyhf.simplify" converter can then be invoked to map this full likelihood onto a simplified-likelihood model.

>>> converter = spey.get_backend("pyhf.simplify")
>>> simplified_model = converter(
...     statistical_model=full_statistical_model,
...     convert_to="default.correlated_background",
...     control_region_indices=[
...             'WREM_cuts', 'STCREM_cuts', 'TRHMEM_cuts', 'TRMMEM_cuts', 'TRLMEM_cuts'
...         ]
... )
>>> print(simplified_model.backend_type)
>>> # "default.correlated_background"

author: str = 'SpeysideHEP'#: Author of the backend

name: str = 'pyhf.simplify'#: Name of the backend

spey_requires: str = '>=0.2.0,<0.3.0'#: Spey version required for the backend

version: str = '0.2.1'#: Version of the backend

References

A. Buckley, M. Citron, S. Fichet, S. Kraml, W. Waltenberger and N. Wardle, The Simplified Likelihood Framework, JHEP 04 (2019) 064, arXiv:1809.05548. Defines the simplified likelihood, the moment-matching parameters and the Monte-Carlo extraction procedure used here.
R. Barlow, Asymmetric Errors, arXiv:physics/0406120, Sec. 3.6. Source of the variable-Gaussian effective-\(\sigma\) form consumed by “default.effective_sigma”.
E. Schanet, simplify package (eschanet/simplify). A complementary approach that emits a pyhf patch by collapsing the post-fit background into a single sample; its output can be used directly with spey-pyhf without going through this converter.

Interface

Contents

Interface#

Simplified likelihoods#

Mathematical setting#

Conversion algorithm#