Interface#
Bases:
PyhfInterfaceThis backend initiates
pyhf.simplemodels.uncorrelated_background, forming an uncorrelated histogram structure with given inputs.- Parameters:
signal_yields (
List[float]) – signal yieldsbackground_yields (
List[float]) – background yieldsdata (
List[float]) – observationsabsolute_uncertainties (
List[float]) – absolute uncertainties on the background
Compute negative log-likelihood at fixed \(\mu\) for Asimov data.
Note
Interface first calls backend specific methods to compute likelihood. If they are not implemented, it optimizes objective function through
speyinterface. Either prescription to optimizing the likelihood or objective function must be available for a backend to be sucessfully integrated to thespeyinterface.- Parameters:
poi_test (
float, default1.0) – parameter of interest, \(\mu\).expected (ExpectationType) –
Sets which values the fitting algorithm should focus and p-values to be computed.
observed: Computes the p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth (default).aposteriori: Computes the expected p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth.apriori: Computes the expected p-values with via pre-fit prescription which means that the SM will be assumed to be the truth.
test_statistics (
Text, default"qtilde") –test statistics.
'qtilde': (default) performs the calculation using the alternative test statistic, \(\tilde{q}_{\mu}\), see eq. (62) of [arXiv:1007.1727] (qmu_tilde()).Warning
Note that this assumes that \(\hat\mu\geq0\), hence
allow_negative_signalassumed to beFalse. If this function has been executed by user,speyassumes that this is taken care of throughout the external code consistently. Whilst computing p-values or upper limit on \(\mu\) throughspeythis is taken care of automatically in the backend.'q': performs the calculation using the test statistic \(q_{\mu}\), see eq. (54) of [arXiv:1007.1727] (qmu()).'q0': performs the calculation using the discovery test statistic, see eq. (47) of [arXiv:1007.1727] \(q_{0}\) (q0()).
kwargs – keyword arguments for the optimiser.
- Raises:
NotImplementedError – If the method is not available for the backend.
- Returns:
value of negative log-likelihood at POI of interest and fit parameters (\(\mu\) and \(\theta\)).
- Return type:
Tuple[float, np.ndarray]
Author of the backend
A routine to combine to statistical models.
Note
This function is only available if the backend has a specific routine for combination between same or other backends.
- Parameters:
other (
BackendBase) – Statistical model object to be combined.- Raises:
NotImplementedError – If the backend does not have a combination scheme.
- Returns:
Create a new statistical model from combination of this and other one.
- Return type:
BackendBase
Model configuration.
- Parameters:
allow_negative_signal (
bool, defaultTrue) – IfTrue\(\hat\mu\) value will be allowed to be negative.poi_upper_bound (
float, default40.0) – upper bound for parameter of interest, \(\mu\).
- Returns:
Model configuration. Information regarding the position of POI in parameter list, suggested input and bounds.
- Return type:
ModelConfig
Citable DOI for the backend
Compute the expected value of the statistical model
- Parameters:
pars (
List[float]) – nuisance parameters, \(\theta\) and parameter of interest, \(\mu\).- Returns:
Expected data of the statistical model
- Return type:
List[float]
Currently Hessian of \(\log\mathcal{L}(\mu, \theta)\) is only used to compute variance on \(\mu\). This method returns a callable function which takes fit parameters (\(\mu\) and \(\theta\)) and returns Hessian.
- Parameters:
expected (ExpectationType) –
Sets which values the fitting algorithm should focus and p-values to be computed.
observed: Computes the p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth (default).aposteriori: Computes the expected p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth.apriori: Computes the expected p-values with via pre-fit prescription which means that the SM will be assumed to be the truth.
data (
Union[List[float], np.ndarray], defaultNone) – input data that to fit
- Returns:
Function that takes fit parameters (\(\mu\) and \(\theta\)) and returns Hessian of \(\log\mathcal{L}(\mu, \theta)\).
- Return type:
Callable[[np.ndarray], float]
Generate function to compute \(\log\mathcal{L}(\mu, \theta)\) where \(\mu\) is the parameter of interest and \(\theta\) are nuisance parameters.
- Parameters:
expected (ExpectationType) –
Sets which values the fitting algorithm should focus and p-values to be computed.
observed: Computes the p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth (default).aposteriori: Computes the expected p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth.apriori: Computes the expected p-values with via pre-fit prescription which means that the SM will be assumed to be the truth.
data (
Union[List[float], np.ndarray], defaultNone) – input data that to fit
- Returns:
Function that takes fit parameters (\(\mu\) and \(\theta\)) and computes \(\log\mathcal{L}(\mu, \theta)\).
- Return type:
Callable[[np.ndarray], float]
Objective function is the function to perform the optimisation on. This function is expected to be twice negative log-likelihood, \(-2\log\mathcal{L}(\mu, \theta)\). Additionally, if available it canbe bundled with the gradient of twice negative log-likelihood.
- Parameters:
expected (ExpectationType) –
Sets which values the fitting algorithm should focus and p-values to be computed.
observed: Computes the p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth (default).aposteriori: Computes the expected p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth.apriori: Computes the expected p-values with via pre-fit prescription which means that the SM will be assumed to be the truth.
data (
Union[List[float], np.ndarray], defaultNone) – input data that to fitdo_grad (
bool, defaultTrue) – IfTruereturn objective and its gradient astuple(subject to availablility) ifFalseonly returns objective function.
- Returns:
Function which takes fit parameters (\(\mu\) and \(\theta\)) and returns either objective or objective and its gradient.
- Return type:
Callable[[np.ndarray], Union[float, Tuple[float, np.ndarray]]]
Retreives the function to sample from.
- Parameters:
pars (
np.ndarray) – fit parameters (\(\mu\) and \(\theta\))- Returns:
Function that takes
number_of_samplesas input and draws as many samples from the statistical model.- Return type:
Callable[[int], np.ndarray]
Returns True if at least one bin has non-zero signal yield.
pyhf Manager to handle the interface with pyhf
A backend specific method to minimize negative log-likelihood for Asimov data.
Note
Interface first calls backend specific methods to compute likelihood. If they are not implemented, it optimizes objective function through
speyinterface. Either prescription to optimizing the likelihood or objective function must be available for a backend to be sucessfully integrated to thespeyinterface.- Parameters:
expected (ExpectationType) –
Sets which values the fitting algorithm should focus and p-values to be computed.
observed: Computes the p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth (default).aposteriori: Computes the expected p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth.apriori: Computes the expected p-values with via pre-fit prescription which means that the SM will be assumed to be the truth.
test_statistics (
Text, default"qtilde") –test statistics.
'qtilde': (default) performs the calculation using the alternative test statistic, \(\tilde{q}_{\mu}\), see eq. (62) of [arXiv:1007.1727] (qmu_tilde()).Warning
Note that this assumes that \(\hat\mu\geq0\), hence
allow_negative_signalassumed to beFalse. If this function has been executed by user,speyassumes that this is taken care of throughout the external code consistently. Whilst computing p-values or upper limit on \(\mu\) throughspeythis is taken care of automatically in the backend.'q': performs the calculation using the test statistic \(q_{\mu}\), see eq. (54) of [arXiv:1007.1727] (qmu()).'q0': performs the calculation using the discovery test statistic, see eq. (47) of [arXiv:1007.1727] \(q_{0}\) (q0()).
kwargs – keyword arguments for the optimiser.
- Raises:
NotImplementedError – If the method is not available for the backend.
- Returns:
value of negative log-likelihood and fit parameters (\(\mu\) and \(\theta\)).
- Return type:
Tuple[float, np.ndarray]
A backend specific method to minimize negative log-likelihood.
Note
Interface first calls backend specific methods to compute likelihood. If they are not implemented, it optimizes objective function through
speyinterface. Either prescription to optimizing the likelihood or objective function must be available for a backend to be sucessfully integrated to thespeyinterface.- Parameters:
expected (ExpectationType) –
Sets which values the fitting algorithm should focus and p-values to be computed.
observed: Computes the p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth (default).aposteriori: Computes the expected p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth.apriori: Computes the expected p-values with via pre-fit prescription which means that the SM will be assumed to be the truth.
allow_negative_signal (
bool, defaultTrue) – IfTrue\(\hat\mu\) value will be allowed to be negative.kwargs – keyword arguments for the optimiser.
- Raises:
NotImplementedError – If the method is not available for the backend.
- Returns:
value of negative log-likelihood and fit parameters (\(\mu\) and \(\theta\)).
- Return type:
Tuple[float, np.ndarray]
Retreive statistical model container
Name of the backend
Backend specific method to compute negative log-likelihood for a parameter of interest \(\mu\).
Note
Interface first calls backend specific methods to compute likelihood. If they are not implemented, it optimizes objective function through
speyinterface. Either prescription to optimizing the likelihood or objective function must be available for a backend to be sucessfully integrated to thespeyinterface.- Parameters:
poi_test (
float, default1.0) – parameter of interest, \(\mu\).expected (ExpectationType) –
Sets which values the fitting algorithm should focus and p-values to be computed.
observed: Computes the p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth (default).aposteriori: Computes the expected p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth.apriori: Computes the expected p-values with via pre-fit prescription which means that the SM will be assumed to be the truth.
kwargs – keyword arguments for the optimiser.
- Raises:
NotImplementedError – If the method is not available for the backend.
- Returns:
value of negative log-likelihood at POI of interest and fit parameters (\(\mu\) and \(\theta\)).
- Return type:
Tuple[float, np.ndarray]
Spey version required for the backend
Version of the backend
- class spey_pyhf.interface.FullStatisticalModel(signal_patch: Dict, background_only_model: str | Dict)[source]#
Bases:
PyhfInterfacepyhf Interface. For details on input structure please see this link
- Parameters:
signal_patch (
List[Dict]) –Patch data for signal model. please see this link for details on the structure of the input.
background_only_model (
DictorText) – This input expects background only data that describes the full statistical model for the background. It also acceptsstrinput which indicates the full path to the background onlyJSONfile.
Example:
1>>> import spey 2 3>>> background_only = { 4... "channels": [ 5... { 6... "name": "singlechannel", 7... "samples": [ 8... { 9... "name": "background", 10... "data": [50.0, 52.0], 11... "modifiers": [ 12... { 13... "name": "uncorr_bkguncrt", 14... "type": "shapesys", 15... "data": [3.0, 7.0], 16... } 17... ], 18... } 19... ], 20... } 21... ], 22... "observations": [{"name": "singlechannel", "data": [51.0, 48.0]}], 23... "measurements": [{"name": "Measurement", "config": {"poi": "mu", "parameters": []}}], 24... "version": "1.0.0", 25... } 26>>> signal = [ 27... { 28... "op": "add", 29... "path": "/channels/0/samples/1", 30... "value": { 31... "name": "signal", 32... "data": [12.0, 11.0], 33... "modifiers": [{"name": "mu", "type": "normfactor", "data": None}], 34... }, 35... } 36... ] 37>>> stat_wrapper = spey.get_backend("pyhf") 38>>> statistical_model = stat_wrapper( 39... analysis="simple_pyhf", 40... background_only_model=background_only, 41... signal_patch=signal, 42... ) 43>>> statistical_model.exclusion_confidence_level() # [0.9474850259721279]
- asimov_negative_loglikelihood(poi_test: float = 1.0, expected: ExpectationType = observed, test_statistics: str = 'qtilde', **kwargs) Tuple[float, ndarray]#
Compute negative log-likelihood at fixed \(\mu\) for Asimov data.
Note
Interface first calls backend specific methods to compute likelihood. If they are not implemented, it optimizes objective function through
speyinterface. Either prescription to optimizing the likelihood or objective function must be available for a backend to be sucessfully integrated to thespeyinterface.- Parameters:
poi_test (
float, default1.0) – parameter of interest, \(\mu\).expected (ExpectationType) –
Sets which values the fitting algorithm should focus and p-values to be computed.
observed: Computes the p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth (default).aposteriori: Computes the expected p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth.apriori: Computes the expected p-values with via pre-fit prescription which means that the SM will be assumed to be the truth.
test_statistics (
Text, default"qtilde") –test statistics.
'qtilde': (default) performs the calculation using the alternative test statistic, \(\tilde{q}_{\mu}\), see eq. (62) of [arXiv:1007.1727] (qmu_tilde()).Warning
Note that this assumes that \(\hat\mu\geq0\), hence
allow_negative_signalassumed to beFalse. If this function has been executed by user,speyassumes that this is taken care of throughout the external code consistently. Whilst computing p-values or upper limit on \(\mu\) throughspeythis is taken care of automatically in the backend.'q': performs the calculation using the test statistic \(q_{\mu}\), see eq. (54) of [arXiv:1007.1727] (qmu()).'q0': performs the calculation using the discovery test statistic, see eq. (47) of [arXiv:1007.1727] \(q_{0}\) (q0()).
kwargs – keyword arguments for the optimiser.
- Raises:
NotImplementedError – If the method is not available for the backend.
- Returns:
value of negative log-likelihood at POI of interest and fit parameters (\(\mu\) and \(\theta\)).
- Return type:
Tuple[float, np.ndarray]
- author: str = 'SpeysideHEP'#
Author of the backend
- combine(other, **kwargs)[source]#
Combine full statistical models generated by pyhf interface
- Parameters:
other (
FullStatisticalModel) – other statistical model to be combined with this modelkwargs –
pyhf specific inputs:
join (
str, defaultNone): How to join the two workspaces. Pick from"none","outer","left outer"or “right outer”.merge_channels (
bool): Whether or not to merge channels when performing the combine. This is only done with"outer","left outer", and"right outer"options.
non-pyhf specific inputs:
update_measurements (
bool, defaultTrue): In case the measurement name of two statistical models are the same, other statistical model’s measurement name will be updated. If set toFalsemeasurements will remain as is.
Note
This model is
"left"and other model is considered to be"right".
- Raises:
CombinationError – Raised if its not possible to combine statistical models.
- Returns:
Combined statistical model.
- Return type:
FullStatisticalModel
- config(allow_negative_signal: bool = True, poi_upper_bound: float = 10.0) ModelConfig#
Model configuration.
- Parameters:
allow_negative_signal (
bool, defaultTrue) – IfTrue\(\hat\mu\) value will be allowed to be negative.poi_upper_bound (
float, default40.0) – upper bound for parameter of interest, \(\mu\).
- Returns:
Model configuration. Information regarding the position of POI in parameter list, suggested input and bounds.
- Return type:
ModelConfig
- doi: List[str] = ['10.5281/zenodo.1169739', '10.21105/joss.02823']#
Citable DOI for the backend
- expected_data(pars: List[float]) List[float]#
Compute the expected value of the statistical model
- Parameters:
pars (
List[float]) – nuisance parameters, \(\theta\) and parameter of interest, \(\mu\).- Returns:
Expected data of the statistical model
- Return type:
List[float]
- get_hessian_logpdf_func(expected: ExpectationType = observed, data: List[float] | ndarray | None = None) Callable[[ndarray], float]#
Currently Hessian of \(\log\mathcal{L}(\mu, \theta)\) is only used to compute variance on \(\mu\). This method returns a callable function which takes fit parameters (\(\mu\) and \(\theta\)) and returns Hessian.
- Parameters:
expected (ExpectationType) –
Sets which values the fitting algorithm should focus and p-values to be computed.
observed: Computes the p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth (default).aposteriori: Computes the expected p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth.apriori: Computes the expected p-values with via pre-fit prescription which means that the SM will be assumed to be the truth.
data (
Union[List[float], np.ndarray], defaultNone) – input data that to fit
- Returns:
Function that takes fit parameters (\(\mu\) and \(\theta\)) and returns Hessian of \(\log\mathcal{L}(\mu, \theta)\).
- Return type:
Callable[[np.ndarray], float]
- get_logpdf_func(expected: ExpectationType = observed, data: List[float] | ndarray | None = None) Callable[[ndarray], float]#
Generate function to compute \(\log\mathcal{L}(\mu, \theta)\) where \(\mu\) is the parameter of interest and \(\theta\) are nuisance parameters.
- Parameters:
expected (ExpectationType) –
Sets which values the fitting algorithm should focus and p-values to be computed.
observed: Computes the p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth (default).aposteriori: Computes the expected p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth.apriori: Computes the expected p-values with via pre-fit prescription which means that the SM will be assumed to be the truth.
data (
Union[List[float], np.ndarray], defaultNone) – input data that to fit
- Returns:
Function that takes fit parameters (\(\mu\) and \(\theta\)) and computes \(\log\mathcal{L}(\mu, \theta)\).
- Return type:
Callable[[np.ndarray], float]
- get_objective_function(expected: ExpectationType = observed, data: List[float] | ndarray | None = None, do_grad: bool = True) Callable[[ndarray], float | Tuple[float, ndarray]]#
Objective function is the function to perform the optimisation on. This function is expected to be twice negative log-likelihood, \(-2\log\mathcal{L}(\mu, \theta)\). Additionally, if available it canbe bundled with the gradient of twice negative log-likelihood.
- Parameters:
expected (ExpectationType) –
Sets which values the fitting algorithm should focus and p-values to be computed.
observed: Computes the p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth (default).aposteriori: Computes the expected p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth.apriori: Computes the expected p-values with via pre-fit prescription which means that the SM will be assumed to be the truth.
data (
Union[List[float], np.ndarray], defaultNone) – input data that to fitdo_grad (
bool, defaultTrue) – IfTruereturn objective and its gradient astuple(subject to availablility) ifFalseonly returns objective function.
- Returns:
Function which takes fit parameters (\(\mu\) and \(\theta\)) and returns either objective or objective and its gradient.
- Return type:
Callable[[np.ndarray], Union[float, Tuple[float, np.ndarray]]]
- get_sampler(pars: ndarray) Callable[[int], ndarray]#
Retreives the function to sample from.
- Parameters:
pars (
np.ndarray) – fit parameters (\(\mu\) and \(\theta\))- Returns:
Function that takes
number_of_samplesas input and draws as many samples from the statistical model.- Return type:
Callable[[int], np.ndarray]
- property is_alive: bool#
Returns True if at least one bin has non-zero signal yield.
- manager#
pyhf Manager to handle the interface with pyhf
- minimize_asimov_negative_loglikelihood(expected: ExpectationType = observed, test_statistics: str = 'qtilde', **kwargs) Tuple[float, ndarray]#
A backend specific method to minimize negative log-likelihood for Asimov data.
Note
Interface first calls backend specific methods to compute likelihood. If they are not implemented, it optimizes objective function through
speyinterface. Either prescription to optimizing the likelihood or objective function must be available for a backend to be sucessfully integrated to thespeyinterface.- Parameters:
expected (ExpectationType) –
Sets which values the fitting algorithm should focus and p-values to be computed.
observed: Computes the p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth (default).aposteriori: Computes the expected p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth.apriori: Computes the expected p-values with via pre-fit prescription which means that the SM will be assumed to be the truth.
test_statistics (
Text, default"qtilde") –test statistics.
'qtilde': (default) performs the calculation using the alternative test statistic, \(\tilde{q}_{\mu}\), see eq. (62) of [arXiv:1007.1727] (qmu_tilde()).Warning
Note that this assumes that \(\hat\mu\geq0\), hence
allow_negative_signalassumed to beFalse. If this function has been executed by user,speyassumes that this is taken care of throughout the external code consistently. Whilst computing p-values or upper limit on \(\mu\) throughspeythis is taken care of automatically in the backend.'q': performs the calculation using the test statistic \(q_{\mu}\), see eq. (54) of [arXiv:1007.1727] (qmu()).'q0': performs the calculation using the discovery test statistic, see eq. (47) of [arXiv:1007.1727] \(q_{0}\) (q0()).
kwargs – keyword arguments for the optimiser.
- Raises:
NotImplementedError – If the method is not available for the backend.
- Returns:
value of negative log-likelihood and fit parameters (\(\mu\) and \(\theta\)).
- Return type:
Tuple[float, np.ndarray]
- minimize_negative_loglikelihood(expected: ExpectationType = observed, allow_negative_signal: bool = True, **kwargs) Tuple[float, ndarray]#
A backend specific method to minimize negative log-likelihood.
Note
Interface first calls backend specific methods to compute likelihood. If they are not implemented, it optimizes objective function through
speyinterface. Either prescription to optimizing the likelihood or objective function must be available for a backend to be sucessfully integrated to thespeyinterface.- Parameters:
expected (ExpectationType) –
Sets which values the fitting algorithm should focus and p-values to be computed.
observed: Computes the p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth (default).aposteriori: Computes the expected p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth.apriori: Computes the expected p-values with via pre-fit prescription which means that the SM will be assumed to be the truth.
allow_negative_signal (
bool, defaultTrue) – IfTrue\(\hat\mu\) value will be allowed to be negative.kwargs – keyword arguments for the optimiser.
- Raises:
NotImplementedError – If the method is not available for the backend.
- Returns:
value of negative log-likelihood and fit parameters (\(\mu\) and \(\theta\)).
- Return type:
Tuple[float, np.ndarray]
- name: str = 'pyhf'#
Name of the backend
- negative_loglikelihood(poi_test: float = 1.0, expected: ExpectationType = observed, **kwargs) Tuple[float, ndarray]#
Backend specific method to compute negative log-likelihood for a parameter of interest \(\mu\).
Note
Interface first calls backend specific methods to compute likelihood. If they are not implemented, it optimizes objective function through
speyinterface. Either prescription to optimizing the likelihood or objective function must be available for a backend to be sucessfully integrated to thespeyinterface.- Parameters:
poi_test (
float, default1.0) – parameter of interest, \(\mu\).expected (ExpectationType) –
Sets which values the fitting algorithm should focus and p-values to be computed.
observed: Computes the p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth (default).aposteriori: Computes the expected p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth.apriori: Computes the expected p-values with via pre-fit prescription which means that the SM will be assumed to be the truth.
kwargs – keyword arguments for the optimiser.
- Raises:
NotImplementedError – If the method is not available for the backend.
- Returns:
value of negative log-likelihood at POI of interest and fit parameters (\(\mu\) and \(\theta\)).
- Return type:
Tuple[float, np.ndarray]
- spey_requires: str = '>=0.2.0,<0.3.0'#
Spey version required for the backend
- version: str = '0.2.1'#
Version of the backend
Simplified likelihoods#
Convert pyhf full statistical models into the simplified likelihood framework.
This module implements Simplify, a spey.ConverterBase
plug-in that approximates a pyhf HistFactory likelihood by one
of the three simplified-likelihood backends shipped with spey:
“default.correlated_background” – multi-bin Poisson likelihood with a multivariate-Gaussian constraint on the combined background nuisance parameters;
“default.third_moment_expansion” – extension that captures the leading skewness of the per-bin background distribution through a quadratic deformation of the expected counts;
“default.effective_sigma” – variable-Gaussian (effective-\(\sigma\)) treatment of asymmetric per-bin background uncertainties.
Mathematical setting#
Following Buckley, Citron, Fichet, Kraml, Waltenberger and Wardle (JHEP 04 (2019) 064, arXiv:1809.05548), an experimental likelihood with \(N\) independent elementary nuisance parameters \(\boldsymbol{\delta}\) over \(P\) observed counts \(\{n_I^{\mathrm{obs}}\}\) (here \(P\) is the number of analysis bins) is approximated, in the regime \(N \geq P\), by
where \(\boldsymbol{\alpha}\) parametrise the new-physics signal, \(n_{s,I}(\boldsymbol{\alpha})\) are the per-bin signal yields and \(\boldsymbol{\theta} = (\theta_1, \dots, \theta_P)\) are combined nuisance parameters, one per bin, that summarise the action of the elementary \(\boldsymbol{\delta}\). The latter are unit-variance, centred Gaussians whose joint correlations are encoded in the \(P \times P\) matrix \(\boldsymbol{\rho}\). The coefficients \((a_I, b_I, c_I, \rho_{IJ})\) are obtained by matching the first three central moments of the background expectation at \(\mu = 0\) to those of the full likelihood (Buckley et al. eqs. 2.6–2.8):
Inverting these relations (eqs. 2.9–2.12) yields
valid in the regime \(8\,m_{2,II}^{\,3} \geq m_{3,I}^{\,2}\). When \(m_{3,I} \rightarrow 0\) the quadratic correction vanishes (\(c_I \rightarrow 0\)) and the expansion collapses to the standard simplified likelihood with \(a_I = m_{1,I}\), \(b_I = \sqrt{m_{2,II}}\) and a multivariate-Gaussian constraint on \(\boldsymbol{\theta}\) whose correlation matrix is \(\rho_{IJ} = m_{2,IJ}/(b_I\,b_J)\). The latter case is implemented by “default.correlated_background”, while the full quadratic form is implemented by “default.third_moment_expansion”.
For asymmetric uncertainties the same combined-parameter strategy is used, but the per-bin expectation \(n^{b}_I + \theta_I\,\sigma_I\) of the standard form is replaced by the variable-Gaussian prescription of Barlow (arXiv:physics/0406120, Sec. 3.6):
so that the conditional standard deviation interpolates smoothly between the upper (\(\sigma^{+}\)) and lower (\(\sigma^{-}\)) absolute uncertainties of the bin. This is the form used by “default.effective_sigma”.
Conversion algorithm#
The converter implements the Monte-Carlo moment extraction advocated in Buckley et al. Sec. 4. For a user-supplied full statistical model \(\mathcal{L}^{\mathrm{SR}}\) (which need not be limited to signal regions, but is referred to as such here for brevity), the algorithm proceeds as follows.
Control likelihood. A control likelihood \(\mathcal{L}^{c}\) is built from the background-only
pyhfworkspace by attaching a zero-yield signal sample to every channel listed incontrol_region_indices(defaulting to a substring-based guess ofCR/VRchannels). Wheninclude_modifiers_in_control_modelis true the signal modifiers — and therefore their associated nuisance parameters — are propagated to the control sample so that signal-induced systematics contribute to the constraint covariance. Channels outsidecontrol_region_indiceskeep their background-only structure. The parameter of interest \(\mu\) is retained so that the workspace has a valid signal+background topology but is fixed to zero in the following step and therefore has no effect on the per-bin yields.Conditional MLE. \(\mathcal{L}^{c}\) is profiled at \(\mu = 0\) to obtain the conditional best-fit nuisance vector \(\hat{\boldsymbol{\theta}}_0^{c}\). Because every signal yield in \(\mathcal{L}^{c}\) is zero at \(\mu = 0\), this profile coincides with the maximum-likelihood estimate of the background-only fit.
Nuisance covariance. The observed Fisher information
\[V^{-1}_{ij} = -\,\frac{\partial^{2} \log\mathcal{L}^{c}}{\partial\theta_i\,\partial\theta_j} \bigg|_{(\mu,\,\boldsymbol{\theta}) = (0,\,\hat{\boldsymbol{\theta}}_0^{c})}\]is evaluated from the Hessian provided by
pyhf’sjaxbackend after deleting the row and column associated with \(\mu\). Its inverse \(\mathbf{V}\) is the asymptotic covariance of the nuisance parameters at the conditional MLE.Sampling. Nuisance draws \(\tilde{\boldsymbol{\theta}} \sim \mathcal{N}(\hat{\boldsymbol{\theta}}_0^{c}, \mathbf{V})\) are taken. If \(\mathcal{L}^{\mathrm{SR}}\) has nuisance parameters that are not present in \(\mathcal{L}^{c}\) (for instance because some signal-only modifiers were excluded from the control model), the missing entries are profiled by maximising \(\mathcal{L}^{\mathrm{SR}}\) at \(\mu = 0\) with the entries shared with \(\mathcal{L}^{c}\) held at \(\tilde{\boldsymbol{\theta}}\) through equality constraints. Each accepted parameter vector is forwarded to the
pyhfsampler of \(\mathcal{L}^{\mathrm{SR}}\) to draw one Poisson realisation per bin (include_auxiliary=False). Draws that would require sampling from a Poisson with a non-positive rate are silently rejected and the loop continues untilnumber_of_samplesaccepted samples are collected.Moments. With \(\tilde{n}_b\) denoting the matrix of per-bin samples, the simplified-likelihood inputs are
\[\begin{split}m_1 &= \mathbb{E}[\tilde{n}_b] , \\ \Sigma &= \mathrm{cov}(\tilde{n}_b) , \\ m_3 &= \mathbb{E}\!\left[(\tilde{n}_b - m_1)^{\,3}\right] ,\end{split}\]estimated from the sample mean, sample covariance and sample third moment, respectively. For “default.effective_sigma” the symmetric \((m_1, \Sigma)\) summary is supplemented by the 68% sample quantiles that define the per-bin asymmetric envelope
\[\begin{split}\sigma^{+}_I &= |\,Q_{0.8413}(\tilde{n}_{b,I}) - m_{1,I}\,| , \\ \sigma^{-}_I &= |\,m_{1,I} - Q_{0.1587}(\tilde{n}_{b,I})\,| ,\end{split}\]where \(Q_p\) denotes the empirical \(p\)-quantile, and \(\Sigma\) is reduced to the correlation matrix \(\rho_{IJ} = \Sigma_{IJ}/\sqrt{\Sigma_{II}\Sigma_{JJ}}\).
The resulting summary statistics are passed to the chosen
simplified-likelihood backend, which evaluates the
\((a, b, c, \boldsymbol{\rho})\) parameters internally according
to the formulae above and returns the simplified statistical model.
When save_model is provided, the moments (and the quantile envelopes
for “default.effective_sigma”) are persisted to
a compressed .npz file together with the channel order inferred from
the underlying pyhf configuration, so that the model can be
rebuilt without re-running the sampler.
- class spey_pyhf.simplify.Simplify[source]#
Convert a
pyhffull statistical model into the simplified likelihood framework.The converter approximates the input full statistical model by one of three
speysimplified-likelihood backends: “default.correlated_background”, “default.third_moment_expansion” or “default.effective_sigma”. The methodology — the construction of a control likelihood \(\mathcal{L}^{c}\), the multivariate-Gaussian sampling of its nuisance parameters and the Monte-Carlo extraction of the first few central moments of the per-bin background distribution — is described in detail in thespey_pyhf.simplifymodule documentation and follows Buckley et al., JHEP 04 (2019) 064 (arXiv:1809.05548). The asymmetric variant follows Barlow, arXiv:physics/0406120.For details on the target simplified-likelihood backends, see the spey default plug-ins page; a user-level walk-through is also provided in the spey-pyhf online documentation.
- Parameters:
statistical_model (
StatisticalModel) – constructed full statistical model backed bypyhfwith thejaxbackend enabled. Thejaxbackend is required because the algorithm queries the Hessian of the log-likelihood through automatic differentiation.fittype (
Text, default"postfit") – expectation type used when constructing and profiling the control model."postfit"maps tospey.ExpectationType.observed(uses the observed auxiliary data) and"prefit"tospey.ExpectationType.apriori(uses the pre-fit auxiliary data).convert_to (
Text, default"default.correlated_background") – target simplified-likelihood backend. Must be one of"default.correlated_background","default.third_moment_expansion"or"default.effective_sigma".number_of_samples (
int, default1000) – number of accepted Monte-Carlo samples used to estimate the background moments and, where relevant, the asymmetric quantile envelopes. Samples that would require evaluating a Poisson at a non-positive rate are rejected and do not count toward this total.control_region_indices (
List[int]orList[Text], defaultNone) – indices or names of the control and validation regions in the background-only workspace. These are the channels into which a zero-yield signal sample is injected when constructing \(\mathcal{L}^{c}\). IfNone, the interface guesses CR/VR channels via the substring heuristic ofguess_CRVR(). AConversionErroris raised if no CR/VR channel can be identified.include_modifiers_in_control_model (
bool, defaultFalse) – ifTrue, the signal modifiers (and therefore their nuisance parameters) are attached to the zero-yield signal sample injected into the control regions, so that signal-induced systematics contribute to the nuisance covariance \(\mathbf{V}\) of \(\mathcal{L}^{c}\). By default modifiers are excluded.save_model (
Text, defaultNone) –full path to which the extracted summary statistics are persisted. The data is stored as a compressed NumPy archive (
.npz); the suffix is added automatically when missing.Reading the saved model:
One can read the saved model using NumPy’s
load()function>>> import numpy as np >>> saved_model = np.load("/PATH/TO/DIR/MODELNAME.npz") >>> data = saved_model["data"]
The archive always contains:
"covariance_matrix": \(P \times P\) covariance matrix \(\Sigma\) between bins, estimated from the accepted samples."background_yields": per-bin sample mean \(m_{1,I}\). This is the simplified-likelihood background expectation, not the original background yield of the fullpyhfmodel."data": per-bin observed counts \(n_I^{\mathrm{obs}}\) read from the underlyingpyhfworkspace."channel_order": channel-name list inferred from thepyhfconfiguration. The simplified backend assumes this ordering when interpreting a signal patch.
Additional keys are written depending on
convert_to:"third_moments": per-bin diagonal third central moment \(m_{3,I}\), written whenconvert_to == "default.third_moment_expansion"."absolute_uncertainty_envelops": per-bin pairs \((\sigma^{+}_I, \sigma^{-}_I)\) extracted from the 68% sample quantiles, written whenconvert_to == "default.effective_sigma".
- Raises:
ConversionError – when
convert_tois not one of the supported targets, or when no control region can be identified.AssertionError – if
statistical_modelis not apyhfmodel, or if its underlyingpyhfmanager does not have thejaxbackend enabled.
Example:
As an example, let us use the JSON files provided for the ATLAS-SUSY-2019-08 analysis, which can be found in HEPData. Once these are downloaded one can read them and construct a full statistical model as follows.
>>> import json, spey >>> with open("1Lbb-likelihoods-hepdata/BkgOnly.json", "r") as f: >>> background_only = json.load(f) >>> with open("1Lbb-likelihoods-hepdata/patchset.json", "r") as f: >>> signal = json.load(f)["patches"][0]["patch"] >>> pdf_wrapper = spey.get_backend("pyhf") >>> full_statistical_model = pdf_wrapper( ... background_only_model=background_only, signal_patch=signal ... ) >>> full_statistical_model.backend.manager.backend = "jax"
Note that
patchset.jsonincludes more than one patch set, which is why we used only one of them. The last line enables thejaxbackend in thepyhfinterface, which is required to compute the Hessian of the statistical model used by the simplification procedure.The
"pyhf.simplify"converter can then be invoked to map this full likelihood onto a simplified-likelihood model.>>> converter = spey.get_backend("pyhf.simplify") >>> simplified_model = converter( ... statistical_model=full_statistical_model, ... convert_to="default.correlated_background", ... control_region_indices=[ ... 'WREM_cuts', 'STCREM_cuts', 'TRHMEM_cuts', 'TRMMEM_cuts', 'TRLMEM_cuts' ... ] ... ) >>> print(simplified_model.backend_type) >>> # "default.correlated_background"
- author: str = 'SpeysideHEP'#
Author of the backend
- name: str = 'pyhf.simplify'#
Name of the backend
- spey_requires: str = '>=0.2.0,<0.3.0'#
Spey version required for the backend
- version: str = '0.2.1'#
Version of the backend
References
A. Buckley, M. Citron, S. Fichet, S. Kraml, W. Waltenberger and N. Wardle, The Simplified Likelihood Framework, JHEP 04 (2019) 064, arXiv:1809.05548. Defines the simplified likelihood, the moment-matching parameters and the Monte-Carlo extraction procedure used here.
R. Barlow, Asymmetric Errors, arXiv:physics/0406120, Sec. 3.6. Source of the variable-Gaussian effective-\(\sigma\) form consumed by “default.effective_sigma”.
E. Schanet,
simplifypackage (eschanet/simplify). A complementary approach that emits apyhfpatch by collapsing the post-fit background into a single sample; its output can be used directly withspey-pyhfwithout going through this converter.