Building a plugin#

spey package has been designed to be expandable. It only needs to know certain aspects of the data structure that is presented and a prescription to form a likelihood function.

What a plugin provides#

A quick intro on the terminology of spey plugins in this section:

  • A plugin is an external Python package that provides additional statistical model prescriptions to spey.

  • Each plugin may provide one (or more) statistical model prescriptions accessible directly through Spey.

  • Depending on the scope of the plugin, you may wish to provide additional (custom) operations and differentiability through various autodif packages such as autograd or jax. As long as they are implemented through predefined function names, Spey can automatically detect and use them within the interface.

Creating your Statistical Model Prescription#

The first step in creating your Spey plugin is to create your statistical model interface. This is as simple as importing abstract base class BackendBase from spey and inheriting it. The most basic implementation of a statistical model can be found below;

 1>>> import spey
 2
 3>>> class MyStatisticalModel(spey.BackendBase):
 4>>>     name = "my_stat_model"
 5>>>     version = "1.0.0"
 6>>>     author = "John Smith <john.smith@smith.com>"
 7>>>     spey_requires = ">=0.1.0,<0.2.0"
 8
 9>>>     def __init__(self, ...)
10>>>         ...
11
12>>>     @property
13>>>     def is_alive(self):
14>>>         ...
15
16>>>     def config(
17...         self, allow_negative_signal: bool = True, poi_upper_bound: float = 10.0
18...     ):
19>>>         ...
20
21>>>     def get_logpdf_func(
22...         self, expected = spey.ExpectationType.observed, data = None
23...     ):
24>>>         ...
25
26>>>     def expected_data(self, pars):
27>>>         ...

BackendBase requires certain functionality from the statistical model to be implemented, but let us first go through the above class structure. Spey looks for specific metadata to track the implementation’s version, author and name. Additionally, it checks compatibility with the current Spey version to ensure that the plugin works as it should.

Note

The list of metadata that Spey is looking for:

  • name (str): Name of the plugin.

  • version (str): Version of the plugin.

  • author (str): Author of the plugin.

  • spey_requires (str): The minimum spey version that the plugin needs, e.g. spey_requires="0.0.1" or spey_requires=">=0.3.3".

  • doi (List[str]): Citable DOI numbers for the plugin.

  • arXiv (List[str]): arXiv numbers for the plugin.

MyStatisticalModel class has four main functionalities namely is_alive(), config(), get_logpdf_func(), and BackendBase() documentation by clicking on them.)

  • is_alive(): This function returns a boolean indicating that the statistical model has at least one signal bin with a non-zero yield.

  • config(): This function returns ModelConfig class which includes certain information about the model structure, such as the index of the parameter of interest within the parameter list (poi_index), minimum value parameter of interest can take (minimum_poi), suggested initialisation parameters for the optimiser (suggested_init) and suggested bounds for the parameters (suggested_bounds). If allow_negative_signal=True the lower bound of POI is expected to be zero; if False minimum_poi. poi_upper_bound is used to enforce an upper bound on POI.

    Note

    Suggested bounds and initialisation values should return a list with a length of the number of nuisance parameters and parameters of interest. Initialisation values should be a type of List[float, ...] and bounds should have the type of List[Tuple[float, float], ...].

  • get_logpdf_func(): Returns a callable that computes the log-likelihood for any parameter vector. Mathematically, this function should return \(\log\mathcal{L}(\mu, \theta)\) where the input array contains both the POI (\(\mu\)) and nuisance parameters (\(\theta\)). Behind the scenes, Spey uses this function within an optimization loop:

    \[(\hat{\mu}, \hat{\theta}) = \arg\min_{\mu, \theta} \left[ -\log\mathcal{L}(\mu, \theta) \right]\]

    The expected argument determines which data to use in the likelihood computation: if expected=spey.ExpectationType.observed, use actual experimental data; if expected=spey.ExpectationType.apriori, use background yields as the “observed” data. This ensures the function correctly computes both fit and Asimov likelihoods. If data is provided explicitly, it overrides the default data selection (used for Asimov data in hypothesis testing).

  • expected_data() (optional): This function is crutial for asymptotic hypothesis testing. This function is used to generate the expected value of the data with the given fit parameters, i.e. \(\theta\) and \(\mu\). If this function does not exist, exclusion limits can still be computed using chi_square calculator. see exclusion_confidence_level().

Other available functions that can be implemented are shown in the table below. These are optional optimizations that improve computational efficiency or enable advanced features.

Functions and Properties

Mathematical Purpose

Use Case

get_objective_function()

Returns \(f(\vec{p}) = -\log\mathcal{L}(\vec{p})\) and optionally its gradient \(\nabla f\). Enables first-order optimization methods that use analytical gradients instead of numerical differentiation.

Significant speedup for high-dimensional fits; essential for Automatic Differentiation backends

get_hessian_logpdf_func()

Returns the Hessian matrix \(H_{ij} = \frac{\partial^2 \log\mathcal{L}}{\partial p_i \partial p_j}\). The inverse Hessian at the maximum is the Fisher information matrix, used to estimate parameter uncertainties.

Accurate uncertainty estimation via sigma_mu(); required for confidence intervals

get_sampler()

Returns a function that generates pseudo-datasets by sampling from the likelihood distribution at given parameter values. Enables toy Monte Carlo hypothesis testing (see exclusion_confidence_level() with calculator='toy').

Toy-based exclusion limits; empirical p-value computation when asymptotic approximations are insufficient

Attention

A simple example implementation can be found in the example-plugin repository which implements

\[\mathcal{L}(\mu) = \prod_{i\in{\rm bins}}{\rm Poiss}(n^i|\mu n_s^i + n_b^i)\]

In order to make this model recognised by Spey, the class must be registered as an entry point or by a decorator. The former is explained in the next section, while the latter can be done by using the register_backend() decorator as follows;

1>>> import spey
2
3>>> @spey.register_backend
4>>> class MyStatisticalModel(spey.BackendBase):
5>>>     name = "my_stat_model"
6>>>     ...
7>>>     # rest of the implementation
8>>>     ...

Notice that this method does not require a setup.py file, but the statistical model will only be available if the module is imported before calling AvailableBackends(). Hence if the goal is to create a package that can be installed and used as a plugin, the entry point method is preferred.

Identifying and installing your statistical model#

To register your statistical model with Spey, you need to create an entry point. Modern Python projects use pyproject.toml (recommended), while legacy projects may use setup.py. Both approaches are shown below.

Folder structure (same for both methods):

my_folder
├── my_subfolder
│   ├── __init__.py
│   └── mystat_model.py # this includes class MyStatisticalModel
├── pyproject.toml    # Modern approach (recommended)
└── setup.py          # Legacy approach (optional)

Using setup.py (Legacy)#

If you prefer the legacy approach or need maximum compatibility with older tools:

from setuptools import setup

stat_model_list = ["my_stat_model = my_subfolder.mystat_model:MyStatisticalModel"]

setup(
    name="my-spey-plugin",
    version="1.0.0",
    description="A custom Spey statistical model",
    py_modules=["my_subfolder"],
    install_requires=["spey>=0.1.0"],
    entry_points={"spey.backend.plugins": stat_model_list}
)

Parameters:

  • stat_model_list is a list of statistical models to register (can include multiple backends)

  • "my_stat_model" is the backend identifier (must match the class’s name attribute)

  • "my_subfolder.mystat_model:MyStatisticalModel" is the module path and class name

After writing setup.py, install with: pip install -e .

Both methods achieve the same result—after installation, your plugin is immediately available through Spey. Choose pyproject.toml for new projects unless you have specific legacy requirements.

Citing Plug-ins#

Since other users can build plug-ins, they are given a metadata accessor to extract proper information to cite them. get_backend_metadata() function allows the user to extract name, author, version, DOI and arXiv number to be used in academic publications. This information can be accessed as follows

>>> import spey
>>> spey.get_backend_metadata("mystat_model")
>>> # {'name': 'my_stat_model',
... #  'author': 'John Smith <john.smith@smith.com>',
... #  'version': '1.0.0',
... #  'spey_requires': '>=0.1.0,<0.2.0',
... #  'doi': [],
... #  'arXiv': []}