spey.helper_functions.merge_correlated_bins

spey.helper_functions.merge_correlated_bins#

spey.helper_functions.merge_correlated_bins(background_yields: ndarray, data: ndarray, covariance_matrix: ndarray, merge_groups: List[List[int]], signal_yields: ndarray = None, return_group_indices: bool = False) Dict[str, ndarray][source]#

Merge correlated bins in a histogram/cutflow.

This function takes a set of background yields, data, and a covariance matrix, and merges specified groups of bins into single bins. The resulting background yields, data, and covariance matrix are returned in a dictionary. The merging is done by summing the yields and data for the specified groups, and summing the covariance matrix entries for the merged bins.

Added in version 0.2.4.

Example:

>>> from spey.helper_functions import merge_correlated_bins
>>> import numpy as np
>>> background_yields = np.array([10, 20, 30, 40])
>>> data = np.array([12, 22, 32, 42])
>>> covariance_matrix = np.array(
...   [[4, 1, 0.5, 0.2],
...    [1, 3, 0.3, 0.1],
...    [0.5, 0.3, 5, 0.2],
...    [0.2, 0.1, 0.2, 4]]
>>> )
>>> merge_groups = [[0, 1], [2, 3]]
>>> result = merge_correlated_bins(
...     background_yields=background_yields,
...     data=data,
...     covariance_matrix=covariance_matrix,
...     merge_groups=merge_groups
... )
>>> print(result)
>>> # {
... #    'background_yields': array([30., 70.]),
... #    'data': array([34., 74.]),
... #    'covariance_matrix': array([[ 9. ,  1.1],
... #                                [ 1.1,  9.4]])
... # }
The resulting result dictionary will contain:
  • background_yields: Merged background yields.

  • data: Merged data.

  • covariance_matrix: Merged covariance matrix.

Note

The function assumes that the input arrays are 1-dimensional and that the covariance matrix is square. It also checks for overlapping indices in merge_groups and raises an assertion error if any are found.

Warning

The function does not check for the validity of the covariance matrix (e.g., positive definiteness). It is assumed that the input covariance matrix is valid for the given background yields and data.

Parameters:
  • background_yields (np.ndarray) – background yields for each bin.

  • data (np.ndarray) – observed data for each bin.

  • covariance_matrix (np.ndarray) – covariance matrix for the bins.

  • merge_groups (list[list[int]]) – indices of bins to merge.

  • signal_yields (np.ndarray, default None) – signal yields for each bin. If provided, these will also be merged according to the specified groups.

  • return_group_indices (bool, default False) –

    if True, the function will return the indices of the merged groups in the output dictionary. This is to help user to keep track of which bins were merged together and how the bins are reordered. New signal yields can be formed by running the following code:

    >>> new_signal_yields = [sum(np.array(signal_yields)[Gi]) for Gi in output["group_indices"]]
    

Raises:

AssertionError

  • If the lengths of the input arrays do not match or if the covariance matrix is not square. * If there are overlapping indices in merge_groups. * If the lengths of data, background_yields, and signal_yields do not match. * If the covariance matrix is not square. * If the lengths of data, background_yields, and signal_yields do not match.

Returns:

A dictionary containing the merged background yields, data, and covariance matrix (and signal if included).

Return type:

dict[str, np.ndarray]