Statistical inference on Histogram#

In this tutorial, we will generate a mock histogram and see if our signal model is good enough to explain the experimental data.

Let us generate a histogram to be used in our example first. In below example, we generate three sets of data, signal (\(n_s\)), background (\(n_b\)) and observarions (\(n\)). The statistical uncertainties on the background yields are shown with black lines.

../_images/512110ee38754baa14fe8872e0f1f8dc7f409b5816869127c135177259d1e754.png

Lets construct a statistical model. Since the only source of uncertainty is statistical, we can safely assume that non of the bins are correlated in systematic level. Hence we will use "default.uncorrelated_background" model.

pdf_wrapper = spey.get_backend("default.uncorrelated_background")

stat_model = pdf_wrapper(
    signal_yields=s_and_b - b_only,
    background_yields=b_only,
    data=data,
    absolute_uncertainties=np.sqrt(b_only),
)

Now we can compute the normalised likelihood distribution for prefit and postfit statistical models.

poi = np.linspace(-0.3, 8, 50)

llhd_postfit = [stat_model.likelihood(p, return_nll=False) for p in poi]
muhat_postfit, max_llhd_postfit = stat_model.maximize_likelihood(return_nll=False)

llhd_prefit = [
    stat_model.likelihood(p, expected=spey.ExpectationType.apriori, return_nll=False)
    for p in poi
]
muhat_prefit, max_llhd_prefit = stat_model.maximize_likelihood(
    expected=spey.ExpectationType.apriori, return_nll=False
)

../_images/a198b03c6de0f16d805caa8e44a52aa5cd52a999e619bab461ffab1760aae45c.png

Now let us compute the expected and observed upper limits on signal strength (\(\mu\)) at 95% CL:

print(f"Observed upper limit on µ at 95% CL: {stat_model.poi_upper_limit():.5f}")

expected_pval = stat_model.poi_upper_limit(
    expected=spey.ExpectationType.aposteriori, expected_pvalue="1sigma"
)
muUL_apost = expected_pval[1]
print(
    "Expected upper limit on µ with ± 1σ at 95% CL: "
    + ",".join([f"{x:.3f}" for x in expected_pval])
)

Observed upper limit on µ at 95% CL: 4.66815
Expected upper limit on µ with ± 1σ at 95% CL: 2.834,3.764,5.183

We can use confidence_level flag to change the confidence level of our calculation

print(
    f"Observed upper limit on µ at 68% CL: {stat_model.poi_upper_limit(confidence_level=0.68):.5f}"
)

Observed upper limit on µ at 68% CL: 2.79069

Now let us compute the upper limit by hand:

poi = np.linspace(1, 7, 20)
poiUL = np.array(
    [
        stat_model.exclusion_confidence_level(
            poi_test=p, expected=spey.ExpectationType.aposteriori
        )
        for p in poi
    ]
)

../_images/57db90eedf02369b2028665d14d6ca6d99c9a1c904ee59323ee591cc2c642d9d.png

Additionally we can compute the exclusion limit of our signal model to see if our model is excluded.

print(
    f"Observed exclusion confidence level, CLs: {stat_model.exclusion_confidence_level()[0]:.5f}"
)

exp_cls = stat_model.exclusion_confidence_level(expected=spey.ExpectationType.aposteriori)
print(
    f"Expected exclusion confidence level, CLs ± 1σ: {exp_cls[2]:.4f} - {exp_cls[2]-exp_cls[3]:.4f} + {exp_cls[1]-exp_cls[2]:.4f}"
)

Observed exclusion confidence level, CLs: 0.13942
Expected exclusion confidence level, CLs ± 1σ: 0.1620 - 0.0972 + 0.1182

Since our model is not excluded (\(1-{\rm CL}_s < 0.95\)), we may want to see the significance of our signal yields:

print(f"Z = {stat_model.significance()[1]:.3f}")

Z = 1.185