Significance Overview

This page reviews some of the methods and tools we use to calculate the significance, as well as a brief overview of the cross-section calculation. This page is currently prepared for use with MC only (expected values) and will have to be modified to include information about how to find observed, data values.

There several significance methods that are used, most of which are ad-hoc methods. These include S/sqrtB, S/uncertainty(B), 1/(cross-section uncertainty), etc. None of these are advised for quoted results. In initial work it may make sense to use S/sqrtB for its simplicity. However, keep in mind that none of these numbers can really be trusted. For instance, S/sqrtB generally seems to overestimate the actual significance.

Accepted method 1: Frequentist, binomial method

This method is taken from arXiv:physics/0702156v3 (see link at bottom of page). It requires the systematic uncertainty on the background sum to be estimated and the amount of background and signal plus background ("data") after a classifier output cut to be known. With this information, the significance can be calculated using a method in ROOT. However, ROOT does run into problems trying to calculate significances higher than about 7 because of a lack of sufficient digits to do the calculation. It runs into issues for small uncertainties on the background as well. However, significances we study are generally below this value anyways. An example of the ROOT code for this method is shown below:
sigmab = 10; //uncertainty on background remaining after classifier output cut, or other selection
bkgd = 100; //background after classifier output cut, or other selection
signal = 30; //signal after classifier output cut, or other selection
tau = bkgd/(sigmab*sigmab); 
noff = tau*bkgd;
non = signal + bkgd; //"data" 
pbi = TMath::BetaIncomplete(1. / (1. + tau), non, noff + 1); //this is the important line, which computes the p-value
zbi = TMath::Sqrt(2)*TMath::ErfInverse(1-2*pbi); //this line converts the p-value to a significance

Note that the significance calcuation has 2 times pi, which includes the possibility of the background fluctuating up or down. If you only want to consider the possibility of the background fluctuating up above the data, this should get changed to 1 times pi (so using a 1 sided p-value rather than a 2 sided p-value). Another thing to note about this significance, as noted in the paper mentioned earlier, is that it does not undercover; that is, the actual significance will not be lower than the expected significance we calculate here. The actual value could be higher than the expected value however; that is, the method may over-cover or be conservative. This method has not yet been used to calculate combined significances. This method has been used for lots of calculations though. The major advantage of this method, aside from the lack of under-coverage, is its simplicity in programming and extremely fast calculation time (especially compared to method 2).

Accepted method 2: Log-likelihood ratio

This method is much more complicated than the previous one, and takes much longer to generate a significance (~hours, not seconds). The use of the tier three is highly recommended if you are going to use this method, especially for significances expected to be large (5, 6 sigma). This method generates many ensembles, each of which are varied by systematics, and a log-likelihood ratio (LLR) is calculated for each. Note that both background and signal systematics are included for this method. The ensembles are generated with and without signal included. The significance is based on the number of ensembles with only background that have an LLR value greater than the mean of the ensembles with signal and background both included. In other words, it finds the odds that background can fluctuate enough to look like signal. The trouble is that calculating 6 sigma, for instance, requires 1000 sets with 1 million ensembles in each set (at this number of ensembles, it is more productive to generate smaller sets of ensembles each with a different random number seed which can run separately on the tier 3 and add them together rather than one long run generating ensembles). For running on the tier three, at this time sets of 1 million ensembles (or less) are recommended at this time. These should take about half an hour to run each and not overload the memory of the computer you are using for that set. This method is less conservative than method 1, and also allows the user to find the significance for a combination of channels. However, again, it takes awhile to generate these numbers. This is the method used at the Tevatron for the single-top observation.

-- JennyHolzbauer - 14 Sep 2009
Topic revision: r5 - 16 Oct 2009, TomRockwell

This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback