Comparisons between TMVA and SPR

Two Methods of Calculating Significance in TMVA

Using TMVAnalysis and TMVApplication

The first method available in TMVA to calculate significance is to first run TMVAnalysis and the TMVApplication. TMVAnalysis was run as
python TMVAnalysis.py \
   -i /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/CombineSigBkg/Topology.SingleTop.1405006.FDR2.Electron.Training.NoNeg.root \
   -t "TopTreeSig TopTreeBkg" -m BDT\
   -o "TMVAout.CompareTMVA.1405006.root" \

The following line was used in TMVAnalysis so that all the events except for 2 were used for training
factory.PrepareTrainingAndTestTree( mycutSig, mycutBkg, "NSigTrain=113:NBkgTrain=1121::NSigTest=2:NBkgTest=2:SplitMode=Alternate:NormMode=NumEvents:!V" )

The output of TMVAnalysis includes the following
--- DataSet        : - Training signal entries     : 113
--- DataSet        : - Training background entries : 1121
--- DataSet        : - Testing  signal entries     : 2
--- DataSet        : - Testing  background entries : 2

TMVApplication was run as
python TMVApplication.py \
   -i /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/merged/Topology.SingleTop.1405006.FDR2.Electron.Signal.Validation.root -m BDT\
   -o Signal.Compare.1405006.root 

python TMVApplication.py \
   -i /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/merged/Topology.SingleTop.1405006.FDR2.Electron.Background.Validation.root -m BDT \
   -o Background.Compare.1405006.root 

Significance was calculated as
python Significance.py -S Signal.Compare.1405006.root -B Background.Compare.1405006.root -w 3 -o SignificanceOutput.Compare.1405006.root

The output is
==== Weighted Signal Events:  84.6193445325 Weighted Background Events:  449.649060309
==== Total Significance:  3.99055050669

===  BDT Maximum Significance (From Right):  5.54816436768
===  BDT Maximum Significance (From Left):   3.99055051804

TMVAnalysis and then calculate significance directly

python TMVAnalysis.py \
   -i /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/CombineSigBkg/Topology.SingleTop.1405006.FDR2.Electron.NoNeg.root \
   -t "TopTreeSig TopTreeBkg" -m BDT\
   -o "TMVAout.CompareTMVA.Method2.1405006.root" \

The following line was used in TMVAnalysis so that half of the events were used for training and half for testing
factory.PrepareTrainingAndTestTree( mycutSig, mycutBkg, "NSigTrain=0:NBkgTrain=0::NSigTest=0:NBkgTest=0:SplitMode=Alternate:NormMode=NumEvents:!V" )

The output included
--- DataSet        : - Training signal entries     : 174
--- DataSet        : - Training background entries : 1677
--- DataSet        : - Testing  signal entries     : 174
--- DataSet        : - Testing  background entries : 1677

--- Factory        : -----------------------------------------------------------------------------
--- Factory        : MVA              Signal efficiency at bkg eff. (error):  |  Sepa-    Signifi-
--- Factory        : Methods:         @B=0.01    @B=0.10    @B=0.30    Area   |  ration:  cance:  
--- Factory        : -----------------------------------------------------------------------------
--- Factory        : BDT            : 0.045(15)  0.047(16)  0.052(16)  0.268  |  0.511    0.276
--- Factory        : -----------------------------------------------------------------------------

Run SignificanceAlt.py
 python SignificanceAlt.py -m BDT -i TMVAout.CompareTMVA.Method2.1405006.root -o SignificanceOutput.Compare.Method2.1405006.root -p "TMVA" -w2

The output included

==== Weighted Signal Events:  347.999909878 Weighted Background Events:  4197.4060111
==== Total Significance:  5.37141418101

===  Signal_MVA_BDT Maximum Significance (Right):  5.37141418457
===  Signal_MVA_BDT Maximum Significance (Left):   7.41332483292

Significance from SPR

Run the training
/usr/local/bin/bin/SprBaggerDecisionTreeApp  \
  -n 1100 -l 6 -s 33 -g 1 -y '0:1' -d 5\
  -f baggerCompare.spr -o baggerOutputTraining.Compare.root \
  data/baggerTraining.pat \

The output includes the following
Total number of points read: 1238
Training data filtered by class.
Points in class 0(1):   1123
Points in class 1(1):   115

Validation was run by
/usr/local/bin/bin/SprOutputWriterApp baggerCompare.spr -y '0:1'  \
  data/baggerValidation.pat baggerOutputValidation.Compare.root

Output included
Total number of points read: 1387
Training data filtered by class.
Points in class 0(1):   1268
Points in class 1(1):   119

Significance was calculated by
python SignificanceAlt.py -m BDT -i /work/jever/pryan/SPR-3.3.1/baggerOutputValidation.Compare.root -o SignificanceOutput.Compare.SPR.1405006.root -p "SPR" -w3

Output included
===                                 Bagger
=== Method:  Bagger
    Signal Events:  119     Background Events:  1268

==== Weighted Signal Events:  84.6193494797 Weighted Background Events:  449.649265766
==== Total Significance:  3.99054982829

===  Signal_Bagger Maximum Significance (From Right):  4.29537010193
===  Signal_Bagger Maximum Significance (From Left):   4.26745414734

Conclusions

The significance from the three different methods are:

  • TMVAnalysis and TMVApplication: 3.991
  • TMVAnalysis and direct calculation: 5.371
  • SPR and direct calculation: 3.991

TMVAnalysis with TMVApplication and SPR give the same total significance. TMVAnalysis with a direction calculation gives a different significance, which may be due to the fact that a different sample was used for training and validation. However, this significance is not at all close to the other significances, which means that the difference is most likely due to TestAllMethods() doing something different than the validation in TMVApplication and SPR.

-- PatRyan - 04 May 2009
Topic revision: r4 - 07 May 2009, PatRyan
 

This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback