SPR for Version 14.05.006

Running Bagger

The bagger (decision tree) classifier was run for comparison purposes. The .pat files associated with this classifier are shown below. Note that a "_" is needed before the names of the leaves.iii

The contents of baggerValidation.pat is
Tree: TopTreeBkg TopTreeSig
TreeClass: 0 1
Leaves: _HT _Jet1Pt _DeltaRJet1Jet2 _WTransverseMass
WeightVariable: _EventWeight
File: /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/CombineSigBkg/Topology.SingleTop.1405006.FDR2.Electron.Validation.root

The contents of baggerTraining.pat is
Tree: TopTreeBkg TopTreeSig
TreeClass: 0 1
Leaves: _HT _Jet1Pt _DeltaRJet1Jet2 _WTransverseMass
WeightVariable: _EventWeight
File: /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/CombineSigBkg/Topology.SingleTop.1405006.FDR2.Electron.Training.NoNeg.root

Run Training and Validation Simultaneously

Run the boosted decision classifier.
/usr/local/bin/bin/SprBaggerDecisionTreeApp  \
  -n 1100 -l 6 -s 33 -g 1 -y '0:1' -d 5\
  -f bagger.spr -t data/baggerValidation.pat -o baggerOutput.root\
   data/baggerTraining.pat  \
The options are as follows
  • -n 1100: number of Bagger training cycles (the number of decision trees)
  • -l 6: minimal number of entries per tree leaf (def=0) (entries per terminal node of the decision tree)
  • -s 33: max number of sampled features (def=0 no sampling)
  • -g 1: per-event loss for (cross-)validation is quadratic loss (y-f(x))^2
  • -y '0:1': list of input classes (0 corresponds to background and 1 to signal)
  • -d 5: frequency of print-outs for validation data
  • -f bagger.spr: store trained Bagger to file
  • -t data/baggerValidation.pat: read validation/test data from a file (must be in same format as input data!)
  • -o baggerOutput.root: output root file (is this validation or training)
  • data/baggerTraining.pat: The file used for training

Part of the output is shown below. The training file has 1123 background events and 115 signal events, giving 1238 events in total. The validation file has 1268 background events and 119 signal events, giving 1387 events in total. The end of the output shows that 1238 events were put into the root file, which corresponds to the number of events in the training sample and not the validation sample. The output root file baggerOutput.root is the same as the output root file baggerOutputTraining.root produced below.

Parsing File: data/baggerTraining.pat
Found file: /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/CombineSigBkg/Topology.SingleTop.1405006.FDR2.Electron.Training.NoNeg.root start: 0 end: -1 class: 0 weight: 1
A variable determined weight has been chosen, the value assigned to 
        _EventWeight
 will be used for the weight.
Reading File: /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/CombineSigBkg/Topology.SingleTop.1405006.FDR2.Electron.Training.NoNeg.root for Tree: TopTreeBkg (1123 events)
Reading File: /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/CombineSigBkg/Topology.SingleTop.1405006.FDR2.Electron.Training.NoNeg.root for Tree: TopTreeSig (115 events)
Read data from file data/baggerTraining.pat for variables "_HT" "_Jet1Pt" "_DeltaRJet1Jet2" "_WTransverseMass"
Total number of points read: 1238
Training data filtered by class.
Points in class 0(1):   1123
Points in class 1(1):   115
Parsing File: data/baggerValidation.pat
Found file: /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/CombineSigBkg/Topology.SingleTop.1405006.FDR2.Electron.Validation.root start: 0 end: -1 class: 0 weight: 1
Reading File: /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/CombineSigBkg/Topology.SingleTop.1405006.FDR2.Electron.Validation.root for Tree: TopTreeBkg (1268 events)
Reading File: /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/CombineSigBkg/Topology.SingleTop.1405006.FDR2.Electron.Validation.root for Tree: TopTreeSig (119 events)
Read validation data from file data/baggerValidation.pat for variables "_HT" "_Jet1Pt" "_DeltaRJet1Jet2" "_WTransverseMass"
Total number of points read: 1387
Validation data filtered by class.
Points in class 0(1):   1268
Points in class 1(1):   119
Optimization criterion set to Gini index  -1+p^2+q^2 
Monitoring criterion set to Fraction of correctly classified events 
Per-event loss set to Quadratic loss (y-f(x))^2 
Decision tree initialized with minimal number of events per node 6
Decision tree will resample at most 4 features.
Using a Topdown tree.
Classes for Bagger are set to 0(1) 1(1)
Bagger initialized with classes 0(1) 1(1) with cycles 1100
Validation FOM=0.0970811 at cycle 5
...
Bagger finished training with 1100 classifiers.
Feeder storing point 0 out of 1238
Feeder storing point 1000 out of 1238

Run Training and Validation Separately

Run Training Only

The only difference between running training separately and running training with validation, as shown above is the lack of the command line option -t data/baggerValidation.pat.
/usr/local/bin/bin/SprBaggerDecisionTreeApp  \
  -n 1100 -l 6 -s 33 -g 1 -y '0:1' -d 5\
  -f bagger.spr -o baggerOutputTraining.root \
  data/baggerTraining.pat \

The output root file baggerOutputTraining.root is the same as baggerOutput.root produced above. The text output printed to stdout is below
Parsing File: data/baggerTraining.pat
Found file: /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/CombineSigBkg/Topology.SingleTop.1405006.FDR2.Electron.Training.NoNeg.root start: 0 end: -1 class: 0 weight: 1
A variable determined weight has been chosen, the value assigned to 
        _EventWeight
 will be used for the weight.
Reading File: /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/CombineSigBkg/Topology.SingleTop.1405006.FDR2.Electron.Training.NoNeg.root for Tree: TopTreeBkg (1123 events)
Reading File: /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/CombineSigBkg/Topology.SingleTop.1405006.FDR2.Electron.Training.NoNeg.root for Tree: TopTreeSig (115 events)
Read data from file data/baggerTraining.pat for variables "_HT" "_Jet1Pt" "_DeltaRJet1Jet2" "_WTransverseMass"
Total number of points read: 1238
Training data filtered by class.
Points in class 0(1):   1123
Points in class 1(1):   115
Optimization criterion set to Gini index  -1+p^2+q^2 
Monitoring criterion set to Fraction of correctly classified events 
Per-event loss set to Quadratic loss (y-f(x))^2 
Decision tree initialized with minimal number of events per node 6
Decision tree will resample at most 4 features.
Using a Topdown tree.
Classes for Bagger are set to 0(1) 1(1)
Bagger initialized with classes 0(1) 1(1) with cycles 1100
Bagger finished training with 1100 classifiers.
Warning: file baggerOutputTraining.root will be deleted.
Feeder storing point 0 out of 1238
Feeder storing point 1000 out of 1238
Error in <TBasket::Create>: Cannot create key without file
Error in <TBasket::Create>: Cannot create key without file
Writer successfully closed.

Run Validation Only

Running validation only is done by
/usr/local/bin/bin/SprBaggerDecisionTreeApp  \
  -n 0 -y '0:1' \
  -r bagger.spr -o baggerOutputValidation.root \
  data/baggerValidation.pat \

The option -r bagger.spr resumes running using the info stored in bagger.spr. For validation, the option -n 0 must be used. SprBaggerDecisionTreeApp accepts non-zero values for this options but this gives a different output histogram for different values of n. For a non-zero n, the output histogram changes slightly for different values of the option -s and greatly for different values of the option -l. In addition, the -n option is not available for SprOutputWriterApp (see below).

The validation sample contains both negatively and positively weighted events and therefore has more events that the training sample, which has only positively weighted events. The stdout is shown below.

Parsing File: data/baggerValidation.pat
Found file: /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/CombineSigBkg/Topology.SingleTop.1405006.FDR2.Electron.Validation.root start: 0 end: -1 class: 0 weight: 1
Reading File: /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/CombineSigBkg/Topology.SingleTop.1405006.FDR2.Electron.Validation.root for Tree: TopTreeBkg (1268 events)
Reading File: /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/CombineSigBkg/Topology.SingleTop.1405006.FDR2.Electron.Validation.root for Tree: TopTreeSig (119 events)
Read data from file data/baggerValidation.pat for variables "_HT" "_Jet1Pt" "_DeltaRJet1Jet2" "_WTransverseMass"
Total number of points read: 1387
Training data filtered by class.
Points in class 0(1):   1268
Points in class 1(1):   119
Optimization criterion set to Gini index  -1+p^2+q^2 
Monitoring criterion set to Fraction of correctly classified events 
Per-event loss set to Quadratic loss (y-f(x))^2 
Decision tree initialized with minimal number of events per node 6
Decision tree will resample at most 4 features.
Using a Topdown tree.
Classes for Bagger are set to 0(1) 1(1)
Bagger initialized with classes 0(1) 1(1) with cycles 1100
Read saved Bagger from file bagger.spr with 1100 trained classifiers.
Bagger finished training with 2200 classifiers.
Warning: file baggerOutputValidation.root will be deleted.
Feeder storing point 0 out of 1387
Feeder storing point 1000 out of 1387
Error in <TBasket::Create>: Cannot create key without file
Error in <TBasket::Create>: Cannot create key without file
Writer successfully closed.

Alternative Method to Run Validation Only

SprOutputWriterApp provides a simpler method of running only the validation. According to the README

SprOutputWriterApp reads the saved configuration of any trained classifier, computes classifier responses for the supplied data and saves output into a file. Prior to tag V05-00-00 this functionality was distributed among several executables. For example, to read the saved AdaBoost configuration from a file and apply it to data, the user would have to specify zero training cycles "-n 0" and a file to read from using "-r" option of the corresponding AdaBoost executable. SprOutputWriterApp is a unified recommended replacement.

This program is run as
/usr/local/bin/bin/SprOutputWriterApp bagger.spr -y '0:1'  \
  data/baggerValidation.pat baggerOutputValidationOW.root

Note that there is no -n option. The output from this method is identical to the previous method of running SprBaggerDecisionTreeApp with the option -n 0. However, the classifier histogram is names Bagger in the output root file instead of bag as when running SprBaggerDecisionTreeApp.

Recommendations for Running

It is recommended to use SprBaggerDecisionTreeApp only for training and then use SprOutputWriterApp for validation. Since SprOutputWriterApp can be called in the exact same way for any classifier and it is not necessary to remember use the parameter -n 0, this method simpler and leads to less confusion.

Significance Calculation

SPR was run in training only mode
/usr/local/bin/bin/SprBaggerDecisionTreeApp  \
  -n 1100 -l 6 -s 33 -g 1 -y '0:1' -d 5\
  -f bagger.spr -o baggerOutputTraining.root \
  data/baggerTraining.pat \
and then the output writer was run for validation.
/usr/local/bin/bin/SprOutputWriterApp bagger.spr -y '0:1'  \
  data/baggerValidation.pat baggerOutputValidationOW.root

The significance was calculated using !SignificanceAlt.py

-- PatRyan - 16 Apr 2009
Topic revision: r7 - 28 Apr 2009, PatRyan
 

This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback