Difference: AnalysisVersion1405006SPR (1 vs. 7)

Revision 7
28 Apr 2009 - PatRyan
Line: 1 to 1
 
META TOPICPARENT name="AnalysisVersion1405006"

SPR for Version 14.05.006

Line: 205 to 205
  Since SprOutputWriterApp can be called in the exact same way for any classifier and it is not necessary to remember use the parameter -n 0, this method simpler and leads to less confusion.
Added:
>
>

Significance Calculation

SPR was run in training only mode
/usr/local/bin/bin/SprBaggerDecisionTreeApp  \
  -n 1100 -l 6 -s 33 -g 1 -y '0:1' -d 5\
  -f bagger.spr -o baggerOutputTraining.root \
  data/baggerTraining.pat \
and then the output writer was run for validation.
/usr/local/bin/bin/SprOutputWriterApp bagger.spr -y '0:1'  \
  data/baggerValidation.pat baggerOutputValidationOW.root

The significance was calculated using !SignificanceAlt.py
  -- PatRyan - 16 Apr 2009 \ No newline at end of file
Revision 6
22 Apr 2009 - PatRyan
Line: 1 to 1
 
META TOPICPARENT name="AnalysisVersion1405006"

SPR for Version 14.05.006

Line: 178 to 178
 

Alternative Method to Run Validation Only

Changed:
<
<
SprOutputWriterApp provides a simpler method of running only the validation. According to the README
>
>
SprOutputWriterApp provides a simpler method of running only the validation. According to the README
 
Changed:
<
<
_SprOutputWriterApp reads the saved configuration of any trained classifier,
>
>
SprOutputWriterApp reads the saved configuration of any trained classifier,
  computes classifier responses for the supplied data and saves output
Changed:
<
<
into a file. Prior to tag V05-00-00 this fuctionality was distributed among several executables. For example, to read the saved AdaBoost configuration
>
>
into a file. Prior to tag V05-00-00 this functionality was distributed among several executables. For example, to read the saved AdaBoost configuration
  from a file and apply it to data, the user would have to specify zero training cycles "-n 0" and a file to read from using "-r" option of the
Changed:
<
<
corresponding AdaBoost executable. SprOutputWriterApp is a unified recommended replacement._
>
>
corresponding AdaBoost executable. SprOutputWriterApp is a unified recommended replacement.
 

This program is run as
Line: 196 to 196
 

Note that there is no -n option. The output from this method is identical to the previous method
Changed:
<
<
of running SprBaggerDecisionTreeApp with the option -n 0. However, the classifier histogram is names Bagger in the output root file instead of bag as when running SprBaggerDecisionTreeApp.
>
>
of running SprBaggerDecisionTreeApp with the option -n 0. However, the classifier histogram is names Bagger in the output root file instead of bag as when running SprBaggerDecisionTreeApp.
 
Changed:
<
<
It is recommended to use SprOutputWriterApp since it is simpler and can be applied to the training output from all the classifiers.
>
>

Recommendations for Running

It is recommended to use SprBaggerDecisionTreeApp only for training and then use SprOutputWriterApp for validation. Since SprOutputWriterApp can be called in the exact same way for any classifier and it is not necessary to remember use the parameter -n 0, this method simpler and leads to less confusion.
 

-- PatRyan - 16 Apr 2009 \ No newline at end of file
Revision 5
21 Apr 2009 - PatRyan
Line: 1 to 1
 
META TOPICPARENT name="AnalysisVersion1405006"

SPR for Version 14.05.006

Line: 135 to 135
  Running validation only is done by
/usr/local/bin/bin/SprBaggerDecisionTreeApp  \
Changed:
<
<
-n 1100 -l 6 -s 33 -g 1 -y '0:1' -d 5
>
>
-n 0 -y '0:1'
  -r bagger.spr -o baggerOutputValidation.root data/baggerValidation.pat
Changed:
<
<
The option -r bagger.spr resumes running using the info stored in bagger.spr.
>
>
The option -r bagger.spr resumes running using the info stored in bagger.spr. For validation, the option -n 0 must be used. SprBaggerDecisionTreeApp accepts non-zero values for this options but this gives a different output histogram for different values of n. For a non-zero n, the output histogram changes slightly for different values of the option -s and greatly for different values of the option -l. In addition, the -n option is not available for SprOutputWriterApp (see below).
  The validation sample contains both negatively and positively weighted events and therefore has more events that the training sample, which has only positively weighted events. The stdout is shown below.
Line: 173 to 177
  Writer successfully closed.
Added:
>
>

Alternative Method to Run Validation Only

SprOutputWriterApp provides a simpler method of running only the validation. According to the README

_SprOutputWriterApp reads the saved configuration of any trained classifier, computes classifier responses for the supplied data and saves output into a file. Prior to tag V05-00-00 this fuctionality was distributed among several executables. For example, to read the saved AdaBoost configuration from a file and apply it to data, the user would have to specify zero training cycles "-n 0" and a file to read from using "-r" option of the corresponding AdaBoost executable. SprOutputWriterApp is a unified recommended replacement._

This program is run as
/usr/local/bin/bin/SprOutputWriterApp bagger.spr -y '0:1'  \
  data/baggerValidation.pat baggerOutputValidationOW.root

Note that there is no -n option. The output from this method is identical to the previous method of running SprBaggerDecisionTreeApp with the option -n 0. However, the classifier histogram is names Bagger in the output root file instead of bag as when running SprBaggerDecisionTreeApp.

It is recommended to use SprOutputWriterApp since it is simpler and can be applied to the training output from all the classifiers.
 

-- PatRyan - 16 Apr 2009 \ No newline at end of file
Revision 4
21 Apr 2009 - PatRyan
Line: 1 to 1
 
META TOPICPARENT name="AnalysisVersion1405006"

SPR for Version 14.05.006

Line: 99 to 99
  data/baggerTraining.pat
Changed:
<
<
The output file baggerOutputTraining.root is the same as baggerOutput.root produced above.
>
>
The output root file baggerOutputTraining.root is the same as baggerOutput.root produced above. The text output printed to stdout is below
Parsing File: data/baggerTraining.pat
Found file: /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/CombineSigBkg/Topology.SingleTop.1405006.FDR2.Electron.Training.NoNeg.root start: 0 end: -1 class: 0 weight: 1
A variable determined weight has been chosen, the value assigned to 
        _EventWeight
 will be used for the weight.
Reading File: /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/CombineSigBkg/Topology.SingleTop.1405006.FDR2.Electron.Training.NoNeg.root for Tree: TopTreeBkg (1123 events)
Reading File: /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/CombineSigBkg/Topology.SingleTop.1405006.FDR2.Electron.Training.NoNeg.root for Tree: TopTreeSig (115 events)
Read data from file data/baggerTraining.pat for variables "_HT" "_Jet1Pt" "_DeltaRJet1Jet2" "_WTransverseMass"
Total number of points read: 1238
Training data filtered by class.
Points in class 0(1):   1123
Points in class 1(1):   115
Optimization criterion set to Gini index  -1+p^2+q^2 
Monitoring criterion set to Fraction of correctly classified events 
Per-event loss set to Quadratic loss (y-f(x))^2 
Decision tree initialized with minimal number of events per node 6
Decision tree will resample at most 4 features.
Using a Topdown tree.
Classes for Bagger are set to 0(1) 1(1)
Bagger initialized with classes 0(1) 1(1) with cycles 1100
Bagger finished training with 1100 classifiers.
Warning: file baggerOutputTraining.root will be deleted.
Feeder storing point 0 out of 1238
Feeder storing point 1000 out of 1238
Error in <TBasket::Create>: Cannot create key without file
Error in <TBasket::Create>: Cannot create key without file
Writer successfully closed.
 

Run Validation Only

Running validation only is done by
Line: 111 to 141
 

The option -r bagger.spr resumes running using the info stored in bagger.spr.
Added:
>
>
The validation sample contains both negatively and positively weighted events and therefore has more events that the training sample, which has only positively weighted events. The stdout is shown below.

Parsing File: data/baggerValidation.pat
Found file: /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/CombineSigBkg/Topology.SingleTop.1405006.FDR2.Electron.Validation.root start: 0 end: -1 class: 0 weight: 1
Reading File: /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/CombineSigBkg/Topology.SingleTop.1405006.FDR2.Electron.Validation.root for Tree: TopTreeBkg (1268 events)
Reading File: /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/CombineSigBkg/Topology.SingleTop.1405006.FDR2.Electron.Validation.root for Tree: TopTreeSig (119 events)
Read data from file data/baggerValidation.pat for variables "_HT" "_Jet1Pt" "_DeltaRJet1Jet2" "_WTransverseMass"
Total number of points read: 1387
Training data filtered by class.
Points in class 0(1):   1268
Points in class 1(1):   119
Optimization criterion set to Gini index  -1+p^2+q^2 
Monitoring criterion set to Fraction of correctly classified events 
Per-event loss set to Quadratic loss (y-f(x))^2 
Decision tree initialized with minimal number of events per node 6
Decision tree will resample at most 4 features.
Using a Topdown tree.
Classes for Bagger are set to 0(1) 1(1)
Bagger initialized with classes 0(1) 1(1) with cycles 1100
Read saved Bagger from file bagger.spr with 1100 trained classifiers.
Bagger finished training with 2200 classifiers.
Warning: file baggerOutputValidation.root will be deleted.
Feeder storing point 0 out of 1387
Feeder storing point 1000 out of 1387
Error in <TBasket::Create>: Cannot create key without file
Error in <TBasket::Create>: Cannot create key without file
Writer successfully closed.
 

-- PatRyan - 16 Apr 2009
Revision 3
20 Apr 2009 - PatRyan
Line: 1 to 1
 
META TOPICPARENT name="AnalysisVersion1405006"

SPR for Version 14.05.006

Changed:
<
<
Run the boosted decision classifier.
/usr/local/bin/bin/SprBaggerDecisionTreeApp  \
  -n 1100 -l 6 -s 33 -g 1 -y '0:1' \
  -f bagger.spr -t data/comparisonValidation2.pat \
   data/comparisonTraining.pat -d 5 \
The options are as follows
  • -n 1100: number of Bagger training cycles
  • -l 6: minimal number of entries per tree leaf (def=0)
  • -s 33: max number of sampled features (def=0 no sampling)
  • -g 1: per-event loss for (cross-)validation is quadratic loss (y-f(x))^2
  • -y '0:1': list of input classes (I believe 0 corresponds to background and 1 to signal)
  • -f bagger.spr: store trained Bagger to file
  • -t data/baggerValidation.pat: read validation/test data from a file (must be in same format as input data!!!
  • data/baggerTraining.pat: The file used for training
  • -d 5: frequency of print-outs for validation data
>
>

Running Bagger

The bagger (decision tree) classifier was run for comparison purposes. The .pat files associated with this classifier are shown below. Note that a "_" is needed before the names of the leaves.iii
 

The contents of baggerValidation.pat is
Tree: TopTreeBkg TopTreeSig
TreeClass: 0 1
Leaves: _HT _Jet1Pt _DeltaRJet1Jet2 _WTransverseMass
Changed:
<
<
Weight: 1.
>
>
WeightVariable: _EventWeight
  File: /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/CombineSigBkg/Topology.SingleTop.1405006.FDR2.Electron.Validation.root
Line: 34 to 20
  Tree: TopTreeBkg TopTreeSig TreeClass: 0 1 Leaves: _HT _Jet1Pt _DeltaRJet1Jet2 _WTransverseMass
Changed:
<
<
Weight: 1.
>
>
WeightVariable: _EventWeight
  File: /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/CombineSigBkg/Topology.SingleTop.1405006.FDR2.Electron.Training.NoNeg.root
Added:
>
>

Run Training and Validation Simultaneously

Run the boosted decision classifier.
/usr/local/bin/bin/SprBaggerDecisionTreeApp  \
  -n 1100 -l 6 -s 33 -g 1 -y '0:1' -d 5\
  -f bagger.spr -t data/baggerValidation.pat -o baggerOutput.root\
   data/baggerTraining.pat  \
The options are as follows
  • -n 1100: number of Bagger training cycles (the number of decision trees)
  • -l 6: minimal number of entries per tree leaf (def=0) (entries per terminal node of the decision tree)
  • -s 33: max number of sampled features (def=0 no sampling)
  • -g 1: per-event loss for (cross-)validation is quadratic loss (y-f(x))^2
  • -y '0:1': list of input classes (0 corresponds to background and 1 to signal)
  • -d 5: frequency of print-outs for validation data
  • -f bagger.spr: store trained Bagger to file
  • -t data/baggerValidation.pat: read validation/test data from a file (must be in same format as input data!)
  • -o baggerOutput.root: output root file (is this validation or training)
  • data/baggerTraining.pat: The file used for training

Part of the output is shown below. The training file has 1123 background events and 115 signal events, giving 1238 events in total. The validation file has 1268 background events and 119 signal events, giving 1387 events in total. The end of the output shows that 1238 events were put into the root file, which corresponds to the number of events in the training sample and not the validation sample. The output root file baggerOutput.root is the same as the output root file baggerOutputTraining.root produced below.

Parsing File: data/baggerTraining.pat
Found file: /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/CombineSigBkg/Topology.SingleTop.1405006.FDR2.Electron.Training.NoNeg.root start: 0 end: -1 class: 0 weight: 1
A variable determined weight has been chosen, the value assigned to 
        _EventWeight
 will be used for the weight.
Reading File: /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/CombineSigBkg/Topology.SingleTop.1405006.FDR2.Electron.Training.NoNeg.root for Tree: TopTreeBkg (1123 events)
Reading File: /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/CombineSigBkg/Topology.SingleTop.1405006.FDR2.Electron.Training.NoNeg.root for Tree: TopTreeSig (115 events)
Read data from file data/baggerTraining.pat for variables "_HT" "_Jet1Pt" "_DeltaRJet1Jet2" "_WTransverseMass"
Total number of points read: 1238
Training data filtered by class.
Points in class 0(1):   1123
Points in class 1(1):   115
Parsing File: data/baggerValidation.pat
Found file: /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/CombineSigBkg/Topology.SingleTop.1405006.FDR2.Electron.Validation.root start: 0 end: -1 class: 0 weight: 1
Reading File: /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/CombineSigBkg/Topology.SingleTop.1405006.FDR2.Electron.Validation.root for Tree: TopTreeBkg (1268 events)
Reading File: /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/CombineSigBkg/Topology.SingleTop.1405006.FDR2.Electron.Validation.root for Tree: TopTreeSig (119 events)
Read validation data from file data/baggerValidation.pat for variables "_HT" "_Jet1Pt" "_DeltaRJet1Jet2" "_WTransverseMass"
Total number of points read: 1387
Validation data filtered by class.
Points in class 0(1):   1268
Points in class 1(1):   119
Optimization criterion set to Gini index  -1+p^2+q^2 
Monitoring criterion set to Fraction of correctly classified events 
Per-event loss set to Quadratic loss (y-f(x))^2 
Decision tree initialized with minimal number of events per node 6
Decision tree will resample at most 4 features.
Using a Topdown tree.
Classes for Bagger are set to 0(1) 1(1)
Bagger initialized with classes 0(1) 1(1) with cycles 1100
Validation FOM=0.0970811 at cycle 5
...
Bagger finished training with 1100 classifiers.
Feeder storing point 0 out of 1238
Feeder storing point 1000 out of 1238

Run Training and Validation Separately

Run Training Only

The only difference between running training separately and running training with validation, as shown above is the lack of the command line option -t data/baggerValidation.pat.
/usr/local/bin/bin/SprBaggerDecisionTreeApp  \
  -n 1100 -l 6 -s 33 -g 1 -y '0:1' -d 5\
  -f bagger.spr -o baggerOutputTraining.root \
  data/baggerTraining.pat \

The output file baggerOutputTraining.root is the same as baggerOutput.root produced above.

Run Validation Only

Running validation only is done by
/usr/local/bin/bin/SprBaggerDecisionTreeApp  \
  -n 1100 -l 6 -s 33 -g 1 -y '0:1' -d 5\
  -r bagger.spr -o baggerOutputValidation.root \
  data/baggerValidation.pat \

The option -r bagger.spr resumes running using the info stored in bagger.spr.
  -- PatRyan - 16 Apr 2009 \ No newline at end of file
Revision 2
17 Apr 2009 - PatRyan
Line: 1 to 1
 
META TOPICPARENT name="AnalysisVersion1405006"

SPR for Version 14.05.006

Added:
>
>
Run the boosted decision classifier.
/usr/local/bin/bin/SprBaggerDecisionTreeApp  \
  -n 1100 -l 6 -s 33 -g 1 -y '0:1' \
  -f bagger.spr -t data/comparisonValidation2.pat \
   data/comparisonTraining.pat -d 5 \
The options are as follows
  • -n 1100: number of Bagger training cycles
  • -l 6: minimal number of entries per tree leaf (def=0)
  • -s 33: max number of sampled features (def=0 no sampling)
  • -g 1: per-event loss for (cross-)validation is quadratic loss (y-f(x))^2
  • -y '0:1': list of input classes (I believe 0 corresponds to background and 1 to signal)
  • -f bagger.spr: store trained Bagger to file
  • -t data/baggerValidation.pat: read validation/test data from a file (must be in same format as input data!!!
  • data/baggerTraining.pat: The file used for training
  • -d 5: frequency of print-outs for validation data
 
Added:
>
>
The contents of baggerValidation.pat is
Tree: TopTreeBkg TopTreeSig
TreeClass: 0 1
Leaves: _HT _Jet1Pt _DeltaRJet1Jet2 _WTransverseMass
Weight: 1.
File: /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/CombineSigBkg/Topology.SingleTop.1405006.FDR2.Electron.Validation.root

The contents of baggerTraining.pat is
Tree: TopTreeBkg TopTreeSig
TreeClass: 0 1
Leaves: _HT _Jet1Pt _DeltaRJet1Jet2 _WTransverseMass
Weight: 1.
File: /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/CombineSigBkg/Topology.SingleTop.1405006.FDR2.Electron.Training.NoNeg.root
 

-- PatRyan - 16 Apr 2009 \ No newline at end of file
 
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback