TMVA for Version 14.05.006

Ranking the Classifiers

The effectiveness of a classifier is determined by both its Signal Efficiency and Background Rejection. From a plot of Background Rejection vs. Signal Efficiency, there are two methods by which to rank the classifiers. The first is by the length of a straight line connecting the upper-right point in the plot to the upper-right axis intersection of (1,1) and the second is by the area under the classifier curve. A shorter line or larger area infers a better classifier.

To produce the file used for classifier ranking, was run over the merged root files. Note that these are the merged files containing all events, and not those split into training, validation, and yield samples.

In, half the events were assigned to training and the other half to validation by the line
factory.PrepareTrainingAndTestTree( mycutSig, mycutBkg, "NSigTrain=0:NBkgTrain=0::NSigTest=0:NBkgTest=0:SplitMode=Alternate:NormMode=NumEvents:!V" )

The kNN classifier was not used because it generated the following fatal error message
 <FATAL> KNN            : kNN result list is empty or has wrong size

The following command was used to execute
python \
   -i /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/CombineSigBkg/Topology.SingleTop.1405006.FDR2.Electron.root \
   -t "TopTreeSig TopTreeBkg" \
   -o "TMVAout.ClassifierRanking.1405006.root" \

The program ! produces a plot showing Background Rejection vs. Signal Efficiency and ranks the classifiers. The plot and the rankings are shown below. Both rankings have the methods PDERS, CutsGA, and Fisher as the top 3.


Area Under the Curve
Rank Method Area
1 PDERS 0.668661
2 CutsGA 0.659969
3 Fisher 0.643790
4 LikelihoodPCA 0.535585
5 SVM_Gauss 0.528220
6 BDT 0.493489
7 Likelihood 0.491559
8 BDTD 0.490003
9 MLP 0.434129
10 HMatrix 0.413259
11 RuleFit 0.393389
12 FDA_MT 0.355496

Length of Line to Upper Right Corner
Rank Method Length
1 PDERS 0.475490
2 CutsGA 0.538328
3 Fisher 0.538659
4 BDT 0.609361
5 LikelihoodPCA 0.633015
6 Likelihood 0.633601
7 SVM_Gauss 0.675473
8 MLP 0.708466
9 BDTD 0.740976
10 HMatrix 0.744161
11 FDA_MT 0.783945
12 RuleFit 0.793519

Run TMVAnalysis over Training files was run over the Training sample of the merged root files. The following command was used to execute the python script:
python \
   -i /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/CombineSigBkg/Topology.SingleTop.1405006.FDR2.Electron.Training.NoNeg.root  \
   -t "TopTreeSig TopTreeBkg" \
   -o "TMVAout.1405006.root" \
2 signal events were used for signal validation and 2 background event were used for background validation. This was achieved by the following line in
factory.PrepareTrainingAndTestTree( mycutSig, mycutBkg, "NSigTrain=113:NBkgTrain=1121::NSigTest=2:NBkgTest=2:SplitMode=Alternate:NormMode=NumEvents:!V" )
Using 0 events for NSigTest and NBkgTest causes the program to crash. It runs successfully with 1 event but this leads to a division by 0 in the significance calculation below. Note that NSigTrain + NSigTest and NBkgTrain + NBkgTest must equal or less than the actual number of signal and background events, respectively, in the root file. If a number larger than the actual number of events is used the program dies. If an absurdly large number, such as 10000000, is used, the program uses an equal number of training and testing events.

TMVAnalysis outputs the root file TMVAout.1405006.root and information in the weights directory.

The text output from running the program can be found here: AnalysisTxt1405006

Run TMVApplication over Validation files was run over the Validation samples (with events having negative weights). Signal and background samples were run separately using the following commands.

 python \
   -i /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/merged/Topology.SingleTop.1405006.FDR2.Electron.Signal.Validation.NoNeg.root \
   -o Signal.1405006.root 

 python \
   -i /home/root_files/single_top/TopPhysDPDMaker/14.05.006/EarlyData/merged/Topology.SingleTop.1405006.FDR2.Electron.Background.Validation.NoNeg.root \
   -o Background.1405006.root 

Note that the number of bins in the classifier outputs is 50. This is set by nbin = 50.

Calculate Significance was run in order to calculate the significance. For now, the significance was taken as Signal/sqrt(Background). A more accurate, and complicated, calculation of the significance will be performed in the future. Events were weighted by a factor of 3 to account for the splitting into training, validation, and yield samples. was run as
python -S Signal.1405006.root -B Background.1405006.root -w 3 -o SignificanceOutput.1405006.root

The classifier output distributions are shown below for signal and background.

The significance was calculated for different cuts on the classifier output. This was done in two ways, by cutting as you move right across the values on x-axis and by cutting as you move left across the values on the x-axis. Both are shown below.
significanceBDT.Hist.Right.png significanceBDT.Hist.Left.png

Comparisons with Jenny

The two analyses have complete agreement at this point.

-- PatRyan - 03 Mar 2009
Topic attachments
I Attachment Action Size Date Who Comment
BDT.pngpng BDT.png manage 14 K 05 Mar 2009 - 13:58 UnknownUser  
MaxEfficiency.pngpng MaxEfficiency.png manage 28 K 04 Mar 2009 - 16:21 UnknownUser Background Rejection vs. Signal Efficiency for various classifiers
compare.epseps compare.eps manage 19 K 16 Apr 2009 - 13:42 UnknownUser  
compare.pngpng compare.png manage 22 K 16 Apr 2009 - 13:42 UnknownUser  
significanceBDT.Hist.Left.epseps significanceBDT.Hist.Left.eps manage 6 K 05 Mar 2009 - 13:59 UnknownUser  
significanceBDT.Hist.Left.pngpng significanceBDT.Hist.Left.png manage 14 K 05 Mar 2009 - 14:00 UnknownUser  
significanceBDT.Hist.Right.pngpng significanceBDT.Hist.Right.png manage 15 K 05 Mar 2009 - 14:00 UnknownUser  
Topic revision: r13 - 16 Oct 2009, TomRockwell
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback