Analysis Comparisons for Code Version 13030
Run Over D3PDs using the MSU analysis code
The DPDPs are stored on the CERN cluster in the directory
The following samples were included in the analysis:
Only those files in each sample which ran without crashing the analysis code were used. The bad files are commented out
in the list files.
The output files from the MSU analysis code are stored in the CERN cluster in the directory
The samples in the directory
were processed without an event weight. The samples in the directory
were weighted so that the
number of events corresponds to a luminosity of 100 pb-1.
Merge Signal and Background Files
The 3 signal files were combined into a single signal file called
The 3 background files were combined into a signal background file called
Note that "Electron" is replaced by "Muon" for the muon chain in the file names.
The merging was done using the
routine located in the
in the MSU analysis package and
the merged files are in the directory
The merged files were divided into training
, and yield
samples, which was achieved
by setting the
flag to 1 in the config file. The events were split according to their order in
the merged file. For example, the first event was categorized as training, the second as validation, the third as yield,
the fourth as training, and so on. Functionality to randomly assign variables to the different categories will be added in the future.
Events with negative weights were excluded from the Training sample to retain compatibility with SPR, which cannot handle negative weights.
Validation samples both with and without events with negative weights were made. The files without events having negative weights are signified by the label
Run TMVAnalysis over Training files
For comparison purposes, only the following variables were considered for training:
Jet1Pt =, =DeltaRJet1Jet2
TMVAnalysis.py was run over the Training sample of the merged root files. The following command was used to execute the python script:
python TMVAnalysis.py \
-S /home/root_files/single_top/TopPhysDPDMaker/13.0.30/FDR2/merged/Topology.SingleTop.13030.FDR2.Electron.Signal.Training.NoNeg.root \
-B /home/root_files/single_top/TopPhysDPDMaker/13.0.30/FDR2/merged/Topology.SingleTop.13030.FDR2.Electron.Background.Training.NoNeg.root \
-t "TopTree TopTree" \
-o TMVAout.root \
Other methods will be used in the future. Also, 1 signal event was used for signal validation
and 1 background event was used for background validation. This was achieved by the following line in TMVAnalysis.py:
factory.PrepareTrainingAndTestTree( mycutSig, mycutBkg, "NSigTrain=10000000000:NBkgTrain=100000000000::NSigTest=1:NBkgTest=1:SplitMode=Alternate:NormMode=NumEvents:!V" )
Using 0 or 1 NTest events instead of 2 events caused the program to crash. I'm not sure why using only 1 caused it to crash.
TMVAnalysis outputs the root file
and information in the
The text output from running the program can be found here: AnalysisTxt13030
Run TMVApplication over Validation files
TMVApplication.py was run over the Validation samples (with events having negative weights). Signal and background samples were run separately using the
python TMVApplication.py \
-i /home/root_files/single_top/TopPhysDPDMaker/13.0.30/FDR2/merged/Topology.SingleTop.13030.FDR2.Electron.Signal.Validation.root \
python TMVApplication.py \
-i /home/root_files/single_top/TopPhysDPDMaker/13.0.30/FDR2/merged/Topology.SingleTop.13030.FDR2.Electron.Background.Validation.root \
Note that the number of bins in the classifier outputs is 50. This is set by
nbin = 50
Significance.py was run in order to calculate the significance. For now, the significance was taken as Signal/sqrt(Background). A more accurate, and complicated,
calculation of the significance will be performed in the future. Events were weighted by a factor of 3 to account for the splitting into training, validation, and yield samples.
The classifier output distributions are shown below for signal and background.
The significance was calculated for different cuts on the classifier output. This was done in two ways,
by cutting as you move right across the values on x-axis and by cutting as you move left across the values
on the x-axis. Both are shown below.
Comparisons with Jenny
- 14 Nov 2008