For developers: Add new workflow
The BTV tutorial for the coffea part at notebooks/BTV_commissioning_tutorial-coffea.ipynb and the template to construct new workflow is located at src/BTVNanoCommissioning/workflows/example.py
0. Add new workflow info to workflows/__init__.py
# if the workflow is in a new files, you need to import your workflow.py
from BTVNanoCommissioning.workflows.new_workflow import (
NanoProcessor as new_WORKFLOWProcessor
)
# And then include the processor into the modules with the name of workflow. The name is used when specifying --workflow when running the jobs
workflows["name_workflow"] = new_WORKFLOWProcessor
# IF the workflow is based on the modifier to existing workflow, put the selectionModifier used in the existing workflow
workflows["ctag_ttsemilep_sf"] = partial(
CTAGWcTTValidSFProcessor, selectionModifier="semittM"
)
Notice that if you are working on a WP SFs, please put WP in the name.
1. Add histogram collections to utils/histogramming/histograms
The framework uses hist histograms, which can be easily to converted to ROOT histogram by uproot or to numpy histograms. For a quickstart to hist see this page.
There are a common axes located in utils/histogramming/axes/common.py, which are used by many of the workflows. You can define your own axes collection in the same directory and then add it to utils/histogramming/hist_helpers.py to make it available for the histogrammer. The name and dictionary key of the axis should be consistent with axes defined in the histogram.
An example collection of axes:
axes = {
"flav": hist.axis.IntCategory([0, 1, 4, 5, 6], name="flav", label="Genflavour"),
"syst": hist.axis.StrCategory([], name="syst", growth=True),
"pt": hist.axis.Regular(60, 0, 300, name="pt", label=" $p_{T}$ [GeV]"),
}
Regardless of the axis or histogram collection, the histograms need to be contained in a dict, and each histogram should at least contain as its first axis a syst_axis and a Hist.storage.Weight() as its last axis. The key for the histogram is suggested to use the format $OBJECT_$VAR, with OBJECT being an object defined during the workflow and $VAR a variable for that object, for example:
_hist_dict["mujet_pt"] = Hist.Hist(
syst_axis, flav_axis, dr_axis, Hist.storage.Weight()
) # create cutstomize histogram
After choosing and defining the axis and histogram collections you want to use in your workflow, use the histogrammer function to create the output histograms. Note that there are also common histogram collections for creating common histograms (common.py) and 4-vector histograms for a given obj_list (fourvec.py). An example histogrammer call could be:
output = histogrammer(
jet_fields=events.Jet.fields, # Pass the jet fields to check available jet variables
obj_list=["posl", "negl", "dilep", "jet0"], # Pass the objects to define 4-vector quantities for
hist_collections=["common", "fourvec", "DY"], # Choose the histogram collections to use
include_m=isMu, # Pass a collection specific kwarg
# This call uses the common axis collection
)
2. Selections: Implement selections on events (workflow/)
Create boolean arrays along event axis. Also check whether some common selctions already in utils/selection.py
## HLT- put trigger paths
triggers = [
"Mu23_TrkIsoVVL_Ele12_CaloIdL_TrackIdL_IsoVL_DZ",
"Mu12_TrkIsoVVL_Ele23_CaloIdL_TrackIdL_IsoVL_DZ",
"Mu8_TrkIsoVVL_Ele23_CaloIdL_TrackIdL_IsoVL_DZ",
]
req_trig = HLT_helper(events, triggers)
##### Add some selections
## Muon cuts
# muon twiki: https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideMuonIdRun2
muon_sel = (events.Muon.pt > 15) & (mu_idiso(events, self._campaign)) # applied selection on pT, and `mu_idiso` is predefined selection in `selection.py` which refers to cut-based tight muon ID+Iso
event_mu = events.Muon[muon_sel] # Pruned the muon collections with the selection
req_muon = ak.num(event_mu.pt) == 1 # Check each event has exact one muon
.... # other selections
## Apply all selections
event_level = (
req_trig & req_lumi & req_jet & req_muon & req_ele & req_leadlep_pt
)
In case you are modifying an existing workflow, you need to add to __init__ and in the selections create a selectionModifier:
# in init
self.selMod = selectionModifier
# In selection
if self.selMod=="WcM":
event_level = req_trig & req_lumi & req_jet & req_muon & req_ele & req_leadlep_pt& req_Wc
3. Selected objects: Pruned objects with reduced event_level
Store the selected objects to event-based arrays. If using the common histogram writer, the selected object must contain Sel, and the muon-enriched jet and soft muon are called MuonJet and SoftMu, respectively. You can then use the defined objects to derive new variables, ie.
# Keep the structure of events and pruned the object size
pruned_ev = events[event_level]
pruned_ev["SelJet"] = event_jet[event_level][:, 0]
pruned_ev["SelMuon"] = event_mu[event_level][:, 0]
pruned_ev["mujet_ptratio"] = event_mu[event_level].pt / pruned_ev.SelJet.pt # notice that the cross-object need to be created specificaly
pruned_ev["mujet_dr"] = event_mu[event_level].delta_r(pruned_ev.SelJet)
The pruned information is then stored into the defined histograms and the output arrays, and used to evaluate the event weights. In case you have a custom object for corrections, or a new common variable that you need to add, please go to dedicated section.
See details below for the usage of pruned_ev
Output section
####################
# Output #
####################
# Configure SFs - read pruned objects from the pruned_ev and apply SFs and call the systematics
weights = weight_manager(pruned_ev, self.SF_map, self.isSyst)
# Configure systematics shifts
if shift_name is None:
systematics = ["nominal"] + list(weights.variations) # nominal + weight variation systematics
else:
systematics = [shift_name] # JES/JER systematics
# Fill the weight to output arrys
if not isRealData:
pruned_ev["weight"] = weights.weight()
for ind_wei in weights.weightStatistics.keys():
pruned_ev[f"{ind_wei}_weight"] = weights.partial_weight(
include=[ind_wei]
)
# Configure histograms- fill the histograms with pruned objects
if not self.noHist:
output = histo_writter(
pruned_ev, output, weights, systematics, self.isSyst, self.SF_map
)
# Output arrays - store the pruned objects in the output arrays
if self.isArray:
array_writer(self, pruned_ev, events, weights, systematics,dataset, isRealData)
4. Setup CI pipeline .github/workflow
The actions are checking the changes would break the framework. The actions are collected in .github/workflow
You can simply include a workflow by adding the entries with name
- name: semileptonic + c ttbar workflows with correctionlib
run: |
string=$(git log -1 --pretty=format:'%s')
if [[ $string == *"ci:skip array"* ]]; then
opts=$(echo "$opts" | sed 's/--isArray //g')
fi
if [[ $string == *"ci:skip syst"* ]]; then
opts=$(echo "$opts" | sed 's/--isSyst all//g')
elif [[ $string == *"ci:JEC_reduced"* ]]; then
opts=$(echo "$opts" | sed 's/--isSyst all/--isSyst JEC_reduced/g')
elif [[ $string == *"ci:weight_only"* ]]; then
opts=$(echo "$opts" | sed 's/--isSyst all/--isSyst weight_only/g')
fi
python runner.py --workflow c_ttsemilep_sf --json metadata/test_bta_run3.json --limit 1 --executor iterative --campaign Summer23 --year 2023 $opts
Special commit head messages could run different commands in actions (add the flag in front of your commit) The default configureation is doing
python runner.py --workflow emctag_ttdilep_sf --json metadata/test_bta_run3.json --limit 1 --executor iterative --campaign Summer23 --isArray --isSyst all
[skip ci]: not running ci at all in the commit messageci:skip array: remove--isArrayoptionci:skip syst: remove--isSyst alloptionci:JEC_reduced: change systematic option to split JEC uncertainty sources--isSyst JEC_reducedci:weight_only: change systematic option to weight only variations--isSyst weight_only
Set CI in your github account
Since the CI pipelines involve reading files via xrootd and access gitlab.cern.ch, you need to save some secrets in your forked directory.
Yout can find the secret configuration in the direcotry : Settings>>Secrets>>Actions, and create the following secrets:
GIT_CERN_SSH_PRIVATE:Create a ssh key pair with
ssh-keygen -t rsa -b 4096(do not overwrite with your local one), add the public key to your CERN gitlab accountCopy the private key to the entry
GRID_PASSWORD: Add your grid password to the entry.GRID_USERCERT&GRID_USERKEY: Encrypt your grid user certificationbase64 -i ~/.globus/userkey.pem | awk NF=NF RS= OFS=andbase64 -i ~/.globus/usercert.pem | awk NF=NF RS= OFS=and copy the output to the entry.
5. Refine used MC as input sample.py
The sample.py collects the samples (dataset name) used in the workflow. This collections are use to create the dataset json file.
data: data sample (MuonEG, Muon0…)MC: main MC used for the workflowminor_MC: minor MC samples use for background eventssyst_MC: systematic MC samples (TTbar sample mass, Hdamp … variations)
Here’s the example for BTA_ttbar
"BTA_ttbar": {
"data": ["MuonEG"],
"MC": ["TTto2L2Nu_TuneCP5_13p6TeV_powheg-pythia8"],
"minor_MC": [
"TTtoLNu2Q_TuneCP5_13p6TeV_powheg-pythia8",
"TWminusto2L2Nu_TuneCP5_13p6TeV_powheg-pythia8",
"TbarWplusto2L2Nu_TuneCP5_13p6TeV_powheg-pythia8",
"TWminustoLNu2Q_TuneCP5_13p6TeV_powheg-pythia8",
"TbarWplustoLNu2Q_TuneCP5_13p6TeV_powheg-pythia8",
"TbarBQ_t-channel_4FS_TuneCP5_13p6TeV_powheg-madspin-pythia8",
"TBbarQ_t-channel_4FS_TuneCP5_13p6TeV_powheg-madspin-pythia8",
"WWto2L2Nu_TuneCP5_13p6TeV_powheg-pythia8",
"ZZto2L2Q_TuneCP5_13p6TeV_powheg-pythia8",
"WZto3LNu_TuneCP5_13p6TeV_powheg-pythia8",
"WZtoLNu2Q_TuneCP5_13p6TeV_powheg-pythia8",
],
"syst_MC": [
"TTto2L2Nu_MT-171p5_TuneCP5_13p6TeV_powheg-pythia8",
"TTto2L2Nu_MT-175p5_TuneCP5_13p6TeV_powheg-pythia8",
"TTto2L2Nu_Hdamp-158_TuneCP5_13p6TeV_powheg-pythia8",
"TTto2L2Nu_Hdamp-418_TuneCP5_13p6TeV_powheg-pythia8",
"TTto2L2Nu_TuneCP5Down_13p6TeV_powheg-pythia8",
"TTto2L2Nu_TuneCP5Up_13p6TeV_powheg-pythia8",
],
},
Optional changes
Add workflow to scripts/suball.py
The suball.py summarize the steps to obtain the result.
In case your task requires to run several workflows, you can wrapped them as dict of the workflows
scheme = {
# scale factor workflows
"SF": ["BTA_ttbar", "BTA_addPFMuons"],
# Use for prompt data MC checks for analysis
"Validation": ["ttdilep_sf", "ctag_Wc_sf"],
# commissioning workflows
"default_comissioning": [
"ttdilep_sf",
"ttsemilep_sf",
"ctag_Wc_sf",
"ctag_DY_sf",
"QCD_sf",
"QCD_mu_sf"
],
}
Add new common variables in helper/definition.py
In the definition.py we collect the axis definition, name and label of tagger scores/input variables
disc_list=[....] # tagger score
definitions_dict = {
"DeepCSV_jetNSelectedTracks": # name used in tree
{
"displayname": "Jet N Selected Tracks", # axis name
"manual_ranges": [0.0, 25],
"ylabel_text": "Jets",
"format_unit": "2f",
"format_unit_digits": 2,
"bins": 25,
"inputVar_units": None,
},
...
}
Additional corrections and uncertainty variations not in the framework
The corrections are collected in utils/correction.py. There are two types of the variation: weight varations, i.e. SFs, ueps weight, or object energy scale/resolution variations: JES/JER. Here’s an example to add new corrections
Add new info
utils/AK4_parameter.py
"JPCalib": {
"Run2023D-22Sep2023_v1": "calibeHistoWrite_Data2023D-22Sep2023_v1.root",
"Run2023D-22Sep2023_v2": "calibeHistoWrite_Data2023D-22Sep2023_v2.root",
"MC": "calibeHistoWrite_MC2023_Summer23BPix.root",
},
Add new collections to
load_SFinutils/correction.pyDepends on corrections file type, read differently from its definition. See details in: correctionlib official, or other custom way used in coffea. This load all the correction information asevaluatorcan be use to extract weight information later
for SF in config[campaign].keys():
if SF == "DC":
continue
## pileup weight
if SF == "LUM":
## Check whether files in jsonpog-integration exist
if os.path.exists(
f"/cvmfs/cms.cern.ch/rsync/cms-nanoAOD/jsonpog-integration/POG/LUM/{year}_{campaign}"
):
correct_map["LUM"] = correctionlib.CorrectionSet.from_file(
f"/cvmfs/cms.cern.ch/rsync/cms-nanoAOD/jsonpog-integration/POG/LUM/{year}_{campaign}/puWeights.json.gz"
)
## Otherwise custom files
else:
_pu_path = f"BTVNanoCommissioning.data.PU.{campaign}"
with importlib.resources.path(
_pu_path, config[campaign]["LUM"]
) as filename:
if str(filename).endswith(".pkl.gz"):
with gzip.open(filename) as fin:
correct_map["LUM"] = cloudpickle.load(fin)[
"2017_pileupweight"
]
elif str(filename).endswith(".json.gz"):
correct_map["LUM"] = correctionlib.CorrectionSet.from_file(
str(filename)
)
elif str(filename).endswith(".histo.root"):
ext = extractor()
ext.add_weight_sets([f"* * {filename}"])
ext.finalize()
correct_map["LUM"] = ext.make_evaluator()
3.1 Add weight based correction
In the utils/correction create the reader to get the weight
Create your function to readout the weight from the evaluator stored in the correction_map, the idea is to add the weight/systematic information to the event and return to the workflow
def btagSFs(jet, correct_map, weights, SFtype, syst=False):
.....
if i == 0 and syst == False:
weights.add(SFtype, sfs)
if syst == True:
weights.add_multivariation(
SFtype,
sfs,
systlist,
np.array(list(sfs_up_all.values())),
np.array(list(sfs_down_all.values())),
)
# in case you only have the up/down variation
weights.add(
SFtype,# name of the weight
sfs,# nominal SFs
sf_unc_up,#up varition
sf_unc_down, #down varition
)
In case it’s a common correction, add to the weight_manager in utils/correction otherwise directly to the workflow (weight based)
def weight_manager(pruned_ev, SF_map, isSyst):
weights = Weights(len(pruned_ev), storeIndividual=True)
...
btagSFs(pruned_ev.SelJet, SF_map, weights, "DeepJetC", syst_wei)
btagSFs(pruned_ev.SelJet, SF_map, weights, "DeepJetB", syst_wei)
btagSFs(pruned_ev.SelJet, SF_map, weights, "DeepCSVB", syst_wei)
btagSFs(pruned_ev.SelJet, SF_map, weights, "DeepCSVC", syst_wei)
return weights
3.2 Add object variations
For the object scale / resolution variation we shift object energy/positions as a list of shifts to the original object.
The shifts is a list of shifted object after the corrctions are applied to the objects
# JES uncertainty
if "JES_Total" in jets.fields:
shifts += [
(
{
"Jet": jets.JES_Total.up, # change the objects to JES up variation
"MET": met.JES_Total.up,
},
"JESUp", # the uncertainty variation name
),
(
{
"Jet": jets.JES_Total.down,
"MET": met.JES_Total.down,
},
"JESDown",
),
]
In case the shifts are in common, put to common_shifts:
if "JME" in self.SF_map.keys():
shifts = JME_shifts(
shifts, self.SF_map, events, self._year, self._campaign, isRealData, self.isSyst,
)
else:
## Use PuppiMET if available (NanoAODv15), otherwise fall back to PFMET
if hasattr(events, "PuppiMET"):
shifts = [
(
{
"Jet": events.Jet,
"MET": events.PuppiMET,
},
None,
)
]
else:
shifts = [
(
{
"Jet": events.Jet,
"MET": events.MET,
},
None,
)
]
otherwise in your workflow process(self, events) add new shifts
def process(self, events):
events = missing_branch(events)
shifts = common_shifts(self, events)
shifts+=[({obj:variation},name)]