For user : Running the workflows

structure

All in one script : scripts/suball.py

All the steps are summarized in the suball.py scripts for the existing workflows. You can simply just run

python scripts/suball.py --scheme ${SCHEME_FOR_STUDY} --campaign ${CAMPAIGN_FOR_SF} --year ${YEAR}  --DAS_campaign "$DATA_CAMPAIGN_RGX,$MC_CAMPAIGN_RGX"
#Example with 2023 Summer23 campaign
python scripts/suball.py --scheme default_comissioning --campaign Summer23 --year 2023  --DAS_campaign "*Run2023D*Sep2023*,*Run3Summer23BPixNanoAODv12-130X*" 
#Example with 2024 Summer24 campaign in NanoAODv15
python scripts/suball.py --scheme Validation --campaign Summer24 --year 2024  --DAS_campaign "*Run2024*BTV*,*Summer24NanoAODv15-BTV*" 

This wrap up the steps mentioned above as a streamline to obtained the required info

The only missing item need to do manually is to change the updated correction in AK4_parameter.py as written here Each steps are also explained in detailed below, this can be obtain by

0. Make the dataset json files

Use fetch.py in folder scripts/ to obtain your samples json files for the predefined workflow with the refined MC. For more flexible usage please find details.

The fetch script reads the predefine data & MC samples dataset name and output the json file to metadata/$CAMPAIGN/, but to find the exact dataset for BTV studies, we usually need to specify the DAS_campaign.

python scripts/fetch.py -c {campaign} --year {args.year}  --from_workflow {wf} --DAS_campaign {DAS_campaign} {--overwrite} {--executor futures -w 12}
# campaign :  the campaign name like Summer23,Winter22
# year : data taking years 2022/2023...
# wf: workflow name like ttdilep_sf, ctag_Wc_sf
# DAS_campaign: Input the campaign name for DAS to search appropriate campaigns, use in dataset construction , please do `campaign1,campaign2,campaign3`. Also supports "auto" (hard-coded!) if campaign and year are specified.
# overwrite (bool): recreate the exist json file 
# Use `--executor futures -j 12 -w 12` to parallelize with, e.g., 12 cores

Caution

Do not make the file list greater than 4k files to avoid scaleout issues in various site (file open limit).

Tip

If gfal-ls does not work on your machine, reset the gfal-python with GFAL_PYTHONBIN=/usr/bin/python3.

Quick exercise — fetch example (Summer24)

Try running the following command:

python scripts/fetch.py -c Summer24 --year 2024 -wf ttdilep_sf --DAS_campaign "*Run2024*-BTV*,*Summer24NanoAODv15-BTV*" --skipvalidation --overwrite 

If you need more information, please run with the --verbose flag. Can you spot the difference if you run with --executor futures? Also try out a different workflow, for example, ctag_DY_sf See some samples missing? Since 2024 DY samples are lepton flavor split, so you would have to replace the present samples in the src/BTVNanoCommissioning/utils/sample.py by the current: "DYto2E-4Jets_Bin-MLL-50_TuneCP5_13p6TeV_madgraphMLM-pythia8",               "DYto2Mu-4Jets_Bin-MLL-50_TuneCP5_13p6TeV_madgraphMLM-pythia8". Please, add them respectively to the src/BTVNanoCommissioning/helpers/xsection.py

1. Correction files configurations & add new correction files (Optional)

If the correction files are not supported yet by jsonpog-integration, you can still try with custom input data.

All the lumiMask, correction files (SFs, pileup weight), and JEC, JER files are under BTVNanoCommissioning/src/data/ following the substructure ${type}/${year}_${campaign}/${files}(except DC and Prescales).

Type

File type

Comments

DC

.json

Masked good lumi-section used for physics analysis

Prescales

.json.

HLT paths for prescaled triggers

PU

.pkl.gz or .histo.root

Pileup reweight files, matched MC to data

MUO

.histo.root

Muon ID/Iso/Reco/Trigger SFs

EGM

.histo.root

Electron ID/Iso/Reco/Trigger SFs

BTV

.csv or .root

b-tagger, c-tagger SFs

JME

.txt

JER, JEC files

JPCalib

.root

Jet probablity calibration, used in LTSV methods

Create a dict entry under correction_config with dedicated campaigns in BTVNanoCommissioning/src/utils/AK4_parameters.py.

The official correction files collected in jsonpog-integration is updated by POG, except lumiMask and JME still updated by by the BTVNanoCommissioning framework user/developer. For centrally maintained correction files, no input files have to be defined anymore in the correction_config. The example to implemented new corrections from POG can be found in git, and the contents of the correction files are in the summary.

Example of a Run 2 corrections dictionary:

"2017_UL": {
      # Same with custom config
      "DC": "Cert_294927-306462_13TeV_UL2017_Collisions17_MuonJSON.json",

      "JME": {
          "MC": "Summer19UL17_V5_MC",
        "Run2017F": "Summer19UL17_RunF_V5_DATA",
      },
      ### Alternatively, take the txt files in  https://github.com/cms-jet/JECDatabase/tree/master/textFiles
      "JME": {
                  # specified the name of JEC
                  "name": "V1_AK4PFPuppi",
                  # dictionary of jec text files
                  "MC": [
                      "Summer23Prompt23_V1_MC_L1FastJet_AK4PFPuppi",
                      "Summer23Prompt23_V1_MC_L2Relative_AK4PFPuppi",
                      "Summer23Prompt23_V1_MC_L2Residual_AK4PFPuppi",
                      "Summer23Prompt23_V1_MC_L3Absolute_AK4PFPuppi",
                      "Summer23Prompt23_V1_MC_UncertaintySources_AK4PFPuppi",
                      "Summer23Prompt23_V1_MC_Uncertainty_AK4PFPuppi",
                      "Summer23Prompt23_JRV1_MC_SF_AK4PFPuppi",
                      "Summer23Prompt23_JRV1_MC_PtResolution_AK4PFPuppi",
                  ],
                  "dataCv123": [
                      "Summer23Prompt23_RunCv123_V1_DATA_L1FastJet_AK4PFPuppi",
                      "Summer23Prompt23_RunCv123_V1_DATA_L2Relative_AK4PFPuppi",
                      "Summer23Prompt23_RunCv123_V1_DATA_L3Absolute_AK4PFPuppi",
                      "Summer23Prompt23_RunCv123_V1_DATA_L2L3Residual_AK4PFPuppi",
                  ],
                  "dataCv4": [
                      "Summer23Prompt23_RunCv4_V1_DATA_L1FastJet_AK4PFPuppi",
                      "Summer23Prompt23_RunCv4_V1_DATA_L2Relative_AK4PFPuppi",
                      "Summer23Prompt23_RunCv4_V1_DATA_L3Absolute_AK4PFPuppi",
                      "Summer23Prompt23_RunCv4_V1_DATA_L2L3Residual_AK4PFPuppi",
                  ],
              },
      ###
      # no config need to be specify for PU weights
      "LUM": None,
      # Alternatively, take root file as input
      "LUM": "puwei_Summer23.histo.root",
      # Btag SFs - specify $TAGGER : $TYPE-> find [$TAGGER_$TYPE] in json file
      "BTV": {"deepCSV": "shape", "deepJet": "shape"},
      "roccor": None,
      # JMAR, IDs from JME- Following the scheme: "${SF_name}": "${WP}"
      "JMAR": {"PUJetID_eff": "L"},
      "EGM": {
      # Electron SF - Following the scheme: "${SF_name} ${year}": "${WP}"
      # https://github.com/cms-egamma/cms-egamma-docs/blob/master/docs/EgammaSFJSON.md
          "ele_ID 2017": "wp90iso",
          "ele_Reco 2017": "RecoAbove20",
      },
      "MUO":{
      # Muon SF - Following the scheme: "${SF_name} ${year}": "${WP}"

          "mu_Reco 2017_UL": "NUM_TrackerMuons_DEN_genTracks",
          "mu_HLT 2017_UL": "NUM_IsoMu27_DEN_CutBasedIdTight_and_PFIsoTight",
          "mu_ID 2017_UL": "NUM_TightID_DEN_TrackerMuons",
          "mu_Iso 2017_UL": "NUM_TightRelIso_DEN_TightIDandIPCut",
      },
      # use for BTA production, jet probablity
    "JPCalib": {
        "Run2022E": "calibeHistoWrite_Data2022F_NANO130X_v1.root",
        "Run2022F": "calibeHistoWrite_Data2022F_NANO130X_v1.root",
        "Run2022G": "calibeHistoWrite_Data2022G_NANO130X_v1.root",
        "MC": "calibeHistoWrite_MC2022EE_NANO130X_v1.root",
    },
  },

Quick exercise — corrections example (Summer24)

Try leaving out lepton scale factors and JME corrections here:

"Summer24": {
        "DC": "Cert_Collisions2024_378981_386951_Golden.json",
        "LUM": "PU_weights_Summer24.histo.root",
        "JME": {                ###<--- Try running without it
            # TODO: JER are a placeholder for now (July 2025)
            "MC": "Summer24Prompt24_V1 Summer23BPixPrompt23_RunD_JRV1",
            "Run2024C": "Summer24Prompt24_V1",
            "Run2024D": "Summer24Prompt24_V1",
            "Run2024E": "Summer24Prompt24_V1",
            "Run2024F": "Summer24Prompt24_V1",
            "Run2024G": "Summer24Prompt24_V1",
            "Run2024H": "Summer24Prompt24_V1",
            "Run2024I": "Summer24Prompt24_V1",
        },
        "jetveto": {"Summer24Prompt24_RunBCDEFGHI_V1": "jetvetomap"},
        "MUO": {                  ###<--- Try running without it
            "mu_ID": "NUM_TightID_DEN_TrackerMuons",
            "mu_Iso": "NUM_TightPFIso_DEN_TightID",
        },
        "EGM":{
            "ele_Reco 2024 Electron-ID-SF": "",
            "ele_ID 2024 Electron-ID-SF": "wp80iso",
        },
        "muonSS": "",
        "electronSS": ["EGMScale_Compound_Ele_2024", "EGMSmearAndSyst_ElePTsplit_2024"],
    }

Can you spot the difference in distributions?

2. Run the workflow to get coffea files

The runner.py handles the options to select the workflow with dedicated configuration for each campaign. The miniumum required info is

python runner.py --wf {wf} --json metadata/{args.campaign}/{types}_{args.campaign}_{args.year}_{wf}.json {overwrite} --campaign {args.campaign} --year {args.year} 

Tip

  • In case just to test your program, you can limit only one file with one chunk using iterative executor to avoid overwriting error message by --max 1 --limit 1 --executor iterative

  • In case you only want to run particular sample in your json --only $dataset_name, i.e. --only TT* or --only MuonEG_Run2023A

  • Change the numbers of scale job by -s $NJOB

  • Store the arrays by setting the flag --isArray

  • Modifying chunk size in case the jobs is to big --chunk $N_EVENTS_PER_CHUNK

  • Sometimes the global redirector is insufficient, you can increase the numbers of retries (only in parsl/dask) --retries 30, or skip the files --skipbadfiles and later reprocess the missing info by create the json with skipped files. Methods to create the json files discussed in the next part.

Other runner options

### ====> REQUIRED <=======
# --wf {validation,ttdilep_sf,ttsemilep_sf,c_ttsemilep_sf,emctag_ttdilep_sf,ctag_ttdilep_sf,ectag_ttdilep_sf,ctag_ttsemilep_sf,ectag_ttsemilep_sf,QCD_sf,QCD_smu_sf,ctag_Wc_sf,ectag_Wc_sf,ctag_DY_sf,ectag_DY_sf,BTA,BTA_addPFMuons,BTA_addAllTracks,BTA_ttbar}, --workflow {validation,ttdilep_sf,ttsemilep_sf,c_ttsemilep_sf,emctag_ttdilep_sf,ctag_ttdilep_sf,ectag_ttdilep_sf,ctag_ttsemilep_sf,ectag_ttsemilep_sf,QCD_sf,QCD_smu_sf,ctag_Wc_sf,ectag_Wc_sf,ctag_DY_sf,ectag_DY_sf,BTA,BTA_addPFMuons,BTA_addAllTracks,BTA_ttbar}
#                         Which processor to run
#   -o OUTPUT, --output OUTPUT
#                         Output histogram filename (default: hists.coffea)
#   --json SAMPLEJSON     JSON file containing dataset and file locations (default: dummy_samples.json)
#   --year YEAR           Year
#   --campaign CAMPAIGN   Dataset campaign, change the corresponding correction files

#=======Optional======
# ==> configurations for storing info
#   --isSyst {False,all,weight_only,JEC_full,JEC_reduced,JEC_reduced_JER_split,JEC_total,JP_MC}
#                         Run with systematics (default: False)
#   --isArray             Output root files
#   --noHist              Not output coffea histogram
#   --overwrite           Overwrite existing files
# ==> scale out options
#   --executor {iterative,futures,parsl/slurm,parsl/condor,parsl/condor/naf_lite,dask/condor,dask/condor/brux,dask/slurm,dask/lpc,dask/lxplus,dask/casa,condor_standalone}
#                         The type of executor to use (default: futures). Other options can be implemented. For example see https://parsl.readthedocs.io/en/stable/userguide/configuring.html-
#                         `parsl/slurm` - tested at DESY/Maxwell- `parsl/condor` - tested at DESY, RWTH- `parsl/condor/naf_lite` - tested at DESY- `dask/condor/brux` - tested at BRUX (Brown U)-
#                         `dask/slurm` - tested at DESY/Maxwell- `dask/condor` - tested at DESY, RWTH- `dask/lpc` - custom lpc/condor setup (due to write access restrictions)- `dask/lxplus` - custom
#                         lxplus/condor setup (due to port restrictions)
#   -j WORKERS, --workers WORKERS
#                         Number of workers (cores/threads) to use for multi-worker executors (e.g. futures or condor) (default: 3)
#   -s SCALEOUT, --scaleout SCALEOUT
#                         Number of nodes to scale out to if using slurm/condor. Total number of concurrent threads is ``workers x scaleout`` (default: 6)
#   --memory MEMORY       Memory used in jobs default ``(default: 4.0)
#   --disk DISK           Disk used in jobs default ``(default: 4)
#   --voms VOMS           Path to voms proxy, made accessible to worker nodes. By default a copy will be made to $HOME.
#   --chunk N             Number of events per process chunk
#   --retries N           Number of retries for coffea processor
#   --fsize FSIZE         (Specific for dask/lxplus file splitting, default: 50) Numbers of files processed per dask-worker
#   --index INDEX         (Specific for dask/lxplus file splitting, default: 0,0) Format: $dict_index_start,$file_index_start,$dict_index_stop,$file_index_stop. Stop indices are optional. $dict_index
#                         refers to the index, splitted $dict_index and $file_index with ','$dict_index refers to the sample dictionary of the samples json file. $file_index refers to the N-th batch
#                         of files per dask-worker, with its size being defined by the option --index. The job will start (stop) submission from (with) the corresponding indices.
# ==> debug option
#   --validate            Do not process, just check all files are accessible
#   --skipbadfiles        Skip bad files.
#   --only ONLY           Only process specific dataset or file
#   --limit N             Limit to the first N files of each dataset in sample JSON
#   --max N               Max number of chunks to run in total

Quick exercise — runner example (Summer24)

Try running the following command:

python runner.py --wf ctag_DY_sf --json metadata/Summer24/data_Summer24_2024_ctag_DY_sf.json --overwrite --campaign Summer24 --year 2024 --outputdir Commissioning_tutorial/ 

If you make a test run to see if the code works properly, please add --limit 1 as a flag. Otherwise, try applying scale out on some cluster.

3. Dump processed information to obtain luminoisty and processed files

After obtaining the .coffea file, we can check the processed files and obtain the luminosity in the processed files.

To get the run & luminosity information for the processed events from the .coffea output files use the scripts/dump_processed.py script. This script helps you to dump the processed luminosity into a json file that can be used by the brilcalc tool to calculate the output luminosity. A list of failed lumi sections can then be obtained by comparing the original json input to the one from the .coffea files. We will see the luminosity info in /pb and the files skipped by the runner flag --skipbadfiles as new json ready for resubmission.

python scripts/dump_processed.py -t all -c INPUT_COFFEA --json ORIGINAL_JSON_INPUT -n {args.campaign}_{args.year}_{wf}
#   -t {all,lumi,failed}, --type {all,lumi,failed}
#                         Choose the function for dump luminosity(`lumi`)/failed files(`failed`) into json
#   -c COFFEA, --coffea COFFEA
#                         Processed coffea files, splitted by ,. Wildcard option * available as well.
#   -n FNAME, --fname FNAME
#                         Output name of jsons(with _lumi/_dataset)
#   -j JSONS, --jsons JSONS
#                         Original input json files, splitted by ,. Wildcard option * available as well.
Quick exercise — postprocessing example (Summer24)

Try running the following command:

python scripts/dump_processed.py -c Commissioning_tutorial/hists_ctag_DY_sf_data_Summer24_2024_ctag_DY_sf/hists_ctag_DY_sf_data_Summer24_2024_ctag_DY_sf.coffea -n ctag_DY_sf_2024 -t lumi 

If you encounter any problems when running the script, make sure you activated BRIL environment properly!

4. Obtain data/MC plots

We can obtain data/MC plots from coffea via the scripts/plotdataMC.py plotting script. For other possible plotting scripts see plotting scripts.

You can specify -v all to plot all the variables in the .coffea file, or use wildcard options, e.g. -v "*DeepJet*" for the input variables containing DeepJet.

new:

non-uniform rebinning is possible, specify the bins with list of edges --autorebin 50,80,81,82,83,100.5.

python scripts/plotdataMC.py -i a.coffea,b.coffea --lumi 41500 -p ttdilep_sf -v z_mass,z_pt  
python scripts/plotdataMC.py -i "test*.coffea" --lumi 41500 -p ttdilep_sf -v z_mass,z_pt # with wildcard option need ""
Options:

  -h, --help            show this help message and exit
  --lumi LUMI           luminosity in /pb
  --com COM             sqrt(s) in TeV
  -p {ttdilep_sf,ttsemilep_sf,ctag_Wc_sf,ctag_DY_sf,ctag_ttsemilep_sf,ctag_ttdilep_sf}, --phase {dilep_sf,ttsemilep_sf,ctag_Wc_sf,ctag_DY_sf,ctag_ttsemilep_sf,ctag_ttdilep_sf}
                        which phase space
  --log LOG             log on y axis
  --norm NORM           Use for reshape SF, scale to same yield as no SFs case
  -v VARIABLE, --variable VARIABLE
                        variables to plot, splitted by ,. Wildcard option * available as well. Specifying `all` will run through all variables.
  --SF                  make w/, w/o SF comparisons
  --ext EXT             prefix name
  -i INPUT, --input INPUT
                        input coffea files (str), splitted different files with ','. Wildcard option * available as well.
   --autorebin AUTOREBIN
                        Rebin the plotting variables, input `int` or `list`. int: merge N bins. list of number: rebin edges(non-uniform bin is possible)
   --xlabel XLABEL      rename the label for x-axis
   --ylabel YLABEL      rename the label for y-axis
   --splitOSSS SPLITOSSS 
                        Only for W+c phase space, split opposite sign(1) and same sign events(-1), if not specified, the combined OS-SS phase space is used
   --xrange XRANGE      custom x-range, --xrange xmin,xmax
   --flow FLOW 
                        str, optional {None, 'show', 'sum'} Whether plot the under/overflow bin. If 'show', add additional under/overflow bin. If 'sum', add the under/overflow bin content to first/last bin.
   --split {flavor,sample,sample_flav}
                        Decomposition of MC samples. Default is split to jet flavor(udsg, pu, c, b), possible to split by group of MC
                        samples. Combination of jetflavor+ sample split is also possible 

Quick exercise — plotting example (Summer24)

Try running the following command:

python scripts/plotdataMC.py -i Commissioning_tutorial/hists_ctag_DY_sf_data_Summer24_2024_ctag_DY_sf/hists_ctag_DY_sf_data_Summer24_2024_ctag_DY_sf.coffea,Commissioning_tutorial/hists_ctag_DY_sf_MC_Summer24_2024_ctag_DY_sf/hists_ctag_DY_sf_MC_Summer24_2024_ctag_DY_sf.coffea --lumi YOUR_LUMI -p ctag_DY_sf -v jet0_pt

Run -v all if you want a full collection of variables. Try exploring the way to present your plots by splitting them by sample or making them log-scaled

Reading coffea hist

Quick tutorial to go through coffea files. Example coffea files can be found in testfile/.

Structure of the file

The coffea file contains histograms wrapped in a dictionary with $dataset:{$histname:hist}, where the hist is a hist histogram that allows multidimensional binning with different datatypes, for example:

{'WW_TuneCP5_13p6TeV-pythia8':{
'btagDeepFlavB_b_0': Hist(
  IntCategory([0, 1, 4, 5, 6], name='flav', label='Genflavour'),
  IntCategory([1, -1], name='osss', label='OS(+)/SS(-)'),
  StrCategory(['noSF'], growth=True, name='syst'),
  Regular(30, -0.2, 1, name='discr', label='btagDeepFlavB_b'),
  storage=Weight()) # Sum: WeightedSum(value=140, variance=140), 'btagDeepFlavB_bb_0': Hist(
  IntCategory([0, 1, 4, 5, 6], name='flav', label='Genflavour'),
  IntCategory([1, -1], name='osss', label='OS(+)/SS(-)'),
  StrCategory(['noSF'], growth=True, name='syst'),
  Regular(30, -0.2, 1, name='discr', label='btagDeepFlavB_bb'),
  storage=Weight()) # Sum: WeightedSum(value=140, variance=140), 'btagDeepFlavB_lepb_0': Hist(
  IntCategory([0, 1, 4, 5, 6], name='flav', label='Genflavour'),
  IntCategory([1, -1], name='osss', label='OS(+)/SS(-)'),
  StrCategory(['noSF'], growth=True, name='syst'),
  Regular(30, -0.2, 1, name='discr', label='btagDeepFlavB_lepb'),
  storage=Weight()) # Sum: WeightedSum(value=140, variance=140)}}

The processed file and lumi/run info are also stored for each dataset in the file. This information is used in dump_processed info.

The stored histograms are multidimensional histograms, with the axes such as:

Hist(
  IntCategory([0, 1, 4, 5, 6], name='flav', label='Genflavour'),# different genflavor, 0 for light, 1 for PU, 2 for c, 3 for b. Always 0 for data.
  IntCategory([1, -1], name='osss', label='OS(+)/SS(-)'),# opposite sign or same sign, only appears in W+c workflow
  StrCategory(['noSF','PUUp','PUDown'], growth=True, name='syst'),# systematics variations,
  Regular(30, -0.2, 1, name='discr', label='btagDeepFlavB_lepb'),# discriminator distribution, the last axis is always the variable
  storage=Weight()) # Sum: WeightedSum(value=140, variance=140)# Value is sum of the entries, Variances is sum of the variances.

Read coffea files and explore the histogram

from coffea.util import load
# open coffea file
output=load("hists_ctag_Wc_sf_VV.coffea"
# get the histogram and read the info
hist=output['WW_TuneCP5_13p6TeV-pythia8']['btagDeepFlavB_lepb_0']
# addition for two histogram is possible if the axis is the same
histvv=output['WW_TuneCP5_13p6TeV-pythia8']['btagDeepFlavB_lepb_0']+
       output['WZ_TuneCP5_13p6TeV-pythia8']['btagDeepFlavB_lepb_0']+
       output['ZZ_TuneCP5_13p6TeV-pythia8']['btagDeepFlavB_lepb_0']
# To get 1D histogram, we need to reduce the dimention
# we can specify the axis we want to read, e.g. read charm jet, opposite sign events with noSF
axis={'flav':3,'os':0,'syst':'noSF'}
hist1d=hist[axis] #--> this is the 1D histogram Hist
# you can also sum over the axis, e.g. here shows no jet flavor split and sum os+ss
axis={'flav':sum,'os':sum,'syst':'noSF'}
# rebin the axis is also possible, rebin the discrimnator by merged two bins into one
axis={'flav':sum,'os':sum,'syst':'noSF','discr':hist.rebin(2)}

Plot the histogram

You can simply plot the histogram using mplhep

import mplhep as hep
import matplotlib.pyplot as plt
# set the plot style like tdr style
plt.style.use(hep.style.ROOT)
# make 1D histogram plot
hep.histplot(hist1D) 

convert coffea hist to ROOT TH1

You can use scripts/make_template.py to convert the histograms in the coffea files into 1D/2D ROOT histograms:

python scripts/make_template.py -i $INPUT_COFFEA --lumi $LUMI_IN_invPB -o $ROOTFILE_NAME -v $VARIABLE -a $HIST_AXIS 
## Example
python scripts/make_template.py -i "testfile/*.coffea" --lumi 7650 -o test.root -v mujet_pt -a '{"flav":0,"osss":"sum"}' --mergemap 
#####
# -h, --help            show this help message and exit
#  -i INPUT, --input INPUT
#                        Input coffea file(s), can be a regular expression contains '*'
#  -v VARIABLE, --variable VARIABLE
#                        Variables to store(histogram name)
#  -a AXIS, --axis AXIS  dict, put the slicing of histogram, specify 'sum' option as string
#  --lumi LUMI           Luminosity in /pb , normalize the MC yields to corresponding luminosity
#  -o OUTPUT, --output OUTPUT
#                        output root file name
#  --mergemap MERGEMAP   Specify mergemap as dict, '{merge1:[dataset1,dataset2]...}' Also works with the json file with dict

#### EXAMPLE MERGEMAP
{
    "WJets": ["WJetsToLNu_TuneCP5_13p6TeV-madgraphMLM-pythia8"],
    "VV": [ "WW_TuneCP5_13p6TeV-pythia8", "WZ_TuneCP5_13p6TeV-pythia8", "ZZ_TuneCP5_13p6TeV-pythia8"],
    "TT": [ "TTTo2J1L1Nu_CP5_13p6TeV_powheg-pythia8", "TTTo2L2Nu_CP5_13p6TeV_powheg-pythia8"],
    "ST":[ "TBbarQ_t-channel_4FS_CP5_13p6TeV_powheg-madspin-pythia8", "TbarWplus_DR_AtLeastOneLepton_CP5_13p6TeV_powheg-pythia8", "TbarBQ_t-channel_4FS_CP5_13p6TeV_powheg-madspin-pythia8", "TWminus_DR_AtLeastOneLepton_CP5_13p6TeV_powheg-pythia8"],
"data":[ "Muon_Run2022C-PromptReco-v1", "SingleMuon_Run2022C-PromptReco-v1", "Muon_Run2022D-PromptReco-v1", "Muon_Run2022D-PromptReco-v2"]
}