Scale-out (Sites & file split scheme)

Scale out can be notoriously tricky between different sites. Coffea’s integration of slurm and dask makes this quite a bit easier and for some sites the ``native’’ implementation is sufficient, e.g Condor@DESY. However, some sites have certain restrictions for various reasons, in particular Condor @CERN and @FNAL. The scaleout scheme is named as follows: $cluster_schedule_system/scheduler/site. The existing sites are documented in sites configuration while standalone condor submission is possible and strongly suggested when working on lxplus.

file splitting scheme

Here we also provide some file split schemes provided in the framework, the later two schemes can be used together to boost the processing time. All in one case is the default any non-lxplus clusters which avoid complications for merging several files in plotting step. Dataset split scheme can be used to save intermediate steps while the input json file need to be created by users. File split scheme is the default scheme for lxplus cluster(see details in condor-dask@lxplus) where the file list of each dataset in the json file are split by a certain numbers of file --fsize and process sequencially automatically. You can also combine both dataset split and file split to parallize the job into different processes(notice in lxplus you need to login different machine due to port restriction) with quick job.

Memory usage is also useful to adapt to cluster. Check the memory by calling memory_usage_psutil() from helpers.func.memory_usage_psutil to optimize job size. Example with ectag_Wc_sf summarized below.

Type	Array+Hist	Hist only	Array Only
DoubleMuon (BTA,BTV_Comm_v2)	1243MB	848MB	1249MB
DoubleMuon (PFCands, BTV_Comm_v1)	1650MB	1274MB	1632MB
DoubleMuon (Nano_v11)	1183MB	630MB	1180MB
WJets_inc (BTA,BTV_Comm_v2)	1243MB	848MB	1249MB
WJets_inc (PFCands, BTV_Comm_v1)	1650MB	1274MB	1632MB
WJets_inc (Nano_v11)	1183MB	630MB	1180MB

Sites configuration with dask/parsl schedular

dask: Condor@FNAL (CMSLPC)

Follow setup instructions at https://github.com/CoffeaTeam/lpcjobqueue. After starting the singularity container run with

python runner.py --wf ttcom --executor dask/lpc

daskLCondor@CERN (lxplus)

Only one port is available per node, so its possible one has to try different nodes until hitting one with 8786 being open. Other than that, no additional configurations should be necessary.

python runner.py --wf ttcom --executor dask/lxplus

jobs automatically split to 50 files per jobs to avoid job failure due to crowded cluster on lxplus with the naming scheme hist_$workflow_$json_$dictindex_$fileindex.coffea. The .coffea files can be then combined at plotting level

Caution

The optimal scaleout options on lxplus are -s 50 --chunk 50000

To deal with unstable condor cluster and dask worker on lxplus, you can resubmit failure jobs via --index $dictindex,$fileindex option. $dictindex refers to the index in the .json dict, $fileindex refers to the index of the file list split to 50 files per dask-worker. The total number of files of each dict can be computed by math.ceil(len($filelist)/50) The job will start from the corresponding indices.

Coffea-casa (Nebraska AF)

Coffea-casa is a JupyterHub based analysis-facility hosted at Nebraska. For more information and setup instuctions see https://coffea-casa.readthedocs.io/en/latest/cc_user.html

After setting up and checking out this repository (either via the online terminal or git widget utility run with

python runner.py --wf ttcom --executor dask/casa

Authentication is handled automatically via login auth token instead of a proxy. File paths need to replace xrootd redirector with “xcache”, runner.py does this automatically.

parsl/dask with Condor

python runner.py --wf ttcom --executor dask/condor(parsl/condor)

Tip

if the jobs hang for a long time, you can check the logs of the condor jobs runinfo/$ID/submit_scripts/parsl.parsl.run.blockxxx.err

Standalone condor jobs@lxplus/cmsconnect

Caution

Strongly suggest to use this on lxplus. Check the end of this subsection for an alternative, lighter lxplus condor submitter with some caveats.

You have the option to run the framework through “standalone condor jobs”, bypassing the native coffea-supported job submission system. Within each job you submit, a standalone script will execute the following on the worker node:

Set up a minimal required Python environment.
Retrieve the BTVNanoCommissioning repository, either from a git link or transferred locally.
Launch the python runner.py ... command to execute the coffea framework in the iterative executor mode.

This utility is currently adapted for the lxplus and cmsconnect condor systems. To generate jobs for launching, replace python runner.py with python condor/submitter.py, append the existing arguments, and add the following arguments in addition:

--jobName: Specify the desired condor job name. A dedicated folder will be generated, including all submission-related files.
--jobqueue: Indicate the condor@lxplus JobFlavour. Default is “tomorrow”.
-n --condorFileSize: Define the number of files to process per condor job (default is 50). The input file list will be divided based on this count.
--remoteRepo (optional, but recommended): Specify the path and branch of the remote repository to download the BTVNanoCommissioning repository. If not specified, the local directory will be packed and transferred as the condor input, potentially leading to higher loads for condor transfers. Use the format e.g. --remoteRepo 'https://github.com/cms-btv-pog/BTVNanoCommissioning.git -b master'.

After executing the command, a new job folder will be created and the jobs will be submitted to condor.

Example command:

python condor/submitter.py --workflow ctag_DY_sf --json metadata/Summer24/MC_Summer24_2024_ctag_DY_sf.json --campaign Summer24 --year 2024 --isArray --isSyst all --skipbadfiles --jobName condor_DYMu_MC --outputDir /eos/user/u/user/2024/DYmu_MC -n 20 --jobqueue workday

Frequent issues for standalone condor jobs submission

CMS Connect provides a condor interface where one can submit jobs to all resources available in the CMS Global Pool. See WorkBookCMSConnect Twiki for the instructions if you use it for the first time.
The submitted jobs are of the kind which requires a proper setup of the X509 proxy, to use the XRootD service to access and store data. In the generated .jdl file, you may see a line configured for this purpose use_x509userproxy = true. If you have not submitted jobs of this kind on lxplus condor, we recommend you to add a line
```
export X509_USER_PROXY=$HOME/x509up_u`id -u`
```
to .bashrc and run it so the proxy file will be stored in your AFS folder instead of in your /tmp/USERNAME folder. For submission on cmsconnect, no specific action is required.

A lighter version that works *only* on lxplus

The jobs submitted by this script rely on eos/afs being mounted on the condor nodes. This is good for processing a few hundred files only. Larger submissions are likely to make afs unstable.

It does not create a new installation of conda/mamba and instead uses a preinstalled conda env (you can replace this with your own conda/mamba path in the PATH variable, if needed).
It does not create a tarball of the BTVNanoComm code either, simply cds to the working directory in eos/afs.
Copies the proxy locally and reads it directly from the condor node.

Example

voms-proxy-init --voms cms --valid 192:00
conda activate btv_coffea   # Or conda activate /eos/home-m/milee/miniforge3/envs/btv_coffea
python condor_lxplus/submitter.py --workflow ctag_DY_sf --json fetched_list.json --campaign Summer22 --year 2022 --isArray --isSyst all --skipbadfiles --jobName condor_1 --outputDir /preferably/eos/output/directory --submit

Check outputs Check if all jobs succeeded with:

python condor_lxplus/checkoutputs.py <job_condor_dir>

Then manually resubmit failed jobs with the newly created config.

hadd outputs hadd all outputs using

python condor_lxplus/haddoutputs.py <job_output_dir>

Pros

This is likely faster and will run out of the box.
Proxy handling works even if condor’s native user_proxy method fails.

Cons

Relies on eos/afs mount, hence jobs will fail to run on condor nodes where the mount is unstable or when afs is overloaded. You may need to keep releasing jobs in that case.
Will not work where eos is not mounted, e.g. on CMSConnect nodes.

FAQ for submission

All jobs held: might indicate environment setup issue→ check out the condor err/out for parsl jobs the info are in runinfo/JOBID/submit_scripts/
Exit without complain: might be huge memory consumption:
- Reduce --chunk, especially JERC variation are memory intense
- check the memory usage by calling memory_usage_psutil
partially failed/held:
- could be temporarily unavailable of the files/site. If the retries not work, considering obtained failure file list and resubmit.
- error of certain files→ check the failed files and run it locally with --executor iterative