Question for Olivier: does run.sh support more than one CPU core? Can it? Should it? #1001

valassi · 2024-09-16T13:02:47Z

Question for Olivier: does run.sh support more than one CPU core? Can it? Should it?

Hi @oliviermattelaer just a question for you. This is from bits and pieces of skype with @choij1589 @Saptaparna @roiser

I am trying to understand if run.sh uses more than one core. This does not seem to be the case. The gridpack I have (slightly modified with profiling, but still) has

    def launch(self, nb_event, seed):
        """ launch the generation for the grid """
        print("__CUDACPP_DEBUG: GridPackCmd.launch starting")
        cudacpp_start = time.perf_counter()
        # 1) Restore the default data
        print("__CUDACPP_DEBUG: GridPackCmd.launch (1) restore_data")
        logger.info('generate %s events' % nb_event)
        logger.info('nb_core = %s' % self.options['nb_core'])
        self.set_run_name('GridRun_%s' % seed)
        if not self.readonly:
            self.update_status('restoring default data', level=None)
            misc.call([pjoin(self.me_dir,'bin','internal','restore_data'),
                         'default'], cwd=self.me_dir)

        if self.run_card['python_seed'] == -2:
            import random
            if not hasattr(random, 'mg_seedset'):
                random.seed(seed)  
                random.mg_seedset = seed
        elif self.run_card['python_seed'] > 0:
            import random
            if not hasattr(random, 'mg_seedset'):
                random.seed(self.run_card['python_seed'])  
                random.mg_seedset = self.run_card['python_seed']         
        # 2) Run the refine for the grid
        print("__CUDACPP_DEBUG: GridPackCmd.launch (2) refine4grid")
        self.update_status('Generating Events', level=None)
        #misc.call([pjoin(self.me_dir,'bin','refine4grid'),
        #                str(nb_event), '0', 'Madevent','1','GridRun_%s' % seed],
        #                cwd=self.me_dir)
        self.refine4grid(nb_event)
...
    def refine4grid(self, nb_event):
        """Special refine for gridpack run."""
        print("__CUDACPP_DEBUG: GridPackCmd.refine4grid starting")
        cudacpp_start = time.perf_counter()
        self.nb_refine += 1
        
        precision = nb_event

        self.opts = dict([(key,value[1]) for (key,value) in \
                          self._survey_options.items()])
        
        # initialize / remove lhapdf mode
        # self.configure_directory() # All this has been done before
        self.cluster_mode = 0 # force single machine

In other words, I have the impression that self.cluster_mode = 0 hardcodes the use of only one core?

Can you confirm (or otherwise clarify) please?
Thanks

The text was updated successfully, but these errors were encountered:

oliviermattelaer · 2024-09-16T13:07:14Z

Yes gridpack are running on a single core by design (and the code is optimised based on that feature)

valassi · 2024-09-16T13:37:22Z

Thanks Olivier!

Replying on some of the discussion on skype (about my 'should it?' question): we could rethink this and submit multi core generation.

IMO anyway it also depends on the experiments... 4 single core jobs (even against a shared GPU) are in principle not less efficient than 1 4-core job (provided you really launch 4 jobs). For CMS I understand that CMS typically used to have GEN-SIM jobs, where the SIM multicore part was waiting a on a single core GEN part (then I am not sure if this changed, maybe splitting GEN and SIM), so there the inefficiency would in any case be that a multicore slot is used with (waht is known to be) a single core executable

oliviermattelaer · 2024-09-16T13:53:36Z

rethink the strategy? yes we should.
Change the strategy? that's we need to think about...

For the moment, CMS is using (afaik) a read-only gridpack such that it launch (typically 8) gridpack executable in parralel within the same job allocation (which is asking for 8 core).

With our current work we do have multiple way to move forward from that situation:

allow for an openmp execution of the gripack (so that mode would still be one executable but using all available thread) --no real code change needed here--
keep the code as is, and use the same readonly framework to hit the GPU multiple times here --need some validation to check that readonly is working with GPU--
change the algorithm to lift the requirement on single executable

roiser · 2024-09-16T14:19:54Z

I think it may be interesting to re-think is that e.g. in an HPC environment you get allocated a N GPUs and IIUC those are solely available to your job, one could then of course start N*M gridpacks within the same job (pilot) submission but this may then become difficult for the data management afterwards ...

valassi assigned oliviermattelaer Sep 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question for Olivier: does run.sh support more than one CPU core? Can it? Should it? #1001

Question for Olivier: does run.sh support more than one CPU core? Can it? Should it? #1001

valassi commented Sep 16, 2024

oliviermattelaer commented Sep 16, 2024

valassi commented Sep 16, 2024

oliviermattelaer commented Sep 16, 2024

roiser commented Sep 16, 2024

Question for Olivier: does run.sh support more than one CPU core? Can it? Should it? #1001

Question for Olivier: does run.sh support more than one CPU core? Can it? Should it? #1001

Comments

valassi commented Sep 16, 2024

oliviermattelaer commented Sep 16, 2024

valassi commented Sep 16, 2024

oliviermattelaer commented Sep 16, 2024

roiser commented Sep 16, 2024