Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question for Olivier: does run.sh support more than one CPU core? Can it? Should it? #1001

Open
valassi opened this issue Sep 16, 2024 · 4 comments
Assignees

Comments

@valassi
Copy link
Member

valassi commented Sep 16, 2024

Question for Olivier: does run.sh support more than one CPU core? Can it? Should it?

Hi @oliviermattelaer just a question for you. This is from bits and pieces of skype with @choij1589 @Saptaparna @roiser

I am trying to understand if run.sh uses more than one core. This does not seem to be the case. The gridpack I have (slightly modified with profiling, but still) has

    def launch(self, nb_event, seed):
        """ launch the generation for the grid """
        print("__CUDACPP_DEBUG: GridPackCmd.launch starting")
        cudacpp_start = time.perf_counter()
        # 1) Restore the default data
        print("__CUDACPP_DEBUG: GridPackCmd.launch (1) restore_data")
        logger.info('generate %s events' % nb_event)
        logger.info('nb_core = %s' % self.options['nb_core'])
        self.set_run_name('GridRun_%s' % seed)
        if not self.readonly:
            self.update_status('restoring default data', level=None)
            misc.call([pjoin(self.me_dir,'bin','internal','restore_data'),
                         'default'], cwd=self.me_dir)

        if self.run_card['python_seed'] == -2:
            import random
            if not hasattr(random, 'mg_seedset'):
                random.seed(seed)  
                random.mg_seedset = seed
        elif self.run_card['python_seed'] > 0:
            import random
            if not hasattr(random, 'mg_seedset'):
                random.seed(self.run_card['python_seed'])  
                random.mg_seedset = self.run_card['python_seed']         
        # 2) Run the refine for the grid
        print("__CUDACPP_DEBUG: GridPackCmd.launch (2) refine4grid")
        self.update_status('Generating Events', level=None)
        #misc.call([pjoin(self.me_dir,'bin','refine4grid'),
        #                str(nb_event), '0', 'Madevent','1','GridRun_%s' % seed],
        #                cwd=self.me_dir)
        self.refine4grid(nb_event)
...
    def refine4grid(self, nb_event):
        """Special refine for gridpack run."""
        print("__CUDACPP_DEBUG: GridPackCmd.refine4grid starting")
        cudacpp_start = time.perf_counter()
        self.nb_refine += 1
        
        precision = nb_event

        self.opts = dict([(key,value[1]) for (key,value) in \
                          self._survey_options.items()])
        
        # initialize / remove lhapdf mode
        # self.configure_directory() # All this has been done before
        self.cluster_mode = 0 # force single machine

In other words, I have the impression that self.cluster_mode = 0 hardcodes the use of only one core?

Can you confirm (or otherwise clarify) please?
Thanks

@oliviermattelaer
Copy link
Member

Yes gridpack are running on a single core by design (and the code is optimised based on that feature)

@valassi
Copy link
Member Author

valassi commented Sep 16, 2024

Thanks Olivier!

Replying on some of the discussion on skype (about my 'should it?' question): we could rethink this and submit multi core generation.

IMO anyway it also depends on the experiments... 4 single core jobs (even against a shared GPU) are in principle not less efficient than 1 4-core job (provided you really launch 4 jobs). For CMS I understand that CMS typically used to have GEN-SIM jobs, where the SIM multicore part was waiting a on a single core GEN part (then I am not sure if this changed, maybe splitting GEN and SIM), so there the inefficiency would in any case be that a multicore slot is used with (waht is known to be) a single core executable

@oliviermattelaer
Copy link
Member

rethink the strategy? yes we should.
Change the strategy? that's we need to think about...

For the moment, CMS is using (afaik) a read-only gridpack such that it launch (typically 8) gridpack executable in parralel within the same job allocation (which is asking for 8 core).

With our current work we do have multiple way to move forward from that situation:

  1. allow for an openmp execution of the gripack (so that mode would still be one executable but using all available thread) --no real code change needed here--
  2. keep the code as is, and use the same readonly framework to hit the GPU multiple times here --need some validation to check that readonly is working with GPU--
  3. change the algorithm to lift the requirement on single executable

@roiser
Copy link
Member

roiser commented Sep 16, 2024

I think it may be interesting to re-think is that e.g. in an HPC environment you get allocated a N GPUs and IIUC those are solely available to your job, one could then of course start N*M gridpacks within the same job (pilot) submission but this may then become difficult for the data management afterwards ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants