Support logging figures to multiple loggers #2138

robmarkcole · 2024-06-27T09:47:13Z

Summary

Currently logging of figures (e.g. in Classification and Segmentation trainers) assumes Tensorboard logger which has the method add_figure. However other loggers have similar methods but with different names and are currently not supported. e.g. MLflow uses log_figure. This FR is to support these loggers and document which are supported

Rationale

Many orgs use a service for logging such as mlflow or wandb etc, and it would be nice for these to just work.

Implementation

We would probably want to abstract this for use by all trainers, but adding this method is one soluton

# on the trainer
    def log_figure(self, fig, label, step):
        logger = self.logger.experiment
        if isinstance(logger, SummaryWriter):  # Checking if TensorBoard is used
            logger.add_figure(label, fig, global_step=step)
        elif 'wandb' in str(type(self.logger)):
            wandb.log({label: wandb.Image(fig)})
        elif 'MLFlowLogger' in str(type(self.logger)):
            with tempfile.NamedTemporaryFile(delete=False, suffix=".png") as tmpfile:
                temp_file_path = tmpfile.name
            fig.savefig(temp_file_path)
            mlflow.log_artifact(temp_file_path)
        plt.close(fig)

# in the step
            if fig:
                self.log_figure(fig, f'image/{batch_idx}', self.global_step)

Alternatives

None

Additional information

No response

The text was updated successfully, but these errors were encountered:

adamjstewart · 2024-06-29T20:25:16Z

Another option would be to remove any figure logging from TorchGeo entirely. If users want it, they can subclass the trainer and add it. This feature has historically been a nightmare to maintain/test and has been riddled with bugs. The proposal here is to make it even more complicated.

More positively, can you start a discussion with the Lightning folks to see if there is an easier way to do this automatically? Or a way that can unify the interface somehow? This just seems like too much work to support out of the box.

robmarkcole · 2024-06-30T07:42:12Z

RE lightning I see a lot of work was done in Lightning-AI/pytorch-lightning#6227 but ultimately they decided this was too complex to maintain, and closed the PR.

Perhaps as you say, logging figures should be left to users to subclass the trainers. I think basic logging of samples on the dataset is a necessity however

isaaccorley · 2024-07-22T03:14:18Z

I've actually been moving away from adding the plotting directly inside the LightningModule and using a custom Callback instead. There are methods like on_validation_batch_end which have access to the current batch, module, batch_idx, etc. which can essentially replicate what the current figure logging was doing.

See https://lightning.ai/docs/pytorch/stable/_modules/lightning/pytorch/callbacks/callback.html#Callback.on_validation_batch_end

robmarkcole · 2024-07-22T15:24:20Z

I agree that using the callbacks will be the most intuitive approach. Do we think it's best to utilise them or just have a 'best practice' guide perhaps?

isaaccorley · 2024-07-22T15:57:00Z

Probably better to have in a tutorial or just reference the Lightning docs so we don't have to maintain them.

robmarkcole · 2024-08-06T08:44:00Z

Been reading on Rastervision, perhaps we can utilise its plotting functionality with lightning callbacks - a simple tutorial could suffice https://docs.rastervision.io/en/stable/usage/tutorials/visualize_data_samples.html

adamjstewart · 2024-08-06T10:44:17Z

Is there something wrong with TorchGeo's plotting functionality (i.e., dataset.plot(sample))? It's certainly not perfect, but I don't know if Raster Vision's is any better.

robmarkcole · 2024-08-06T12:53:07Z

Nothing wrong - I thought at some point there was a comment or issue about externalising the plotting

calebrob6 · 2024-09-22T15:47:51Z

I'm for this (even getting rid of the datamodule plots) on the condition that we write a tutorial that shows how to subclass and do this with tensorboard. torchgeo doesn't have to (and shouldn't try to) do everything by itself --- the more that we can offload to the user with tutorials the better! This is definitely a good example as I've had to overwrite all of val_step specifically to implement mlflow logging for Azure ML in my own projects.

robmarkcole · 2024-09-22T16:38:29Z

I'm all for tutorials - good to hear you are also mlflow user as that is also in my stack and I've made a bunch of customisations to support it, be good to share best practice

adamjstewart added the trainers PyTorch Lightning trainers label Jun 29, 2024

robmarkcole mentioned this issue Jul 22, 2024

Remove logging of figures #2184

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support logging figures to multiple loggers #2138

Support logging figures to multiple loggers #2138

robmarkcole commented Jun 27, 2024 •

edited

Loading

adamjstewart commented Jun 29, 2024

robmarkcole commented Jun 30, 2024

isaaccorley commented Jul 22, 2024 •

edited

Loading

robmarkcole commented Jul 22, 2024

isaaccorley commented Jul 22, 2024

robmarkcole commented Aug 6, 2024

adamjstewart commented Aug 6, 2024

robmarkcole commented Aug 6, 2024

calebrob6 commented Sep 22, 2024

robmarkcole commented Sep 22, 2024

Support logging figures to multiple loggers #2138

Support logging figures to multiple loggers #2138

Comments

robmarkcole commented Jun 27, 2024 • edited Loading

Summary

Rationale

Implementation

Alternatives

Additional information

adamjstewart commented Jun 29, 2024

robmarkcole commented Jun 30, 2024

isaaccorley commented Jul 22, 2024 • edited Loading

robmarkcole commented Jul 22, 2024

isaaccorley commented Jul 22, 2024

robmarkcole commented Aug 6, 2024

adamjstewart commented Aug 6, 2024

robmarkcole commented Aug 6, 2024

calebrob6 commented Sep 22, 2024

robmarkcole commented Sep 22, 2024

robmarkcole commented Jun 27, 2024 •

edited

Loading

isaaccorley commented Jul 22, 2024 •

edited

Loading