Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensorboard Eval Images with TF-Vision #11270

Open
3 tasks done
RayanMoarkech opened this issue Oct 18, 2024 · 5 comments
Open
3 tasks done

Tensorboard Eval Images with TF-Vision #11270

RayanMoarkech opened this issue Oct 18, 2024 · 5 comments
Assignees
Labels
models:official models that come under official repository type:bug Bug in the code

Comments

@RayanMoarkech
Copy link

RayanMoarkech commented Oct 18, 2024

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am using the latest TensorFlow Model Garden release and TensorFlow 2.
  • I am reporting the issue to the correct repository. (Model Garden official or research directory)
  • I checked to make sure that this issue has not been filed already.

1. The entire URL of the file you are using

https://www.tensorflow.org/tfmodels/vision/object_detection#load_logs_in_tensorboard

2. Describe the bug

I am following this documentation, https://www.tensorflow.org/tfmodels/vision/object_detection#load_logs_in_tensorboard
When I open tensorboard, and select images, I get "No image data was found."

I also tried to add EXPERIMENT_CONFIG.task.allow_image_summary = True, but I got an error, even with the dataset and code given by the documentation.

The error:

ValueError: Expected scalar shape, saw shape: (1, 640, 640, 3).

The code:

model, eval_logs = tfm.core.train_lib.run_experiment(
    distribution_strategy=distribution_strategy,
    task=task,
    mode='train_and_eval',
    params=EXPERIMENT_CONFIG,
    model_dir=paths['MODEL_CHECKPOINT_PATH'],
    run_post_eval=True,
)

3. Steps to reproduce

Now, try to train again with

  • EXPERIMENT_CONFIG.task.allow_image_summary = True
  • see error:
ValueError: Expected scalar shape, saw shape: (1, 640, 640, 3).
Screenshot 2024-10-18 at 1 52 38 AM

4. Expected behavior

I would like to see the evaluated images per epochs saved on tensorboard.

5. Additional context

Let me know if you need anything extra

6. System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): MacOS 15.0
  • Mobile device name if the issue happens on a mobile device: N/A
  • TensorFlow installed from (source or binary): source
  • TensorFlow version (use command below): v2.17.0-rc1-2-gad6d8cc177d 2.17.0
  • Python version: 3.10
  • Bazel version (if compiling from source): N/A
  • GCC/Compiler version (if compiling from source): N/A
  • CUDA/cuDNN version: N/A
  • GPU model and memory: N/A -> using CPU
@RayanMoarkech RayanMoarkech added models:official models that come under official repository type:bug Bug in the code labels Oct 18, 2024
@bharatjetti
Copy link
Collaborator

Hi @RayanMoarkech
I worked on the problem and reproduced the issue of No image data was found, I added piece of code to the existing,
i.e in the show_batch function added this line and made necessary changes.

with summary_writer.as_default():
tf.summary.image(f'Image_with_bboxes_{i+1}', np.expand_dims(image, axis=0), step=train_steps)

and I found that it is working, here is the notebook that I worked on. Please check it here is the screenshot.

Screenshot 2024-11-19 at 2 07 29 PM

@bharatjetti bharatjetti added the stat:awaiting response Waiting on input from the contributor label Nov 19, 2024
@RayanMoarkech
Copy link
Author

RayanMoarkech commented Nov 19, 2024

But this will not create an image log at every summary interval while training the model with:

tfm.core.train_lib.run_experiment

Correct me if I'm wrong. But I was not able to connect the training to produce an image summary at the same time I am doing a summary_interval. So this means this option is only a manual code that I should run on every train step I want to stop at?

Based on your screenshot, you can see the data is from a . RUN
Screenshot 2024-11-19 at 1 55 21 PM

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Waiting on input from the contributor label Nov 19, 2024
@bharatjetti
Copy link
Collaborator

Hi @RayanMoarkech,
It seems there is no issue with model training and we can observe the results on few example images. However to get the entire image summary automatically, Could you please raise it in tensorboard repo, please feel free to close this issue.

@bharatjetti bharatjetti added the stat:awaiting response Waiting on input from the contributor label Dec 10, 2024
@LakshmiKalaKadali LakshmiKalaKadali removed their assignment Dec 10, 2024
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@RayanMoarkech
Copy link
Author

I figured how to log the images to tensorboard. It does not seem to be a supported communication between tensorfow and tensorboard. So it needs to be implemented manually. Here it is for whomever is searching:

You first need to set the log image to true:

EXPERIMENT_CONFIG.task.allow_image_summary = True

Then you will need to define a Callable Orbit Action method that will take the image from the log and push it to tensorboard. It is important to delete the image from the data, since tensorflow will not know how to log an image summary. (here is the broken part that needs to be fixed internally).

from typing import Dict, Union

img_val_logs_path = f"{paths['MODEL_CHECKPOINT_PATH']}/validation"

if not os.path.exists(img_val_logs_path):
    os.makedirs(img_val_logs_path, exist_ok=True)

summary_writer = tf.summary.create_file_writer(img_val_logs_path)
Output = float  # Replace with actual type if needed

evaluated_step = steps_per_loop # Not from 0 since it runs 1 time without validation

def image_eval(data: Dict[str, Union[tf.Tensor, float, np.number, np.ndarray, Output]]) -> None:
    global evaluated_step
    # Now, let's log the image with bounding boxes (optional depending on your use case)
    evaluated_step += steps_per_loop
    with summary_writer.as_default():
        for i in range(valid_batch_size):
            # Extract the image and bounding boxes from the dictionary
            image = data.get(f'image/validation_outputs/{i}')
            # Log image
            tf.summary.image(f'image_{i}', image, step=evaluated_step)
            # Delete the data image
            del data[f'image/validation_outputs/{i}']

    print(f"Logged images with bounding boxes at step {evaluated_step}")

Lastly, you need to add the callable to the tfm.core.train_lib.run_experiment params:

eval_actions=[image_eval]

@bharatjetti I am not sure if you want to reopen the issue to fix something internally. The issue is when allow_image_summary is True, then the model training does not know how to log it in events. Expecting scalar numbers, but receiving an image.

@RayanMoarkech RayanMoarkech reopened this Jan 28, 2025
@google-ml-butler google-ml-butler bot removed the stat:awaiting response Waiting on input from the contributor label Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
models:official models that come under official repository type:bug Bug in the code
Projects
None yet
Development

No branches or pull requests

3 participants