Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensor Comparison #150

Open
GregHattJr opened this issue Oct 10, 2024 · 0 comments
Open

Tensor Comparison #150

GregHattJr opened this issue Oct 10, 2024 · 0 comments

Comments

@GregHattJr
Copy link
Contributor

GregHattJr commented Oct 10, 2024

Description

Enable reading binary files on systems that did not produce them by addressing platform-specific issues in the current serialization process. Currently a number of issues (detailed below) prevent us from consuming these .bin files within the visualizer. In order to move forward we'll need to implement a solution against the source repository which generates these files.

Link to Existing POC PR

Issues

  1. Device-Specific Storage
    The current serialization system for DeviceStorage and MultiDeviceStorage depends on device-specific memory (e.g., GPU, FPGA). This is reflected in the following part of the code:

    TT_THROW("Device storage isn't supported");

    Serialization skips device-specific tensors entirely, meaning that device-related data cannot be deserialized on systems without the same hardware.

  2. Memory Configuration
    The MemoryConfig serialized data (e.g., TensorMemoryLayout, BufferType) assumes that the target system can reconstruct the memory architecture. This occurs when writing the memory configuration to the file:

    output_stream.write(reinterpret_cast<const char*>(&layout), sizeof(Layout));
    output_stream.write(reinterpret_cast<const char*>(&storage_type), sizeof(StorageType));

    This layout may depend on the original system’s memory architecture, leading to potential deserialization issues on different systems.

  3. DistributedTensorConfig
    The DistributedTensorConfig specifies how tensors are distributed across multiple devices. The code handles this in the multi-device storage logic:

    std::size_t num_buffers = storage.num_buffers();
    output_stream.write(reinterpret_cast<const char*>(&num_buffers), sizeof(std::size_t));

    This configuration is system-dependent and cannot be easily reproduced on another system without a similar device setup.

  4. Device-Dependent Code Paths
    The MeshDevice configuration depends on the system's multi-device setup. For instance:

    if (device != nullptr) {
        tensor = tensor.to(device, memory_config);
    }

    This code assumes the presence of specific devices (e.g., MeshDevice), making it impossible to deserialize and map tensor data properly if such devices are absent.

  5. Data Types
    The code supports several data types, including hardware-specific ones like BFLOAT16. These types may not be available on all systems:

    DataType data_type;
    input_stream.read(reinterpret_cast<char*>(&data_type), sizeof(DataType));

    System dependency arises here, as some platforms may lack support for certain data types, leading to deserialization errors.

  6. Tensor Layout
    The code serializes tensor layouts (e.g., ROW_MAJOR, TILE), which may be optimized for certain hardware architectures. This layout is read and written as:

    input_stream.read(reinterpret_cast<char*>(&layout), sizeof(Layout));

    If the target system has a different memory architecture, it may not be able to reconstruct the tensor layout correctly.

  7. Device Context During Deserialization
    Deserialization relies on device context to check if tensors are stored on a device and then transfers them to CPU memory:

    tensor = tensor.to(device, memory_config);

    Without the necessary devices, this part of the code cannot function properly, leading to deserialization failures on systems without similar hardware.

  8. Version-Specific Serialization
    The code includes version checks to ensure compatibility between different serialization versions:

    if (version_id >= 2) {
        input_stream.read(reinterpret_cast<char*>(&has_memory_config), sizeof(bool));
    }

    Mismatched versions between the writing and reading systems could result in failed or incorrect deserialization.

  9. Custom Buffers and Memory Management
    The custom buffer types OwnedBuffer and BorrowedBuffer manage memory during serialization. The buffer sizes are system-dependent:

    output_stream.write(reinterpret_cast<const char*>(&size), sizeof(size));

    These custom buffers may not translate well across systems with different memory architectures, leading to issues during deserialization.

  10. Endianness and Platform-Specific Binary Formats
    The binary format relies on system-specific properties like endianness, which are not handled explicitly in the current code:

    output_stream.write(reinterpret_cast<const char*>(&size), sizeof(size));

    This could cause byte-swapping issues when reading binary files on systems with different endianness.

Proposal

Write All Tensors in Host Independent Format

Given that we can not read the .bin file on a different host system we need to store the tensor in a host-independent format. Currently the database logic has a conditional that will write the tensor either to the custom TTNN tensor format (and produce a .bin file) or will simply write the tensor using PyTorch's save method.

def store_tensor(report_path, tensor):
    import torch

    tensors_path = report_path / TENSORS_PATH
    tensors_path.mkdir(parents=True, exist_ok=True)
    if isinstance(tensor, ttnn.Tensor):
        tensor_file_name = tensors_path / f"{tensor.tensor_id}.bin"
        if tensor_file_name.exists():
            return
        ttnn.dump_tensor(
            tensor_file_name,
            ttnn.from_device(tensor),
        )
    elif isinstance(tensor, torch.Tensor):
        tensor_file_name = tensors_path / f"{tensor.tensor_id}.pt"
        if tensor_file_name.exists():
            return
        torch.save(torch.Tensor(tensor), tensor_file_name)
    else:
        raise ValueError(f"Unsupported tensor type {type(tensor)}")

Unfortunately simply saving the tensors as .pt is not enough to allow for reading them on a different host. The tensors need to be detached and converted to a non-host specific memory using the CPU method.

def store_tensor(report_path, tensor):
    import torch

    DETACH_SAVED_TENSORS = True  # TODO Read from a configuration

    tensors_path = report_path / TENSORS_PATH
    tensors_path.mkdir(parents=True, exist_ok=True)
    if isinstance(tensor, ttnn.Tensor):
        if DETACH_SAVED_TENSORS:
            tensor_file_name = tensors_path / f"{tensor.tensor_id}.pt"
        else:
            tensor_file_name = tensors_path / f"{tensor.tensor_id}.bin"

        if tensor_file_name.exists():
            return

        if DETACH_SAVED_TENSORS:
            torch_tensor = ttnn.to_torch(tensor)
            torch_tensor = torch_tensor.detach().cpu()
            torch.save(torch_tensor, tensor_file_name)
        else:
            ttnn.dump_tensor(
                tensor_file_name,
                ttnn.from_device(tensor),
            )
    elif isinstance(tensor, torch.Tensor):
        tensor_file_name = tensors_path / f"{tensor.tensor_id}.pt"
        if tensor_file_name.exists():
            return
        torch_tensor = torch.Tensor(tensor)
        if DETACH_SAVED_TENSORS:
            torch_tensor = torch.Tensor(tensor).detach().cpu()
        torch.save(torch_tensor, tensor_file_name)
    else:
        raise ValueError(f"Unsupported tensor type {type(tensor)}")
This was referenced Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant