Skip to content

Commit

Permalink
docs: refine docs and crop logo
Browse files Browse the repository at this point in the history
  • Loading branch information
lllAlexanderlll committed Jan 16, 2024
1 parent 97eb0c0 commit 64f6668
Show file tree
Hide file tree
Showing 13 changed files with 83 additions and 50 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/build_and_deploy_documentation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ jobs:
pip install sphinx-rtd-theme sphinx-autodoc-typehints sphinx-click sphinx-automodapi texext
- name: "Parse into HTML"
run: |
sphinx-build -M html docs/source/ docs/build/
sphinx-apidoc -o docs/source/api src/modalities
sphinx-build -M html docs/source/ docs/build/
- name: Deploy to GitHub Pages
uses: peaceiris/actions-gh-pages@v3
with:
Expand Down
2 changes: 1 addition & 1 deletion config_files/config_lorem_ipsum.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ modalities_setup:

wandb:
project_name: modalities
mode: ONLINE
mode: OFFLINE

data:
sample_key: "input_ids"
Expand Down
Binary file removed docs/logo.jpg
Binary file not shown.
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@
# The name of an image file (relative to this directory) to place at the top
# of the sidebar.
#
html_logo = "../logo.jpg"
html_logo = "logo.jpg"

# -- Options for HTMLHelp output ---------------------------------------------

Expand Down
6 changes: 2 additions & 4 deletions docs/source/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,7 @@
Configuration
========================================================================

**EDIT "docs/source/configuration.rst" IN ORDER TO MAKE CHANGES HERE**

Training config is defined in yaml formatted files. See :file:`data/config_lorem_ipsum.yaml`. These configs are very explicit specifying all training parameters to keep model trainings as transparent and reproducible as possible. Each config setting is reflected in pydantic classes in :file:`src/llm_gym/config/*.py`. In the config you need to define which config classes to load in field type_hint. This specifies the concrete class. A second parameter, config, then takes all the constructor arguments for that config class. This way it is easy to change i.e. DataLoaders while still having input validation in place.
Training config is defined in yaml formatted files. See :file:`data/config_lorem_ipsum.yaml`. These configs are very explicit specifying all training parameters to keep model trainings as transparent and reproducible as possible. Each config setting is reflected in pydantic classes in :file:`src/modalities/config/*.py`. In the config you need to define which config classes to load in field type_hint. This specifies the concrete class. A second parameter, config, then takes all the constructor arguments for that config class. This way it is easy to change i.e. DataLoaders while still having input validation in place.

Pydantic and ClassResolver
------------------------------------------------------------------------
Expand Down Expand Up @@ -39,7 +37,7 @@ With this we utilize Pydantics feature to auto-select a fitting type based on th
In our implmentation we go a step further, as both,

* a :python:`type_hint` in a :python:`BaseModel` config must be of type :python:`llm_gym.config.lookup_types.LookupEnum` and
* a :python:`type_hint` in a :python:`BaseModel` config must be of type :python:`modalities.config.lookup_types.LookupEnum` and
* :python:`config` is a union of allowed concrete configs of base type :python:`BaseModel`.

:python:`config` hereby replaces :python:`activation_kwargs` in the example above, and replaces it with pydantic-validated :python:`BaseModel` configs.
Expand Down
11 changes: 4 additions & 7 deletions docs/source/entrypoints.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,10 @@
Entrypoints
=======================================================


**EDIT "docs/source/entrypoints.rst" IN ORDER TO MAKE CHANGES HERE**

We use `click <https://click.palletsprojects.com/en/>`_ as a tool to add new entry points and their CLI arguments.
For this we have a main entry point from which all other entry points are started.

The main entry point is :file:`src/llm_gym/__main__.py:main()`.
The main entry point is :file:`src/modalities/__main__.py:main()`.
We register other sub-entrypoints by using our main :python:`click.group`, called :python:`main`, as follows:

.. code-block:: python
Expand Down Expand Up @@ -64,8 +61,8 @@ With
.. code-block:: python
[project.scripts]
llm_gym = "llm_gym.__main__:main"
modalities = "modalities.__main__:main"
in our :file:`pyproject.toml`, we can start only main with :python:`llm_gym` (which does nothing), or a specific sub-entrypoint e.g. :bash:`llm_gym do_stuff --config_file_path config_files/config.yaml --my_cli_argument 3537`.
in our :file:`pyproject.toml`, we can start only main with :python:`modalities` (which does nothing), or a specific sub-entrypoint e.g. :bash:`modalities do_stuff --config_file_path config_files/config.yaml --my_cli_argument 3537`.
Alternatively, directly use :bash:`src/llm_gym/__main__.py do_stuff --config_file_path config_files/config.yaml --my_cli_argument 3537`.
Alternatively, directly use :bash:`src/modalities/__main__.py do_stuff --config_file_path config_files/config.yaml --my_cli_argument 3537`.
13 changes: 9 additions & 4 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
@@ -1,12 +1,17 @@
Welcome to Modalities's documentation!
======================================================================

**EDIT "docs/source/index.rst" IN ORDER TO MAKE CHANGES HERE**
We propose a novel training framework for Multimodal Large Language Models (LLMs) that prioritizes code readability and efficiency.
The codebase adheres to the principles of "clean code," minimizing Lines of Code (LoC) while maintaining extensibility.
A single, comprehensive configuration file enables easy customization of various model and training parameters.

<TODO: Add abstract --> still needed: USPs, key features; include FSDP here;>

<TODO: CAN ADD LINKS TO SPECIFIC THINGS USERS CAN EXPLORE AT FIRST>
A key innovation is the adoption of a PyTorch-native training loop integrated with the Fully Sharded Data Parallelism (FSDP) technique.
FSDP optimizes memory usage and training speed, enhancing scalability for large-scale multimodal models.
By leveraging PyTorch's native capabilities, our framework simplifies the development process and promotes ease of maintenance.

The framework's modular design facilitates experimentation with different multimodal architectures and training strategies.
Users can seamlessly integrate diverse datasets and model components, allowing for comprehensive exploration of multimodal learning tasks.
The combination of clean code, minimal configuration, and PyTorch-native training with FSDP contributes to a user-friendly and efficient platform for developing state-of-the-art multimodal language models.

.. note::

Expand Down
2 changes: 1 addition & 1 deletion docs/source/known_issues.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Known Issues
==================================================================

**EDIT "docs/source/known_issues.rst" IN ORDER TO MAKE CHANGES HERE**
`GitHub Issues <https://github.com/Modalities/modalities/issues>`_

1. hardcoded dataset path :file:`/raid/s3/opengptx/mehdi/temp/temp_data/train_text_document.bin` in :file:`config/config.yaml`
2. Dependency on weights&biases
Binary file added docs/source/logo.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 2 additions & 4 deletions docs/source/memmap.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,6 @@
MemMap Datasets
====================================================

**EDIT "docs/source/memmap.rst" IN ORDER TO MAKE CHANGES HERE**

MemMapDataset Index Generator
------------------------------------------------------------------------------

Expand All @@ -18,7 +16,7 @@ The :python:`MemMapDataset` requires an index file providing the necessary point
modalities create_memmap_index <path/to/jsonl/file>
The index will be created in the same directory as the raw data file. For further options you may look into the usage documentation via :bash:`llm_gym create_memmap_index --help`.
The index will be created in the same directory as the raw data file. For further options you may look into the usage documentation via :bash:`modalities create_memmap_index --help`.

Packed Dataset Generator
--------------------------------------------------------------------------------
Expand All @@ -29,7 +27,7 @@ The :python:`PackedMemMapDatasetContinuous` and :python:`PackedMemMapDatasetMega
modalities create_packed_data <path/to/jsonl/file>
The packed data file will be created in the same directory as the raw data file. For further options you may look into the usage documentation via :bash:`llm_gym create_packed_data --help`.
The packed data file will be created in the same directory as the raw data file. For further options you may look into the usage documentation via :bash:`modalities create_packed_data --help`.

Packed Data Format
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
1 change: 0 additions & 1 deletion docs/source/model_cards.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
Model Cards
====================================================
**EDIT "docs/source/model_cards.rst" IN ORDER TO MAKE CHANGES HERE**
<TODO>
2 changes: 0 additions & 2 deletions docs/source/quickstart.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
Quickstart
====================================================

**EDIT "docs/source/quickstart.rst" IN ORDER TO MAKE CHANGES HERE**

Installation
-----------------------------------------------------
Setup a conda environment `conda create -n modalities python=3.10 & conda activate modalities` and install the requirements `pip install -e .`.
Expand Down
86 changes: 62 additions & 24 deletions docs/source/vs_code_setup.rst
Original file line number Diff line number Diff line change
@@ -1,33 +1,71 @@
VSCode Setup
====================================================

**EDIT "docs/source/vs_code_setup.rst" IN ORDER TO MAKE CHANGES HERE**


We recommend a docker environment based on the most recent pytorch e.g.:

.. code-block:: bash
FROM pytorch/pytorch:2.1.2-cuda12.1-cudnn8-devel
RUN apt-get update && apt-get install -y wget openssh-client git-core bash-completion
RUN wget -O /tmp/git-lfs.deb https://packagecloud.io/github/git-lfs/packages/ubuntu/focal/git-lfs_2.13.3_amd64.deb/download.deb && \
dpkg -i /tmp/git-lfs.deb && \
rm /tmp/git-lfs.deb
RUN echo 'source /usr/share/bash-completion/completions/git' >> ~/.bashrc
CMD ["/bin/bash"]
This works seamlessly in combination with the VSCode DevContainer extention:

.. code-block:: json
{
"name": "Dev Container",
"dockerFile": "Dockerfile",
"runArgs": [
"--network",
"host",
"--gpus",
"all"
],
"customizations": {
"vscode": {
"settings": {
"terminal.integrated.shell.linux": "/bin/bash"
},
"extensions": [
"ms-python.python"
]
}
}
}
In VSCode, add this to your :file:`launch.json`:

.. code-block:: json
{
"name": "Torchrun Main",
"type": "python",
"request": "launch",
"module": "torch.distributed.run",
"env": {
"CUDA_VISIBLE_DEVICES": "0"
},
"args": [
"--nnodes",
"1",
"--nproc_per_node",
"2",
"--rdzv-endpoint=0.0.0.0:29503",
"src/modalities/__main__.py",
"run",
"--config_file_path",
"config_files/config.yaml",
],
"console": "integratedTerminal",
"justMyCode": true,
"envFile": "${workspaceFolder}/.env"
}
{
"name": "Torchrun Train and Eval",
"type": "python",
"request": "launch",
"module": "torch.distributed.run",
"env": {
"CUDA_VISIBLE_DEVICES": "4,5"
},
"args": [
"--nnodes",
"1",
"--nproc_per_node",
"2",
"--rdzv-endpoint=0.0.0.0:29503",
"src/modalities/__main__.py",
"run",
"--config_file_path",
"config_files/config_lorem_ipsum.yaml",
],
"console": "integratedTerminal",
"justMyCode": true,
"envFile": "${workspaceFolder}/.env",
"cwd": "${workspaceFolder}/modalities"
}

0 comments on commit 64f6668

Please sign in to comment.