Skip to content

Commit

Permalink
Rename to lightning data (#7)
Browse files Browse the repository at this point in the history
  • Loading branch information
tchaton authored Feb 19, 2024
1 parent 7151f5c commit 1a65f19
Show file tree
Hide file tree
Showing 58 changed files with 166 additions and 166 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci-checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ jobs:
uses: ./check-typing.yml
with:
actions-ref: main
import-name: "litdata"
import-name: "lightning_data"
artifact-name: dist-packages-${{ github.sha }}
testing-matrix: |
{
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/ci-testing.yml
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ jobs:
- name: Tests
run: |
coverage run --source litdata -m pytest tests -v
coverage run --source lightning_data -m pytest tests -v
- name: Statistics
if: success()
Expand Down
4 changes: 2 additions & 2 deletions MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ recursive-exclude __pycache__ *.py[cod] *.orig

# Include the README and CHANGELOG
include *.md
recursive-include litdata *.md
recursive-include lightning_data *.md

# Include the code
recursive-include litdata *.py
recursive-include lightning_data *.py

# Include the license file
include LICENSE
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ test: clean
pip install -q -r requirements/test.txt

# use this to run tests
python -m coverage run --source litdata -m pytest src -v --flake8
python -m coverage run --source lightning_data -m pytest src -v --flake8
python -m coverage report

docs: clean
Expand Down
26 changes: 13 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ Convert your raw dataset into Lightning Streaming format using the `optimize` op

```python
import numpy as np
from litdata import optimize
from lightning_data import optimize
from PIL import Image


Expand Down Expand Up @@ -84,7 +84,7 @@ Here is an example with [AWS S3](https://aws.amazon.com/s3).
### 3. Use StreamingDataset and DataLoader

```python
from litdata import StreamingDataset
from lightning_data import StreamingDataset
from torch.utils.data import DataLoader

# Remote path where full dataset is persistently stored
Expand Down Expand Up @@ -135,7 +135,7 @@ for i in range(1000):

```python
import os
from litdata import map
from lightning_data import map
from PIL import Image

input_dir = "s3://my-bucket/my_images"
Expand Down Expand Up @@ -174,7 +174,7 @@ We have end-to-end free [Studios](https://lightning.ai) showing all the steps to
To scale data processing, create a free account on [lightning.ai](https://lightning.ai/) platform. With the platform, the `optimize` and `map` can start multiple machines to make data processing drastically faster as follows:

```python
from litdata import optimize, Machine
from lightning_data import optimize, Machine

optimize(
...
Expand All @@ -186,7 +186,7 @@ optimize(
OR

```python
from litdata import map, Machine
from lightning_data import map, Machine

map(
...
Expand Down Expand Up @@ -216,8 +216,8 @@ The `StreamingDataset` and `StreamingDataLoader` takes care of everything for yo
You can easily experiment with dataset mixtures using the CombinedStreamingDataset.

```python
from litdata import StreamingDataset, CombinedStreamingDataset
from litdata.streaming.item_loader import TokensLoader
from lightning_data import StreamingDataset, CombinedStreamingDataset
from lightning_data.streaming.item_loader import TokensLoader
from tqdm import tqdm
import os
from torch.utils.data import DataLoader
Expand Down Expand Up @@ -257,7 +257,7 @@ Note: The `StreamingDataLoader` is used by [Lit-GPT](https://github.com/Lightnin
```python
import os
import torch
from litdata import StreamingDataset, StreamingDataLoader
from lightning_data import StreamingDataset, StreamingDataLoader

dataset = StreamingDataset("s3://my-bucket/my-data", shuffle=True)
dataloader = StreamingDataLoader(dataset, num_workers=os.cpu_count(), batch_size=64)
Expand All @@ -280,7 +280,7 @@ for batch_idx, batch in enumerate(dataloader):
The `StreamingDataLoader` supports profiling your data loading. Simply use the `profile_batches` argument as follows:

```python
from litdata import StreamingDataset, StreamingDataLoader
from lightning_data import StreamingDataset, StreamingDataLoader

StreamingDataLoader(..., profile_batches=5)
```
Expand All @@ -292,7 +292,7 @@ This generates a Chrome trace called `result.json`. You can visualize this trace
Access the data you need when you need it.

```python
from litdata import StreamingDataset
from lightning_data import StreamingDataset

dataset = StreamingDataset(...)

Expand All @@ -304,7 +304,7 @@ print(dataset[42]) # show the 42th element of the dataset
## ✢ Use data transforms

```python
from litdata import StreamingDataset, StreamingDataLoader
from lightning_data import StreamingDataset, StreamingDataLoader
import torchvision.transforms.v2.functional as F

class ImagenetStreamingDataset(StreamingDataset):
Expand All @@ -326,7 +326,7 @@ for batch in dataloader:
Limit the size of the cache holding the chunks.

```python
from litdata import StreamingDataset
from lightning_data import StreamingDataset

dataset = StreamingDataset(..., max_cache_size="10GB")
```
Expand All @@ -338,7 +338,7 @@ When processing large files like compressed [parquet files](https://en.wikipedia
```python
from pathlib import Path
import pyarrow.parquet as pq
from litdata import optimize
from lightning_data import optimize
from tokenizer import Tokenizer
from functools import partial

Expand Down
10 changes: 5 additions & 5 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
SPHINX_MOCK_REQUIREMENTS = int(os.environ.get("SPHINX_MOCK_REQUIREMENTS", True))

# alternative https://stackoverflow.com/a/67692/4521646
spec = spec_from_file_location("litdata/__about__.py", os.path.join(_PATH_ROOT, "litdata", "__about__.py"))
spec = spec_from_file_location("lightning_data/__about__.py", os.path.join(_PATH_ROOT, "lightning_data", "__about__.py"))
about = module_from_spec(spec)
spec.loader.exec_module(about)

Expand Down Expand Up @@ -316,8 +316,8 @@ def find_source():
fname = inspect.getsourcefile(obj)
# https://github.com/rtfd/readthedocs.org/issues/5735
if any(s in fname for s in ("readthedocs", "rtfd", "checkouts")):
# /home/docs/checkouts/readthedocs.org/user_builds/litdata/checkouts/
# devel/litdata/utilities/cls_experiment.py#L26-L176
# /home/docs/checkouts/readthedocs.org/user_builds/lightning_data/checkouts/
# devel/lightning_data/utilities/cls_experiment.py#L26-L176
path_top = os.path.abspath(os.path.join("..", "..", ".."))
fname = os.path.relpath(fname, start=path_top)
else:
Expand Down Expand Up @@ -380,8 +380,8 @@ def find_source():
import os
import torch
import litdata
from litdata import StreamingDataset
import lightning_data
from lightning_data import StreamingDataset
"""
coverage_skip_undoc_in_source = True
File renamed without changes.
File renamed without changes.
8 changes: 4 additions & 4 deletions litdata/__init__.py → lightning_data/__init__.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
from lightning_utilities.core.imports import RequirementCache

from litdata.processing.functions import map, optimize, walk
from litdata.streaming.combined import CombinedStreamingDataset
from litdata.streaming.dataloader import StreamingDataLoader
from litdata.streaming.dataset import StreamingDataset
from lightning_data.processing.functions import map, optimize, walk
from lightning_data.streaming.combined import CombinedStreamingDataset
from lightning_data.streaming.dataloader import StreamingDataLoader
from lightning_data.streaming.dataset import StreamingDataset

__all__ = [
"LightningDataset",
Expand Down
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -20,21 +20,21 @@
from lightning import seed_everything
from tqdm.auto import tqdm as _tqdm

from litdata.constants import (
from lightning_data.constants import (
_BOTO3_AVAILABLE,
_DEFAULT_FAST_DEV_RUN_ITEMS,
_INDEX_FILENAME,
_IS_IN_STUDIO,
_LIGHTNING_CLOUD_LATEST,
_TORCH_GREATER_EQUAL_2_1_0,
)
from litdata.processing.readers import BaseReader
from litdata.streaming import Cache
from litdata.streaming.cache import Dir
from litdata.streaming.client import S3Client
from litdata.streaming.resolver import _resolve_dir
from litdata.utilities.broadcast import broadcast_object
from litdata.utilities.packing import _pack_greedily
from lightning_data.processing.readers import BaseReader
from lightning_data.streaming import Cache
from lightning_data.streaming.cache import Dir
from lightning_data.streaming.client import S3Client
from lightning_data.streaming.resolver import _resolve_dir
from lightning_data.utilities.broadcast import broadcast_object
from lightning_data.utilities.packing import _pack_greedily

if _TORCH_GREATER_EQUAL_2_1_0:
from torch.utils._pytree import tree_flatten, tree_unflatten, treespec_loads
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,11 @@

import torch

from litdata.constants import _IS_IN_STUDIO, _TORCH_GREATER_EQUAL_2_1_0
from litdata.processing.data_processor import DataChunkRecipe, DataProcessor, DataTransformRecipe
from litdata.processing.readers import BaseReader
from litdata.processing.utilities import optimize_dns_context
from litdata.streaming.resolver import (
from lightning_data.constants import _IS_IN_STUDIO, _TORCH_GREATER_EQUAL_2_1_0
from lightning_data.processing.data_processor import DataChunkRecipe, DataProcessor, DataTransformRecipe
from lightning_data.processing.readers import BaseReader
from lightning_data.processing.utilities import optimize_dns_context
from lightning_data.streaming.resolver import (
Dir,
_assert_dir_has_index_file,
_assert_dir_is_empty,
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from subprocess import Popen # noqa: S404
from typing import Any, Callable, Optional, Tuple

from litdata.constants import _IS_IN_STUDIO
from lightning_data.constants import _IS_IN_STUDIO


def get_worker_rank() -> Optional[str]:
Expand Down Expand Up @@ -66,7 +66,7 @@ def optimize_dns(enable: bool) -> None:
):
cmd = (
f"sudo /home/zeus/miniconda3/envs/cloudspace/bin/python"
f" -c 'from litdata.processing.utilities import _optimize_dns; _optimize_dns({enable})'"
f" -c 'from lightning_data.processing.utilities import _optimize_dns; _optimize_dns({enable})'"
)
Popen(cmd, shell=True).wait() # E501

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@
# See the License for the specific language governing permissions and
# limitations under the License.

from litdata.streaming.cache import Cache
from litdata.streaming.combined import CombinedStreamingDataset
from litdata.streaming.dataloader import StreamingDataLoader
from litdata.streaming.dataset import StreamingDataset
from litdata.streaming.item_loader import TokensLoader
from lightning_data.streaming.cache import Cache
from lightning_data.streaming.combined import CombinedStreamingDataset
from lightning_data.streaming.dataloader import StreamingDataLoader
from lightning_data.streaming.dataset import StreamingDataset
from lightning_data.streaming.item_loader import TokensLoader

__all__ = [
"Cache",
Expand Down
18 changes: 9 additions & 9 deletions litdata/streaming/cache.py → lightning_data/streaming/cache.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,19 +15,19 @@
import os
from typing import Any, Dict, List, Optional, Tuple, Union

from litdata.constants import (
from lightning_data.constants import (
_INDEX_FILENAME,
_LIGHTNING_CLOUD_LATEST,
_TORCH_GREATER_EQUAL_2_1_0,
)
from litdata.streaming.item_loader import BaseItemLoader
from litdata.streaming.reader import BinaryReader
from litdata.streaming.resolver import Dir, _resolve_dir
from litdata.streaming.sampler import ChunkedIndex
from litdata.streaming.serializers import Serializer
from litdata.streaming.writer import BinaryWriter
from litdata.utilities.env import _DistributedEnv, _WorkerEnv
from litdata.utilities.format import _convert_bytes_to_int
from lightning_data.streaming.item_loader import BaseItemLoader
from lightning_data.streaming.reader import BinaryReader
from lightning_data.streaming.resolver import Dir, _resolve_dir
from lightning_data.streaming.sampler import ChunkedIndex
from lightning_data.streaming.serializers import Serializer
from lightning_data.streaming.writer import BinaryWriter
from lightning_data.utilities.env import _DistributedEnv, _WorkerEnv
from lightning_data.utilities.format import _convert_bytes_to_int

logger = logging.Logger(__name__)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
from time import time
from typing import Any, Optional

from litdata.constants import _BOTO3_AVAILABLE
from lightning_data.constants import _BOTO3_AVAILABLE

if _BOTO3_AVAILABLE:
import boto3
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@

from torch.utils.data import IterableDataset

from litdata.streaming.dataset import StreamingDataset
from litdata.utilities.env import _WorkerEnv
from lightning_data.streaming.dataset import StreamingDataset
from lightning_data.utilities.env import _WorkerEnv

__NUM_SAMPLES_YIELDED_KEY__ = "__NUM_SAMPLES_YIELDED__"
__SAMPLES_KEY__ = "__SAMPLES__"
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,11 @@
import os
from typing import Any, Dict, List, Optional, Tuple

from litdata.constants import _INDEX_FILENAME, _TORCH_GREATER_EQUAL_2_1_0
from litdata.streaming.downloader import get_downloader_cls
from litdata.streaming.item_loader import BaseItemLoader, PyTreeLoader, TokensLoader
from litdata.streaming.sampler import ChunkedIndex
from litdata.streaming.serializers import Serializer
from lightning_data.constants import _INDEX_FILENAME, _TORCH_GREATER_EQUAL_2_1_0
from lightning_data.streaming.downloader import get_downloader_cls
from lightning_data.streaming.item_loader import BaseItemLoader, PyTreeLoader, TokensLoader
from lightning_data.streaming.sampler import ChunkedIndex
from lightning_data.streaming.serializers import Serializer

if _TORCH_GREATER_EQUAL_2_1_0:
from torch.utils._pytree import tree_unflatten, treespec_loads
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,16 +33,16 @@
)
from torch.utils.data.sampler import BatchSampler, Sampler

from litdata.constants import _DEFAULT_CHUNK_BYTES, _TORCH_GREATER_EQUAL_2_1_0, _VIZ_TRACKER_AVAILABLE
from litdata.streaming import Cache
from litdata.streaming.combined import (
from lightning_data.constants import _DEFAULT_CHUNK_BYTES, _TORCH_GREATER_EQUAL_2_1_0, _VIZ_TRACKER_AVAILABLE
from lightning_data.streaming import Cache
from lightning_data.streaming.combined import (
__NUM_SAMPLES_YIELDED_KEY__,
__SAMPLES_KEY__,
CombinedStreamingDataset,
)
from litdata.streaming.dataset import StreamingDataset
from litdata.streaming.sampler import CacheBatchSampler
from litdata.utilities.env import _DistributedEnv
from lightning_data.streaming.dataset import StreamingDataset
from lightning_data.streaming.sampler import CacheBatchSampler
from lightning_data.utilities.env import _DistributedEnv

if _TORCH_GREATER_EQUAL_2_1_0:
from torch.utils._pytree import tree_flatten
Expand Down Expand Up @@ -105,7 +105,7 @@ def __getitem__(self, index: int) -> Any:
if not _equal_items(data_1, data2):
raise ValueError(
f"Your dataset items aren't deterministic. Found {data_1} and {data2} for index {index}."
" HINT: Use the `litdata.cache.Cache` directly within your dataset."
" HINT: Use the `lightning_data.cache.Cache` directly within your dataset."
)
self._is_deterministic = True
self._cache[index] = data_1
Expand Down Expand Up @@ -180,7 +180,7 @@ def __call__(
) -> None:
from torch.utils.data._utils import worker

from litdata.streaming.cache import Cache
from lightning_data.streaming.cache import Cache

enable_profiling = self._global_rank == 0 and worker_id == 0 and _VIZ_TRACKER_AVAILABLE and self._profile

Expand Down Expand Up @@ -481,7 +481,7 @@ def _try_put_index(self) -> None:
class StreamingDataLoader(DataLoader):
r"""The StreamingDataLoader combines a dataset and a sampler, and provides an iterable over the given dataset.
The :class:`~litdata.streaming.dataloader.StreamingDataLoader` supports either a
The :class:`~lightning_data.streaming.dataloader.StreamingDataLoader` supports either a
StreamingDataset and CombinedStreamingDataset datasets with single- or multi-process loading,
customizing
loading order and optional automatic batching (collation) and memory pinning.
Expand Down
Loading

0 comments on commit 1a65f19

Please sign in to comment.