Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support remote file systems for Audio #5353

Closed
OllieBroadhurst opened this issue Dec 12, 2022 · 1 comment
Closed

Support remote file systems for Audio #5353

OllieBroadhurst opened this issue Dec 12, 2022 · 1 comment
Labels
enhancement New feature or request

Comments

@OllieBroadhurst
Copy link

Feature request

Hi there!

It would be super cool if Audio(), and potentially other features, could read files from a remote file system.

Motivation

Large amounts of data is often stored in buckets. load_from_disk is able to retrieve data from cloud storage but to my knowledge actually copies the datasets across first, so if you're working off a system with smaller disk specs (like a VM), you can run out of space very quickly.

Your contribution

Something like this (for Google Cloud Platform in this instance):

from datasets import Dataset, Audio
import gcsfs

fs = gcsfs.GCSFileSystem()

list_of_audio_fp = {'audio': ['1', '2', '3']}
ds = Dataset.from_dict(list_of_audio_fp)

ds = ds.cast_column("audio", Audio(sampling_rate=16000, fs=fs))

Under the hood:

import librosa
from io import BytesIO

def load_audio(fp, sampling_rate=None, fs=None):
    if fs is not None:
        with fs.open(fp, 'rb') as f:
            arr, sr = librosa.load(BytesIO(f), sr=sampling_rate)
    else:
        # Perform existing io operations

Written from memory so some things could be wrong.

@OllieBroadhurst OllieBroadhurst added the enhancement New feature or request label Dec 12, 2022
@OllieBroadhurst OllieBroadhurst changed the title Add an fs argument to Audio Support remote file systems for Audio Dec 12, 2022
@OllieBroadhurst
Copy link
Author

Just seen #5281

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant