Skip to content

Physical layer interface

Jim Pivarski edited this page Nov 3, 2020 · 8 revisions

In this wiki, I'll document the interface that Uproot expects from Source classes, such as HTTPSource and XRootDSource. This document only describes reading because only reading has been implemented (Nov 2020).

When a user opens a file with uproot4.open, the URL scheme determines which Source to use:

The uproot4.open returns a ReadOnlyDirectory, which has a file property pointing to a ReadOnlyFile, which has a source property pointing to the actual Source. The Source may be stateful, with open file handles and associated threads. When any object derived from uproot4.open exits a with statement (through its __exit__) or is explicitly closed, __exit__ calls are propagated all the way down to the Source, so that it can close or shutdown whatever it needs to.

The job of a Source is to deliver Chunk objects on demand. A Chunk represents physical bytes of a file, uninterpreted and (if directly from a Source), possibly compressed. The data in a Chunk might not have been read yet, but they have been requested. A Chunk is defined by:

The rest of Uproot interfaces with Chunk objects through get and remainder to get the raw data from the file (through the future) as a numpy.ndarray of dtype numpy.uint8. The act of requesting data from a Chunk blocks until its future actually delivers.

The interpretation of those bytes is out of scope for the physical layer: the physical layer only needs to deliver bytes (in futures) on demand.

Clone this wiki locally