-
Notifications
You must be signed in to change notification settings - Fork 76
Physical layer interface
In this wiki, I'll document the interface that Uproot expects from Source
classes, such as HTTPSource and XRootDSource. This document only describes reading because only reading has been implemented (Nov 2020).
When a user opens a file with uproot4.open, the URL scheme determines which Source to use:
- non-URL or
"file://"
defaults to MemmapSource, but MultithreadedFileSource is also a good option, -
"http://"
and"https://"
default to HTTPSource, -
"root://"
defaults to XRootDSource, - a non-string, non-Path object with
read
andseek
methods is handled by ObjectSource.
The uproot4.open returns a ReadOnlyDirectory, which has a file property pointing to a ReadOnlyFile, which has a source property pointing to the actual Source. The Source may be stateful, with open file handles and associated threads. When any object derived from uproot4.open exits a with
statement (through its __exit__
) or is explicitly closed, __exit__
calls are propagated all the way down to the Source, so that it can close or shutdown whatever it needs to.
The job of a Source is to deliver Chunk objects on demand. A Chunk represents physical bytes of a file, uninterpreted and (if directly from a Source), possibly compressed. The data in a Chunk might not have been read yet, but they have been requested. A Chunk is defined by:
- a pointer back to the Source,
-
start and stop, the
inclusive:exclusive
byte positions in the file, - a future, which adheres to a subset of the Python 3 Future protocol. (It only has to have a result method.)
The rest of Uproot interfaces with Chunk objects through get and remainder to get the raw data from the file (through the future) as a numpy.ndarray
of dtype numpy.uint8
. The act of requesting data from a Chunk blocks until its future actually delivers.
The interpretation of those bytes is out of scope for the physical layer: the physical layer only needs to deliver bytes (in futures) on demand.