Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: too many colons in file path #1038

Closed
matthewfeickert opened this issue Nov 21, 2023 · 7 comments · Fixed by #1022
Closed

ValueError: too many colons in file path #1038

matthewfeickert opened this issue Nov 21, 2023 · 7 comments · Fixed by #1022
Assignees
Labels
bug The problem described is something that must be fixed

Comments

@matthewfeickert
Copy link
Member

matthewfeickert commented Nov 21, 2023

Hi. So this might be more of an XRootD issue than an upoort one, but from the "Opening a file" docs I'm not sure what the limits are so I thought I'd ask.

I don't have a great public minimal reproducible example yet, so I'll just share the example that @ivukotic found today and sent me. In this example there is one (of many) ROOT files on the UChicago Analysis Facility that we're able to read using XRootD from inside of a Docker container (sslhep/analysis-dask-base:latest) that provides the base environment for the k8 pod that is serving the user a Jupyter Lab environment. This environment has uproot v5.1.2 and XRootD Python bindings in it but when we try to open the file in the following test.py

# test.py
import importlib.metadata

import uproot
import XRootD

print(f"uproot version: {uproot.__version__}")
print(f"XRootD Python bindings version: {importlib.metadata.version('XRootD')}")

xrootd_uri = "root://xcache.af.uchicago.edu:1094//root://fax.mwt2.org:1094//pnfs/uchicago.edu/atlaslocalgroupdisk/rucio/data18_13TeV/df/a4/DAOD_PHYSLITE.34858087._000001.pool.root.1"
file = uproot.open(xrootd_uri)
file

we error out with

ValueError: too many colons in file path: root://xcache.af.uchicago.edu...

Full example here, but this requires the user to have ATLAS credentials here:

$ docker run --rm -ti sslhep/analysis-dask-base:latest /bin/bash
Configured GCC from: /opt/lcg/gcc/11.2.0-8a51a/x86_64-centos7/bin/gcc
Configured AnalysisBase from: /usr/AnalysisBase/24.2.26/InstallArea/x86_64-centos7-gcc11-opt
Configured PyColumnarPrototype from: /usr/tools/PyColumnarPrototypeDemo/1.0.0/InstallArea/x86_64-centos7-gcc11-opt
(venv) [bash][atlas AnalysisBase-24.2.26]:analysis > kinit [email protected]
Password for [email protected]:
(venv) [bash][atlas AnalysisBase-24.2.26]:analysis > klist
Ticket cache: FILE:/tmp/krb5cc_1000
Default principal: [email protected]

Valid starting     Expires            Service principal
11/21/23 06:48:13  11/22/23 07:47:43  krbtgt/[email protected]
	renew until 11/26/23 06:47:43
(venv) [bash][atlas AnalysisBase-24.2.26]:analysis > xrdcp -f root://xcache.af.uchicago.edu:1094//root://fax.mwt2.org:1094//pnfs/uchicago.edu/atlaslocalgroupdisk/rucio/data18_13TeV/df/a4/DAOD_PHYSLITE.34858087._000001.pool.root.1  .
[252.2MB/252.2MB][100%][==================================================][3.655MB/s]
(venv) [bash][atlas AnalysisBase-24.2.26]:analysis > file DAOD_PHYSLITE.34858087._000001.pool.root.1
DAOD_PHYSLITE.34858087._000001.pool.root.1: ROOT file Version 62608 (Compression: 505)
(venv) [bash][atlas AnalysisBase-24.2.26]:analysis > cat test.py
import importlib.metadata

import uproot
import XRootD

print(f"uproot version: {uproot.__version__}")
print(f"XRootD Python bindings version: {importlib.metadata.version('XRootD')}")

xrootd_uri = "root://xcache.af.uchicago.edu:1094//root://fax.mwt2.org:1094//pnfs/uchicago.edu/atlaslocalgroupdisk/rucio/data18_13TeV/df/a4/DAOD_PHYSLITE.34858087._000001.pool.root.1"
file = uproot.open(xrootd_uri)
file
(venv) [bash][atlas AnalysisBase-24.2.26]:analysis > python test.py
uproot version: 5.1.2
XRootD Python bindings version: 5.4.3
Traceback (most recent call last):
  File "/analysis/test.py", line 10, in <module>
    file = uproot.open(xrootd_uri)
  File "/venv/lib/python3.9/site-packages/uproot/reading.py", line 126, in open
    file_path, object_path = uproot._util.file_object_path_split(path)
  File "/venv/lib/python3.9/site-packages/uproot/_util.py", line 314, in file_object_path_split
    raise ValueError(f"too many colons in file path: {path} for url {parsed_url}")
ValueError: too many colons in file path: root://xcache.af.uchicago.edu:1094//root://fax.mwt2.org:1094//pnfs/uchicago.edu/atlaslocalgroupdisk/rucio/data18_13TeV/df/a4/DAOD_PHYSLITE.34858087._000001.pool.root.1 for url ParseResult(scheme='root', netloc='xcache.af.uchicago.edu:1094', path='//root://fax.mwt2.org:1094//pnfs/uchicago.edu/atlaslocalgroupdisk/rucio/data18_13TeV/df/a4/DAOD_PHYSLITE.34858087._000001.pool.root.1', params='', query='', fragment='')
(venv) [bash][atlas AnalysisBase-24.2.26]:analysis >

Can you provide any details on if there's revisions to the naming scheme that we'll need to use or if there's something we did wrong?

Please attach a small ROOT file that reproduces the issue! If small and public, you can drag-and-drop it into the issue—rename the extension to "txt" so that GitHub allows it. If large, you can put it on some large-file service (e.g. Dropbox). In general, we can't access XRootD URLs (most are not public).

Let me work on this. Maybe @ivukotic can temporarily move the file to a public area?

cc @alexander-held and @oshadura in the event that they came across similar issues with the IRIs-HEP Analysis Grand challenge before.

@matthewfeickert matthewfeickert added the bug (unverified) The problem described would be a bug, but needs to be triaged label Nov 21, 2023
@lobis
Copy link
Collaborator

lobis commented Nov 21, 2023

Hello @matthewfeickert ,

We have been trying to simplify how paths are handled by uproot with the goal of delegating as much responsibility as possible to fsspec. We tried to maintain support for all previous usages but it appears we didn't cover all cases. I'll add a test with this particular url to make sure it keeps working in the future. (sorry for the troubles!)

This particular path you are posting will be correctly processed after 0ace10af972ba825d98c28fa7df97cd3ef0b480f wh. However this PR is not yet available in the main branch as it introduces some breaking changes. We will soon produce a v5.2.0rc2 pre-release which will also include this PR. In the meantime you can also use the main-fsspec branch which should correctly process this url.

The new naming scheme is pretty simple:

  • Only files ending in exactly .root will support the path:object scheme (when you want to specify which object to read from the file url). Uproot is responsible for correctly splitting the object if present.
  • The file (without object) url is fed into fsspec which will attempt to resolve the path after applying protocol chaining. :: and :// are the special sequences that fsspec uses for protocol chaining. The resolution of this url is now a responsibility of fsspec or the particular package that implements the filesystem in question (such as fsspec-xrootd).

In this particular case it looks like there are two scheme (://root) sequences so I'm not 100% sure it would work (I'm not familiar with what an xrootd url can look like). In the case it doesn't work, the issue should be raised with https://github.com/CoffeaTeam/fsspec-xrootd. But please let us know in any case and I will raise the issue in case it doesn't work.

If it doesn't work you could also try to use the handler option as in: uproot.open(urlpath, handler=uproot.source.xrootd.XRootDSource) which should revert to the previous behaviour. (But this should work with the default handler, if it doesn't it's a bug).

@lobis lobis self-assigned this Nov 21, 2023
@matthewfeickert
Copy link
Member Author

Thanks @lobis. Yeah, after installing from 0ace10a and installing fsspec-xrootd things indeed work. 👍

$ docker run --rm -ti sslhep/analysis-dask-base:latest /bin/bash
Configured GCC from: /opt/lcg/gcc/11.2.0-8a51a/x86_64-centos7/bin/gcc
Configured AnalysisBase from: /usr/AnalysisBase/24.2.26/InstallArea/x86_64-centos7-gcc11-opt
Configured PyColumnarPrototype from: /usr/tools/PyColumnarPrototypeDemo/1.0.0/InstallArea/x86_64-centos7-gcc11-opt
(venv) [bash][atlas AnalysisBase-24.2.26]:analysis > python -m pip --quiet uninstall --yes uproot
(venv) [bash][atlas AnalysisBase-24.2.26]:analysis > python -m pip --quiet install --upgrade git+https://github.com/scikit-hep/uproot5.git@0ace10af972ba825d98c28fa7df97cd3ef0b480f
(venv) [bash][atlas AnalysisBase-24.2.26]:analysis > python -m pip show uproot
Name: uproot
Version: 5.2.0rc1
Summary: ROOT I/O in pure Python and NumPy.
Home-page: 
Author: 
Author-email: Jim Pivarski <[email protected]>
License: 
Location: /venv/lib/python3.9/site-packages
Requires: awkward, fsspec, numpy, packaging, typing-extensions
Required-by: coffea
(venv) [bash][atlas AnalysisBase-24.2.26]:analysis > python -m pip install --upgrade fsspec-xrootd
Collecting fsspec-xrootd
  Downloading fsspec_xrootd-0.2.2-py3-none-any.whl.metadata (4.1 kB)
Requirement already satisfied: fsspec in /venv/lib/python3.9/site-packages (from fsspec-xrootd) (2023.10.0)
Downloading fsspec_xrootd-0.2.2-py3-none-any.whl (11 kB)
Installing collected packages: fsspec-xrootd
Successfully installed fsspec-xrootd-0.2.2
(venv) [bash][atlas AnalysisBase-24.2.26]:analysis > kinit [email protected]
Password for [email protected]: 
(venv) [bash][atlas AnalysisBase-24.2.26]:analysis > klist
Ticket cache: FILE:/tmp/krb5cc_1000
Default principal: [email protected]

Valid starting     Expires            Service principal
11/21/23 09:02:28  11/22/23 10:02:19  krbtgt/[email protected]
	renew until 11/26/23 09:02:19
(venv) [bash][atlas AnalysisBase-24.2.26]:analysis > vi test.py
(venv) [bash][atlas AnalysisBase-24.2.26]:analysis > python -i test.py 
uproot version: 5.2.0rc1
XRootD Python bindings version: 5.4.3
>>> file
<ReadOnlyDirectory '/' at 0x7f3c1c2bba30>
>>> len(file["CollectionTree"].keys())
941
>>> 
(venv) [bash][atlas AnalysisBase-24.2.26]:analysis > 

So we'll be on the lookout for v5.2.0rc2!

matthewfeickert added a commit to usatlas/analysisbase-dask that referenced this issue Nov 21, 2023
* Install a precursor to uproot v5.2.0rc2 from GitHub and install
  fsspec-xrootd to provide support for xrootd access from uproot.
   - c.f. scikit-hep/uproot5#1038
   - Note: This should be changed to an install from PyPI as soon as a
     release candidate is available for stability/reproducibility.
* Rebuild lock file.
@alexander-held
Copy link
Member

Out of curiosity, would uproot.open({xrootd_uri: None}) also work pre-patch and/or is that not (no longer?) recommended? That previously was the generic workaround (see e.g. #669).

@lobis
Copy link
Collaborator

lobis commented Nov 21, 2023

Out of curiosity, would uproot.open({xrootd_uri: None}) also work pre-patch and/or is that not (no longer?) recommended? That previously was the generic workaround (see e.g. #669).

Yes, this will also work and it's the most robust way to specify the object inside the file. In this case there are no restrictions that the file name needs to end in .root

@ivukotic
Copy link

ivukotic commented Nov 21, 2023

So about possible paths...
In ATLAS very often files end with ".root.x" where x is an integer.
Paths that use xcache can look like:

  • root[s]://xcacheserver[:port]//root[s]://originserver:[port]/path/file
  • http[s]://xcacheserver[:port]//root[s]://originserver:[port]/path/file
  • root[s]://xcacheserver[:port]//http[s]://originserver:[port]/path/file
  • http[s]://xcacheserver[:port]//http[s]://originserver:[port]/path/file

Best,
Ilija

@ivukotic
Copy link

ivukotic commented Nov 21, 2023

Another issue is opening more than one file:

import uproot

xc='root://xcache.af.uchicago.edu:1094//'
fname_data = xc+"root://fax.mwt2.org:1094//pnfs/uchicago.edu/atlaslocalgroupdisk/rucio/data18_13TeV/df/a4/DAOD_PHYSLITE.34858087._000001.pool.root.1"
fname_dat1 = xc+"root://fax.mwt2.org:1094//pnfs/uchicago.edu/atlaslocalgroupdisk/rucio/data18_13TeV/6c/67/DAOD_PHYSLITE.34858087._000002.pool.root.1"

tree_data = uproot.iterate(
    {fname_data: "CollectionTree"}, {fname_dat1: "CollectionTree"}
)
next(tree_data)  # trigger error
ValueError: cannot produce Awkward Arrays for interpretation AsObjects(Unknown_xAOD_3a3a_MissingETAssociationMap_5f_v1) because

    xAOD::MissingETAssociationMap_v1

instead, try library="np" rather than library="ak" or globally set uproot.default_library

in file root://xcache.af.uchicago.edu:1094//root://fax.mwt2.org:1094//pnfs/uchicago.edu/atlaslocalgroupdisk/rucio/data18_13TeV/df/a4/DAOD_PHYSLITE.34858087._000001.pool.root.1
in object /CollectionTree;1:METAssoc_AnalysisMET

The same happens even if I try to open single file using iterator...

@lobis lobis linked a pull request Nov 22, 2023 that will close this issue
@lobis lobis added bug The problem described is something that must be fixed and removed bug (unverified) The problem described would be a bug, but needs to be triaged labels Nov 22, 2023
@lobis
Copy link
Collaborator

lobis commented Nov 22, 2023

I will leave this issue open in case anyone runs into the same error. It will be fixed by #1022 which will be available in the next release (5.2.0). This issue will be automatically closed when this is merged into main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The problem described is something that must be fixed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants