Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: mlcommons/croissant
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v1.0.11
Choose a base ref
...
head repository: mlcommons/croissant
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: main
Choose a head ref
Loading
Showing with 4,707 additions and 243 deletions.
  1. +1 −0 .github/workflows/ci.yml
  2. +1 −1 README.md
  3. +880 −0 datasets/1.0/huggingface-lmms-eval-lite/metadata.json
  4. +3 −0 datasets/1.0/huggingface-lmms-eval-lite/output/gqa.jsonl
  5. +27 −25 datasets/1.0/huggingface-open-hermes/metadata.json
  6. +250 −0 datasets/1.0/huggingface-pollen-robotics-apple-storage/metadata.json
  7. +2 −0 datasets/1.0/huggingface-pollen-robotics-apple-storage/output/default.jsonl
  8. +293 −0 datasets/1.0/huggingface-rag-dataset/metadata.json
  9. +2 −0 datasets/1.0/huggingface-rag-dataset/output/question-answer.jsonl
  10. +13 −11 datasets/1.0/huggingface-squad/metadata.json
  11. +48 −46 datasets/1.0/huggingface-the-cauldron/metadata.json
  12. +6 −6 datasets/1.0/huggingface-web-of-science/metadata.json
  13. +208 −0 datasets/1.1/huggingface-baratilab-flow3d/metadata.json
  14. +2 −0 datasets/1.1/huggingface-baratilab-flow3d/output/0_0100_01.4_1.0E-4_1.0E-2.jsonl
  15. +254 −0 datasets/1.1/huggingface-pollen-robotics-apple-storage/metadata.json
  16. +2 −0 datasets/1.1/huggingface-pollen-robotics-apple-storage/output/default.jsonl
  17. +229 −0 datasets/1.1/huggingface-recipe_RL_data_roberta-base/metadata.json
  18. +2 −0 datasets/1.1/huggingface-recipe_RL_data_roberta-base/output/default.jsonl
  19. +1,871 −0 docs/croissant-spec-draft.md
  20. +1 −1 editor/Dockerfile
  21. +6 −0 editor/deploy_to_hf.sh
  22. +2 −2 editor/requirements.txt
  23. +3 −31 python/mlcroissant/mlcroissant/_src/beam.py
  24. +11 −0 python/mlcroissant/mlcroissant/_src/core/constants.py
  25. +11 −0 python/mlcroissant/mlcroissant/_src/core/context.py
  26. +8 −0 python/mlcroissant/mlcroissant/_src/core/data_types.py
  27. +1 −3 python/mlcroissant/mlcroissant/_src/core/dataclasses.py
  28. +2 −2 python/mlcroissant/mlcroissant/_src/core/json_ld_test.py
  29. +2 −0 python/mlcroissant/mlcroissant/_src/core/rdf.py
  30. +90 −0 python/mlcroissant/mlcroissant/_src/core/regex.py
  31. +60 −0 python/mlcroissant/mlcroissant/_src/core/regex_test.py
  32. +30 −45 python/mlcroissant/mlcroissant/_src/datasets.py
  33. +39 −10 python/mlcroissant/mlcroissant/_src/datasets_test.py
  34. +7 −12 python/mlcroissant/mlcroissant/_src/operation_graph/execute.py
  35. +24 −27 python/mlcroissant/mlcroissant/_src/operation_graph/operations/field.py
  36. +1 −2 python/mlcroissant/mlcroissant/_src/operation_graph/operations/filter.py
  37. +5 −1 python/mlcroissant/mlcroissant/_src/operation_graph/operations/read.py
  38. +10 −8 python/mlcroissant/mlcroissant/_src/structure_graph/base_node.py
  39. +52 −1 python/mlcroissant/mlcroissant/_src/structure_graph/nodes/field.py
  40. +8 −0 python/mlcroissant/mlcroissant/_src/structure_graph/nodes/field_test.py
  41. +5 −1 python/mlcroissant/mlcroissant/_src/structure_graph/nodes/file_set.py
  42. +12 −4 python/mlcroissant/mlcroissant/_src/structure_graph/nodes/metadata.py
  43. +23 −0 python/mlcroissant/mlcroissant/_src/structure_graph/nodes/metadata_test.py
  44. +1 −2 python/mlcroissant/mlcroissant/_src/structure_graph/nodes/record_set.py
  45. +96 −0 python/mlcroissant/mlcroissant/_src/tests/graphs/1.1/mlfield_bad_array_definition/metadata.json
  46. +2 −0 python/mlcroissant/mlcroissant/_src/tests/graphs/1.1/mlfield_bad_array_definition/output.txt
  47. +97 −0 python/mlcroissant/mlcroissant/_src/tests/graphs/1.1/mlfield_bad_array_shape/metadata.json
  48. +2 −0 python/mlcroissant/mlcroissant/_src/tests/graphs/1.1/mlfield_bad_array_shape/output.txt
  49. +1 −1 python/mlcroissant/pyproject.toml
  50. +1 −1 python/mlcroissant/recipes/tfds_croissant_builder.ipynb
1 change: 1 addition & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -171,6 +171,7 @@ jobs:
mypy \
networkx \
pandas \
pillow \
pyarrow \
pytest \
pytype \
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -146,7 +146,7 @@ Here is an extremely simple example of the Croissant format, with comments showi
- [Join](https://mlcommons.org/community/subscribe/) the mailing list
- Attend Croissant meetings (please joint the list to automatically receive the invite)
- [File issues for](https://github.com/mlcommons/croissant) bugs for feature requests
- [Contribute code](https://github.com/mlcommons/croissant) (please sign the MLCommons Association CLA first!)
- [Contribute to the code](https://github.com/mlcommons/croissant). To merge PRs, you will need to sign the MLCommons Association CLA at: https://mlcommons.org/community/subscribe/

## Integrations

Loading