Skip to content

Pull requests: NVIDIA/NeMo-Curator

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

Remove minhash conditional for 25.02
#558 opened Feb 18, 2025 by praateekmahajan Loading…
3 tasks
Create FastText classifier module
#546 opened Feb 13, 2025 by sarahyurick Draft
Hard negative mining for Retriever fine-tuning
#523 opened Feb 5, 2025 by vinay-raman Loading…
3 tasks done
Added LookUp error handling during encoding detection.
#502 opened Jan 30, 2025 by ggcr Loading…
Clean up Pandas, cuDF, Dask, and Dask-cuDF DocumentDataset type logic gpuci Run GPU CI/CD on PR
#494 opened Jan 23, 2025 by sarahyurick Loading…
Standardize text_field and id_field terminology gpuci Run GPU CI/CD on PR
#485 opened Jan 17, 2025 by sarahyurick Loading…
Add nemo-toolkit dependency to gpuCI gpuci Run GPU CI/CD on PR
#480 opened Jan 10, 2025 by sarahyurick Loading…
Support dask_expr migration into dask.dataframe
#477 opened Jan 9, 2025 by rjzamora Loading…
3 tasks
[pre-commit.ci] pre-commit suggestions
#470 opened Jan 7, 2025 by pre-commit-ci bot Loading…
[WIP] Add RAPIDS Nightly to GPU CI gpuci Run GPU CI/CD on PR
#436 opened Dec 17, 2024 by praateekmahajan Draft
3 tasks
Updating the Quick Example
#432 opened Dec 16, 2024 by stsfaroz Loading…
Add TrafilaturaExtractor class
#431 opened Dec 13, 2024 by sarahyurick Loading…
Bump nltk from 3.8.1 to 3.9 in /tutorials/dapt-curation/code dependencies Pull requests that update a dependency file
#429 opened Dec 13, 2024 by dependabot bot Loading…
Fix GPU error messages for fuzzy deduplication gpuci Run GPU CI/CD on PR
#387 opened Nov 22, 2024 by sarahyurick Loading…
2 tasks done
Remove max_text_bytes_per_part gpuci Run GPU CI/CD on PR
#385 opened Nov 20, 2024 by sarahyurick Draft
Create Cache class for exact, fuzzy, and semantic deduplication gpuci Run GPU CI/CD on PR
#384 opened Nov 19, 2024 by sarahyurick Loading…
4 tasks done
ci: Add copyright-check workflow
#369 opened Nov 14, 2024 by ko3n1g Loading…
3 tasks
Added example notebook for translation with ct2 model. documentation Improvements or additions to documentation
#262 opened Sep 25, 2024 by uahmed93 Draft
3 tasks
Fixed bug: changed to correct model name
#186 opened Aug 6, 2024 by ByteWrite Loading…
1 of 3 tasks
ProTip! Adding no:label will show everything without a label.