Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement TransformStream (Web Stream) for csv-parse #444

Closed
uasan opened this issue Nov 18, 2024 · 13 comments
Closed

Implement TransformStream (Web Stream) for csv-parse #444

uasan opened this issue Nov 18, 2024 · 13 comments

Comments

@uasan
Copy link
Contributor

uasan commented Nov 18, 2024

Summary

Hi, our data source is ReadableStream, due to performance we would not like to use the twoWeb or fromWeb adapters, I see in the parser there is a file
https://github.com/adaltas/node-csv/blob/master/packages/csv-parse/lib/stream.js
but it is not exported and the implementation itself is not complete, there is no error handling and no promise waiting for the chunk parsing to complete.

I can make a PR in which I will add everything necessary for its correct operation to the stream.js file and make an export, would you be against this PR?

Motivation

If the data source is a web stream, such as fetch, then a productive pipe interface is needed to parse the CSV.

Alternative

I can implement TransformStream in my application if I export from your library the transform function from the file https://github.com/adaltas/node-csv/blob/master/packages/csv-parse/lib/api/index.js

@wdavidw
Copy link
Member

wdavidw commented Nov 18, 2024

You are right, the Web Stream implementation is an undocumented API. I did it some time ago but in my memory it was functional. Errors are expected to be propagated but maybe you have something in mind that must be handle inside the transformer. I did some performance comparison and it was slower than with the Node.js stream API. I would review PR, just that the test coverage need to be complete.

@uasan
Copy link
Contributor Author

uasan commented Nov 18, 2024

PR: #445

Yes, you are right, parser errors are always sync and web streams will handle them, I made all transform and flush synchronous, they will not create extra promise objects, I also moved callbacks to closures so as not to create new functions for each chunk.

@wdavidw
Copy link
Member

wdavidw commented Nov 18, 2024

Would you mind adding some test to it. I must admit, i was kind of lazy at the time because I don't see much. It would helps because I am not too familiar with the webstream api and I remember it to be fairly complex to understand.

@uasan
Copy link
Contributor Author

uasan commented Nov 18, 2024

@wdavidw I wrote a simple test and fixed the stream closings.

@uasan
Copy link
Contributor Author

uasan commented Nov 20, 2024

I added errors handle to TransformStream

@uasan
Copy link
Contributor Author

uasan commented Nov 20, 2024

@wdavidw When is the new version planned to be published in NPM?
Thanks.

@wdavidw
Copy link
Member

wdavidw commented Nov 20, 2024

I am fighting with the latest lerna version and its version command which requires a package lock file which the CI doesn't really appreciate.

@uasan
Copy link
Contributor Author

uasan commented Nov 20, 2024

I read the specification again and did some experiments, for correct error handling it is enough to call the function controller.error(error), throwing an exception can be removed, it is unnecessary.

Waiting for the release of a new version )

@wdavidw
Copy link
Member

wdavidw commented Nov 20, 2024

Indeed, the "API Web Stream/stream web TransformStream" test behaves the same.

@uasan
Copy link
Contributor Author

uasan commented Nov 20, 2024

By the way, it will be useful for many if the documentation includes descriptions and a simple example of a web stream, I think many will be interested in such an example:

import { parse } from 'csv-parse/stream';

const response = await fetch('file.csv');

for await (const record of response.body.pipeThrough(parse())) {
   console.log(record);
}

@wdavidw
Copy link
Member

wdavidw commented Nov 21, 2024

It certainly makes sense. I was waiting for more knowledge/confidence in the implementation before pushing docs and sample. Also lack of time..

@wdavidw
Copy link
Member

wdavidw commented Nov 21, 2024

Version 5.6.0 of csv-parse is published.

@wdavidw wdavidw closed this as completed Nov 21, 2024
@uasan
Copy link
Contributor Author

uasan commented Nov 21, 2024

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants