-
-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide library #19
Comments
On Sat, May 29, 2021 at 12:46:07PM -0700, plata wrote:
It should be possible to put the core functionality into a library which can be reused by other projects.
Sure, but I would have to decide on an API/ABI, preferably one that doesn't
tie me up to libxml, so it does require some work. Is this something that
you actually need, or just a general idea?
|
Not OP but that's something I needed to integrate rdrview into another tool I'm building. I changed target platform to nodejs in the meantime (not only because of rdrview, some other things were missing or not production ready yet), so now I just use mozilla's reader instead. |
Not really under my control but I would love to see this being used by @TobiasFella in https://github.com/KDE/alligator. Background info: Flym feed reader uses Readability4J to show the complete feed content for RSS which only contains teaser text. I couldn't find a C/C++ library which provides the same/similar functionality. |
Not really under my control but I would love to see this being used by @TobiasFella in https://github.com/KDE/alligator.
And is there an advantage if alligator can call rdrview as a library, as
opposed to just using the cli through system() or similar? I think you would
have to parse a lot of pages per second to see a difference in performance;
maybe you are thinking of doing prefetching of all the new pages?
I'm leaving this issue open either way. I'll try to make time for it if the
alligator devs (or anyone else) can confirm that they need it.
|
Not OP but that's something I needed to integrate rdrview into another tool I'm building. I changed target platform to nodejs in the meantime (not only because of rdrview, some other things were missing or not production ready yet), so now I just use mozilla's reader instead.
Thanks for the feedback. I think it's typical for such projects to be
written in a higher-level language, like you ended up doing. Those languages
usually have a (more mature?) readability implementation available. That's
part of the reason I never bothered with this.
By the way, did you actually need rdrview as a library? Were you calling it
often enough to get performance issues from spawning processes, or was there
another reason?
|
I am using haxe, which can target different languages or VMs. I wanted to use the new-ish Haxe VM Hashlink for this project, but it's still very rough for sys things (handling processes, etc. ; even creating a web server was pre-alpha). Having rdrview as a lib would have allowed be to write native bindings for hashlink VM and would have eased the process, but I gave up because of all the other (current) shortcomings of hashlink as a sys application (vs game, which is its primary target). Still using haxe, but now targetting nodejs which is a much better fit for this project. |
I made a quick prototype where I call the executable: plata/alligator@30a5bf0 It takes some time when loading many feeds. However, I cannot tell if this is because it's not a library, if it's Internet access or if it's even related to rdrview at all. |
This is also something I would like to have - ideally a simple function that takes in an input HTML string, and returns or allocates the output HTML string. The main reason for this is performance, as there is some overhead when running a new process each time I want to get the content of a website. Working with the strings directly also means that I can do the fetching however I want, I don't need to rely on the networking capabilities of this project (for example fetching 100s of websites). C is also super portable, so adding this functionality would allow less popular languages to use this implementation |
@eafer Could you provide some hinds on what would be required to build a C library in case somebody would like to give it a try? Looking at the code, it isn't really obvious for me. In the meantime, I've been looking around for other readability implementations. While there are several for Python, Java, Javascript etc., I've not been able to find anything for C. Also, calling an executable is not an option in my use case (I'm not allowed to do so and it raises issues for packaging/delivery). |
@eafer Could you provide some hinds on what would be required to build a C library in case somebody would like to give it a try? Looking at the code, it isn't really obvious for me.
I haven't looked at the code in a while, but I think the biggest change that
would be required is to run cleanups and bubble up errors on failure; right
now the program just exits whenever there is a problem. My biggest concern
though, is that rdrview was always designed with the sandbox in mind, so
it's possible that there are places where I was not sufficiently careful in
the parsing. I'm also not very happy with making a library that depends on
something as big as libxml2, so I would like to let callers somehow provide
their own parser, but I don't know how realistic that is.
|
@eafer feel free to close this if there's no plan to provide a library. |
I don't mind leaving it open in case other people show up asking for this. |
It should be possible to put the core functionality into a library which can be reused by other projects.
The text was updated successfully, but these errors were encountered: