-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All the Maths #1
Comments
I'm unclear about the goal. Is it to provide BLAS and/or LAPACK/LINPACK bindings for people to use in their applications? Encourage the V8 people to work on numerical performance? Both? There's a strawman for native SIMD support and I believe Intel is committed to bringing that to V8. (Spidermonkey nightly already implements the strawman but I don't think it has made its way into a release yet. Or if it did, it's probably behind a flag.) There's also River Trail (strawman), another Intel effort that aims to bring parallel numerical computing to JS. Mozilla calls it PJS and I believe they have declared an intent to implement it but I don't know what the status is. This meta-bug is the tracking bug, if you're interested. |
@bnoordhuis I think all the enhancements that Mozilla and Intel have been working on will be great and I'm looking forward to them but R and scipy don't have them and are seeing adoption in one area that JS isn't at the moment. The only they have but we don't is access to these ancient libraries. If we were able to transpile them we might see some of that adoption, if they are slow the VM engineers will likely make them faster or work a little faster getting these other features in and improving emscripten or these libraries directly to use them. |
The obstacles to getting high performance numerical code working in JS are relatively clear:
The first three of these issues are really language design problems, and not things that we are going to solve at the node level. It is up to the ES committee to get these real performance impacting issues fixed, and the best we can do is put pressure on them to make it a priority over syntactic twiddling. Still even if the ES committee never sorts these problems out, I would like to point out that at least in regards to the first 3 points JS is certainly no worse than Python, which suffers from exactly the same draw backs. And yet, people use Python extensively for numerical computing, even though an equally strong case could be made that the semantics of the language do not make it suitable for the task. Accepting the first 3 items as a given, we are left with what I believe is the real problem here: The main blocking issue to getting numerical computing working in JavaScript is interoperability with C and Fortran code bases.Interoperability is really a broader problem that affects many applications (for example crypto could benefit a lot here as well), but it is felt in numerical computing especially acutely due to the size, sophistication and fundamental importance of legacy numerical code. In my dream of how things could work, I would love to see a more modular/npm-centric solution to this problem. Something where we could just |
I think ruling out making native bindings to existing libraries sells Node short of possibilities. Especially in terms of performance. TBH I don't care about making this browser compatible, because even with all the proposed optimizations (which will take a while to come out) it'll still be much slower than using native methods. V8 has dome some great work to make calling into C++ from JS really cheap. Here's an example that just sums an array of // run.js
var runMe = require('./build/Release/addon').runMe;
var smalloc = require('smalloc');
var stypes = smalloc.Types;
var ITER = 1e5;
var SIZE = 1024;
var b = smalloc.alloc(SIZE, {}, stypes.Uint32);
for (var i = 0; i < SIZE; i++)
b[i] = i;
function simpleSum(arr, len) {
var ret = 0;
for (var i = 0; i < len; i++)
ret += arr[i];
return ret;
}
var t = process.hrtime();
for (var i = 0; i < ITER; i++) {
//runMe(b);
//simpleSum(b, SIZE);
}
t = process.hrtime(t);
console.log(((t[0] * 1e9 + t[1]) / ITER).toFixed(1) + ' ns/op'); #include <v8.h>
#include <node.h>
using v8::FunctionCallbackInfo;
using v8::Handle;
using v8::Local;
using v8::Object;
using v8::Value;
void RunMe(const FunctionCallbackInfo<Value>& args) {
Local<Object> obj = args[0].As<Object>();
uint32_t* arr =
static_cast<uint32_t*>(obj->GetIndexedPropertiesExternalArrayData());
size_t len = obj->GetIndexedPropertiesExternalArrayDataLength();
uint32_t ret = 0;
for (size_t i = 0; i < len; i++)
ret += arr[i];
args.GetReturnValue().Set(ret);
}
void init(Handle<Object> exports) {
NODE_SET_METHOD(exports, "runMe", RunMe);
}
NODE_MODULE(addon, init) So, Now, the math libraries themselves should be able to be done without much build pain and, TBH, if computationally complex operations are being done on my server I'll care way more about taking a minute to build a library and save the CPU cycles. |
As a data scientist in learning (who use Python for that), with a huge love for JavaScript, I can name three reasons why I doubt I will ever get to use JavaScript for actual numerical computing.
|
For technical clarity, it's actually
True, and this could be a challenge. Technically someone could create their own parser and instead of operating in native JS operations they instead pass in a string. Ugly, yes, but it could get the job done. One reason why I implemented the |
I like the idea of extending the Buffer concept to implement a BigNum type, which could be further extended to implement things like Vector and Matrix types. If I recall correctly, it might even be possible to override operators at the C++ level to have the prettier syntax for working with those more complex types. |
Do you mean JS operators? |
Yeah, I thought I remembered seeing functions to define operator behaviour |
@Qard Not that I remember. @bnoordhuis You know of some such black magic? |
I don't think there's any such magic. About the only thing you can do in C++ that you can't do from JS is make objects callable as functions (like how RegExp used to be.) |
Indeed. You appear to be right. After some quick grepping of the source, I'm finding no such convenience. I must've been remembering some other VM. :( |
In ES6 it is possible to create a typed array with up to 2^53 - 1 elements. If this is not implemented in V8 (or perhaps, not implemented in the version of V8 that ships with Node) than it is worth filing a bug to make sure it gets fixed.
You should be able to access up to 2^53 - 1. Not quite 2^64 - 1, but certainly a lot more than 4 * 1024^3.
This is where things get tougher, and require tricky ESfuture-track work. The current thought on all things in this design space is found in https://github.com/nikomatsakis/typed-objects-explainer/; operator overloading is the least-developed of these proposals, unfortunately. But the good news, at least, is that everyone on TC39 agrees this must be done. It's just a matter of finding the right people and time to design an operator overloading extension to the JavaScript language, and making sure it integrates with all the related topics like value types and typed objects and user-defined literals. |
This is an accepted ESfuture-track proposal with good detail already, shipping in SpiderMonkey and with a patch open for V8 by Intel.
(See above about value types)
This gets tricky. We'd really like to avoid shared-memory multithreading and that whole concurrency model, with data races etc. But there is definitely room for improving the existing web worker and transferable infrastructure into something with more room for performance. I don't remember if this is exactly right, but I think at one point I was convinced you could allow a typed array (or similar) to be shared among multiple workers as long as only one of them had write access.
Emscripten seems perfect for this? |
I tried so hard to get emscripten running to port ancient FORTRAN code but failed to get it running on my Mac :( |
@mikeal I'd love to try hacking on this at oakland JS one week. Something to do proper vector math with overloaded operators would be swell. Perhaps a browserify transform can take care of the sugar for that. |
@thealphanerd that would be awesome :) |
Germane for the SIMD point: https://hacks.mozilla.org/2014/10/introducing-simd-js/ |
Also interested in this. I would like to explore several alternatives, including
Also been meaning to explore JuPyTer (aka IPython notebooks) and generally making JS more sci/math friendly. |
@smikes great ideas Sam but I'm curious why you think it would be easiest to do C bindings. I'm told emscripten actually can convert the Fortran code but I don't know of anyone that has written a v8 binding to a fortran library :) |
Well for starters LAPACK already ships a set of C headers : http://www.netlib.org/lapack/#_standard_c_language_apis_for_lapack And ARPACK has a C++ api, http://www.ime.unicamp.br/~chico/arpack++/ And when I was writing quantum chemistry codes (which was very long ago) we generally found it easy to call a Fortran function from C. So if the fortran library can be compiled to a native binary, then we can write C/C++ code that bridges JS -> C++ -> F90 |
I think you can forget about ARPACK, it is not important, e.g. R did not have it either until very recently, and even now only as a 3rd party package. ARPACK is also outdated and unmaintained. LAPACK is definitely needed, however, but has C versions as well (translated from Fortran). |
Does anyone know why R uses the Fortran LAPACK instead of the C version? I can't imagine it was easier for them to bind to. |
Tips:
In the library I develop, we use (part of) LAPACK, and on Windows we use the f2c translated (and postprocessed) sources, for ARPACK as well, actually. This is to allow people build with Visual Studio. |
I just mentioned the C LAPACK, because it is a way, if emscripten or your tool of choice does not support Fortran. |
I am working on building a docker image that will contain
My idea is that if we have a known good Fortran -> JS toolchain installed somewhere, then we can clone it and experiment with it. Docker is suitable for this because we don't care about native code when the final compilation target is (inherently cross-platform) javascript. |
Started a repository for the Docker image project - https://github.com/smikes/femscripten -- Dockerfile checked in and some sample code. Docker image upload in progress as of now (Sun 16 Nov 2014 17:38:38 UTC) to docker https://registry.hub.docker.com/u/smikes/femscripten/ |
The docker image for FORTRAN->JS is now working and I have tested with a subroutine from LAPACK -- a very simple one, to be sure, but in principle it should work. Come over to https://github.com/smikes/femscripten to check it out. Docker images are not yet available for download but you can build your own with the Dockerfile provided in the github repo. |
@smikes That's awesome! |
Thanks! I was not aware of that. That project even wound up with R-in-the-browser running on an iPad (see picture). That was a proof-of-concept; the input interface was via dialog boxes and there was (afaict) no support for graphical output. I understand R already has some JS-generating output tools (eg, canvas - http://rforge.net/canvas/ ) so a moderately useful r-in-browser should ---
Something a lot like an IPython notebook, but fully browser-based (or is it Jupyter now?). |
As I see there are even fundamental problems with it. The most important one is no The things you wrote are kind of easy after that. The graphics display is maybe more challenging, but R has SVG output, and graphics device can be simply implemented (kind of) as plugins, by implementing very few graphics primitives, so it should not be hard. |
Just to comment on the main thread as well, a lot numerical libraries are already available in JS. E.g. some links: Some of these are partially implemented, and non-native JS, but require C/C++ code I guess. I also don't know if they are robust, or even just usable. |
The only one of those I've checked out is It implements a foreign-function interface to a handful of functions:
But it's true, there's a lot of sci/math javascript out there that's not well publicized. (For example, there's a pure-JS in-browser 3d molecule viewer widget.) Maybe the node-forward/discussions wiki would be a good place to keep track of these? I can't commit to maintaining that list, though, so I'm not going to start it. ;-) |
Whoa! Just checked back in here and awesome progress @smikes !!! I think the next big step is to get the emscripten output into something a bit more manageable/modular. (Hopefully so it is possible to load up multiple libraries at the same time, installed via CommonJS). In an ideal world, these libraries would be installable via npm and expose a CommonJS interface. By default in a node environment they would expose a wrapper over the native lapack for your system, but if it is not present or linking fails; then they would have a pure JS fallback (which would also work in a browser). How large are the resulting code objects from femscripten out of curiosity? |
I agree that this would be awesome. We are still pretty far from that, though.
Huge, enormous -- 350k for unoptimized, unminified, down to 150k for
However, most of that data is emscripten overhead for setting up the virtual processor and memory and all that. The actual But even that's not an algorithm suitable for calling from plain javascript; it still presumes the existence of the emscripten virtual machine. What I'd like to do is get a higher-level interpretation of the fortran source -- a representation of the AST perhaps -- and try generating javascript from that. Even the LLVM bitcodes are too low level. (And I'd prefer not to have do the translation manually.) gfortran provides the option |
150K isn't that big, we're just more sensitive to this kind of thing in the JS community :) |
Great initiative everyone, interested to see where this goes. Just my .02 but a quick look shows that lapack builds produce static libs, so I'm not clear what the advantages of a fortran -> c -> emscripten transpile strategy would be. I'd have assumed userland modules wrapping lapacks native C api would be the way to integrate it. Because JS is constrained to IEEE 754 double precision floating point numbers with well-known implications, I'd also assume some number type abstraction (BigDecimal.js etc) would be needed, with functions replacing the typical arithmetic operators. |
There are several related but ultimately different goals which I'd like to support: Making FORTRAN libraries available to node programs on the serverThis is actually my area of greatest interest. Right now one thing that's painful with node is having the appropriate toolchain installed to build binary apps (python, gyp, appropriate compiler). Rather than adding to that for FORTRAN codes, I'd like to see if I can set up a docker image that provides a FORTRAN-to-native cross-compiler, and make that available as a node/docker package. It seems kind of crazy that it would be simpler to provide people with a whole virtual machine than to get them to install a FORTRAN compiler, but I guess that's life in the 21st century. ;-) Running formerly FORTRAN-only or otherwise native-only code in the browserThis is where emscripten is useful; building R or Octave or things like that and running them in the browser. Making javascript a friendly environment for new math/science code developmentI would like to see LAPACK algorithms translated into idiomatic javascript, ideally without a lot of manual intervention. I would like to make it easy to take advantage of OpenCL and SIMD efforts. The lack of operator overloading and intrinsic bignum, bigdecimal, and complex types is a problem here. Maybe a macro system such as |
Write FFI bindings. If you're using gfortran, or Intel's compiler on *nix, then it's fairly simple to call a Fortran library through the C ABI.
Doing Fortran-to-Emscripten translation with Dragonegg is risky from a maintenance perspective. While you may be able to get things to work with gcc 4.6 and llvm 3.3 (the next thing you'll need to do for non-trivial Fortran code is Emscripten-compile the libgfortran runtime library), it's unlikely that this will continue to work in the future for newer versions of gcc or llvm. See http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-July/074784.html for example, Dragonegg hasn't been a very high priority for LLVM developers since clang matured enough to stand on its own. [edit: Dragonegg is now officially unmaintained and dead http://reviews.llvm.org/rL236077] I personally don't see the point of trying to do numerically intensive scientific computing in the browser when things like http://cloud.sagemath.com, http://tmpnb.org, and www.juliabox.org exist. And just so everyone's clear, BLAS and LAPACK are primarily API specifications - the reference implementations from Netlib are rarely used for serious work. Anyone who is remotely performance sensitive uses an optimized implementation such as OpenBLAS or Intel's MKL, which use quite a bit of CPU-family-dependent SIMD assembly, cache size tuning, etc.
There are a few people who disagree with me and are working on this, you might want to have a look at https://github.com/amd/furious.js and possibly contact the author for his input here. As I see it there are language problems which you're going to have a hard time changing any time soon, and a lack of appropriate libraries which takes a lot of effort from people who really know what they're doing to overcome. I do a lot of scientific computing and numerically intensive work, and Atwood's law notwithstanding I don't meet a lot of people in the field who ever say "I wish I could write this in javascript." Javascript (or specifically asm.js) as a compilation target for the browser is interesting though, regardless of the maintenance difficulties in getting and keeping an LLVM backend for Fortran working robustly. There are a few people in the Julia community who've looked at using Emscripten as a backend target for Julia code, I'd have to go searching to check the latest status on that [edit: https://github.com/JuliaLang/julia/issues/9430]. And others have looked at embedding Julia in Node for server-side use to make it easier to call one language from the other. I think we have some more work to get ourselves using the same version of libuv (instead of our fork) for this to work more smoothly. |
Moving from the general case to the specific, there are some things currently written in C++ and FORTRAN that I would like to try running in the browser. It may be that a browser-only implementation is no better than round-tripping to the server, but that's a question that I can only answer by messing around. Specifically the codes that I want to run include
|
C and C++ you should be fine with clang to emscripten. If the Fortran code you're interested in is all Fortran 77 and self-contained, you can try f2c instead of gfortran and dragonegg, might be simpler. That won't get you to "on par with R and scipy," but it might work for your specific use case. |
on the topic of operator overloading in js http://www.2ality.com/2011/12/fake-operator-overloading.html |
IMO you don't need operator overloading in JavaScript. It should be in a different language. Consider a subset of MatLab which could compile to JavaScript and have first-class matrix operations, slices or whatever. Operator overloading is rarely useful for regular use cases, and bringing more complexity to a single language because of some edge cases doesn't make any sense. |
I think the language needs infix notation, not operator overloading.
|
@mikeal said:
R does generate SSE/SIMD instructions depending on the implementation of specific libraries or routines. But, primarily, R achieves parallel vectorization by means of CPU multiprocessing, and does so inherently any time R's native array processing syntax is used. Python also has a threading model that nicely supports non-I/O bound numerical calculations. JS will need a decent threading model to support parallel vectorization at high- and low-levels. Maybe some WebWorker + messaging-for-shared-state high-level abstractions could be provided for acheiving the heavy lifting of easy parallelism that is necessary for numerical computation. |
I am not sure what exactly you mean here, but to me all of R's array processing seems very sequential. See e.g. double binary operators here: https://github.com/wch/r-source/blob/170a2a4a030e84a2437aee1a8d15c310a3b22527/src/main/arithmetic.c#L904 and the macros here: https://github.com/wch/r-source/blob/fbf5cdf29d923395b537a9893f46af1aa75e38f3/src/include/R_ext/Itermacros.h#L32 Am I missing something? How is this inherently parallel? |
Overloading is very useful for numerical work. It's common to want to define a custom type that behaves like a number syntactically but with special behavior - arbitrary-precision arithmetic, polynomials, dual numbers for automatic differentiation, probability distributions, placeholder variables for modeling optimization problems, etc. But it's not very pleasant to do when overloading is tacked-on to a language that is otherwise not designed for it. Using JavaScript as a compilation target for a better-suited language seems like a better approach when it comes to numerical computing in the browser. It would likely save some work to pick an LLVM-based language to start from - Julia, Rust, or maybe even Swift would be reasonable choices. I've heard GHCJS works very well which would be another option, somewhat lacking in the array libraries or ease-of-use categories though.
Huh? Python's threading model for numerical calculations is "do threading (and all of the actual work) in C."
Any time R calls out to BLAS, you get multithreading (and SIMD) by using OpenBLAS or MKL. You can run into issues with excessive fork-join, but I understand packages like plyr are designed to decompose problems at a higher level to mitigate this. |
Right. But R does not use BLAS for array operations. AFAIK, these are the only operations it uses BLAS/LAPACK for: https://github.com/wch/r-source/blob/d75f39d532819ccc8251f93b8ab10d5b83aac89a/src/main/names.c#L926 (the
Yes, some packages work around sequential R, but you need to do extra work to use them. It's not that you just load the package and then array operations are parallel. IMO there is nothing inherently parallel in R or its packages. (One could actually override the default array operations to make them SIMD parallel, but I don't know of any package that does this. It is too much work, I guess.) |
Ah, good to know, you know much more about the internals of R than I do. I'm a little surprised to not see gemv or gemm (matrix-vector or matrix-matrix multiplication, for those not familiar with the Fortran acronyms http://www.netlib.org/blas/blasqr.pdf) there.
Some packages use Rcpp to write extensions in C++ where you can do whatever you want in your own code, right? But maybe it's considered bad form to have your R package substantially change default behaviors. |
You are actually right, the above ones are only the ones called directly from R code. I missed a couple more, they are here: ~/works/r-source/src/main (trunk)$ grep F77_CALL *
array.c: F77_CALL(dgemm)(transa, transb, &nrx, &ncy, &ncx, &one,
array.c: F77_CALL(zgemm)(transa, transb, &nrx, &ncy, &ncx, &one,
array.c: F77_CALL(dsyrk)(uplo, trans, &nc, &nr, &one, x, &nr, &zero, z, &nc);
array.c: F77_CALL(dgemm)(transa, transb, &ncx, &ncy, &nrx, &one,
array.c: F77_CALL(zgemm)(transa, transb, &ncx, &ncy, &nrx, &one,
array.c: F77_CALL(dsyrk)(uplo, trans, &nr, &nc, &one, x, &nr, &zero, z, &nr);
array.c: F77_CALL(dgemm)(transa, transb, &nrx, &nry, &ncx, &one,
array.c: F77_CALL(zgemm)(transa, transb, &nrx, &nry, &ncx, &one,
array.c: F77_CALL(dtrsm)("L", upper ? "U" : "L", trans ? "T" : "N", "N",
main.c: F77_CALL(intpr)("dummy", &i, &i, &i); So there is indeed So the situation is slightly better than what I said above, but not much, and array operations are certainly sequential.
Right. With some limits. (Btw. you don't need |
Okay... I only recently began prototyping stuff in R, it is hard not The literature is a little misleading claiming things in R having Performance differences must be solely interpreted R on for-loop syntax vs. On Thursday, December 11, 2014, Gabor Csardi [email protected]
|
I believe so. Plus there is the potential, that they will be parallel some time in the future. |
Great to see people working to make JavaScript better for numerical computing! I do not see any compelling argument why R, a language which I am quite familiar with as an aspiring statistician, would be better suited for numerical work than JavaScript. However, the biggest disadvantage of JS at the moment is the lack of appropriate, well maintained libraries. For example, it would be great to have a well-implemented linear-algebra package. I do not know of any such package currently. There are a couple node modules binding to C or C++ libraries (such as the mentioned eigenjs or armadillo, which I have been working on), but they are all unfinished. And the pure JS implementations often lack a lot of functionality, e.g. matrix decompositions. Concerning operator-overloading, one could use sweet.js macros to implement it, at the cost of having to compile one's code. Besides for node.js, it would be nice to have access to such libraries in the brower, too. For example, I recently wanted to create a d3 visualization plotting some simulation of logistic regression results. The ability to easily carry out the model fitting inside the browser would have been of great help in this case. So porting some of the ancient math libraries to JS via emscripten sounds like a good idea to me. For example, consider the following artificial example:
I am very new to emscripten, so I have no idea how I could port such code as to make it callable from JS, in the sense that I want to end up with a constructor function than can create 3d vectors and a Has anyone figured out how to do this? |
From the very small amount of time I've spent playing around with One of my goals with |
The nice thing is that emscripten does provide facilities to expose functions directly to JS. I created a git repo with the gsl code and my shell script which exports a large number of functions for use in JavaScript. Here is the link: I have not investigated the performance hit incurred because of the overhead of having the emscripten runtime in the back, but the claim on their website is that it should be reasonably fast. |
Oh, awesome. I hadn't found that yet. I will have to try that with the FORTRAN frontend, it may save me a lot of work. Thanks for pointing it out. |
Quick update: I asked on the emscripten repo about how to deal with structs, and meanwhile figured out how to export the C example I posted above via Embind: emscripten-core/emscripten#3083. So I do not think that there are any fundamental barriers preventing us from exposing entire C libraries to JS via emscripten. |
I played around a bit and ended up transpiling all of the complex number functionality of the GSL library into a node package. The GitHub repo with some documentation is here: https://github.com/Planeshifter/gsl-complex-js. Installation via npm: |
From a few conversations with @mikolalysenko I learned that languages like R and libraries like scipy which are known for amazing math are just binding to the same old Fortran libraries (and in a few cases C). That said, I don't think a binding layer will work for JavaScript, especially if we want to use them in the browser, but I do think we could use emscripten to transpile them in to JS, hopefully asm.js.
I got as far as writing most of a script to pull down calgo, which is laid out incredibly horribly, extract them all and generate package.json files which we could eventually checkin to git.
My attempts at getting emscripten to work ended at getting dragonegg to find the right llvm :(
Anyway, my instinct is that the best strategy is to:
Thoughts?
I know that @groundwater is interested in this as I imagine @substack and @maxogden are as well. Additionally, I saw @rwaldron throw out some links on twitter which I think were related to crazy math in JS.
The text was updated successfully, but these errors were encountered: