The Pyodide project aims to compile the CPython interpreter and scientific Python stack to WebAssembly allowing to use Python libraries in the browser. In the past, numpy, pandas and matplotlib have been packaged in Pyodide. In this post we will outline latest developments funded by Nexedi, which include packaging scipy and scikit-learn, improvements in testing workflow and ability to install packages from custom URLs.
Using scipy and scikit-learn from Javascript
Scipy and scikit-learn libraries have been experimentally packaged for Pyodide, and it is now possible to run them in the browser environment.
For instance, in the following example we will build a logistic regression model on the Iris dataset using scikit-learn from Javascript,
<html>
<head><meta charset="utf-8"/></head>
<body>
<script src="https://softinst109335.host.vifib.net/public/pyodide_updated/pyodide.js"></script>
<script>
languagePluginLoader.then(() => {
pyodide.loadPackage(['scikit-learn']).then(() => {
pyodide.runPython(`
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
X, y = load_iris(return_X_y=True)
lr = LogisticRegression().fit(X, y)
print(f"Train score: {lr.score(X, y)}")
y_train = lr.predict(X).tolist()`
);
var y_train = pyodide.pyimport('y_train');
console.log(y_train);
});});
</script>
</body>
</html>
Here, we download pyodide.js
, initialize the environment and load scikit-learn package built for WebAssembly (which will also pull its dependencies: numpy and scipy). Once this succeeds, we can run any Python code using these libraries, as well as retrieve results in (or pass data from) Javascript.
In addition, Pyodide is a Python plugin for Iodide, an interactive notebook environment that aims to facilitate scientific computing in the browser. For instance, see the Pyodide notebook demo.
While a large part of functionality in scipy and scikit-learn works, these builds are still somewhat experimental due to,
- excessively large size of the generated scipy package (~160 MB) as compared to 7 MB for numpy or 20 MB for pandas
- partial support of some scipy modules that requires Fortran extensions or LAPACK
In the following section we will discuss how scipy is compiled for WebAssembly to better understand these difficulties.
Notes on building scipy for WebAssembly
The SciPy library is one of the cornerstones of the scientific Python stack that wraps a numerous numerical routines written in C, C++ and Fortran. Both because of its size and languages used, building it for pyodide is somewhat challenging.
First, currently there is no reliably working Fortran compiler that supports WebAssembly. flang may support it the future, however currently, among other points, support for i386 (32 bit) architecture, used by Webassembly, is disabled. As a result, we use f2c
tool to convert Fortran code to C so it can be built with the Emscripten toolchain. The limitation of this approach, is that it only works for Fortran 77 but not later versions. Thus, in a first step, Scipy 0.17.1 from 2016 was built as it is guaranteed not to contain f90/f95 code.
In addition, scipy has a hard dependency on BLAS and LAPACK linear algebra libraries. In this build we use the f2c versions of the reference Fortran implementation of BLAS (and LAPACK), that is lighter but also significantly slower than more modern implementations such as OpenBLAS. Since Emscripten originally didn't support dynamic linking between shared libraries, we have linked BLAS and LAPACK statically in every scipy module that require them, which resulted in ~4x larger scipy package than if it was linked dynamically. Possible approaches to improve this situation are outlined in the conclusion section below.
Evaluating and improving test coverage
Pyodide is still in the early beta phase. Most of the functionality works, but it is not uncommon to encounter errors in some edge cases. An important focus of this work was to improve the reliability of packaged libraries.
In a first step, this implies ensuring that the CPython test suite passes. Currently approximately 75% of CPython test modules pass (this number counts the test modules, instead of individual tests and is therefore very approximate). This number keeps increasing, and a large fraction of failures are due to unsupported functionality (such as multiprocessing, or use of sockets). One source of failures, was due to system calls not implemented in Emscripten. We have done an analysis of all system calls that fall into this category and have disabled them in CPython, which contibutes to reducing the risk of segmentation faults.
A second milestone was to build pytest package for Pyodide. This allows to run the test discovery and execution for the vast majority of Python projects directly in the WebAssembly environment
Loading pyodide packages from custom URLs
Pyodide includes a very simple packaging mechanism, where,
pyodide.loadPackage('package_name')
loads the corresponding package and its dependencies from the official pyodide repository (or optionally from the base URL where pyodide.js
was located). To make using custom packages easier, we have added the ability to load packages from custom URLs. This also allows using pyodide in a decentralized fashion, without a central package repository.
Conclusions and future work
In this post we have outlined some of the recent developments in Pyodide by Roman Yurchak, funded by Nexedi. Of course, many other enhancements have happened in Pyodide during this time, see the project repository website (github.com/iodide-project/pyodide) for more details. We would like to thank Michael Droettboom for his advice and help in making these improvements possible, as well the Iodide team at Mozilla in general.
Pyodide is a fairly young project, and more work is necessary before it can be reliably used in production. Some of the next steps for the developments mentioned in this post, would be,
- to link BLAS/LAPACK dynamically (pyodide#240) which would help reducing the size of the scipy package in Webassembly by an order or magnitude
- building the latest scipy version
- using a high performance BLAS (pyodide#227) should improve the performance of scientific computing with numpy/scipy significantly.
If you are interested in these topics, don't hesitate to follow the development and contribute on Github.