I need to implement a numerical algorithm. I'd like to write all the performance-critical parts in C and the rest in python. This should produce something that works, and is maintainable, debuggable, etc, etc, while remaning fast.
Connecting numpy to C is not even remotely new or interesting. People have been doing this from the start, and there's plenty of documentation. What there isn't is a simple skeleton program that provides a working python module written in C, that exports some dummy function to take in numpy arguments, do something with them, and return something. This should build, and just work. I'd want to use this example as a starting point for any real work. Unfortunately, I could not find this, so I had to spend hours figuring out how to do the basics here. I should not have had to spend this time, so I now share this example program I came up with for others to use. Please take this and hack it up. I'm releasing it into the public domain, and am giving up all copyright.
This all works with Python 2.7.13 (Python 3 probably needs a few tweaks; patches welcome), and numpy 1.12.1 on Debian/stretch. Clearly I'm not an expert here, so if something isn't quite right, please tell me.
The python module is called tst
and exports a single function tst.foo()
.
Everything should be mostly self-explanatory.
tst.c
:
#define NPY_NO_DEPRECATED_API NPY_API_VERSION #include <Python.h> #include <numpy/arrayobject.h> #include <signal.h> const char foo_docstring[] = "This function returns the (i,j) element of the matrix\n" "\n" "i,j are given in kwargs; each is assumed to be 0 if omitted.\n" "Matrix must have ndims == 2 and (i,j) must be in-bounds (exception\n" "thrown otherwise). The returned element is converted to a double-\n" "precision float, so the input matrix does not need to have dtype=float\n"; static PyObject* foo(PyObject* NPY_UNUSED(self), PyObject* args, PyObject* kwargs) { char* keywords[] = {"arr", // optional kwargs "i", "j", NULL}; PyObject* result = NULL; // Python is silly. There's some nuance about signal handling where it sets // a SIGINT (ctrl-c) handler to just set a flag, and the python layer then // reads this flag and does the thing. Here I'm running C code, so SIGINT // would set a flag, but not quit, so I can't interrupt the C code as it's // running, which is very surprising to the user. Thus I reset the SIGINT // handler to the default, and put it back to the python-specific version // when I'm done. This ignores a potential custom python sigint handler. // Left as an excercise for the reader struct sigaction sigaction_old; if( 0 != sigaction(SIGINT, &(struct sigaction){ .sa_handler = SIG_DFL }, &sigaction_old) ) { PyErr_SetString(PyExc_RuntimeError, "sigaction() failed"); goto done; } PyArrayObject* arr; unsigned int i = 0, j = 0; // defaults, if arguments not given if(!PyArg_ParseTupleAndKeywords( args, kwargs, "O&|II", keywords, PyArray_Converter, &arr, &i, &j)) return NULL; { int ndim = PyArray_NDIM(arr); npy_intp* dims = PyArray_DIMS(arr); int typenum = PyArray_TYPE(arr); if( ndim != 2 ) { PyErr_SetString(PyExc_RuntimeError, "I assume that ndims == 2"); goto done; } if( i >= dims[0] ) { PyErr_Format(PyExc_RuntimeError, "I assume that i < shape[0], but i == %d", i); goto done; } if( j >= dims[1] ) { PyErr_Format(PyExc_RuntimeError, "I assume that j < shape[1], but j == %d", j); goto done; } if( typenum != NPY_DOUBLE ) { PyArrayObject* arr2 = (PyArrayObject*)PyArray_FromArray(arr, PyArray_DescrFromType(NPY_DOUBLE), NPY_ARRAY_ALIGNED); Py_DECREF(arr); arr = arr2; } } // Useful metadata about this matrix __attribute__((unused)) char* data0 = PyArray_DATA (arr); __attribute__((unused)) char* data1 = PyArray_BYTES (arr); __attribute__((unused)) npy_intp *strides = PyArray_STRIDES (arr); __attribute__((unused)) int ndim = PyArray_NDIM (arr); __attribute__((unused)) npy_intp* dims = PyArray_DIMS (arr); __attribute__((unused)) npy_intp itemsize = PyArray_ITEMSIZE(arr); __attribute__((unused)) int typenum = PyArray_TYPE (arr); // Two ways to grab the data out of the matrix: // // 1. Call a function. Clear what this does, but if we need to access a lot // of data in a loop, this has overhead: this function is pre-compiled, so // the overhead will not be inlined double d0 = *(double*)PyArray_GetPtr( arr, (npy_intp[]){i, j} ); // 2. inline the function ourselves. It's pretty simple double d1 = *(double*)&data0[ i*strides[0] + j*strides[1] ]; // The two methods should be identical. If not, this example is wrong, and I // barf if( d0 != d1 ) { PyErr_Format(PyExc_RuntimeError, "PyArray_GetPtr() inlining didn't work: %f != %f", d0, d1); goto done; } result = Py_BuildValue( "d", d0 ); done: Py_DECREF(arr); if( 0 != sigaction(SIGINT, &sigaction_old, NULL )) PyErr_SetString(PyExc_RuntimeError, "sigaction-restore failed"); return result; } PyMODINIT_FUNC inittst(void) { static PyMethodDef methods[] = { {"foo", (PyCFunction)foo, METH_VARARGS | METH_KEYWORDS, foo_docstring}, {} }; PyImport_AddModule("tst"); Py_InitModule3("tst", methods, "Demo module to show binding numpy to C"); // Required to avoid mysterious segfaults import_array(); }
To build this, need this setup.py
#!/usr/bin/python from setuptools import setup from distutils.core import Extension setup(name = 'tst', version = '0.1', author = 'Dima Kogan', author_email = 'dima@secretsauce.net', ext_modules = [Extension('tst', sources = ['tst.c'])])
A build:
dima@shorty:/tmp$ python setup.py build running build running build_ext building 'tst' extension creating build/temp.linux-x86_64-2.7 x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fdebug-prefix-map=/build/python2.7-HVkOs2/python2.7-2.7.13=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC -I/usr/include/python2.7 -c tst.c -o build/temp.linux-x86_64-2.7/tst.o x86_64-linux-gnu-gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-z,relro -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -Wdate-time -D_FORTIFY_SOURCE=2 -g -fdebug-prefix-map=/build/python2.7-HVkOs2/python2.7-2.7.13=. -fstack-protector-strong -Wformat -Werror=format-security -Wl,-z,relro -Wdate-time -D_FORTIFY_SOURCE=2 -g -fdebug-prefix-map=/build/python2.7-HVkOs2/python2.7-2.7.13=. -fstack-protector-strong -Wformat -Werror=format-security build/temp.linux-x86_64-2.7/tst.o -o build/lib.linux-x86_64-2.7/tst.so
This builds the binary module into /tmp/build/lib.linux-x86_64-2.7
. While
testing, we can then run it by telling python to find the module there (after
testing, setup.py
can install to a more standard location). A tst.py
:
#!/usr/bin/python2 import numpy as np import numpysane as nps import sys sys.path[:0] = ('/tmp/build/lib.linux-x86_64-2.7',) import tst x0 = np.arange(12).reshape(3,4) x1 = nps.mv(x0, 0,1) # same matrix, with the axis order reversed print tst.foo(x0, i=1, j=0) print tst.foo(x1, i=0, j=1)
We create a matrix, and then an identical one with a flipped axis ordering. We
then print element (1,0)
of the first, and element (0,1)
of the second,
expecting identical results:
dima@shorty:/tmp$ python tst.py 4.0 4.0
Great! This works if we're creating a python module that we then want to call
from a python program. What if our main application is written in C also? We
then have a C application running some computations that invoke a python
interpreter, and run some python code, which in turn invokes some C code. This
is simpler in some ways, since we don't need to build our module into a shared
object. Add this main()
to the end of our tst.c
above:
int main(int argc, char **argv) { Py_SetProgramName(argv[0]); Py_Initialize(); inittst(); PySys_SetArgvEx(argc, argv, 0); PyRun_SimpleString("import numpy as np\n"); PyRun_SimpleString("import numpysane as nps\n"); PyRun_SimpleString("import tst\n"); PyRun_SimpleString("x0 = np.arange(12).reshape(3,4)\n"); PyRun_SimpleString("x1 = nps.mv(x0, 0,1)\n"); PyRun_SimpleString("print tst.foo(x0, i=1, j=0)\n"); PyRun_SimpleString("print tst.foo(x1, i=0, j=1)\n"); Py_Exit(0); return 0; }
Then build the tst.c
into an application, linking in python as a library. Note
that this bypasses setup.py
:
dima@shorty:/tmp$ gcc -o tst -Wall -Wextra `pkg-config --cflags --libs python` tst.c
Now run the application we just built directly. We are not invoking the python interpreter ourselves:
dima@shorty:/tmp$ ./tst 4.0 4.0
Oh good. Now I can actually try to get work done.