Modify CPy #3: Implementing "atleast"
Introduction
In this post we’ll create a builtin function called atleast
with the
following signature: atleast(n, iterable)
. This function will return
True
if at least n items in an iterable are truthy and False
otherwise. It will extend the idea of the already existing builtin
called any
, which returns True
if at least 1 item is truthy and
False
otherwise. Here are some examples:
atleast(1, [1, 0, 1]) # True, equivalent to: any([1, 0, 1])
atleast(2, [1, 0, 1]) # True
atleast(3, [1, 0, 1]) # False
Where to begin?
Since we are trying to extend the any
builtin, the best place to
start looking is at the builtin_any
function defined in
Python/bltinmodule.c. Here’s what that function looks like:
static PyObject *
builtin_any(PyObject *module, PyObject *iterable)
{
PyObject *it, *item;
PyObject *(*iternext)(PyObject *);
int cmp;
it = PyObject_GetIter(iterable);
if (it == NULL)
return NULL;
iternext = *Py_TYPE(it)->tp_iternext;
for (;;) {
item = iternext(it);
if (item == NULL)
break;
cmp = PyObject_IsTrue(item);
Py_DECREF(item);
if (cmp < 0) {
Py_DECREF(it);
return NULL;
}
if (cmp > 0) {
Py_DECREF(it);
Py_RETURN_TRUE;
}
}
Py_DECREF(it);
if (PyErr_Occurred()) {
if (PyErr_ExceptionMatches(PyExc_StopIteration))
PyErr_Clear();
else
return NULL;
}
Py_RETURN_FALSE;
}
The module
parameter isn’t used in the function body so we can
ignore it. The iterable
parameter is, however, used to create an
iterator through a call to PyObject_GetIter
. Once the iterator is
made, each item within it is passed to the function PyObject_IsTrue
,
which returns -1 for errors, 0 for falsy values, and 1 for truthy
values. In every execution path the reference count for the iterator
is decremented at most one time with a call to Py_DECREF
. If an
error occurs NULL
is returned. If a truthy value is found the
Py_RETURN_TRUE
macro is expanded. Lastly, if no errors occurred and
no truthy values are found, then the Py_RETURN_FALSE
macro is
expanded. As a step towards a solution we can copy this function and
make the following changes:
- define a variable
n
of typePy_ssize_t
to represent the minimum number of truthy values and hardcode it to the value 2. - define another variable
count
of typePy_ssize_t
to keep track of the number of truthy values. - add an if statement to check if
count
is greater than or equal ton
. rename the function to
builtin_atleast2
.static PyObject * builtin_atleast2(PyObject *module, PyObject *iterable) { Py_ssize_t n = 2, count = 0; PyObject *it, *item; PyObject *(*iternext)(PyObject *); int cmp; it = PyObject_GetIter(iterable); if (it == NULL) return NULL; iternext = *Py_TYPE(it)->tp_iternext; for (;;) { item = iternext(it); if (item == NULL) break; cmp = PyObject_IsTrue(item); Py_DECREF(item); if (cmp < 0) { Py_DECREF(it); return NULL; } if (cmp > 0) { if (++count >= n) { Py_DECREF(it); Py_RETURN_TRUE; } } } Py_DECREF(it); if (PyErr_Occurred()) { if (PyErr_ExceptionMatches(PyExc_StopIteration)) PyErr_Clear(); else return NULL; } Py_RETURN_FALSE; }
In order for this new builtin to be picked up we need to add an entry
for it in the static array named builtin_methods
within the same
file. This array contains items of type PyMethodType
, which is a
struct containing a name, a function (cast as either PyCFunction
or
PyCFunctionWithKeywords
), a method flag (can be METH_NOARGS
,
METH_O
, METH_VARARGS
, METH_KEYWORDS
, or METH_VARARGS |
METH_KEYWORDS
), and a documentation string. For now we can copy the
entry for builtin_any
, but with the name set to atleast2
and the
function set to builtin_atleast2
.
static PyMethodDef builtin_methods[] = {
...
{"atleast2", (PyCFunction)builtin_atleast2, METH_O, builtin_any__doc__},
...
};
If we recompile with ../configure && make
, then the following should
work in the Python REPL.
MODIFY_CPy>>> atleast2([1, 0, 1])
True
MODIFY_CPy>>> atleast2([0, 1, 0])
False
The next step
Now we can work on removing the hardcoded value and instead receive it as a parameter. To do so we will need to make the following changes:
- change the method flag from
METH_O
toMETH_VARARGS
. That way our function will be called with an args tuple instead of a Python Object. - rename the parameter from “iterable” to “args”
- declare a pointer to
PyObject
named “iterable”, which acts just like the “iterable” parameter before. parse the “args” tuple using
PyArg_ParseTuple
, passing it the args, a format string, and the addresses of the variables we want to store the values in.PyDoc_STRVAR(atleast_doc, "atleast(n, iterable) -> bool\n\ Returns True if at least n entries are truthy, False otherwise.\n"); static PyObject * builtin_atleast(PyObject *module, PyObject *args) { Py_ssize_t n; PyObject *it, *item, *iterable; PyObject *(*iternext)(PyObject *); int cmp, count = 0; if (!PyArg_ParseTuple(args, "nO:atleast", &n, &iterable)) return NULL; it = PyObject_GetIter(iterable); if (it == NULL) return NULL; iternext = *Py_TYPE(it)->tp_iternext; for (;;) { item = iternext(it); if (item == NULL) break; cmp = PyObject_IsTrue(item); Py_DECREF(item); if (cmp < 0) { Py_DECREF(it); return NULL; } if (cmp > 0) { if (++count >= n) { Py_DECREF(it); Py_RETURN_TRUE; } } } Py_DECREF(it); if (PyErr_Occurred()) { if (PyErr_ExceptionMatches(PyExc_StopIteration)) PyErr_Clear(); else return NULL; } Py_RETURN_FALSE; }
static PyMethodDef builtin_methods[] = { ... {"atleast", (PyCFunction)builtin_atleast, METH_VARARGS, atleast_doc}, ... };
Does it work?
After recompiling, we can try out our new builtin function:
MODIFY_CPy>>> atleast(2, [1, 0, 1, 1])
True
MODIFY_CPy>>> atleast(3, [1, 0, 1, 1])
True
MODIFY_CPy>>> atleast(4, [1, 0, 1, 1])
False