Introduction

In this post we’ll find a way to prepend a string to another string using the minus operator. Once implemented the following code should print the url for this website instead of raising a type error.

MODIFY_CPy>>> scheme = 'https://'
MODIFY_CPy>>> domain = 'blog.hanfox.net'
MODIFY_CPy>>> domain - scheme
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
TypeError: unsupported operand type(s) for -: 'str' and 'str'

Finding the opcode associated with the plus operator

Let’s begin by taking a look at the opcode for the plus operator. Since this operator is already used for concatenating strings, the idea is that we can try to find out how that works and then borrow from it to make prepending strings work. For this task we’ll use the dis module to disassemble a lambda function that adds “.com” to an argument it’s passed and returns the result. The output from running this code shows the associated opcode is called BINARY_ADD.

MODIFY_CPy>>> from dis import dis
MODIFY_CPy>>> dis(lambda s: s + '.com')
  4           0 LOAD_FAST                0 (s)
	      2 LOAD_CONST               1 ('.com')
	      4 BINARY_ADD
	      6 RETURN_VALUE

Where is the opcode used?

A recursive grep of the C source and header files shows this opcode is actually not used in that many places.

$ grep -rn --include='*.[ch]' 'BINARY_ADD' Python-3.7.0
Python-3.7.0/Python/ceval.c:1265:        TARGET(BINARY_ADD) {
Python-3.7.0/Python/compile.c:908:        case BINARY_ADD:
Python-3.7.0/Python/compile.c:3090:        return BINARY_ADD;
Python-3.7.0/Python/opcode_targets.h:25:    &&TARGET_BINARY_ADD,
Python-3.7.0/Include/opcode.h:25:#define BINARY_ADD               23

Here’s an oversimplified description of each file from the output above:

  • Include/opcode.h defines the opcode constants.
  • Python/opcode_targets.h declares a static 256-element array containing all opcodes.
  • Python/ceval.c contains the evaluation loop and related functions.
  • Python/compile.c contains the code that converts an AST to bytecode.

The most meaningful file here is Python/ceval.c because it has the code that will actually take the operands and produce a result depending on type. If you want to learn more about this file there’s a post by Yaniv Aknin from 2010 that goes into much more detail. To summarize, this file has an important function, _PyEval_EvalFrameDefault, which executes certain code depending on opcode. The two important opcodes for us are BINARY_ADD and BINARY_SUBTRACT. We’ll take a look at the code for these next.

Making the changes

The BINARY_ADD and BINARY_SUBTRACT blocks below are fairly similar. Both declare three pointers to PyObject, make calls to Py_DECREF and SET_TOP, check for a NULL result value, and finally make a call to DISPATCH. The difference is that BINARY_ADD has additional logic to perform string concatenation if the result of two calls to PyUnicode_CheckExact, one for each operand, are both truthy. What we can do is copy this logic into BINARY_SUBTRACT, switch the order of some variables, and then test the changes.

...
TARGET(BINARY_ADD) {
    PyObject *right = POP();
    PyObject *left = TOP();
    PyObject *sum;

    if (PyUnicode_CheckExact(left) &&
	PyUnicode_CheckExact(right)) {
	sum = unicode_concatenate(left, right, f, next_instr);
    }
    else {
	sum = PyNumber_Add(left, right);
	Py_DECREF(left);
    }
    Py_DECREF(right);
    SET_TOP(sum);
    if (sum == NULL)
	goto error;
    DISPATCH();
}

TARGET(BINARY_SUBTRACT) {
    PyObject *right = POP();
    PyObject *left = TOP();
    PyObject *diff = PyNumber_Subtract(left, right);
    Py_DECREF(right);
    Py_DECREF(left);
    SET_TOP(diff);
    if (diff == NULL)
	goto error;
    DISPATCH();
}
...

Here are the exact changes made to the BINARY_SUBTRACT block:

index df5c093..14c9bd0 100644
--- a/Python/ceval.c
+++ b/Python/ceval.c
@@ -1291,8 +1291,16 @@ _PyEval_EvalFrameDefault(PyFrameObject *f, int throwflag)
	 TARGET(BINARY_SUBTRACT) {
	     PyObject *right = POP();
	     PyObject *left = TOP();
-            PyObject *diff = PyNumber_Subtract(left, right);
-            Py_DECREF(right);
+            PyObject *diff;
+            if (PyUnicode_CheckExact(left) &&
+                PyUnicode_CheckExact(right)) {
+                diff = unicode_concatenate(right, left, f, next_instr);
+            }
+            else {
+                diff = PyNumber_Subtract(left, right);
+                Py_DECREF(right);
+            }
	     Py_DECREF(left);
	     SET_TOP(diff);
	     if (diff == NULL)

Does it work?

After recompiling, we can retry the code that raised a type error:

MODIFY_CPy>>> scheme = 'https://'
MODIFY_CPy>>> domain = 'blog.hanfox.net'
MODIFY_CPy>>> domain - scheme
https://blog.hanfox.net