Modify CPy #2: Minus operator for strings
Introduction
In this post we’ll find a way to prepend a string to another string using the minus operator. Once implemented the following code should print the url for this website instead of raising a type error.
MODIFY_CPy>>> scheme = 'https://'
MODIFY_CPy>>> domain = 'blog.hanfox.net'
MODIFY_CPy>>> domain - scheme
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
TypeError: unsupported operand type(s) for -: 'str' and 'str'
Finding the opcode associated with the plus operator
Let’s begin by taking a look at the opcode for the plus operator.
Since this operator is already used for concatenating strings, the
idea is that we can try to find out how that works and then borrow
from it to make prepending strings work. For this task we’ll use the
dis module to disassemble a lambda function that adds “.com” to an
argument it’s passed and returns the result. The output from running
this code shows the associated opcode is called BINARY_ADD
.
MODIFY_CPy>>> from dis import dis
MODIFY_CPy>>> dis(lambda s: s + '.com')
4 0 LOAD_FAST 0 (s)
2 LOAD_CONST 1 ('.com')
4 BINARY_ADD
6 RETURN_VALUE
Where is the opcode used?
A recursive grep of the C source and header files shows this opcode is actually not used in that many places.
$ grep -rn --include='*.[ch]' 'BINARY_ADD' Python-3.7.0
Python-3.7.0/Python/ceval.c:1265: TARGET(BINARY_ADD) {
Python-3.7.0/Python/compile.c:908: case BINARY_ADD:
Python-3.7.0/Python/compile.c:3090: return BINARY_ADD;
Python-3.7.0/Python/opcode_targets.h:25: &&TARGET_BINARY_ADD,
Python-3.7.0/Include/opcode.h:25:#define BINARY_ADD 23
Here’s an oversimplified description of each file from the output above:
- Include/opcode.h defines the opcode constants.
- Python/opcode_targets.h declares a static 256-element array containing all opcodes.
- Python/ceval.c contains the evaluation loop and related functions.
- Python/compile.c contains the code that converts an AST to bytecode.
The most meaningful file here is Python/ceval.c because it has the
code that will actually take the operands and produce a result
depending on type. If you want to learn more about this file there’s a
post by Yaniv Aknin from 2010 that goes into much more detail. To
summarize, this file has an important function,
_PyEval_EvalFrameDefault
, which executes certain code depending on
opcode. The two important opcodes for us are BINARY_ADD
and
BINARY_SUBTRACT
. We’ll take a look at the code for these next.
Making the changes
The BINARY_ADD
and BINARY_SUBTRACT
blocks below are fairly
similar. Both declare three pointers to PyObject
, make calls to
Py_DECREF
and SET_TOP
, check for a NULL
result value, and
finally make a call to DISPATCH
. The difference is that BINARY_ADD
has additional logic to perform string concatenation if the result of
two calls to PyUnicode_CheckExact
, one for each operand, are both
truthy. What we can do is copy this logic into BINARY_SUBTRACT
,
switch the order of some variables, and then test the changes.
...
TARGET(BINARY_ADD) {
PyObject *right = POP();
PyObject *left = TOP();
PyObject *sum;
if (PyUnicode_CheckExact(left) &&
PyUnicode_CheckExact(right)) {
sum = unicode_concatenate(left, right, f, next_instr);
}
else {
sum = PyNumber_Add(left, right);
Py_DECREF(left);
}
Py_DECREF(right);
SET_TOP(sum);
if (sum == NULL)
goto error;
DISPATCH();
}
TARGET(BINARY_SUBTRACT) {
PyObject *right = POP();
PyObject *left = TOP();
PyObject *diff = PyNumber_Subtract(left, right);
Py_DECREF(right);
Py_DECREF(left);
SET_TOP(diff);
if (diff == NULL)
goto error;
DISPATCH();
}
...
Here are the exact changes made to the BINARY_SUBTRACT
block:
index df5c093..14c9bd0 100644
--- a/Python/ceval.c
+++ b/Python/ceval.c
@@ -1291,8 +1291,16 @@ _PyEval_EvalFrameDefault(PyFrameObject *f, int throwflag)
TARGET(BINARY_SUBTRACT) {
PyObject *right = POP();
PyObject *left = TOP();
- PyObject *diff = PyNumber_Subtract(left, right);
- Py_DECREF(right);
+ PyObject *diff;
+ if (PyUnicode_CheckExact(left) &&
+ PyUnicode_CheckExact(right)) {
+ diff = unicode_concatenate(right, left, f, next_instr);
+ }
+ else {
+ diff = PyNumber_Subtract(left, right);
+ Py_DECREF(right);
+ }
Py_DECREF(left);
SET_TOP(diff);
if (diff == NULL)
Does it work?
After recompiling, we can retry the code that raised a type error:
MODIFY_CPy>>> scheme = 'https://'
MODIFY_CPy>>> domain = 'blog.hanfox.net'
MODIFY_CPy>>> domain - scheme
https://blog.hanfox.net