How to swap two values effectively? Typically we use the following algorithm:
temp <- Num1
Num1 <- Num2
Num2 <- temp
But there is a famous alternative, which does not require the extra variable
temp
:
Num1 <- Num1 + Num2
Num2 <- Num1 - Num2
Num1 <- Num1 - Num2
And its sibling:
# ^ for XOR
Num1 <- Num1 ^ Num2
Num2 <- Num1 ^ Num2
Num1 <- Num1 ^ Num2
Which one is faster?
Language matters. To really research into that question, we need a serious language, close enough to hardware. C is a good choice, but modern compilers always do some optimizatiion, which might convert whatever algorithm into the optimal one. So we use assembly.
On machine level, data are basically stored in two places - registers and memory. Disk is also accessible, but system calls are required, and therefore it's not discussed here.
We first look at how to swap values in two registers:
- the
temp
algorithm:$2.76$ s - the plus-minus algorithm:
$12.18$ s - the XOR algorithm:
$8.12$ s
For each algorithm, the swapping is performed for
So we see that the temp
algorithm is really faster, and XOR-ing is also
faster than adding or subtracting.
Note that there is actually another way to swap two values on an x86_64
system, that is the command xchg
, which takes temp
algorithm.
How about swapping a register value and a memory value?
For the temp
algorithm, everything would be fine if it just
stays the same; it takes
XOR-ing would take
I thought first of all loading the memory into a temporary register would
reduce the time taken, which is tested here, but it
wouldn't; it takes
Also, note that we are sure this method is going to be slower than the temp
one.
# This is what our new XOR algorithm does.
Reg2 <- Mem
SWAP_BY_XOR Reg1 Reg2
Mem <- Reg2
Since we have already known that SWAP_BY_XOR
for registers is slower than
SWAP_BY_TEMP
for registers anyway, the algorithm here is slower than the one
replacing SWAP_BY_XOR
with SWAP_BY_TEMP
, which is obviously slower than the
temp
we used for memory.
The xchg
command still works here but in two different ways. You can put the
memory address on the left or on the right hand side, taking
Each algorithm carries swapping for
We won't even be using the plus-minus algorithm from now on, because the are a lot slower than the XOR one, not to mention its overflow vulnerability.
Since x86
instructions generally do not accept two memory addresses, if we
want to swap two values in memory, we have to load at least one of the values
into memory.
The temp
algorithm now becomes a better version. Let's loop at
the pseudo code:
Reg1 <- Mem1
Reg2 <- Mem2
Mem1 <- Reg2
Mem2 <- Reg1
We don't need to benchmark XOR and xchg
. Because both of them would
essentially do this:
Reg <- Mem1
SWAP Reg Mem2
Mem1 <- Reg
and as we have shown above, the SWAP
ing step would always be slower than the
temp
algorithm, and even if we use the temp
swapping here, it would still
be slower than our enhanced version for memory addresses.
Seems like the temp
algorithm is optimal, isn't it? Actually, no.
Consider the scenario, where you have all of your registers occupied, yet you want to swap two values stored in registers, perhaps preparing for a function call.
The temp
algorithm would then look like this:
# Push Reg3 onto the stack
PUSH Reg3
Reg3 <- Reg1
Reg1 <- Reg2
Reg2 <- Reg3
# Pop Reg3 from the stack
POP Reg3
Now the xchg
instruction has its place; we can swap two registers with out
using the third one - XCHG Reg1 Reg2
- which is free from memory access, thus
obviously faster.
If you still have all the registers occupied, but you want to swap a register
and a memory location, XOR-ing would be a reasonable choice, since it would
stays the same, while temp
requires two extra memory access.
Let's do some benchmark now. temp
: temp
is still the best option.
Use temp
.
From a programming perspective, temp
is easier to be understood, and it is
also reasonably fast. If you are still not comfortable with this answer,
wanting to find the absolutely most efficient algorithm, you should trust you
compiler, they will do the work for you.
All swapping here swap two quadwords, which is
You can try to compile the source code in this repository on Linux or MacOS with GCC, GNU make, and GNU time installed. On MS Windows, I personally recommend the MSYS2 project.
This article is written by Yushun Cheng, published under Creative Commons Attribution 4.0 International License.
All the codes, including the Makefile, are in public domain.