Coder Social home page Coder Social logo

x86lint's Introduction

x86lint

x86lint examines x86 machine code to find suboptimal encodings and sequences. For example, add eax, 1 can encode with either an 8- or 32-bit immediate:

83C0 01
81C0 01000000

Using the former can result in smaller and faster code. x86lint can help compiler writers generate better code and documents the complexity of x86.

Implemented analyses

  • implicit EAX
    • 81C0 00010000 instead of 05 00010000 (ADD EAX, 0x100)
  • missing LOCK prefix on CMPXCHG and XADD
  • oversized immediates
    • 81C0 01000000 instead of 83C0 01 (ADD EAX, 1)
  • strength-reduce AND with immediate to MOVZBL
  • suboptimal CMP 0 83FF 00 instead of TEST 85C0
  • suboptimal no-ops
    • multiple 90 instead of a single 60 90, etc.
  • suboptimal zero register, see #7
    • MOV EAX, 0 instead of XOR EAX, EAX
  • unnecessary REX prefix
    • XOR RAX, RAX 4831C0 instead of XOR EAX, EAX 31C0
    • 40C9 instead of C9 (LEAVE)
  • unnecessary immediate
    • C1D0 01 instead of D1D0 (RCL EAX, 1)
  • unneeded LOCK prefix on XCHG

Compilation

First install the Intel x86 encoder decoder:

git clone https://github.com/intelxed/xed.git xed
git clone https://github.com/intelxed/mbuild.git mbuild
cd xed
./mfile.py install --install-dir=kits/xed-install

Next build x86lint:

git clone https://github.com/gaul/x86lint.git x86lint
cd x86lint
XED_PATH=/path/to/xed make all

Usage

x86lint is intended to be part of compiler test suites which should #include "x86lint.h" and link libx86lint.a. It can also read arbitrary ELF executables via:

./x86lint /bin/ls

References

License

Copyright (C) 2018 Andrew Gaul

Licensed under the Apache License, Version 2.0

x86lint's People

Contributors

gaul avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

x86lint's Issues

Optimal no-ops

It is better to use fewer but longer no-ops to achieve padding. Some examples:

1:  90                             nop
2:  66 90                          data16 nop
3:  0F 1F 00                       nopl  %eax, (%rax)
4:  0F 1F 40 00                    nopl  %eax, (%rax)
5:  0F 1F 44 00 00                 nopl  %eax, (%rax,%rax,1)
6:  66 0F 1F 44 00 00              nopw  %ax,  (%rax,%rax,1)
7:  0F 1F 80 00 00 00 00           nopl  %eax, (%rax)
8:  0F 1F 84 00 00 00 00 00        nopl  %eax, (%rax,%rax,1)
9:  66 0F 1F 84 00 00 00 00 00     nopw  %ax,  (%rax,%rax,1)
10: 66 2E 0F 1F 84 00 00 00 00 00  nopw  %ax,  (%rax,%rax,1)

More recent processors support efficient 15-byte no-ops as well. References:

Suboptimal zero register should consider flags

Some sequences like the following:

YDIS: cmp byte ptr [rbx+rdx*1-0x1], 0x2f
YDIS: mov ecx, 0x0
YDIS: cmovz eax, ecx

use mov, ecx, 0x0 instead of xor ecx, ecx because the former preserves the flags and the latter does not. Could x86lint do a limited multi-instruction analysis to avoid these false positives?

SUB EAX, -128 can be shorter than ADD EAX, 128

SUB EAX, -128 can use a one-byte immediate while ADD EAX, 128 must use 4-bytes. GCC generates these when adding 127, 128, and 129 to a int32_t:

        addl    $127, %eax
        subl    $-128, %eax
        addl    $129, %eax

Suggested by Martin Möhrmann.

Unintentional use of x87

This is speculative, but I have noticed unexpected use of x87 floating-point on x86-64 where I assumed that SSE would be more efficient since few programs take advantage of 80-bit precision. Also I wonder if there is an overhead for task-switching the additional registers or power usage for using these legacy instructions?

Unintentional use of SSE

Some code unintentionally uses SSE:

unsigned int fp_func(unsigned int x) { return x * 0.8; }
unsigned int int_func(unsigned int x) { return x * 10 / 8; }

The integer version is shorter and more efficient:

$ gcc -O2 -S -o - foo.c | as -al
...
   9 0000 89FF           movl %edi,%edi
  10 0002 660FEFC0       pxor %xmm0,%xmm0
  11 0006 F2480F2A       cvtsi2sdq %rdi,%xmm0
  11      C7
  12 000b F20F5905       mulsd .LC0(%rip),%xmm0
  12      00000000 
  13 0013 F2480F2C       cvttsd2siq %xmm0,%rax
  13      C0
  14 0018 C3             ret
...
  24 0020 8D04BF         leal (%rdi,%rdi,4),%eax
  25 0023 01C0           addl %eax,%eax
  26 0025 C1E803         shrl $3,%eax
  27 0028 C3             ret
...
  33                    .LC0:
  34 0000 9A999999       .long -1717986918
  35 0004 9999E93F       .long 1072273817

Integration with sampling profilers

This would identify specific hotspots that could benefit from optimization. Does perf have the ability to dump machine code? x86lint would need some extension to read non-ELF output.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.