Comments (17)
Another example.
Objconv:
vpcmpeqq ymm2, ymm2, ymmword ptr [YMM_FMA_ONE] ; 6ED1 _ C4 E2 Ed: 29. 15, 00000000(rel)
gdb:
0: c4 e2 ed 29 (bad)
4: 15 00 00 00 00 adc eax,0x0
from uasm.
It should be:
0: c4 41 2d 58 d3 vaddpd ymm10,ymm10,ymm11
We have it as:
000000013fee1000 C4 41 AD 58 D3 vaddpd ymm10, ymm10, ymm11
Will investigate now and fix.
From: gwoltman [mailto:[email protected]]
Sent: 13 November 2016 04:37 PM
To: Terraspace/HJWasm [email protected]
Subject: [Terraspace/HJWasm] Request for gcc compatible output (#38)
One example is the instruction vaddpd ymm10, ymm10, ymm11. Output from objconv:
' vaddpd ymm10, ymm10, ymm11 ; 6ECC _ C4 41 Ad: 58. D3; Note: Prefix bit or byte has no meaning in this context`
Output from https://defuse.ca/online-x86-assembler.htm#disassembly2
0: c4 41 ad 58 (bad)
4: d3 .byte 0xd3
Knights Landing will execute this code correctly. However, debugging is a bit hard in gdb as disassembly is not possible.
One of my users is getting crashes on Piledriver. I don't have access to an AMD cpu so I do not know if the crash is due to encodings like the above or a bug in my code.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub #38 , or mute the thread https://github.com/notifications/unsubscribe-auth/AQGQVNX7EV2MbAM3Dqdm6SbQHyz_zIduks5q9zy_gaJpZM4Kwu7q .
from uasm.
Working on this now.
I don’t think the bit being set would cause a crash, but for the sake of accuracy it should be fixed anyway!
From: gwoltman [mailto:[email protected]]
Sent: 13 November 2016 04:43 PM
To: Terraspace/HJWasm [email protected]
Subject: Re: [Terraspace/HJWasm] Request for gcc compatible output (#38)
Another example.
Objconv:
vpcmpeqq ymm2, ymm2, ymmword ptr [YMM_FMA_ONE] ; 6ED1 _ C4 E2 Ed: 29. 15, 00000000(rel)
gdb:
0: c4 e2 ed 29 (bad)
4: 15 00 00 00 00 adc eax,0x0
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub #38 (comment) , or mute the thread https://github.com/notifications/unsubscribe-auth/AQGQVIOqapvhxp7EhEhwzRHlIyZHlUSZks5q9z4JgaJpZM4Kwu7q .
from uasm.
This is what I’m getting with the latest version:
vpcmpeqq ymm2, ymm2, ymmword ptr [YMM_FMA_ONE]
000000013f2e1000 C4 E2 ED 29 15 F7 2F 00 00 vpcmpeqq ymm2, ymm2, ymmword ptr [rip+0x2ff7]
According to the manual and defuse it should be:
c4 e2 ed 29 15 00 00 00 00
So that one seems right now?
vaddpd ymm10,ymm10,ymm11
gives us:
000000013f2e1009 C4 41 AD 58 D3 vaddpd ymm10, ymm10, ymm11
And should be:
c4 41 2d 58 d3
AD vs 2D = means the (VEX byte 3, W bit is set). Opcode specific extension or used like rex.w, or ignored, depending on the opcode byte.
Specifically for vaddpd we have:
VEX.NDS.256.66.0F.WIG 58 /r
WIG: can use C5H form (if not requiring VEX.mmmmm) or VEX.W value is ignored in the C4H form of VEX
prefix.
— If WIG is present, the instruction may be encoded using either the two-byte form or the three-byte form of
VEX. When encoding the instruction using the three-byte form of VEX, the value of VEX.W is ignored.
So that shouldn’t be a problem and should be safe there.
It would be worth testing the instruction specifically on it’s own on an AMD chip just to make sure they don’t have a different take on the W bit, but I wouldn’t imagine so.
From: gwoltman [mailto:[email protected]]
Sent: 13 November 2016 04:43 PM
To: Terraspace/HJWasm [email protected]
Subject: Re: [Terraspace/HJWasm] Request for gcc compatible output (#38)
Another example.
Objconv:
vpcmpeqq ymm2, ymm2, ymmword ptr [YMM_FMA_ONE] ; 6ED1 _ C4 E2 Ed: 29. 15, 00000000(rel)
gdb:
0: c4 e2 ed 29 (bad)
4: 15 00 00 00 00 adc eax,0x0
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub #38 (comment) , or mute the thread https://github.com/notifications/unsubscribe-auth/AQGQVIOqapvhxp7EhEhwzRHlIyZHlUSZks5q9z4JgaJpZM4Kwu7q .
from uasm.
I'll fix that anyway, it won't be hard to clear that bit if C4 in question
I just need to find out which instructions are included
On Mon, Nov 14, 2016 at 5:35 AM, John Hankinson [email protected]
wrote:
This is what I’m getting with the latest version:
vpcmpeqq ymm2, ymm2, ymmword ptr [YMM_FMA_ONE]
000000013f2e1000 C4 E2 ED 29 15 F7 2F 00 00 vpcmpeqq ymm2, ymm2, ymmword
ptr [rip+0x2ff7]According to the manual and defuse it should be:
c4 e2 ed 29 15 00 00 00 00
So that one seems right now?
vaddpd ymm10,ymm10,ymm11
gives us:
000000013f2e1009 C4 41 AD 58 D3 vaddpd ymm10, ymm10, ymm11
And should be:
c4 41 2d 58 d3
AD vs 2D = means the (VEX byte 3, W bit is set). Opcode specific extension
or used like rex.w, or ignored, depending on the opcode byte.Specifically for vaddpd we have:
VEX.NDS.256.66.0F.WIG 58 /r
WIG: can use C5H form (if not requiring VEX.mmmmm) or VEX.W value is
ignored in the C4H form of VEXprefix.
— If WIG is present, the instruction may be encoded using either the
two-byte form or the three-byte form ofVEX. When encoding the instruction using the three-byte form of VEX, the
value of VEX.W is ignored.So that shouldn’t be a problem and should be safe there.
It would be worth testing the instruction specifically on it’s own on an
AMD chip just to make sure they don’t have a different take on the W bit,
but I wouldn’t imagine so.From: gwoltman [mailto:[email protected]]
Sent: 13 November 2016 04:43 PM
To: Terraspace/HJWasm [email protected]
Subject: Re: [Terraspace/HJWasm] Request for gcc compatible output (#38)Another example.
Objconv:
vpcmpeqq ymm2, ymm2, ymmword ptr [YMM_FMA_ONE] ; 6ED1 _ C4 E2 Ed: 29. 15,
00000000(rel)gdb:
0: c4 e2 ed 29 (bad)
4: 15 00 00 00 00 adc eax,0x0—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <https://github.com/
/issues/38#issuecomment-260197046> , or mute the thread <
https://github.com/notifications/unsubscribe-auth/
AQGQVIOqapvhxp7EhEhwzRHlIyZHlUSZks5q9z4JgaJpZM4Kwu7q> .—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#38 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AQH-YBhNiGeM9Ct2BbNWP5P9NSrSunRmks5q919vgaJpZM4Kwu7q
.
from uasm.
codegen.c line 598:
/* This fixes AVX REX_W wide 32 <-> 64 instructions third
byte bit W*/
//lbyte &= ~EVEX_P1WMASK; //make sure it is not set if
not 64 bit
//lbyte |= ((CodeInfo->pinstr->prefix) >> 8 & 0x80); // set
only W bit if 64 bit
fixes the problem
On Mon, Nov 14, 2016 at 5:40 AM, Branislav Habus [email protected]
wrote:
I'll fix that anyway, it won't be hard to clear that bit if C4 in question
I just need to find out which instructions are includedOn Mon, Nov 14, 2016 at 5:35 AM, John Hankinson [email protected]
wrote:This is what I’m getting with the latest version:
vpcmpeqq ymm2, ymm2, ymmword ptr [YMM_FMA_ONE]
000000013f2e1000 C4 E2 ED 29 15 F7 2F 00 00 vpcmpeqq ymm2, ymm2, ymmword
ptr [rip+0x2ff7]According to the manual and defuse it should be:
c4 e2 ed 29 15 00 00 00 00
So that one seems right now?
vaddpd ymm10,ymm10,ymm11
gives us:
000000013f2e1009 C4 41 AD 58 D3 vaddpd ymm10, ymm10, ymm11
And should be:
c4 41 2d 58 d3
AD vs 2D = means the (VEX byte 3, W bit is set). Opcode specific
extension or used like rex.w, or ignored, depending on the opcode byte.Specifically for vaddpd we have:
VEX.NDS.256.66.0F.WIG 58 /r
WIG: can use C5H form (if not requiring VEX.mmmmm) or VEX.W value is
ignored in the C4H form of VEXprefix.
— If WIG is present, the instruction may be encoded using either the
two-byte form or the three-byte form ofVEX. When encoding the instruction using the three-byte form of VEX, the
value of VEX.W is ignored.So that shouldn’t be a problem and should be safe there.
It would be worth testing the instruction specifically on it’s own on an
AMD chip just to make sure they don’t have a different take on the W bit,
but I wouldn’t imagine so.From: gwoltman [mailto:[email protected]]
Sent: 13 November 2016 04:43 PM
To: Terraspace/HJWasm [email protected]
Subject: Re: [Terraspace/HJWasm] Request for gcc compatible output (#38)Another example.
Objconv:
vpcmpeqq ymm2, ymm2, ymmword ptr [YMM_FMA_ONE] ; 6ED1 _ C4 E2 Ed: 29. 15,
00000000(rel)gdb:
0: c4 e2 ed 29 (bad)
4: 15 00 00 00 00 adc eax,0x0—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <
https://github.com/Terraspace/HJWasm/issues/38#issuecomment-260197046> ,
or mute the thread <https://github.com/notificati
ons/unsubscribe-auth/AQGQVIOqapvhxp7EhEhwzRHlIyZHlUSZks5q9z4JgaJpZM4Kwu7q>
.—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#38 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AQH-YBhNiGeM9Ct2BbNWP5P9NSrSunRmks5q919vgaJpZM4Kwu7q
.
from uasm.
The addpd problem is fixed. The vpcmpeqq is not.
I think this stretch of code:
vpxor ymm2, ymm2, ymm2
vpcmpeqq ymm3, ymm3, ymm2
vpand ymm9, ymm9, YMMWORD PTR YMM_28TH_BIT ;; Test for positive dword values in QF1 (test 28th bit)
vpcmpeqq ymm9, ymm9, ymm2
vpmovmskb rdx, ymm3
and edx, 0FFFFFFFFh ;; See if INVFAC values changed
jnz short invfac_adjust ;; Jump INVFACs need adjustment
vpmovmskb rdx, ymm9
comes out as this by objdump:
3cd5: c5 ed ef d2 vpxor %ymm2,%ymm2,%ymm2
3cd9: c4 e2 e5 29 (bad)
3cdd: da c5 fcmovb %st(5),%st
3cdf: 35 db 0d 00 00 xor $0xddb,%eax
3ce4: 00 00 add %al,(%rax)
3ce6: c4 62 b5 29 (bad)
3cea: ca c4 e1 lret $0xe1c4
3ced: 7d d7 jge 3cc6 <factor64_tf+0x35a3>
3cef: d3 83 e2 ff 75 0a roll %cl,0xa75ffe2(%rbx)
3cf5: c4 c1 7d d7 d1 vpmovmskb %ymm9,%edx
and:
vpsrlq ymm9, ymm9, 30 ;; Q1 = top bits of quotient
comes out as:
3c05: c4 c1 b5 73 (bad)
3c09: d1 1e rcrl (%rsi)
from uasm.
If I disassemble that sequence (use Intel SDE) I get the following (which seems correct):
70: vpxor ymm2, ymm2, ymm2
000000013fa31000 C5 ED EF D2 vpxor ymm2, ymm2, ymm2
71: vpcmpeqq ymm3, ymm3, ymm2
000000013fa31004 C4 E2 E5 29 DA vpcmpeqq ymm3, ymm3, ymm2
72: vpand ymm9, ymm9, YMMWORD PTR mop ;; Test for positive dword values in QF1 (test 28th bit)
000000013fa31009 C5 35 DB 0D EF 2F 00 00 vpand ymm9, ymm9, ymmword ptr [rip+0x2fef]
73: vpcmpeqq ymm9, ymm9, ymm2
000000013fa31011 C4 62 B5 29 CA vpcmpeqq ymm9, ymm9, ymm2
74: vpmovmskb rdx, ymm3
000000013fa31016 C4 E1 7D D7 D3 vpmovmskb edx, ymm3
75: and edx, 0FFFFFFFFh ;; See if INVFAC values changed
000000013fa3101b 83 E2 FF and edx, 0xffffffff
76: jnz short invfac_adjust ;; Jump INVFACs need adjustment
000000013fa3101e 75 05 jnz 0x13fa31025
77: vpmovmskb rdx, ymm9
000000013fa31020 C4 C1 7D D7 D1 vpmovmskb edx, ymm9
78: invfac_adjust:
79: vpsrlq ymm9, ymm9, 30
000000013fa31025 C4 C1 B5 73 D1 1E vpsrlq ymm9, ymm9, 0x1e
Visual Studio 2015 also agrees with:
--- vcall2.asm -----------------------------------------------------------------
start:
000000013F0B1000 C5 ED EF D2 vpxor ymm2,ymm2,ymm2
000000013F0B1004 C4 E2 E5 29 DA vpcmpeqq ymm3,ymm3,ymm2
000000013F0B1009 C5 35 DB 0D EF 2F 00 00 vpand ymm9,ymm9,ymmword ptr [mop (013F0B4000h)]
000000013F0B1011 C4 62 B5 29 CA vpcmpeqq ymm9,ymm9,ymm2
000000013F0B1016 C4 E1 7D D7 D3 vpmovmskb edx,ymm3
000000013F0B101B 83 E2 FF and edx,0FFFFFFFFh
000000013F0B101E 75 05 jne invfac_adjust (013F0B1025h)
000000013F0B1020 C4 C1 7D D7 D1 vpmovmskb edx,ymm9
invfac_adjust:
OBJConv too (apart from the prefix bit which shouldn’t matter):
; Disassembly of file: vcall2.obj
; Thu Nov 17 08:53:53 2016
; Mode: 64 bits
; Syntax: MASM/ML64
; Instruction set: AVX-512, x64
option dotname
public start
public invfac_adjust
public mop
public YMM_FMA_ONE
_text SEGMENT PARA 'CODE' ; section number 1
start PROC
$$$00001 LABEL NEAR
vpxor ymm2, ymm2, ymm2 ; 0000 _ C5 ED: EF. D2
; Note: Prefix bit or byte has no meaning in this context
vpcmpeqq ymm3, ymm3, ymm2 ; 0004 _ C4 E2 E5: 29. DA
vpand ymm9, ymm9, ymmword ptr [mop] ; 0009 _ C5 35: DB. 0D, 00000000(rel)
; Note: Prefix bit or byte has no meaning in this context
vpcmpeqq ymm9, ymm9, ymm2 ; 0011 _ C4 62 B5: 29. CA
vpmovmskb rdx, ymm3 ; 0016 _ C4 E1 7D: D7. D3
and edx, 0FFFFFFFFH ; 001B _ 83. E2, FF
jnz invfac_adjust ; 001E _ 75, 05
vpmovmskb rdx, ymm9 ; 0020 _ C4 C1 7D: D7. D1
The only one that has an issue is Defuse.. so I think in this case, they’re wrong .. probably due to them not being able to recognise the instruction with the prefix bit :)
We’ll fix that anyway, but it should all be working non-the-less.
From: gwoltman [mailto:[email protected]]
Sent: 17 November 2016 04:11 AM
To: Terraspace/HJWasm [email protected]
Cc: John Hankinson [email protected]; Comment [email protected]
Subject: Re: [Terraspace/HJWasm] Request for gcc compatible output (#38)
The addpd problem is fixed. The vpcmpeqq is not.
I think this stretch of code:
vpxor ymm2, ymm2, ymm2
vpcmpeqq ymm3, ymm3, ymm2
vpand ymm9, ymm9, YMMWORD PTR YMM_28TH_BIT ;; Test for positive dword values in QF1 (test 28th bit)
vpcmpeqq ymm9, ymm9, ymm2
vpmovmskb rdx, ymm3
and edx, 0FFFFFFFFh ;; See if INVFAC values changed
jnz short invfac_adjust ;; Jump INVFACs need adjustment
vpmovmskb rdx, ymm9
comes out as this by objdump:
3cd5: c5 ed ef d2 vpxor %ymm2,%ymm2,%ymm2
3cd9: c4 e2 e5 29 (bad)
3cdd: da c5 fcmovb %st(5),%st
3cdf: 35 db 0d 00 00 xor $0xddb,%eax
3ce4: 00 00 add %al,(%rax)
3ce6: c4 62 b5 29 (bad)
3cea: ca c4 e1 lret $0xe1c4
3ced: 7d d7 jge 3cc6 <factor64_tf+0x35a3>
3cef: d3 83 e2 ff 75 0a roll %cl,0xa75ffe2(%rbx)
3cf5: c4 c1 7d d7 d1 vpmovmskb %ymm9,%edx
and:
vpsrlq ymm9, ymm9, 30 ;; Q1 = top bits of quotient
comes out as:
3c05: c4 c1 b5 73 (bad)
3c09: d1 1e rcrl (%rsi)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub #38 (comment) , or mute the thread https://github.com/notifications/unsubscribe-auth/AQGQVEcpDXkfAk92940fYV2fszKFlxp-ks5q-9PEgaJpZM4Kwu7q .
from uasm.
Fixed a bunch of these encoding bits now:
The following test code:
vpcmpeqq ymm3, ymm3, ymm2
vpsrlq ymm9, ymm9, 30
vpxor ymm2, ymm2, ymm2
vpcmpeqq ymm3, ymm3, ymm2
vpand ymm9, ymm9, YMMWORD PTR mop ;; Test for positive dword values in QF1 (test 28th bit)
vpcmpeqq ymm9, ymm9, ymm2
vpmovmskb rdx, ymm3
and edx, 0FFFFFFFFh ;; See if INVFAC values changed
jnz short invfac_adjust ;; Jump INVFACs need adjustment
vpmovmskb rdx, ymm9
invfac_adjust:
vpsrlq ymm9, ymm9, 30
vpslldq ymm1,ymm20, 30
VPBROADCASTD zmm10, xmm3
VPBROADCASTD zmm10, xmm14
VPBROADCASTD ymm20, dword ptr mop
vbroadcastsd zmm16{k1}, QWORD PTR mop
vbroadcastsd zmm16, REAL8 PTR mop
VPBROADCASTD ymm1, xmm2
VPBROADCASTD ymm1, dword ptr mop
VPBROADCASTD ymm9, xmm2
VPBROADCASTD zmm20, dword ptr mop
VPBROADCASTD zmm10{k2}, xmm3
vbroadcastsd zmm16{k1}, QWORD PTR mop
vbroadcastsd zmm16, REAL8 PTR mop
vpsrlq ymm9, ymm7, 3
vpsllq ymm9, ymm7, 2
vpsrlq ymm1, ymm20, 30
vpsllq ymm1, ymm20, 30
vpsrlq ymm9, ymm9, 30
vpsllq ymm9, ymm9, 30
vpslldq ymm9,ymm9, 30
vpsllw ymm8,ymm9,30
vpslld ymm9, ymm9, 30
vpsraw ymm9, ymm9, 30
vpsrad ymm9,ymm9, 30
vpsrldq ymm9, ymm9, 30
vpsrlq ymm9, ymm7, [r11]
vfnmadd231pd zmm27, zmm24, zmm23
;c4 e2 6d 29 14 25 00 ;us
;c4 e2 ed 29 15 00 00 00 00 ;should be
vpcmpeqq ymm2, ymm2, ymmword ptr [YMM_FMA_ONE]
vpcmpeqq ymm9, ymm9, ymmword ptr [YMM_FMA_ONE]
vmovapd ymm1, ymmword ptr [YMM_FMA_ONE]
vmovapd ymm1, ymm9
;c4 41 ad 58 d3 ;us
;c4 41 2d 58 d3 ;should be
vaddpd ymm10,ymm10,ymm11
vaddpd zmm17, zmm17, zmm20
vmovapd zmm16,zmm18
vmovapd zmm18,zmm16
vmulpd zmm17, zmm17, zmm20
vdivpd zmm17, zmm17, zmm20
vaddpd zmm17, zmm17, zmm10
vmovapd zmm16,zmm8
vmovapd zmm18,zmm6
vmulpd zmm17, zmm17, zmm12
vdivpd zmm17, zmm17, zmm13
vaddpd zmm3, zmm17, zmm6
vmovapd zmm16,zmm2
vmovapd zmm18,zmm6
vmulpd zmm1, zmm17, zmm4
vdivpd zmm17, zmm3, zmm13
korw k5, k1, k2
bob exec
vmovapd ymm2,YMMWORD PTR [ebx+48*SZPTR+576+16*SZPTR+24*8]
noexec vmovapd ymm2, YMM_BIGVAL
vxorpd ymm2, ymm2, ymm2
vmovapd ymm1, YMM_BIGVAL ;; Load comparison valueno base2
vxorpd ymm1, ymm1, ymm1 ;; Create comparison value
vcmppd ymm0, ymm2, ymm1, 0Ch ;; Are any carries non-zero
vmovmskpd eax, ymm0 ;; Extract 4 comparison bitsbase2
vxorpd ymm1, ymm1, ymm1 ;; High carry words are always compared to zero
vcmppd ymm0, ymm3, ymm1, 0Ch ;; Are any carries non-zero
vmovmskpd ecx, ymm0 ;; Extract 4 comparison bits
or eax, ecx
movzx eax, BYTE PTR [edi] ;; Load big vs. little flags
vmovapd ymm0, [esi] ;; Load values1ttp
vmulpd ymm0, ymm0, [ebp] ;; Mul values1 by two-to-minus-phittp
vmulpd ymm0, ymm0, YMM_BIGVAL;; Mul by FFTLEN/2
vaddpd ymm0, ymm0, ymm2 ;; x1 = values1 + carry split_upper_carry_zpad_word ttp, base2, ymm3, ymm1, ymm2, rax*2no const vmulpd ymm2, ymm3, YMM_K_LO ;; low bits of high FFT carry * k_loconst
vmulpd ymm2, ymm3, YMM_BIGVAL ;; low bits of high_FFT_carry * k_lo
vaddpd ymm0, ymm0, ymm2 ;; x1 = x1 + low bits of high_FFT_carry * k_lono const
vmulpd ymm3, ymm3, YMM_BIGVAL ;; low bits of high FFT carry * k_hiconst vmulpd ymm3, ymm3, YMM_K_TIMES_MULCONST_HI ;; low bits of high FFT carry * k_hittp
vmulpd ymm3, ymm3, YMM_BIGVAL[rax*2] ;; shift low bits of high FFT carry * k_hino ttp vmulpd ymm3, ymm3, YMM_LIMIT_INVERSE[0] ;; shift low bits of high FFT carry * k_hi
vroundpd ymm3, ymm3, 0 ;; WASTEFUL. Round (k_hi * limit_inverse) should be precomputed rounding ttp, base2, noexec, ymm0, ymm2, ymm4, rax*2
vaddpd ymm2, ymm2, ymm3 ;; Carry += shifted low bits of high_FFT_carry * k_hittp
vmulpd ymm0, ymm0, [ebp+32] ;; new value1 = val * two-to-phi ystore [rsi], ymm0 ;; Save new value1
vmovapd ymm3, ymm1 ;; Next high FFT carry = high bits of current high FFT carryttp bump rdi, 1 ;; Advance pointers bump rsi, 64ttp bump rbp, 64 sub rdx, 1 ;; Test counter jnz section_loop ;; More cache lines in section, add carry in ;; Section ended. Rotate carries again and add the new next section carry values ;; into the previously calculated next section carry values rotate_carries base2, ymm2, ymm4, ymm0, ymm1 rotate_carries noexec, ymm3, ymm5, ymm0, ymm1base2 vsubpd ymm4, ymm4, YMM_BIGVAL vaddpd ymm4, ymm4, YMM_TMP1 vaddpd ymm5, ymm5, YMM_TMP2 jmp section_start
ret
Produces (and is gcc compliant):
Disassembly:
0: c4 e2 65 29 da vpcmpeqq ymm3,ymm3,ymm2
5: c4 c1 35 73 d1 1e vpsrlq ymm9,ymm9,0x1e
b: c5 ed ef d2 vpxor ymm2,ymm2,ymm2
f: c4 e2 65 29 da vpcmpeqq ymm3,ymm3,ymm2
14: c5 35 db 0d e4 2f 00 vpand ymm9,ymm9,YMMWORD PTR [rip+0x2fe4] # 0x3000
1b: 00
1c: c4 62 35 29 ca vpcmpeqq ymm9,ymm9,ymm2
21: c4 e1 7d d7 d3 vpmovmskb edx,ymm3
26: 83 e2 ff and edx,0xffffffff
29: 75 05 jne 0x30
2b: c4 c1 7d d7 d1 vpmovmskb edx,ymm9
30: c4 c1 35 73 d1 1e vpsrlq ymm9,ymm9,0x1e
36: 62 b1 f5 28 73 fc 1e vpslldq ymm1,ymm20,0x1e
3d: 62 72 7d 48 58 d3 vpbroadcastd zmm10,xmm3
43: 62 52 7d 48 58 d6 vpbroadcastd zmm10,xmm14
49: 62 e2 7d 28 58 25 ad vpbroadcastd ymm20,DWORD PTR [rip+0x2fad] # 0x3000
50: 2f 00 00
53: 62 e2 fd 49 19 05 a3 vbroadcastsd zmm16{k1},QWORD PTR [rip+0x2fa3] # 0x3000
5a: 2f 00 00
5d: 62 e2 fd 48 19 05 99 vbroadcastsd zmm16,QWORD PTR [rip+0x2f99] # 0x3000
64: 2f 00 00
67: c4 e2 7d 58 ca vpbroadcastd ymm1,xmm2
6c: c4 e2 7d 58 0d 8b 2f vpbroadcastd ymm1,DWORD PTR [rip+0x2f8b] # 0x3000
73: 00 00
75: c4 62 7d 58 ca vpbroadcastd ymm9,xmm2
7a: 62 e2 7d 48 58 25 7c vpbroadcastd zmm20,DWORD PTR [rip+0x2f7c] # 0x3000
81: 2f 00 00
84: 62 72 7d 4a 58 d3 vpbroadcastd zmm10{k2},xmm3
8a: 62 e2 fd 49 19 05 6c vbroadcastsd zmm16{k1},QWORD PTR [rip+0x2f6c] # 0x3000
91: 2f 00 00
94: 62 e2 fd 48 19 05 62 vbroadcastsd zmm16,QWORD PTR [rip+0x2f62] # 0x3000
9b: 2f 00 00
9e: c5 35 73 d7 03 vpsrlq ymm9,ymm7,0x3
a3: c5 35 73 f7 02 vpsllq ymm9,ymm7,0x2
a8: 62 b1 f5 28 73 d4 1e vpsrlq ymm1,ymm20,0x1e
af: 62 b1 f5 28 73 f4 1e vpsllq ymm1,ymm20,0x1e
b6: c4 c1 35 73 d1 1e vpsrlq ymm9,ymm9,0x1e
bc: c4 c1 35 73 f1 1e vpsllq ymm9,ymm9,0x1e
c2: c4 c1 35 73 f9 1e vpslldq ymm9,ymm9,0x1e
c8: c4 c1 3d 71 f1 1e vpsllw ymm8,ymm9,0x1e
ce: c4 c1 35 72 f1 1e vpslld ymm9,ymm9,0x1e
d4: c4 c1 35 71 e1 1e vpsraw ymm9,ymm9,0x1e
da: c4 c1 35 72 e1 1e vpsrad ymm9,ymm9,0x1e
e0: c4 c1 35 73 d9 1e vpsrldq ymm9,ymm9,0x1e
e6: c4 41 45 d3 0b vpsrlq ymm9,ymm7,XMMWORD PTR [r11]
eb: 62 22 bd 40 bc df vfnmadd231pd zmm27,zmm24,zmm23
f1: c4 e2 6d 29 15 26 2f vpcmpeqq ymm2,ymm2,YMMWORD PTR [rip+0x2f26] # 0x3020
f8: 00 00
fa: c4 62 35 29 0d 1d 2f vpcmpeqq ymm9,ymm9,YMMWORD PTR [rip+0x2f1d] # 0x3020
101: 00 00
103: c5 fd 28 0d 15 2f 00 vmovapd ymm1,YMMWORD PTR [rip+0x2f15] # 0x3020
10a: 00
10b: c4 c1 7d 28 c9 vmovapd ymm1,ymm9
110: c4 41 2d 58 d3 vaddpd ymm10,ymm10,ymm11
115: 62 a1 f5 40 58 cc vaddpd zmm17,zmm17,zmm20
11b: 62 a1 fd 48 28 c2 vmovapd zmm16,zmm18
121: 62 a1 fd 48 28 d0 vmovapd zmm18,zmm16
127: 62 a1 f5 40 59 cc vmulpd zmm17,zmm17,zmm20
12d: 62 a1 f5 40 5e cc vdivpd zmm17,zmm17,zmm20
133: 62 c1 f5 40 58 ca vaddpd zmm17,zmm17,zmm10
139: 62 c1 fd 48 28 c0 vmovapd zmm16,zmm8
13f: 62 e1 fd 48 28 d6 vmovapd zmm18,zmm6
145: 62 c1 f5 40 59 cc vmulpd zmm17,zmm17,zmm12
14b: 62 c1 f5 40 5e cd vdivpd zmm17,zmm17,zmm13
151: 62 f1 f5 40 58 de vaddpd zmm3,zmm17,zmm6
157: 62 e1 fd 48 28 c2 vmovapd zmm16,zmm2
15d: 62 e1 fd 48 28 d6 vmovapd zmm18,zmm6
163: 62 f1 f5 40 59 cc vmulpd zmm1,zmm17,zmm4
169: 62 c1 e5 48 5e cd vdivpd zmm17,zmm3,zmm13
16f: c5 f4 45 ea korw k5,k1,k2
173: c4 c1 7d 28 8b 00 05 vmovapd ymm1,YMMWORD PTR [r11+0x500]
17a: 00 00
17c: c4 c1 7d 28 a3 00 05 vmovapd ymm4,YMMWORD PTR [r11+0x500]
183: 00 00
185: 67 c5 fd 28 93 00 05 vmovapd ymm2,YMMWORD PTR [ebx+0x500]
18c: 00 00
18e: c5 ed 57 d2 vxorpd ymm2,ymm2,ymm2
192: c4 c1 7d 28 8b 00 05 vmovapd ymm1,YMMWORD PTR [r11+0x500]
199: 00 00
19b: c5 f5 57 c9 vxorpd ymm1,ymm1,ymm1
19f: c5 ed c2 c1 0c vcmpneq_oqpd ymm0,ymm2,ymm1
1a4: c5 fd 50 c0 vmovmskpd eax,ymm0
1a8: c5 f5 57 c9 vxorpd ymm1,ymm1,ymm1
1ac: c5 e5 c2 c1 0c vcmpneq_oqpd ymm0,ymm3,ymm1
1b1: c5 fd 50 c8 vmovmskpd ecx,ymm0
1b5: 0b c1 or eax,ecx
1b7: 67 0f b6 07 movzx eax,BYTE PTR [edi]
1bb: 67 c5 fd 28 06 vmovapd ymm0,YMMWORD PTR [esi]
1c0: 67 c5 fd 59 45 00 vmulpd ymm0,ymm0,YMMWORD PTR [ebp+0x0]
1c6: c4 c1 7d 59 83 00 05 vmulpd ymm0,ymm0,YMMWORD PTR [r11+0x500]
1cd: 00 00
1cf: c5 fd 58 c2 vaddpd ymm0,ymm0,ymm2
1d3: c4 c1 65 59 93 00 05 vmulpd ymm2,ymm3,YMMWORD PTR [r11+0x500]
1da: 00 00
1dc: c5 fd 58 c2 vaddpd ymm0,ymm0,ymm2
1e0: c4 c1 65 59 9b 00 05 vmulpd ymm3,ymm3,YMMWORD PTR [r11+0x500]
1e7: 00 00
1e9: c4 c1 65 59 9c 43 00 vmulpd ymm3,ymm3,YMMWORD PTR [r11+rax*2+0x500]
1f0: 05 00 00
1f3: c4 e3 7d 09 db 00 vroundpd ymm3,ymm3,0x0
1f9: c5 ed 58 d3 vaddpd ymm2,ymm2,ymm3
1fd: 67 c5 fd 59 45 20 vmulpd ymm0,ymm0,YMMWORD PTR [ebp+0x20]
203: c5 fd 28 d9 vmovapd ymm3,ymm1
207: c3 ret
From: gwoltman [mailto:[email protected]]
Sent: 17 November 2016 04:11 AM
To: Terraspace/HJWasm [email protected]
Cc: John Hankinson [email protected]; Comment [email protected]
Subject: Re: [Terraspace/HJWasm] Request for gcc compatible output (#38)
The addpd problem is fixed. The vpcmpeqq is not.
I think this stretch of code:
vpxor ymm2, ymm2, ymm2
vpcmpeqq ymm3, ymm3, ymm2
vpand ymm9, ymm9, YMMWORD PTR YMM_28TH_BIT ;; Test for positive dword values in QF1 (test 28th bit)
vpcmpeqq ymm9, ymm9, ymm2
vpmovmskb rdx, ymm3
and edx, 0FFFFFFFFh ;; See if INVFAC values changed
jnz short invfac_adjust ;; Jump INVFACs need adjustment
vpmovmskb rdx, ymm9
comes out as this by objdump:
3cd5: c5 ed ef d2 vpxor %ymm2,%ymm2,%ymm2
3cd9: c4 e2 e5 29 (bad)
3cdd: da c5 fcmovb %st(5),%st
3cdf: 35 db 0d 00 00 xor $0xddb,%eax
3ce4: 00 00 add %al,(%rax)
3ce6: c4 62 b5 29 (bad)
3cea: ca c4 e1 lret $0xe1c4
3ced: 7d d7 jge 3cc6 <factor64_tf+0x35a3>
3cef: d3 83 e2 ff 75 0a roll %cl,0xa75ffe2(%rbx)
3cf5: c4 c1 7d d7 d1 vpmovmskb %ymm9,%edx
and:
vpsrlq ymm9, ymm9, 30 ;; Q1 = top bits of quotient
comes out as:
3c05: c4 c1 b5 73 (bad)
3c09: d1 1e rcrl (%rsi)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub #38 (comment) , or mute the thread https://github.com/notifications/unsubscribe-auth/AQGQVEcpDXkfAk92940fYV2fszKFlxp-ks5q-9PEgaJpZM4Kwu7q .
from uasm.
New packages on the site dated 18th Nov with all the gcc compliant fixes included.
From: gwoltman [mailto:[email protected]]
Sent: 17 November 2016 04:11 AM
To: Terraspace/HJWasm [email protected]
Cc: John Hankinson [email protected]; Comment [email protected]
Subject: Re: [Terraspace/HJWasm] Request for gcc compatible output (#38)
The addpd problem is fixed. The vpcmpeqq is not.
I think this stretch of code:
vpxor ymm2, ymm2, ymm2
vpcmpeqq ymm3, ymm3, ymm2
vpand ymm9, ymm9, YMMWORD PTR YMM_28TH_BIT ;; Test for positive dword values in QF1 (test 28th bit)
vpcmpeqq ymm9, ymm9, ymm2
vpmovmskb rdx, ymm3
and edx, 0FFFFFFFFh ;; See if INVFAC values changed
jnz short invfac_adjust ;; Jump INVFACs need adjustment
vpmovmskb rdx, ymm9
comes out as this by objdump:
3cd5: c5 ed ef d2 vpxor %ymm2,%ymm2,%ymm2
3cd9: c4 e2 e5 29 (bad)
3cdd: da c5 fcmovb %st(5),%st
3cdf: 35 db 0d 00 00 xor $0xddb,%eax
3ce4: 00 00 add %al,(%rax)
3ce6: c4 62 b5 29 (bad)
3cea: ca c4 e1 lret $0xe1c4
3ced: 7d d7 jge 3cc6 <factor64_tf+0x35a3>
3cef: d3 83 e2 ff 75 0a roll %cl,0xa75ffe2(%rbx)
3cf5: c4 c1 7d d7 d1 vpmovmskb %ymm9,%edx
and:
vpsrlq ymm9, ymm9, 30 ;; Q1 = top bits of quotient
comes out as:
3c05: c4 c1 b5 73 (bad)
3c09: d1 1e rcrl (%rsi)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub #38 (comment) , or mute the thread https://github.com/notifications/unsubscribe-auth/AQGQVEcpDXkfAk92940fYV2fszKFlxp-ks5q-9PEgaJpZM4Kwu7q .
from uasm.
Better. vpmuludq exhibits the same symptoms. Two examples:
vpmuludq ymm9, ymm9, ymm8
vpmuludq ymm4, ymm4, YMMWORD PTR YMM_TWO_120_MODF3
from uasm.
This one is fixed now.
We’ve found a few others which we’re doing now as well. Will upload an updated package as soon as they’re all done.
From: gwoltman [mailto:[email protected]]
Sent: 18 November 2016 06:36 PM
To: Terraspace/HJWasm [email protected]
Cc: John Hankinson [email protected]; Comment [email protected]
Subject: Re: [Terraspace/HJWasm] Request for gcc compatible output (#38)
Better. vpmuludq exhibits the same symptoms. Two examples:
vpmuludq ymm9, ymm9, ymm8
vpmuludq ymm4, ymm4, YMMWORD PTR YMM_TWO_120_MODF3
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub #38 (comment) , or mute the thread https://github.com/notifications/unsubscribe-auth/AQGQVABzuTzF5_BU4tdbdFxBWD-CBSAzks5q_fAGgaJpZM4Kwu7q .
from uasm.
Packages are updated on the site, should fix the below issue including some with vpermilpd, vblend and vcvtpd2ps.
From: gwoltman [mailto:[email protected]]
Sent: 18 November 2016 06:36 PM
To: Terraspace/HJWasm [email protected]
Cc: John Hankinson [email protected]; Comment [email protected]
Subject: Re: [Terraspace/HJWasm] Request for gcc compatible output (#38)
Better. vpmuludq exhibits the same symptoms. Two examples:
vpmuludq ymm9, ymm9, ymm8
vpmuludq ymm4, ymm4, YMMWORD PTR YMM_TWO_120_MODF3
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub #38 (comment) , or mute the thread https://github.com/notifications/unsubscribe-auth/AQGQVABzuTzF5_BU4tdbdFxBWD-CBSAzks5q_fAGgaJpZM4Kwu7q .
from uasm.
Sorry to report these in dribs and drabs:
vpaddq ymm10, ymm10, ymm8
from uasm.
That’s ok… sorry to fix them in dribs and drabs :)
From: gwoltman [mailto:[email protected]]
Sent: 18 November 2016 10:56 PM
To: Terraspace/HJWasm [email protected]
Cc: John Hankinson [email protected]; Comment [email protected]
Subject: Re: [Terraspace/HJWasm] Request for gcc compatible output (#38)
Sorry to report these in dribs and drabs:
vpaddq ymm10, ymm10, ymm8
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub #38 (comment) , or mute the thread https://github.com/notifications/unsubscribe-auth/AQGQVGqyptg04xYI7JTbF97f0Gq8X3ibks5q_i0IgaJpZM4Kwu7q .
from uasm.
Fixed, packages updated on the site.
From: gwoltman [mailto:[email protected]]
Sent: 18 November 2016 10:56 PM
To: Terraspace/HJWasm [email protected]
Cc: John Hankinson [email protected]; Comment [email protected]
Subject: Re: [Terraspace/HJWasm] Request for gcc compatible output (#38)
Sorry to report these in dribs and drabs:
vpaddq ymm10, ymm10, ymm8
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub #38 (comment) , or mute the thread https://github.com/notifications/unsubscribe-auth/AQGQVGqyptg04xYI7JTbF97f0Gq8X3ibks5q_i0IgaJpZM4Kwu7q .
from uasm.
I cannot find any more gcc compatibilities in all my assembled source files!
Well done!
from uasm.
Related Issues (20)
- comment on line issue HOT 1
- shrx HOT 1
- EXPR64
- Proposed change: ELFOSABI_NONE instead of ELFOSABI_LINUX HOT 1
- 64-bit procedure prologue/epilogue ("SUB RSP,8 / ADD RSP,8") corrupts the listing HOT 1
- MS Visual Studio 2019 IDE debugger doesn't recognise line number debug information HOT 2
- Bad code generated for vpslld,vpsrld,vpsrad etc. HOT 2
- Bad code generated for vpbroadcastd HOT 5
- FTBFS with 2.56 on Linux HOT 8
- Failed to build 2.56.2 on MacOS
- Wrong macho64 output with struct names
- OR rax,64bit_imm assembles as OR rax,0 and should generate a warning. HOT 1
- vmovd shouldn't accept ymm/zmm
- and al, 0F0h – Operans must be the same size 1-4 HOT 4
- Linux stack balancing regression in 2.56
- Please add option to replace relative .asm path with absolute path HOT 1
- The "Error A2169: General Failure" on -elf generation
- String literals used with INVOKE are escaped
- Link.exe chokes on debug info generated by UASM HOT 3
- General failure on empty UNION
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from uasm.