Product Overview
The current version of the emulator corresponds to the Intel® Architecture instruction set extensions programming reference (revision 319433-052), the Intel® Advanced Vector Extensions 10 (Intel® AVX10) architecture specification, and the Intel® Advanced Performance Extensions (Intel® APX) architecture specification available on Instruction Set Architecture.
Intel SDE continues to support features from previous releases:
- Emulation support for the additional Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instructions present on some future Intel® processors.
- Emulation support for the Intel® Advanced Matrix Extensions (Intel® AMX) present on some future Intel® processors.
- Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instructions present on Intel® processors.
- Intel® Streaming SIMD Extensions 4 (Intel® SSE 4), Advanced Encryption Standard (AES), and PCLMULQDQ and the Intel® Advanced Vector Extensions (Intel® AVX)
- Intel® Advanced Vector Extensions 2 (Intel® AVX2), RTM, BMI1, and BMI2 instructions, introduced on the 4th generation Intel® Core™ processor family.
- The ADOX/ADCX instructions, introduced on the 5th generation Intel® Core™ processor family.
- Support for the Intel® Secure Hash Algorithm Extensions (Intel® SHA) present on the Intel Atom® processor.
- Support for the vector instructions for deep learning present on Intel® processors.
For more information about the Intel SHA extension and a sample application, see Instruction Set Architecture.
Intel is releasing this Intel SDE so that developers can gain familiarity with our upcoming instruction set extensions. Intel SDE can help ensure the software is ready to take advantage of the opportunities created by these new instructions in our processors. We hope that developers will explore the new instructions using the currently available compilers and assemblers.
Intel SDE is built upon the Pin dynamic binary instrumentation system and the XED encoder decoder. Pin controls the running of an application. Pin examines each static instruction in the application approximately once, as it builds traces for running. During this process, which is called instrumentation, for each instruction encountered, Pin validates with Intel SDE if this instruction should be emulated. If the instruction is to be emulated, then Intel SDE notifies Pin to skip over that instruction and instead branch to the appropriate emulation routine. It also instructs Pin how to invoke that emulation function, what arguments to pass, and more.
Intel SDE queries CPUID to figure out what features to emulate. It also modifies the output of CPUID so that compiled applications that check for the emulated features are told that those features exist.
Intel SDE comes with several useful emulator-enabled Pin tools and the XED disassembler:
- The basic emulator
- The mix histogramming tool: This Pin tool can compute histograms by dynamic instructions run, instruction length, instruction category, and ISA extension grouping. This tool can also display the top N most frequently run basic blocks and disassemble them.
- The debug-trace ASCII tracing tool: This versatile tool is useful for observing the dynamic behavior of your code. It prints the instructions that were run, and also the registers written, memory read and written, and more.
- The footprint tool: This simple tool counts how many unique 64-byte chunks of data were referenced while running the program.
- The XED command-line tool can disassemble PECOFF or ELF binary executables.
Licensing
Intel SDE is provided and supported by Intel, free of charge for any type of use, under the terms of the Intel Simplified Software License (Version: August 23, 2023).
Installation
Download and unpack the appropriate kit for your platform. Set your PATH variable to point to that directory. You can also refer to the tools in the kit using full or relative paths. Do not rearrange the files or subdirectories in the unpacked kit. If you want to move the kit directory, move everything.
Windows*: If you are using Winzip, it puts the proper permissions on the unpacked files. However, if you are using Cygwin's tar command to unpack the Windows* kit, you must run a chmod -R +x path-to-kit on the unpacked kit (where path-to-kit is the unpacked kit directory name).
Linux*: On some distributions, you must disable SE Linux to allow Intel SDE to work. On Ubuntu* systems, you must disable yama as described below.
Mac*: Intel SDE is using the MACH taskport APIs. By default, when trying to use these APIs, user-authentication is required once per a GUI session. In order to allow Pin/Intel SDE to run without this authentication you need to disable it. This is done by configuring the machine to auto-confirm takeover of the process as described in System Configuration.
Running Intel® SDE
This is the pattern for running Intel SDE:
path-to-kit/sde [sde args] -- user-application [app args]
The double dash ("--") is important. Options to Intel SDE go before the double dash. Square brackets denote optional arguments.
Important options are the short and long help messages. To see the short help message:
path-to-kit/sde -help
And to see the very long help message:
path-to-kit/sde -long-help
In the help messages, the command line options are often displayed using underscores between words, but dashes may be used instead of underscores. Often the Intel SDE help messages and this web page will refer to command line options as "knobs" for historical reasons. The short help message contains some top level analysis tools knobs as well as the list of supported CPUs.
Emulate Everything Mode
- Windows*: A file called sde-win.bat is provided in Windows* that runs a cmd.exe window under the control of Intel SDE. You can make a shortcut to it and place that shortcut on your desktop. Everything run from that window will be run under the control of Intel SDE, so you may experience a slow down even when you are not emulating anything. All it really does is:
path-to-kit/sde -- cmd.exe
- OS X® or Linux*: You can run your favorite shell under the control of Intel SDE:
path-to-kit/sde -- /bin/tcsh
And everything you run from there will be run under the control of Intel SDE.
Running the Histogram Tool
To generate the instruction mix histograms by opcode (XED iclass, the default) or instruction form (iform). As of version 4.29, the instruction length and instruction category histograms are always included.
path-to-kit/sde -mix -- user-application [args] path-to-kit/sde -mix -iform -- user-application [args]
Notes:
- The ISA extension histogram is also always computed and printed as star-prefixed rows in the histograms. ISA extensions are things like (BASE, X87, MMX, SSE, SSE2, SSE3, etc.). This is useful to see which instruction set extensions are used in your application.
- The dynamic statistics are recorded and emitted several ways: (1) per-thread, (2) per function per thread, and (3) summed for the entire run. Instruction counts by function are also emitted if symbols are found for the application.
- The output is written to a sde-mix-out.txt file in the current directory. The output file name can be changed using the -omix option:
path-to-kit/sde -mix -omix foo.out -- user-application [args]
- The top 20 basic blocks are always printed in the output with their execution weights.
- "-top_blocks N" will allow you to change 20 to N that you specifiy.
- Iforms: "Iform" is the XED term for variants of instructions. In a simple world they would be things like reg/reg or reg/mem, but things are more complicated in general. The iform names come from XED. Consider them experimental and subject to change. To see histograms by the more detailed iforms, use the "-iform" command line option.
- There are many command line options for the mix tool:
Mix knobs
-d [default 0]
Only collect dynamic profile
-demangle [default 1]
Control for attempting symbol demangling
-disas [default 0]
Show disassembly for top blocks
-disas_at_jit [default 0]
This knob does nothing anymore. DEPRECATED!
-dynamic_stats_per_block [default 0]
Print dynamic stats per block
-dynamic_stats_per_loop [default 0]
Print dynamic stats per loop
-function_call_counts [default 1]
Collect number of times each function is called.
-global_functions [default 1]
Print global functions report
-global_hot_blocks [default 1]
Print global hot blocks
-hottest_threads_order [default 0]
thread stats prints are ordered by Icount
-iform [default 0]
Compute ISA iform mix
-line_info [default 1]
Add line info to the top hot blocks
-map_all_blocks [default 0]
Map all the blocks instead of only top blocks
-mapaddr [default 0]
Emit mappings: Address -> Source file and Line
-mapaddr_top_blocks [default 0]
Emit mappings for top blocks: Address -> Source file and Line
-mix [default 0]
Compute mix histograms.
-mix_concat_bbls [default 0]
Concatenate consecutive blocks statistics
-mix_filter_no_shared_libs [default ]
Do not instrument shared libraries
-mix_filter_rtn
Routines to instrument
-mix_loops [default 0]
Supply loops statistics
-mix_loops_threads [default 1]
supply loops statistics per thread
-mix_max_cumulative [default 97]
specify maximum cumulative. stops printing functions when it reached max cumulative
-mix_omit_per_function_stats [default 0]
Omit the per-function histograms. Reduces the output file size.
-mix_omit_per_thread_stats [default 0]
Omit per-thread stats for smaller output files
-mix_opt_report [default 0]
Add optimization report messages to mix file
-mix_top_loops [default 10]
specify maximum number of top loops for which statistics are printed,
sorted by iteration count
-mix_verbose [default 0]
Add info messages to mix file
-mix_vpconflict [default 0]
Add vconflic stats to mix file
-no_shared_libs [default 0]
do not instrument shared libraries
-omix [default sde-mix-out.txt]
specify profile file name
-s [default 0]
terminate after collection of static profile for main image
-top_blocks [default 20]
specify a maximal number of top blocks for which icounts are printed
Example
Command:
% sde -mix -- mm_cmp.opt.vec.exe
Output: in sde-mix-out.txt (default file name)
# # $global-dynamic-counts # # opcode count # *isa-ext-BASE 147597 *isa-ext-MODE64 222 *isa-ext-SSE 21 ADD 3092 AND 2694 CALL_NEAR 1739 CDQE 3 CLD 35 CMOVB 800 CMOVBE 6 ... UCOMISS 14 XCHG 1 XOR 4981 ... *total 147840
Mix Accounting
The rows in the mix output histograms come in two flavors. The rows that begin with "*" are meta-categories which sum up the data in different ways. Here are descriptions of some of the meta categories:
*scalar-simd anything with the XED_ATTRIBUTE_SIMD_SCALAR including AVX and SSE operations. The instructions that operate on one vector element and whose iclass name ends with "SS" or "SD" have this attribute.
*sse-scalar any SSE instruction with the XED_ATTRIBUTE_SIMD_SCALAR
*sse-packed any SSE instruction without the XED_ATTRIBUTE_SIMD_SCALAR
*avx-scalar Any AVX instruction with the attribute XED_ATTRIBUTE_SIMD_SCALAR
*avx128 Any AVX instruction with a 128b vector length but without the XED_ATTRIBUTE_SIMD_SCALAR
*avx256 Any AVX instruction with a 256b vector length
*avx512 Any AVX instruction with a 512b vector length.
*mem-atomic Atomic memory operations
*stack-read Stack reads
*stack-write Stack writes
*iprel-read IP-relative memory reads
*iprel-write IP-relative memory writes
*mem-read-1 Memory read, 1 byte
*mem-read-2 Memory read, 2 bytes
*mem-read-4 Memory read, 4 bytes
*mem-read-8 Memory read, 8 bytes
*mem-write-1 Memory write, 1 byte
*mem-write-2 Memory write, 2 bytes
*mem-write-4 Memory write, 4 bytes
*mem-write-8 Memory write, 8 bytes
*isa-ext-BASE The "BASE" ISA-extension (generic group of instructions. Base includes much of the older instructions
*isa-ext-LONGMODE The set of instructions added with Intel64. These may be 32b or 64b instructions
*isa-set-I186 ISA "set" is a categorization of instructions in the BASE ISA-extension. I186 includes instructions introduced on the 80186 processor.
*isa-set-I386 ISA "set" is a categorization of instructions in the BASE ISA-extension. I386 includes instructions introduced on the 80386 processor.
*isa-set-I486REAL ISA "set" is a categorization of instructions in the BASE ISA-extension. I486REAL includes instructions introduced on the 80486 processor and valid in REAL mode.
*isa-set-I86 ISA "set" is a categorization of instructions in the BASE ISA-extension. I86 includes instructions introduced on the 8086 processor.
*isa-set-LONGMODE ISA "set" is a categorization of instructions in the LONGMODE ISA-extension. LONGMODE includes instructions introduced with Intel64 mode.
*isa-set-PENTIUMREAL ISA "set" is a categorization of instructions in the BASE ISA-extension. PENTIUMREAL includes instructions introduced with Pentium and valid in REAL mode.
*isa-set-PPRO ISA "set" is a categorization of instructions in the BASE ISA-extension. PPRO includes instructions introduced with the PentiumPro.
*lock_prefix Instructions with a 0xF0 LOCK prefix
*rep_prefix Instructions with a 0xF3 REP prefix
*repne_prefix Instructions with a 0xF2 REPNE prefix
*osz_prefix Instructions with a 0x66 prefix
*rex_prefix Instructions with a REX prefix (includes the following 4 cases). REX prefixes can be sued without any of the following 4 bits set as well.
*rexw_prefix Instructions with a REX prefix with the REX.W bit set
*rexr_prefix Instructions with a REX prefix with the REX.R bit set
*rexx_prefix Instructions with a REX prefix with the REX.X bit set
*rexb_prefix Instructions with a REX prefix with the REX.B bit set
*one-memops Instructions with one memory operation
*two-memops Instructions with two memory operations
*disp_only Instructions with a memory operation that addresses memory without using a base register or index register -- just a displacement.
*base_index Instructions with a memory operation that addresses meory using a base and index register, but without a displacement.
*base_index_disp Instructions with a memory operation that addresses memory using a base, index and displacement.
*scale_1 Number of instructions with a scale=1 for the index register
*scale_2 Number of instructions with a scale=2 for the index registern
*scale_4 Number of instructions with a scale=4 for the index register
*scale_8 Number of instructions with a scale=8 for the index register
*memdisp8 Memory operations with 8-bit displacements
*memdisp32 Memory operations with 32-bit displacements
Checking for Bad Pointers and Data Misalignment
Two of the more common errors when bringing up new code are (a) dereferencing bad pointers, either null pointers or pointing to inaccessible memory and (b) misaligned data accesses. Intel SDE has features to help identify these situations in programs.
The options for the pointer checker are:
-null_check [default 0]
Check memops for null references.
-null_check_out [default sde-null-check.out.txt]
Output file name for -null-check.
-ptr_breakpoint [default 0]
Make the ptr checker raise application break point on errors.
-ptr_check [default 0]
Wild pointer checker. Checks memops for accessibility.
-ptr_check_out [default sde-ptr-check.out.txt]
Output file name for -ptr-check.
-ptr_check_warn [default 0]
Make the ptr checker warn on errors. Default is do die on errors.
-ptr_raise [default 0]
Make the ptr checker raise exception on errors. Default is to do
PIN_SafeCopy on so that errors are ignored in analysis routines.
The alignment checker can give profiles of data alignment throught the program as well as when and where data accesses are misaligned.
-align_checker [default normal] Check for unaligned memory accesses mixing. Valid choices are: assert, warn, report, normal, or ignore. There are also assert-all, warn-all and report-all which watch all instructions, including those that do not require alignment. -align_checker_256b [default 0] Limit checker to only checking for 256b (32B) memory references. -align_checker_file [default sde-align-checker-out.txt] File name for messages about unaligned memory accesses. -align_checker_image [default ] Only check instructions in the named image -align_checker_prefetch [default 1] 1=check prefetches, 0=ignore prefetches. -align_checker_stderr [default 0] Attempt to write messages about unaligned data types to stderr. If disabled, then the output file is used. -align_correct [default 1] 1=Enabled, 0=Disable the alignment checker
Running the ASCII Tracing Tool
path-to-kit/sde -debugtrace -- user-application [args]
The output is written to a sde-debugtrace-out.txt file in the current directory by default. There are many options. Run 'sde -debugtrace -thelp' Pin tool option to see the choices. It prints the registers and flags modified by each instruction. It also prints the memory values read/written.
% sde -debugtrace -- il_aesdec.opt.vec.exe % cat sde-debugtrace-out.txt ... TID0: Read 7b5b5465_73745665_63746f72_5d53475d = *(UINT128*)0x7fffffffd500 TID0: INS 0x000000400b13 AVX vmovdqa xmm0, xmmword ptr [rsp+0x100] TID0: XMM0 := 7b5b5465_73745665_63746f72_5d53475d TID0: XMM0 := 1.62559e+286 1.23396e+171 (doubles) TID0: XMM0 := 1.13882e+36 1.93584e+31 4.50904e+21 9.51515e+17 (floats) TID0: Read 48692853_68617929_5b477565_726f6e5d = *(UINT128*)0x7fffffffd510 TID0: INS 0x000000400b1c AVX vmovdqa xmm1, xmmword ptr [rsp+0x110] TID0: XMM1 := 48692853_68617929_5b477565_726f6e5d TID0: XMM1 := 6.84853e+40 5.20343e+131 (doubles) TID0: XMM1 := 238753 4.25907e+24 5.61426e+16 4.74242e+30 (floats) TID0: INS 0x000000400b25 AVX vaesdec xmm0, xmm0, xmm1 TID0: XMM0 := 138ac342_faea2787_b58eb95e_b730392a TID0: XMM0 := 1.55269e-214 -1.02648e-50 (doubles) TID0: XMM0 := 3.50286e-27 -6.079e+35 -1.06338e-06 -1.05037e-05 (floats) TID0: INS 0x000000400b2a AVX vmovdqa xmmword ptr [rsp+0x120], xmm0 TID0: Write *(UINT128*)0x7fffffffd520 = 138ac342_faea2787_b58eb95e_b730392a TID0: Read 138ac342_faea2787_b58eb95e_b730392a = *(UINT128*)0x7fffffffd520 TID0: INS 0x000000400b33 AVX vmovdqu xmm0, xmmword ptr [rsp+0x120] TID0: XMM0 := 138ac342_faea2787_b58eb95e_b730392a TID0: XMM0 := 1.55269e-214 -1.02648e-50 (doubles) TID0: XMM0 := 3.50286e-27 -6.079e+35 -1.06338e-06 -1.05037e-05 (floats) TID0: INS 0x000000400b3c BASE lea rdi, ptr [r12+0x50bd40] | rdi = 0x50bd70 TID0: INS 0x000000400b44 BASE mov edx, 0x10 | rdx = 0x10 TID0: INS 0x000000400b49 BASE lea rsi, ptr [rsp+0xf0] | rsi = 0x7fffffffd4f0 TID0: INS 0x000000400b51 AVX vmovdqu xmmword ptr [rsi], xmm0 TID0: Write *(UINT128*)0x7fffffffd4f0 = 138ac342_faea2787_b58eb95e_b730392a TID0: INS 0x000000400b55 AVX vzeroupper TID0: INS 0x000000400b58 BASE call 0x400ec0 | rsp = 0x7fffffffd3f8
Using Intel® Transactional Synchronization Extensions (Intel® TSX)
Intel® TSX supports emulation of Restricted Transactional Memory (RTM) technology as of version 6.1.
The RTM options are as follows. You can see this in the long help output emitted when running "sde -long-help".
Intel(R) TSX (RTM and HLE) Options -rtm_abort_reason [default 0] Select RTM abort reason code. relevant only for abort mode -rtm_extended_abort_code [default 0] report extended abort cause codes -rtm_mode [default disabled] Select RTM mode from [disabled|abort|full|nop] -tsx [default 0] Enable TSX (RTM and HLE) functionality -tsx_cache_set_size [default 8] Number of cache lines in associative cache set (not necessarily power of 2) -tsx_cache_sets_num [default 64] Number of cache sets in cache -tsx_debug_log [default 0] define debug log details level, printed to the the log file -tsx_file_name [default sde-tsx-out.txt] TSX log file name -tsx_log_file [default 0] create a log file (does not necessarily fill it) -tsx_log_flush [default 0] flush data to the log file after each write -tsx_log_inst [default 0] print the instructions to the log file -tsx_ot_accuracy [default moderate] ownership table accuracy level : simple, moderate -tsx_ownership_size [default 16] log2(ownership table entries) default 16 -tsx_speculation_depth [default 7] Maximum speculation depth allowed -tsx_stats [default 0] Collect TSX statistics -tsx_stats_call_stack [default 0] Add call stack information to TSX statistics -tsx_stats_call_stack_size [default 10] Call stack size in TSX statistics -tsx_stats_file [default sde-tsx-stats.txt] TSX statistics file name -tsx_stats_max_abort [default 1000000] Maximum number of TSX statistics abort list -tsx_stats_top_most [default 10] Number of most common aborts TSX statistics to display
(As in all SDE knobs, you can use dashes instead of underscores.)
For RTM, there are four modes for rtm: disabled, abort, full, and nop. Disabled is the default because emulating RTM in software is very slow. The abort modes always aborts upon executing xbegin. Full is the RTM enabled mode. And NOP treats the RTM instructions like NOPs.
Intel SDE provides statistics about the use of RTM during the execution of your program:
#LIST OF RTM COUNTERS DATA PER THREAD #----------------------------------------------------------------------------- # TID XBEGIN XEND XABORT ABORTS 0 10 0 3 10 1 0 0 0 0 2 0 0 0 0 TOTAL 10 0 3 10 # COUNTERS OF TSX ABORTS PER ABORT REASON #----------------------------------------------------------------------------- # REASON RTM ABORTS HLE ABORTS ABORT_CONTENTION 0 22 UNFRIENDLY_INST 0 2 ABORT_SYNC_EXCEPTION 2 0 ABORT_CONTENTION 2 0 ABORT_XA_EXECUTED 3 0 CACHE_SET_FULL 1 0 UNFRIENDLY_INST 1 0 NESTING_TOO_DEEP 1 0 # TOP 10 GENERAL ABORTS #--------------------------------------------------------------------------- # IP COUNT INSTRUCTION DISASSEMBLY 0x0000000000e61048 4127 pause 0x0000000077351778 1483 syscall # TOP 10 CONTENTION ABORTS #--------------------------------------------------------------------------- # IP COUNT INSTRUCTION DISASSEMBLY 0x0000000000400be8 14 xchg dword ptr [rdx], eax 0x0000000000400b62 3 xrelease lock xchg dword ptr [rdx], eax 0x0000000000400a4e 2 xacquire lock xchg dword ptr [rdx], eax 0x0000000000400f5a 2 mov eax, dword ptr [rip+0x200c24] 0x0000000000400f95 1 mov dword ptr [rip+0x200be9], eax #LIST OF TSX GENERAL ABORT EVENTS #--------------------------------------------------------------------------------------------------- # TID IP DATA ADDRESS TSX TYPE REASON 0 0x0000000000400a99 0x0000000000000000 RTM ABORT_SYNC_EXCEPTION 0 0x0000000000400aab 0x0000000000000000 RTM ABORT_SYNC_EXCEPTION 0 0x0000000000400ab6 0x0000000000000000 RTM UNFRIENDLY_INST 0 N/A N/A RTM ABORT_XA_EXECUTED 0 N/A N/A RTM ABORT_XA_EXECUTED 0 0x0000000000400b3a 0x0000000000609f40 RTM CACHE_SET_FULL 0 N/A N/A RTM NESTING_TOO_DEEP 0 N/A N/A RTM ABORT_XA_EXECUTED
When transaction aborts, Intel SDE provides codes describing the causes of those aborts:
ABORT_SYNC_EXCEPTION - Failed to read\write from memory. ABORT_ASYNC_EXCEPTION - Got general exception like Interrupt or signal. ABORT_CONTENTION - Two threads colided. ABORT_XA_EXECUTED - XABORT instruction sent. CACHE_SET_FULL - Cache is full. UNFRIENDLY_INST - Unfriendly instruction for RTM. NESTING_TOO_DEEP - We reached RTM nesting limit, default 8. ABORT_LOCK_SPLIT_CACHE - Atomic instruction with split cache lines. ELIDED_LOCK_UPGRADE - Lock acquisition failed due to attempt to write an elided lock. ELIDING_WRITE_LOCK - Tried to put an elided lock on a line which is already modified speculatively. PARTIAL_ELISION_ACCESS - Found intersecting elided lock. ELIDING_AGAIN - Tried to elide the same lock twice. UNLOCK_DOESNT_RESTORE - Tried to release non matching elided lock. ELISION_CACHE_NOT_EMPTY - Elision cahce is not empty before commit. RTM_ABORT_HLE_RTM_MIX - Tried to start RTM transaction inside an HLE transaction.
Here are some simple examples of running with Intel TSX:
Running sde with full RTM support: % path-to-kit/sde -rtm-mode full -- application Running sde with both RTM and HLE support: % path-to-kit/sde -tsx -- application Running sde with TSX support and statistics: % path-to-kit/sde -tsx -tsx-stats -- application Running sde with TSX support and statistics and call stack information: % path-to-kit/sde -tsx -tsx-stats -tsx-stats-call-stack -- application
And here is a more complete example of running an RTM on Windows*:
volatile int winning_thread = -1; volatile int aborts = 0; volatile int num_ends = 0; unsigned __stdcall thread_worker(void * arg) { int id =(int) arg; unsigned int status = _xbegin(); if (status == _XBEGIN_STARTED) { for (int i=0; i<10000000; i++) winning_thread = id; num_ends++; _xend(); } else { aborts++; } return 0; } int main() { HANDLE threads[10]; for (int i=0; i<10; i++) threads[i] = (HANDLE) _beginthreadex(NULL, 0, &thread_worker, (void *)i, 0, NULL); for (int i=0; i<10; i++) WaitForSingleObject( threads[i], INFINITE ); return 0; } Compiling with a compiler from Intel: % icl /c /Foapplication.obj application.cpp % xilink.exe /MACHINE:X64 /LARGEADDRESSAWARE:NO /OUT:application.exe application.obj % path-to-kit/sde -tsx -- application.exe
Run XED Disassembler
Example:
path-to-kit/xed -i foo.exe > dis.txt
The above command writes dis.txt.
See the help message (-help) for many options.
XED prints the ISA extension for every instruction. This is useful for finding new instructions in your code. An example of the output is as follows:
% xed -i il_aesdec.opt.vec.exe > dis % cat dis SYM main: XDIS 400a40: PUSH BASE 55 push rbp XDIS 400a41: DATAXFER BASE 4889E5 mov rbp, rsp XDIS 400a44: LOGICAL BASE 4883E480 and rsp, 0xffffffffffffff80 XDIS 400a48: PUSH BASE 4154 push r12 XDIS 400a4a: PUSH BASE 4155 push r13 XDIS 400a4c: BINARY BASE 4881EC70010000 sub rsp, 0x170 XDIS 400a53: DATAXFER BASE BEFE9F9D00 mov esi, 0x9d9ffe XDIS 400a58: DATAXFER BASE BF03000000 mov edi, 0x3 XDIS 400a5d: CALL BASE E83E050000 call 0x400fa0 <__intel_new_feature_proc_init> XDIS 400a62: AVX AVX C5F8AE9C24F0000000 vstmxcsr dword ptr [rsp+0xf0] XDIS 400a6b: LOGICAL BASE 33C0 xor eax, eax XDIS 400a6d: LOGICAL BASE 818C24F000000040800000 or dword ptr [rsp+0xf0], 0x8040 XDIS 400a78: AVX AVX C5F8AE9424F0000000 vldmxcsr dword ptr [rsp+0xf0] XDIS 400a81: DATAXFER AVX C5FA6F15E7120000 vmovdqu xmm2, xmmword ptr [rip+0x12e7] XDIS 400a89: DATAXFER AVX C5FA6F0DEF120000 vmovdqu xmm1, xmmword ptr [rip+0x12ef] XDIS 400a91: DATAXFER AVX C5FA6F05F7120000 vmovdqu xmm0, xmmword ptr [rip+0x12f7] XDIS 400a99: MISC BASE 8D1400 lea edx, ptr [rax+rax*1] XDIS 400a9c: BINARY BASE FFC0 inc eax XDIS 400a9e: SHIFT BASE 48C1E204 shl rdx, 0x4 XDIS 400aa2: DATAXFER AVX C5FA7F9240395000 vmovdqu xmmword ptr [rdx+0x503940], xmm2 XDIS 400aaa: DATAXFER AVX C5FA7F8A407B5000 vmovdqu xmmword ptr [rdx+0x507b40], xmm1 XDIS 400ab2: DATAXFER AVX C5FA7F8240BD5000 vmovdqu xmmword ptr [rdx+0x50bd40], xmm0 XDIS 400aba: DATAXFER AVX C5FA7F9250395000 vmovdqu xmmword ptr [rdx+0x503950], xmm2 XDIS 400ac2: DATAXFER AVX C5FA7F8A507B5000 vmovdqu xmmword ptr [rdx+0x507b50], xmm1 XDIS 400aca: DATAXFER AVX C5FA7F8250BD5000 vmovdqu xmmword ptr [rdx+0x50bd50], xmm0 XDIS 400ad2: BINARY BASE 3D00020000 cmp eax, 0x200 XDIS 400ad7: COND_BR BASE 72C0 jb 0x400a99 ...
Debugging Emulated Code
Intel SDE provides support for debugging application with emulated code. A description on using the system debugger is available here.
Intel® AVX and Intel® SSE Transition Checking
It is recommended that a VZEROALL or a VZEROUPPER be inserted between code that uses Intel SSE and code that uses 256b Intel AVX instructions. Intel SDE can check for Intel SSE instructions followed by Intel AVX instructions without an intervening zeroing instruction and check in the reverse order.
- Use the -ast Pin tool knob
- Use the -oast filename to specify a filename other than avx-sse-transition.out. When using the sde driver, -oast implies -ast.
path-to-kit/sde -oast filename.out -- user-application [args]
Example:
Command:
% sde -ast -- mm_256_cmpouunord_ps.opt.vec.exe
Output: in "avx-sse-transition.out"
Dynamic Dynamic AVX to SSE SSE to AVX Static Dynamic BlockPC Transition Transitio Icount Executions Icount ================ ============ ============ ======== ========== ======== # TID 0 400993 1 0 16 1 16 4009f2 6 6 4 6 24 4009da 7 7 4 7 28 # SUMMARY # AVX_to_SSE_transition_instances: 14 # SSE_to_AVX_transition_instances: 13 # Dynamic_insts: 147841 # AVX_to_SSE_instances/instruction: 0.0001 # SSE_to_AVX_instances/instruction: 0.0001 # AVX_to_SSE_instances/100instructions: 0.0095 # SSE_to_AVX_instances/100instructions: 0.0088
In this case, the program counter locations implicated the isnan() calls in the following code:
source1 = _mm256_loadu_ps(s1);
source2 = _mm256_loadu_ps(s2);
dest=_mm256_cmpunord_ps(source1,source2);
_mm256_storeu_ps((float*) d, dest);
for (i = 0; i < 8; i++) {
if (isnan(s1[i]) || isnan(s2[i])) {
e[i] = -1;
}
else {
e[i] = 0;
}
}
Using Intel SDE for Program Record and Replay
Intel SDE now incorporates the pinplay technology for program record and replay.
Information on this technology is available in the pinplay article Program Record/Replay Toolkit
The basic command for creating a pinball (pinplay checkpoint) is:
% path-to-kit/sde -log -log:basename <dir>/<name> -- user-application [args]
The basic command for replaying a pinball (captured from a 64 bits application) is:
% path-to-kit/sde -replay -replay:basename <dir>/<name> -replay:addr_trans -- path-to-kit/intel64/nullapp
All the other Intel SDE knobs like -mix are available also in the replay mode.
Using Intel SDE for Emulating Control-flow Enforcement Technology
Intel control-flow enforcement technology (CET) was described in a technology preview in the ISA extensions page.
Intel SDE now provides a way to emulate the user-space aspects of this technology and the readiness of the software compiled with CET stack checks or CET indirect branch checks. Intel SDE supports running the application on existing hosts (Linux and Windows) and provides ways to reduce false reports due to running with the system legacy runtime libraries (which were not compiled with CET).
Instructions for running application under Intel SDE with control-flow enforcement technology are available in the article, Emulating Applications with Intel SDE and Control Flow Enforcement Technology.
System Configuration
Linux*
On an Ubuntu* system, the yama feature disables processes from using ptrace attach to the parent process. Intel SDE is using this feature to inject itself into the process. To disable yama on the system run the following as root:
# echo 0 > /proc/sys/kernel/yama/ptrace_scope
This change takes effect until the next reboot. To make this change permanent add it to the init scripts of the system.
Mac*
Intel SDE is using taskport API to inject itself to the application process (whether in attach mode or in launch mode). This results with a popup window to confirm that it is allowed to take control of another process. This happens only the first time that Intel SDE is used on a GUI session. However, when running on non-GUI sessions, (for example, the SSH session) the popup never appears and immediately fails. To resolve this issue, a one-time configuration is needed so that the operating does not display this popup and automatically confirms the takeover of the process.
You need to perform the following procedure:
System Integration Protection
System Integrity Protection is a security technology in macOS* El Capitan (10.11) and later, which restricts the root user account and limits the actions that the root user can perform on protected parts of the Mac operating system.
System Integrity Protection includes protection for these parts of the system:
In order to run Intel SDE on applications, which are protected by the system integrity, you must disable it.
For disabling/enabling SIP, follow the instructions at the end of How to Modify System Integrity Protection.
To learn more about system integrity policy and its impact, see About System Integrity Protection on Your Mac.
Code Sample: AES-128 Encryption and Decryption Routines
This sample code provides a set of C routines that demonstrate encryption and decryption routines using AES-128 in ECB mode. These samples are licensed under Intel Simplified Software License (Version August 2021).
Frequently Asked Questions
How do I download Intel SDE?
Intel SDE is available on the Download Page.
How do I ask questions and get support?
The Intel AVX and CPU Instructions Forum has been set up to address questions. Intel engineers will be monitoring and available to answer user questions.
Q: What are the system requirements?
Intel SDE runs on IA-32 or Intel® 64 processors running Windows or Linux or OS X operating systems.
What are the CPUID requirements?
Pin and Intel SDE require a Pentium® 4 processor or later.
What about running IA-32 architecture applications on Intel® 64 processor platforms?
This is supported.
What about precise SSE exception handling in MXCSR?
Intel SDE tries to accurately set the MXCSR exception flags, but unmasked floating point exceptions are not supported.
What happens when my program dereferences inaccessible memory for emulated instructions?
Intel SDE will crash. You can use sde -ptr-check -- your app to get a more verbose error message. Alternatively, you can use sde -trace-execute -- your app to get a dump of the instructions that ran in your task to find out what instruction was last run. Then you can use sde -debugtrace -- your app to look for the last write to the registers involved in the effective address computation for that last run instruction.
Does Intel SDE handle cygwin/cygdrive/c paths or symlinks?
No, because Pin does not handle them, Intel SDE cannot handle them.
Where can I learn more about Pin?
Pin - A Dynamic Binary Instrumentation Tool and the Groups.io group "pinheads". The release notes for Pin are available here. It provides additional information and restrictions about using Intel SDE, which is built on Pin.
Where can I learn more about mix and debugtrace?
The basic sources for debugtrace and mix are available in Pin kits on the Pin website. I've modified them slightly to invoke Intel SDE and to print out the new registers.
What are the licensing terms?
See the download page.
How can I get Intel SDE to work on Ubuntu 10 or later?
There is a known problem with using Intel SDE on Linux systems that prevents the use of ptrace attach via the sysctl /proc/sys/kernel/yama/ptrace_scope. In this case Pin is not able to use its default (parent) injection mode. To resolve this, run the following echo command as root. (SDE does not need to run as root.)
$ echo 0 > /proc/sys/kernel/yama/ptrace_scope
Primary Technology Contact
Ady Tal: Ady is a senior software engineer in Intel Software and Services Group. Ady joined Intel in 1996. Ady works on emulation of new instructions in support of the compiler, architecture and the enabling teams.
Additional Resources
- Dump the security configuration of the machine into a file by:
% security authorizationdb read system.privilege.taskport > /some/file
- Edit this file and replace the following key/value from 'true' to 'false':
<key>authenticate-user</key> <true/>
- Set the configuration back into the machine by:
% sudo security authorizationdb write system.privilege.taskport < /some/file
- Now the machine is ready.
- /System
- /usr
- /bin
- /sbin
- Apps that are pre-installed with OS X
Related Content
Pin - A Dynamic Binary Instrumentation Tool
GTPin - A Dynamic Binary Instrumentation Framework