Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
en:multiasm:papc:chapter_6_11 [2026/06/22 17:02] pczekalskien:multiasm:papc:chapter_6_11 [2026/06/22 17:06] (current) pczekalski
Line 153: Line 153:
  
 ===== SSE ===== ===== SSE =====
-The SSE is a large set of instructions that implement SIMD processing for floating-point calculations and increase the size and number of registers. The abbreviation SSE comes from the name Streaming SIMD Extensions. As the number of instructions introduced across all SSE versions exceeds a few hundred, we present a general overview of each SSE version and detailed information on selected interesting instructions.+The SSE is a large set of instructions that implement SIMD processing for floating-point calculations and increase the size and number of registers. The abbreviation SSE comes from the name Streaming SIMD Extensions. As the number of instructions introduced across all SSE versions exceeds a few hundred, we present a general overview of each SSE version and detailed information on selected instructions of interest.
 The first group of SSE instructions defines a new vector data type containing four single-precision floating-point numbers. It's easy to calculate that it requires the 128-bit registers. These new registers, named XMM0-XMM7, are distinct from previously implemented registers, so SSE floating-point operations do not conflict with MMX and FPU operations. The first group of SSE instructions defines a new vector data type containing four single-precision floating-point numbers. It's easy to calculate that it requires the 128-bit registers. These new registers, named XMM0-XMM7, are distinct from previously implemented registers, so SSE floating-point operations do not conflict with MMX and FPU operations.
 ==== Data transfer ==== ==== Data transfer ====
Line 302: Line 302:
 </figure> </figure>
  
-There are also advanced shuffle, insert, and extract instructions that make it possible to manipulate the positions of data of various types. The type of the data is specified with the suffix of the mnemonic: b - bytes, w - words, d - doublewords, q - quadwords, ps - single precision and pd - double precision elements. Although these instructions behave the same for the integer and floating-point data elements, formally, those operating with integers begin with the letter "P". A few examples are shown in the following figures. +There are also advanced shuffle, insert, and extract instructions that enable manipulation of the positions of data of various types. The type of the data is specified with the suffix of the mnemonic: b - bytes, w - words, d - doublewords, q - quadwords, ps - single precision and pd - double precision elements. Although these instructions behave the same for the integer and floating-point data elements, formally, those operating with integers begin with the letter "P". A few examples are shown in the following figures. 
  
-The blending instructions copy elements of vectors, mixing two sources into the destination. The **blendps**, **blendpd** and **pblendw** conditionally copy elements from vector X or Y. The mask is specified as the third, immediate value. The behaviour of **blendpd** is shown in fig. {{ref>sse4blendpd}}+The blending instructions copy elements of vectors, mixing two sources into the destination. The **blendps**, **blendpd** and **pblendw** conditionally copy elements from vector X or Y. The mask is specified as the third, immediate value. The behaviour of **blendpd** is shown in figure {{ref>sse4blendpd}}
 <figure sse4blendpd> <figure sse4blendpd>
 {{ :en:multiasm:cs:sse4blendpd.png?400 |Illustration of an example of packed blending instruction}} {{ :en:multiasm:cs:sse4blendpd.png?400 |Illustration of an example of packed blending instruction}}
Line 310: Line 310:
 </figure> </figure>
  
-The instructions **blendvps**, **blendvpd** and **pblendvb** operate similarly, but the condition is specified as the sign bit of the corresponding elements of the third implied argument stored in XMM0. The behaviour of **blendvpd** is shown in fig. {{ref>sse4blendvpd}}+The instructions **blendvps**, **blendvpd** and **pblendvb** operate similarly, but the condition is specified as the sign bit of the corresponding elements of the third implied argument stored in XMM0. The behaviour of **blendvpd** is shown in figure {{ref>sse4blendvpd}}
 <figure sse4blendvpd> <figure sse4blendvpd>
 {{ :en:multiasm:cs:sse4blendvpd.png?400 |Illustration of an example of packed blending instruction}} {{ :en:multiasm:cs:sse4blendvpd.png?400 |Illustration of an example of packed blending instruction}}
Line 316: Line 316:
 </figure> </figure>
  
-The set of extract instructions includes **pextrb**, **pextrw**, **pextrd**, **pextrq** and **extractps**. They take one element of the vector from the XMM register and store it in a CPU register or in memory. The offset of the element is specified with an immediate constant. The behaviour of **extractps** is shown in fig. {{ref>sse4extractps}}+The set of extract instructions includes **pextrb**, **pextrw**, **pextrd**, **pextrq** and **extractps**. They take one element of the vector from the XMM register and store it in a CPU register or in memory. The offset of the element is specified with an immediate constant. The behaviour of **extractps** is shown in figure {{ref>sse4extractps}}
 <figure sse4extractps> <figure sse4extractps>
 {{ :en:multiasm:cs:sse4extractps.png?400 |Illustration of an example of extract instruction}} {{ :en:multiasm:cs:sse4extractps.png?400 |Illustration of an example of extract instruction}}
Line 322: Line 322:
 </figure> </figure>
  
-The insert instructions are **pinsrb**, **pinsrd** and **pinsrq**. They operate in an opposite way to extract instructions. They take an element from memory or a general-purpose register and insert it into the XMM register at the position specified with a constant immediate. The behaviour of **pinsrd** is shown in fig. {{ref>sse4pinsrd}}+The insert instructions are **pinsrb**, **pinsrd** and **pinsrq**. They operate in an opposite way to extract instructions. They take an element from memory or a general-purpose register and insert it into the XMM register at the position specified with a constant immediate. The behaviour of **pinsrd** is shown in figure {{ref>sse4pinsrd}}
 <figure sse4pinsrd> <figure sse4pinsrd>
 {{ :en:multiasm:cs:sse4pinsrd.png?400 |Illustration of an example of an insert instruction}} {{ :en:multiasm:cs:sse4pinsrd.png?400 |Illustration of an example of an insert instruction}}
Line 366: Line 366:
  
 ===== AVX ===== ===== AVX =====
-AVX is the abbreviation of Advanced Vector Extensions. The AVX implements larger 256-bit YMM registers as extensions of XMM. In 64-bit processors, the number of YMM registers is increased to 16. Many SSE instructions are expanded to handle operations on new, larger data types without modifying the mnemonics. The most important improvement in the instruction set of x64 processors is the implementation of RISC-like instructions in which the destination operand can differ from two source operands. A three-operand SIMD instruction format is called the VEX coding scheme. The AVX2 extension implements additional SIMD instructions for operations on 256-bit registers. The AVX-512 extends the register size to 512 bits. An interesting, comprehensive description of a variety of x64 AVX instructions is available on website ((https://www.officedaytime.com/simd512e/)).+AVX is the abbreviation of Advanced Vector Extensions. The AVX implements larger 256-bit YMM registers as extensions of XMM. In 64-bit processors, the number of YMM registers is increased to 16. Many SSE instructions are extended to support operations on new, larger data types without changing the mnemonics. The most important improvement in the instruction set of x64 processors is the implementation of RISC-like instructions in which the destination operand can differ from two source operands. A three-operand SIMD instruction format is called the VEX coding scheme. The AVX2 extension implements additional SIMD instructions for operations on 256-bit registers. The AVX-512 extends the register size to 512 bits. An interesting, comprehensive description of a variety of x64 AVX instructions is available on the website ((https://www.officedaytime.com/simd512e/)).
en/multiasm/papc/chapter_6_11.txt · Last modified: by pczekalski
CC Attribution-Share Alike 4.0 International
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0