Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
en:multiasm:paarm:chapter_5_10 [2026/06/21 21:32] pczekalskien:multiasm:paarm:chapter_5_10 [2026/06/21 21:34] (current) pczekalski
Line 84: Line 84:
 ''<fc #800000>FADD</fc> <fc #008000>V0</fc>.<fc #808000>4S</fc>, <fc #008000>V1</fc>.<fc #808000>4S</fc>, <fc #008000>V2</fc>.<fc #808000>4S</fc> <fc #6495ed>@ Add 4 single-precision floats from v1 and v2, store in v0.</fc>''\\ ''<fc #800000>FADD</fc> <fc #008000>V0</fc>.<fc #808000>4S</fc>, <fc #008000>V1</fc>.<fc #808000>4S</fc>, <fc #008000>V2</fc>.<fc #808000>4S</fc> <fc #6495ed>@ Add 4 single-precision floats from v1 and v2, store in v0.</fc>''\\
 ''<fc #800000>ADD</fc> <fc #008000>V0</fc>.<fc #808000>8H</fc>, <fc #008000>V1</fc>.<fc #808000>8H</fc>, <fc #008000>V2</fc>.<fc #808000>8H</fc> <fc #6495ed>@ Add 8 halfwords (16-bit integers) elementwise.</fc>''\\ ''<fc #800000>ADD</fc> <fc #008000>V0</fc>.<fc #808000>8H</fc>, <fc #008000>V1</fc>.<fc #808000>8H</fc>, <fc #008000>V2</fc>.<fc #808000>8H</fc> <fc #6495ed>@ Add 8 halfwords (16-bit integers) elementwise.</fc>''\\
-''<fc #800000>MUL</fc> <fc #008000>V0</fc>.<fc #808000>16B</fc>, <fc #008000>V1</fc>.<fc #808000>16B</fc>, <fc #008000>V2</fc>.<fc #808000>16B</fc> <fc #6495ed>@ Multiply 16-byte elementwise.</fc>''+''<fc #800000>MUL</fc> <fc #008000>V0</fc>.<fc #808000>16B</fc>, <fc #008000>V1</fc>.<fc #808000>16B</fc>, <fc #008000>V2</fc>.<fc #808000>16B</fc> <fc #6495ed>@ Multiply 16-byte elements elementwise.</fc>''
  
 Elementwise operations are performed by lane, where each element occupies its own lane. Like ''<fc #800000>ADD</fc> <fc #008000>V0</fc>.<fc #808000>8H</fc>, <fc #008000>V1</fc>.<fc #808000>8H</fc>, <fc #008000>V2</fc>.<fc #808000>8H</fc>''. The instruction specifies that there will be eight distinct 16-bit variables. The instruction is designed to operate on the integers, and the result is computed from eight 16-bit integers. In the picture below, it is visible which element from vector ''<fc #008000>V1</fc>'' is added to vector ''<fc #008000>V2</fc>'' and how the result is obtained in vector ''<fc #008000>V0</fc>''. Elementwise operations are performed by lane, where each element occupies its own lane. Like ''<fc #800000>ADD</fc> <fc #008000>V0</fc>.<fc #808000>8H</fc>, <fc #008000>V1</fc>.<fc #808000>8H</fc>, <fc #008000>V2</fc>.<fc #808000>8H</fc>''. The instruction specifies that there will be eight distinct 16-bit variables. The instruction is designed to operate on the integers, and the result is computed from eight 16-bit integers. In the picture below, it is visible which element from vector ''<fc #008000>V1</fc>'' is added to vector ''<fc #008000>V2</fc>'' and how the result is obtained in vector ''<fc #008000>V0</fc>''.
Line 137: Line 137:
 The instruction ''<fc #800000>LD1R</fc> {<fc #008000>V0</fc>.<fc #808000>S</fc>}, [<fc #008000>X1</fc>]'' loads one 32-bit value (integer or single-precision floating-point) in each of four vector lanes. All vector elements will contain the same value. The instruction ''<fc #800000>LD1R</fc> {<fc #008000>V0</fc>.<fc #808000>S</fc>}, [<fc #008000>X1</fc>]'' loads one 32-bit value (integer or single-precision floating-point) in each of four vector lanes. All vector elements will contain the same value.
  
-The instruction ''<fc #800000>LD1</fc> {<fc #008000>V0</fc>.<fc #808000>4S</fc>, <fc #008000>V1</fc>.<fc #808000>4S</fc>}, [<fc #008000>X1</fc>]'' loads eight 32-bit values into the lanes of first ''<fc #008000>V0</fc>'' and then on ''<fc #008000>V1</fc>''. All the data are loaded sequentially in the form x0, x1, x2, x3 and so on. Note that x here is meant for any chosen data type and that the data by itself are in sequential order. Similar operations are performed with ''<fc #800000>LD2</fc>'', ''<fc #800000>LD3</fc>'' and ''<fc #800000>LD4</fc>'' instructions; the data loaded into the vector register are interleaved. The ''<fc #800000>LD2</fc>'' instruction loads the structure of two elements into two vector registers. There must always be two vector registers identified. The structure in the memory may be in such order: x0, y0, x1, y1, x2, y2, … and using the LD2 instruction, the x part of the structure is loaded into the first pointed register and the y part into the second vector register. This instruction is used, for example, to process audio data, where x may be the left audio channel and y the right:\\+The instruction ''<fc #800000>LD1</fc> {<fc #008000>V0</fc>.<fc #808000>4S</fc>, <fc #008000>V1</fc>.<fc #808000>4S</fc>}, [<fc #008000>X1</fc>]'' loads eight 32-bit values into the lanes of first ''<fc #008000>V0</fc>'' and then on ''<fc #008000>V1</fc>''. All the data are loaded sequentially as x0, x1, x2, x3and so on. Note that x here is meant for any chosen data type and that the data by itself are in sequential order. Similar operations are performed with ''<fc #800000>LD2</fc>'', ''<fc #800000>LD3</fc>'' and ''<fc #800000>LD4</fc>'' instructions; the data loaded into the vector register are interleaved. The ''<fc #800000>LD2</fc>'' instruction loads the structure of two elements into two vector registers. There must always be two vector registers identified. The structure in the memory may be in such order: x0, y0, x1, y1, x2, y2, … and using the LD2 instruction, the x part of the structure is loaded into the first pointed register and the y part into the second vector register. This instruction is used, for example, to process audio data, where x may be the left audio channel and y the right:\\
 ''<fc #800000>LD2    </fc> {<fc #008000>V0</fc>.<fc #808000>4S</fc>, <fc #008000>V1</fc>.<fc #808000>4S</fc>}, [<fc #008000>X1</fc>]'' ''<fc #800000>LD2    </fc> {<fc #008000>V0</fc>.<fc #808000>4S</fc>, <fc #008000>V1</fc>.<fc #808000>4S</fc>}, [<fc #008000>X1</fc>]''
  
Line 145: Line 145:
 </figure> </figure>
  
-The ''<fc #800000>LD3</fc>'' instruction takes memory data as a three-element structure, such as an image's RGB data. For this instruction, three vector registers must be identified. Assuming the data in memory: r0, g0, b0, r1, g1, b1, r2, g2, b2,…, the r (red) channel will be loaded in the first identified register, the ‘g’ in the second and ‘b’ in the third, the last identified register.\\+The ''<fc #800000>LD3</fc>'' instruction loads memory data as a three-element structure, such as an image's RGB data. For this instruction, three vector registers must be identified. Assuming the data in memory: r0, g0, b0, r1, g1, b1, r2, g2, b2,…, the r (red) channel will be loaded in the first identified register, the ‘g’ in the second and ‘b’ in the third, the last identified register.\\
 ''<fc #800000>LD3    </fc> {<fc #008000>V0</fc>.<fc #808000>4S</fc>, <fc #008000>V1</fc>.<fc #808000>4S</fc>, <fc #008000>V2</fc>.<fc #808000>4S</fc>}, [<fc #008000>X1</fc>]'' ''<fc #800000>LD3    </fc> {<fc #008000>V0</fc>.<fc #808000>4S</fc>, <fc #008000>V1</fc>.<fc #808000>4S</fc>, <fc #008000>V2</fc>.<fc #808000>4S</fc>}, [<fc #008000>X1</fc>]''
  
en/multiasm/paarm/chapter_5_10.txt · Last modified: by pczekalski
CC Attribution-Share Alike 4.0 International
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0