Table of Contents

 
 Multiasm Project Logo

Project Information

This content was implemented under the following project:

Consortium Partners

 Consortium Partner's Logos

Erasmus+ Disclaimer
This project has been co-funded by the European Union.
Views and opinions expressed are, however, those of the author or authors only and do not necessarily reflect those of the European Union or the Foundation for the Development of the Education System. Neither the European Union nor the entity providing the grant can be held responsible for them.

Copyright Notice
This content was created by the MultiASM Consortium 2023–2026.
The content is copyrighted and distributed under CC BY-NC Creative Commons Licence and is free for non-commercial use.

CC BY-NC

In case of commercial use, please get in touch with MultiASM Consortium representative.

Introduction

This manual is intended to help students bootstrap into assembler programming across a variety of applications. It presents practical exercises in a hands-on lab format, often also covering toolchain configuration. Some sections present details for hardware, such as remote IoT and remote ARM laboratories. Others assume the student owns or has access to the PC and can install software.

ARM and Mobiles

ARM processors are omnipresent, ranging from simple IoT devices to laptops, notebooks, and workstations.
For this reason, we had to select one technology to use for a practical introduction and experimentation.
To present both hardware interfacing and programming, the obvious choice is the Raspberry Pi. The following chapters present laboratory details and scenarios.

Follow the links below to the lab descriptions and scenarios:

SUT's ARM laboratory

RTU's ARM laboratory

Programming in Assembler for Embedded Systems

Assembler programming for embedded systems uses both on-site programming of devices connected directly to the development platform (usually via USB) and an integrated solution for IoT laboratories: VREL NextGen Software for remote experimentation.

Local development

Local development requires installing the development toolchain. A common scenario is to use Visual Studio Code, a compiler and, usually, a plugin dedicated to a selected platform, e.g. AVR Assembler Toolbox.

Remote development

Remote development uses a ready-made development platform accessible only via a web browser. The device is observable only via a live video stream, which introduces limitations to consider, such as latency and the lack of physical access to the device (e.g., pushing a reset button is impossible).
Users connect to the system using a web browser and develop software in the browser, compile it and inject it into the microcontroller, all remotely.

The following chapters present additional information on using the VREL NextGen remote labs system for assembler programming.

Introduction to the Arduino Uno programming in Assembler

The following chapter assumes that you are familiar with basic assembler operations for AVR microcontrollers. Below, we explain the most important construction elements and assembler instructions for manipulating the Arduino Uno's (figure 2) GPIOs, based on the ATmega328P microcontroller.

Figure 2: Arduino Uno development board

Template for the assembler code

Using plain assembler (not C++ + assembler) requires a specific construction of the application where the program is located (loaded) into memory exactly at 0x0000.

    .org 0x0000
    rjmp start
 
start:
...

It is common practice to use rjmp (relative jump), which makes is easier to place data before the start of the code. And it is a good “embedded” practice to keep it even, if it does not really jump, as in this example. Forgetting to put it may impact your programming experience later, when you decide to declare some data.

Memory Map

Location of the code (Flash) and data (SRAM) is assigned to the addressing space. It also impacts source code construction in assembler. The following image presents the ATMega328P memory map. When using fully manual memory control, e.g., when the source code does not use .section, it is necessary to explicitly tell the compiler where to place the source code, variables, and other memory-related components. Details are presented in figure 3 and discussed in the subsection that follows.

300
Figure 3: Arduinio Uno (ATMega218P) memory map

Source code needs to use explicit declarations to tell the GCC-AVR toolkit how to handle the contents: whether it is code or data, whether the variable is read-only, whether it should be stored across updates, and so on. There are two possible approaches to is: one is to use declarations ('.section'), the other is to manually handle addresses.
There are five .section declarations, as presented in table 1. Each of them does the job of .org <address> in a more elegant way: e.g. .section .data is equivalent to .org 0x800100 - one does not need to remember the addresses.

Table 1: Sections reference in AVR-GCC and ATMega328P
Section Content Location Volatile?
.text Instructions / Code Flash No
.data Initialized globals SRAM (from Flash) Yes
.bss Zeroed/Uninitialized globals SRAM Yes
.rodata Constants / Strings Flash No
.eeprom Long-term storage EEPROM No

We mentioned before that .section .data is equivalent to .org 0x800100. Why .org 0x800100 instead just .org 0x0100?
It is, because Flash, EEPROM, and SRAM all start at 0x0000 (see figure 3, and you need to tell the linker (via source code) which memory block you're referring to. Writing .org 0x0100 may be misleading - the compiler will assume it is located in flash instead of SRAM.
For this reason, the way the AVR-GCC toolchain (assembler and linker) handles Harvard Architecture in Arduino Uno (ATMega328P) is the use of virtual memory offsets: 0x000000 means it is Flash, 0x800100 means it is SRAM (built-in) and 0x810000 means it is EEPROM. Details are presented in table 2.

Table 2: ATMega328P AVR-GCC Virtual Memory Offsets and their real mapping in hardware
Memory Type GCC Internal Offset Hardware Address
Flash 0x000000 0x0000
SRAM 0x800000 0x0000
SRAM (Internal) 0x800100 0x0100
EEPROM 0x810000 0x0000
In some of the following chapters, we sometimes present a “naked” code with the use of .org 0x0100 for simplicity. It is only when the code contains no variables, and everything is stored in Flash.

To summarise briefly, the most common scenario is that the code is intended to land in Flash memory, while variables are in SRAM. Appropriate '.org' instructions ensure the correct placement of the following content. It is possible to write code without using sections, but that makes the code unnecessarily complicated. Whenever you use variable declarations, it is advised to use sections to make the code cleaner and easier to understand. If your code is as simple as setting a GPIO out and one does not use variables (everything is in flash), then you may abandon .section declarations.

The sample code below declares a 16-bit value named 'analogue_value' stored in SRAM (RAM). Note use of .section:

.section .data
.org 0x100              ; Set SRAM start address manually
analogue_value:
    .skip 2             ; 16-bit variable
 
.section .text
.org 0x0000
    rjmp main
 
main:
    ; sample values to store
    ldi r24, 0xFF       
    ldi r25, 0x03       
 
    ; store it to SRAM 
    sts analogue_value, r24
    sts analogue_value + 1, r25
 
loop:
    rjmp loop            ; Dummy loop

GPIO and Ports

The Arduino Uno exposes a number of GPIOs that can serve as binary inputs and outputs, analogue inputs, and many of them provide advanced, hardware-accelerated functions, such as UART, SPI, I2C, PWM, and ADC. In fact, not all of the pins on the development board are such “general-purpose”: some of them provide specific features, while others do not: there is no internal multiplexer, so functions such as UART, I2C, SPI, PWM and ADC are bound to particular GPIOs and cannot be changed.

On the hardware level, GPIO pins are grouped into 3 “ports” (figure 4), and it is how you can access them:

A bit in the port corresponds to a single GPIO pin, e.g. bit 5 (6th, zero-ordered) of PortB corresponds to GPIO D13 and is also connected to the built-in LED.

Figure 4: Arduino ports
Some GPIOs have extra features (as presented on figure 4), such as hardware-accelerated PWM, I2C, Serial or SPI. PWM is useful for simulating an analogue output, e.g., to control LED brightness, as we show in the following sections.

IO Registers
Each Port has assigned three 8-bit registers (there are 9 in total then):

Core I/O registers and their IDs
To operate on I/O registers, the developer must either include a library with definitions or (when programming in pure assembler) declare them on their own.
Below there is a table 3 with a list of I/O registers used to control GPIO (Ports B, C and D) and their addresses:

Table 3: I/O registers and their addresses (IDs)
Name Address (I/O) Description
PINB 0x03 Input pins register (Port B)
DDRB 0x04 Data direction register (Port B)
PORTB 0x05 Output register/pull-up enable (Port B)
PINC 0x06 Input pins register (Port C)
DDRC 0x07 Data direction register (Port C)
PORTC 0x08 Output register/pull-up enable (Port C)
PIND 0x09 Input pins register (Port D)
DDRD 0x0A Data direction register (Port D)
PORTD 0x0B Output register/pull-up enable (Port D)

The easiest is to declare constants (converted to values at compile time) and insert them before the code starts (note that they do not exist in memory, so do not disturb code placement and proper execution):

; I/O registers
.equ PINB,  0x03
.equ DDRB,  0x04
.equ PORTB, 0x05
.equ PINC,  0x06
.equ DDRC,  0x07
.equ PORTC, 0x08
.equ PIND,  0x09
.equ DDRD,  0x0A
.equ PORTD, 0x0B
 
; your code starts here
    .org 0x0000
    rjmp start
 
start:
...
.equ is converted into a value and substituted in the code during compile: thus it does not exist in the final, compiled binary code.
Depending on the compiler you use, there are two standards of syntax. You can find the correct .equ PINB, 0x03 or .equ PINB = 0x03

Below are sections representing common usage scenarios for GPIO control.

GPIO Control Assembler Instructions

There is a set of assembler instructions that operate on Ports (I/O registers), as shown in table 4. Those instructions help to control each GPIO pin. They are handy for manually setting GPIO outputs to HIGH (1) or resetting them to LOW (0) each individually, or in a group (a single Port), all at once. They also help to check input values when GPIOs are configured as inputs. Some applications, however, use hardware acceleration beyond manual switching on and off: for example, a PWM signal can be generated using separate hardware-based mechanisms, as described further, which are far more precise than manually enabling and disabling a bit in a loop and do not load the CPU.

Assembler-level operations using ports are much faster than DigitalRead, DigitalWrite, and other instructions in C++, roughly 50 times faster.
Table 4: Common GPIO-related, I/O instructions
Instruction Description
SBI Set bit in register
CBI Clear bit in register
SBIS Skif if bit in register is set (1)
SBIC Skip if bit in register is clear (0)
IN Read hardware register to the general-purpose register (R0-R31)
OUT Write the general-purpose register to the hardware register.
ANDI Masks a bit
ORI Sets a bit

A common scenario for manual control of the GPIO pin is to first set either the GPIO is input or output (using the correct DDRx register), then either set (SBI), reset (CBI), check (SBIS, SBIC), read the whole register (IN) or write the whole register (OUT).

IN and OUT instructions operate on whole, 8-bit registers rather than on single bits. Those are general-purpose instructions that cover the entire range of IO registers (0-63), in addition to the aforementioned DDRx, PORTx, and PINx registers. Operating on multiple bits (8 bits) is faster than setting or reading them individually.

Code Examples

Below are common scenarios implemented in assembler that will help you to understand the code and start programming.

Use GPIO As Digital Output

In this scenario, we use GPIO as a digital output. The simplest is to use the built-in LED to get instantly observable results.
The built-in LED is connected to GPIO13 (D13) and is controlled via PortB (5th bit, zero-based indexing; see figure 4). The built-in LED is enabled in the LOW (0) state and off in the HIGH (1) state on GPIO13. It is also convenient to declare a bit number representing the built-in LED position in PortB, so instead of using a number, we can use an identifier, such as .equ PB5,5.

This code flashes the built-in LED.

.equ DDRB,  0x04
.equ PORTB, 0x05
.equ PB5, 5                 ; PB5 is GPIO 13, and it is a built-in LED
    .org 0x0000
    rjmp RESET

Step 1 - configure GPIO13 (PortB, bit 5) as output, using DDRB register:

RESET:
    ldi r16, 1 << PB5        ; Set bit 5
    out DDRB, r16            ; Set PB5 as output

Execute in a loop on and off, setting directly PortB's bit 5 with sbi and cbi.

LOOP:
    sbi PORTB, PB5           ; Turn LED off
    rcall delay
    cbi PORTB, PB5           ; Turn LED on
    rcall delay
    rjmp LOOP

This implementation of the delay is based on calculating the CPU cycles used to execute the following algorithm:

delay:
    ldi r20, 43     ; Outer loop
outer_loop:
    ldi r18, 250    ; Mid loop
mid_loop:
    ldi r19, 250    ; Inner loop
inner_loop:
    dec r19
    brne inner_loop
    dec r18
    brne mid_loop
    dec r20
    brne outer_loop
    ret

Instructions used in those loops are listed in the table 5, along with a number of cycles used:

Table 5: Selected AVR instruction timings
Instruction Cycles
ldi 1
dec 1
brne 2 (taken), 1 (not taken)
ret 4

Inner loop runs exactly 250 times. Thus, the exact number of cycles used is calculated as:

Total for this inner loop is then 750 clock cycles of the ATMEGA 328p MCU.

Mid-loop runs also 250 times. Each of 250 mid-loop passes uses:

Thus, at the level of mid-loop, the total cost of the algorithm consumes: 188250 cycles

The outer loop runs 43 times. It calls mid-loop 43 times, and the exact number of cycles used is:

The final cost of the loops is 8094879 cycles.
An extra 4 cycles is for the final ret.

Thus, the total cost of the delay section is 8 094 883 clock cycles.

ATMEGA 328p runs at 16 MHz; thus, each cycle takes 1/16000000 of a second.
Overall, the algorithm's execution time is 8094883/16000000, which is about 0.5s (506ms, to be clear). Not perfect, but good enough for this approach. Still, implementing delays this way is straightforward but also troublesome, and there are better solutions, such as using timers.

This kind of implementation of a delay function works, but it is troublesome. First of all, it is a blocking 'delay'; second, it is energy-inefficient; and, most of all, it is troublesome: you need to analyse your algorithm instruction-by-instruction and calculate the total number of ticks.

Use GPIO as Digital Input

GPIOs may be used as inputs, e.g., to check whether a button was pressed. A common scenario is that a button shorts to GND, which requires a pull-up resistor (either external or internal). For the internal pull-up, it is necessary to explicitly enable it in assembler code.

Configuring GPIO as input with pull-up is pretty simple:

Reading the value of a GPIO is as simple as reading the corresponding bit in the PINx register: when the GPIO is HIGH, the bit is 1; when the GPIO is LOW, the bit is 0. When the GPIO value controls the algorithm flow, it is more convenient (and faster and more memory-efficient) to use conditional jumps based on the PINx bit value, such as the SBIC instruction.

The example below shows an Arduino Uno with a button connected to GPIO 8, controlling the built-in LED connected to GPIO 13. On button press, the LED turns on; on release, it turns off. Contextual circuit schematic is presented in figure 5.

Figure 5: Circuit for GPIO input handling

Declare ports: both GPIOs are on PortB, bits 0 (GPIO 8) and 5 (GPIO 13) as presented in figure 4.

.equ DDRB,   0x04   ; Data Direction Register Port B
.equ PORTB,  0x05   ; Data Register Port B
.equ PINB,   0x03   ; Input Pins Address Port B
 
; Bit Positions
.equ PB0,    0      ; Pin 8 is Port B, Bit 0
.equ PB5,    5      ; Pin 13 is Port B, Bit 5
 
.section .text
.org 0x0000         
    rjmp main

Configure GPIO 13 as output, GPIO 8 as input, and enable the internal pull-up resistor for GPIO 8.

main:
    sbi   DDRB, PB5     ; Set PB5 (GPIO 13) as Output
    cbi   DDRB, PB0     ; Set PB0 (GPIO 8) as Input
    sbi   PORTB, PB0    ; Enable Internal Pull-up on PB0 (GPIO 8)

This section is a simple push-switch implementation. Instead of reading PB0 (GPIO 8), we use the sbic instruction, which tests a bit and branches based on its value: it bypasses the next instruction if PB0=0, thereby executing the section starting with the led_on label. Note that PB0=0 means that the button is pressed, not released.

loop:
 
    sbic  PINB, PB0     ; Skip next instruction if PB0 (GPIO 8) is LOW (Button Pressed)
    rjmp  led_off       ; If High (Not Pressed), go to led_off
 
led_on:
    sbi   PORTB, PB5    ; Set Pin 13 High
    rjmp  loop          ; Jump back to start of loop
 
led_off:
    cbi   PORTB, PB5    ; Set Pin 13 Low
    rjmp  loop          ; Jump back to start of loop

Use Serial Port for Tracing

The Arduino Uno has no direct debugging capabilities, such as step-by-step execution. To monitor program execution, tracing can be used. Here, there is no rich user interface, such as a display, however. One of the tracing methods is sending information via the serial port. It can then be visualised on a developer's computer using any serial port monitor tool.

UART uses two pins:

While it is possible to implement a full serial port protocol using GPIOs alone (so-called soft-serial), here we will use a hardware-implemented UART with several registers, as shown in the table 6.

Table 6: Serial port (UART) related registers
Register Address Official Name Common Name Bits Description
UDR0 0xC6 USART I/O Data Register Data register / TX-RX buffer 7:0 Write to transmit data, read to receive data
UCSR0A 0xC0 USART Control and Status Register A Status register RXC0, TXC0, UDRE0, FE0, DOR0, UPE0, U2X0, MPCM0 Status flags (ready, complete, errors, speed mode)
UCSR0B 0xC1 USART Control and Status Register B Control register RXCIE0, TXCIE0, UDRIE0, RXEN0, TXEN0, UCSZ02, RXB80, TXB80 Enable TX/RX, interrupts, 9-bit mode
UCSR0C 0xC2 USART Control and Status Register C Configuration / Frame register UMSEL01:0, UPM01:0, USBS0, UCSZ01:0, UCPOL0 Frame format (mode, parity, stop bits, data size)
UBRR0L 0xC4 USART Baud Rate Register Low Baud rate register (low) 7:0 Lower byte of baud rate divider
UBRR0H 0xC5 USART Baud Rate Register High Baud rate register (high) 3:0 Upper byte of baud rate divider

In the example below, we will use TX only to send data from the MCU to the developer's PC. Let's start with some declarations for registers used during serial transmission and flags:

.equ UBRR0H, 0xC5
.equ UBRR0L, 0xC4
.equ UCSR0A, 0xC0
.equ UCSR0B, 0xC1
.equ UCSR0C, 0xC2
.equ UDR0,   0xC6
 
.equ TXEN0, 3      ; bit 3 controls if UART is enabled or disabled
.equ UDRE0, 5      ; bit 5 indicates the transmit buffer is empty

Then let's define a message “Hello World”. Tailing bytes 13 and 10 are the Windows-standard end-of-line sequence, and the string is 0-terminated.

.org 0x0000
    rjmp reset
message:
    .byte 'H','e','l','l','o',' ','W','o','r','l','d',13,10,0

The following section initialises the serial port for 9600bps:

ldi r16, hi8(103)
sts UBRR0H, r16
ldi r16, lo8(103)
sts UBRR0L, r16

The 103 value is loaded into the UBRR register: the high byte into UBRR0H and the low byte into UBRR0L. Prescaler can be calculated according to the formula presented in figure 6.

Figure 6: UART prescaler equation

Where Fcpu is 16MHz for regular Arduino Uno (AtMega 328P). Note that this calculation yields ~9615 bps, not exactly 9600 bps. A tolerance of up to 2% is acceptable (here, it is 0.16%).

Next step is to enable UART:

ldi r16, (1 << TXEN0)
sts UCSR0B, r16

and configure frame format (8 bits, no parity, 1 stop bit, shortly 8N1 - the most common case):

ldi r16, (1 << TXEN0)
sts UCSR0B, r16

Now it is time to send the string to the transmitter, byte by byte. Pointer to the string is loaded to Z register (ZH i ZL respectively) using ldi. The string is processed character by character until it encounters 0 (the end of the string).

main:
    ldi ZH, hi8(message)     ; Load high byte of message address into ZH (Z pointer → flash)
    ldi ZL, lo8(message)     ; Load low byte of message address into ZL
 
send_loop:
    lpm r18, Z+              ; Load next byte from program memory (message) into r18, then increment pointer
    cpi r18, 0               ; Check end of string
    breq main                ; If the end of the string is reached, start sending the whole "Hello World" again

The next character can be loaded to the sending buffer only if the previous one is already been sent. The transmitter is ready for the next byte only when bit UDRE0 in register UCSR0A is set (1). If not, one needs to wait until it is transmitted. The next byte (character, letter) can be written to UDR0 then:

wait_udre:
    lds r19, UCSR0A          ; Load Serial port status register into r19
    sbrs r19, UDRE0          ; Check if buffer is ready to accept next byte
    rjmp wait_udre           ; If not ready, keep waiting
 
    sts UDR0, r18            ; Write character from r18 to UART data register (start transmission)
    rjmp send_loop           ; And process next character

Use of Timers to Generate PWM

Timers are handy for measuring time, waiting for a delay, or executing delayed tasks either once or periodically. The last one is very helpful for generating a PWM signal (a square wave with a controllable duty cycle) and thus controlling the amount of energy delivered to the externally connected device via the GPIO, e.g., to control an LED's brightness. It is somehow equivalent to an analogue output control.

The ATMega328P has 3 timers: one is high-precision (16-bit), and two are low-precision (8-bit). Details are presented in the table 7.
The timer counts “ticks”, where a “tick” can come either directly as a clock cycle (16 MHz) or comes through a prescaler to “slow it down”. Timers 0 and 1 share a common prescaler, and Timer 2 has an independent prescaler with more granularity. See table 7 for a list of valid prescalers for each timer, and table 8 for the frequency and period values for each prescaler. The general formula for timer speed is given by the following equation (figure 7):

Figure 7: Timer speed formula, based on prescaler value, for Arduino Uno working at 16MHz

Additionally, timer 2 has an extra feature: instead of using the internal clock, it can be clocked from an external 32768 kHz crystal oscillator and thus can work as an RTC.

Table 7: ATMega328P timers
Timer Size Channels & Pins (PWM) Valid prescallers Common Uses
Timer 0 8-bit Ch A: Pin 6, Ch B: Pin 5 1, 8, 64, 256, 1024 Used by Arduino for millis() and delay().
Timer 1 16-bit Ch A: Pin 9, Ch B: Pin 10 1, 8, 64, 256, 1024 High precision, long intervals, Servo control.
Timer 2 8-bit Ch A: Pin 11, Ch B: Pin 3 1, 8, 32, 64, 128, 256, 1024 Audio (tone) generation, Real-Time Clocks.
Table 8: Prescalers and related frequentues and periods for typical 16MHz clock
Prescaler Frequency Period (Tick Speed)
1 16 MHz 0.0625 µs
8 2 MHz 0.5 µs
32 (Timer 2 only) 500 kHz 2.0 µs
64 250 kHz 4.0 µs
128 (Timer 2 only) 125 kHz 8.0 µs
256 62.5 kHz 16.0 µs
1024 15.625 kHz 64.0 µs

The frequency is commonly represented as the number of ticks the timer counts per cycle and is referred to as the TOP value.

Each timer in the ATmega328P has 2 channels: A and B. Channels are hardwired to GPIO pins, and you cannot change their assignments. Channels share the same base frequency, but the duty cycle can be controlled separately for each channel.

Common Timer use is to generate a signal to control standard, analogue servomotors. They operate at 50Hz (20ms), and a common configuration for the Arduino Uno is to use Timer 1 (due to its 16-bit resolution, which affects duty-cycle generation accuracy). At 16MHz, a prescaler of 64 is used, so the timer runs at 250000 ticks per second.
To get 50Hz, we calculate TOP as:
TOP = 250000/50 = 5000 ticks.

Each timer has a number of registers, named “The Big Five”. Timer applications go far beyond generating a PWM signal and thus have complex configuration settings, but here we focus only on the PWM application and the use of Timer1. Note, however, that other timers (Timer0 and Timer2) have similar functions, composition and control, differ, e.g. in a number of registers, because in Timer1 you need to use two 8-bit registers (High part and Low part of the value) for each related setting, while in Timer0 and Timer2, you use just one 8-bit register. In the table 9, there is a list of registers, along with their purposes and meanings, and it is further explained below.

Table 9: ATMega328P Timer1 registers (The Big Five)
Register Name Size Full Name Role Meaning / Purpose
TCCR1A / TCCR1B 8-bit (each) Timer/Counter Control Register A & B The Manager Sets Mode, Pin behaviour, and Prescaler.
TCNT1 (H/L) 16-bit Timer/Counter Register 1 The Stopwatch Holds the actual live count (0 to TOP).
OCR1A (H/L) 16-bit Output Compare Register A The Trigger Defines the Duty Cycle (when the pin toggles).
ICR1 (H/L) 16-bit Input Capture Register 1 The Ceiling Defines the Frequency (the TOP value).
TIMSK1 / TIFR1 8-bit (each) Timer Interrupt Mask & Flag Register Notification Handles Interrupts and status flags.

The Manager
Those registers control timer behaviour and functions (refer to table 10:

The Stopwatch
This is for reading; it represents the timer's current “value”. Note that it may change very quickly (and asynchronously with the main code), but it is possible to write to it to enforce a cycle change, e.g., to perform a synchronisation.
As Timer1 is 16-bit, there are two registers, representing the upper (TCNT1H) and lower (TCNT1L) parts of the 16-bit value. Timer0 and Timer2, being 8-bit timers, have only a single Stopwatch register (TCNT0 and TCNT2, respectively).

The Trigger
Those registers store comparator values (values to compare against a Stopwatch). Again, for Timer0 and Timer2, there is one per timer, per channel (so 2 per timer: one for channel A and one for channel B); for Timer1, there are two per channel. E.g. for Timer1, channel A, register names are OCR1AH - the high part of a 16-bit value to compare Stopwatch against and OCR1AL to store the lower part. For channel B, those are OCR1BH and OCR1BL, respectively.

The Ceiling (TOP)
The TOP registers (also referred to as the Input Capture Register or Ceiling) define the maximum “capacity” of the Stopwatch register and thus, define the frequency. The timer simply counts from 0 up to the TOP value, and when it reaches TOP, it resets to 0 on the next tick.

Note that it is zero-based indexing and thus if the desired amount of ticks till overflow is e.g. 5000 ticks, the value of the TOP is 4999, not 5000.

Again, there are two registers for Timer1 (ICR1H, ICR1L - the high and low parts, respectively) and one for each Timer0 and Timer2.

The Notification
These registers are to control the timer-based interrupt notification system. We do not use interrupts for PWM; therefore, this description is omitted.

Table 10: The Manager register details - how to control the Timer1
Register Bit Name Value (Example) Description
TCCR1A 7 COM1A1 1 Compare Output Mode A bit 1: Set for Non-Inverting PWM.
6 COM1A0 0 Compare Output Mode A bit 0: Combined with bit 7 to control Pin 9.
5 COM1B1 0 Compare Output Mode B bit 1: Controls Pin 10 behaviour.
4 COM1B0 0 Compare Output Mode B bit 0: Combined with bit 5 to control Pin 10.
3 - 0 Reserved: Always write to 0.
2 - 0 Reserved: Always write to 0.
1 WGM11 1 Waveform Generation Mode bit 1: Part of Mode 14 selection.
0 WGM10 0 Waveform Generation Mode bit 0: Part of Mode 14 selection.
TCCR1B 7 ICNC1 0 Input Capture Noise Canceler: 1 enables a noise filter (used for sensors).
6 ICES1 0 Input Capture Edge Select: Selects trigger edge for capture (rising/falling).
5 - 0 Reserved: Always write to 0.
4 WGM13 1 Waveform Generation Mode bit 3: Part of Mode 14 selection.
3 WGM12 1 Waveform Generation Mode bit 2: Part of Mode 14 selection.
2 CS12 0 Clock Select bit 2: High bit of the Prescaler (gearbox).
1 CS11 1 Clock Select bit 1: Middle bit of the Prescaler.
0 CS10 1 Clock Select bit 0: Low bit of the Prescaler.

Bits WGM13, WGM12, WGM11 and WGM10 are to be analysed together: they form a 4-bit value representing a mode. Mode 14 is Fast PWM, so binary representation is 1,1,1,0 (WGM13, WGM12, WGM11, WGM10 respectively).
Bits CS define prescaler value as presented in table 11.

Table 11: CS bits and their meaning for prescaler definition for Timer1
CS12 CS11 CS10 Prescaler (Gear) Ticks per second (at 16MHz) Description
0 0 0 No Clock 0 Timer is stopped (Off).
0 0 1 clk/1 16,000,000 No division. 1 tick = 1 CPU cycle.
0 1 0 clk/8 2,000,000 Timer ticks once every 8 CPU cycles.
0 1 1 clk/64 250,000 Our choice for 50Hz.
1 0 0 clk/256 62,500 Used for medium-speed pulses.
1 0 1 clk/1024 15,625 Used for very slow events or long delays.
1 1 0 External T1 N/A Timer ticks on a falling edge of Pin D5.
1 1 1 External T1 N/A Timer ticks on a rising edge of Pin D5.
In the case of writing into the pair of timers' registers that represent the upper and lower parts of a 16-bit value (e.g. ICR1H and ICR1L), it is obligatory to write the higher part first, then the lower part.

To refer to the registers from the assembler code level, it is necessary to use their numbers. It is, however, more convenient to use register literals. A full list of timer-related registers is presented in the table 12.

Table 12: Timer-related registers
Timer Register Address Brief Description
Timer 0 (8-bit) TCCR0A 0x44 Control Reg A: Sets PWM mode and Pin behaviour.
TCCR0B 0x45 Control Reg B: Sets Prescaler (the gearbox).
TCNT0 0x46 Stopwatch: The actual 8-bit live count.
OCR0A 0x47 Trigger A: Duty Cycle for Pin 6.
OCR0B 0x48 Trigger B: Duty Cycle for Pin 5.
TIMSK0 0x6E Interrupt Mask: Enables timer-specific alarms.
TIFR0 0x35 Interrupt Flag: Shows if a timer event occurred.
Timer 1 (16-bit) TCCR1A 0x80 Control Reg A: Mode and Pin behaviour (Ch A & B).
TCCR1B 0x81 Control Reg B: Mode and Prescaler.
TCCR1C 0x82 Control Reg C: Force Output Compare bits.
TCNT1H 0x85 Stopwatch High: Bits 8-15 of the count.
TCNT1L 0x84 Stopwatch Low: Bits 0-7 of the count.
ICR1H 0x87 Ceiling High: Bits 8-15 of the frequency TOP.
ICR1L 0x86 Ceiling Low: Bits 0-7 of the frequency TOP.
OCR1AH 0x89 Trigger A High: Bits 8-15 of Duty Cycle Pin 9.
OCR1AL 0x88 Trigger A Low: Bits 0-7 of Duty Cycle Pin 9.
OCR1BH 0x8B Trigger B High: Bits 8-15 of Duty Cycle Pin 10.
OCR1BL 0x8A Trigger B Low: Bits 0-7 of Duty Cycle Pin 10.
TIMSK1 0x6F Interrupt Mask: Enables Timer 1 alarms.
TIFR1 0x36 Interrupt Flag: Shows Timer 1 status/events.
Timer 2 (8-bit) TCCR2A 0xB0 Control Reg A: Mode and Pin behaviour.
TCCR2B 0xB1 Control Reg B: Prescaler and Mode bits.
TCNT2 0xB2 Stopwatch: The actual 8-bit live count.
OCR2A 0xB3 Trigger A: Duty Cycle for Pin 11.
OCR2B 0xB4 Trigger B: Duty Cycle for Pin 3.
ASSR 0xB6 Asynchronous Status: Used for 32kHz watch crystals.
TIMSK2 0x70 Interrupt Mask: Enables Timer 2 alarms.
TIFR2 0x37 Interrupt Flag: Shows Timer 2 status/events.
System GTCCR 0x43 General Timer Control: Syncs/Resets all timers.

To use timers for PWM generation, one must configure the following (in order):

Example for the use of timers

The example below implements a standard servo PWM signal (50Hz) with a 10% duty cycle:

The code contains only a minimal set of register declarations used to control Timer1 for PWM. Note that in the code below, the timer, once configured, generates a PWM signal independently of CPU work. In the final loop, the CPU is doing nothing, just the dummy loop. All logic is controlled solely by a timer, asynchronously and externally to the code. The configuration process is presented in the figure 8.

Figure 8: Timer configuration steps
/* 
 * ATmega328P 50Hz PWM via Timer 1
 * No includes - Manual Address Mapping
 */
 
/* Register Addresses */
.equ DDRB,    0x24      /* Port B Direction Register */
.equ TCCR1A,  0x80      /* Control Register A */
.equ TCCR1B,  0x81      /* Control Register B */
.equ ICR1H,   0x87      /* TOP Value (High) */
.equ ICR1L,   0x86      /* TOP Value (Low) */
.equ OCR1AH,  0x89      /* Duty Cycle (High) */
.equ OCR1AL,  0x88      /* Duty Cycle (Low) */
 
.org 0x0000
    rjmp reset
 
reset:

Configure PIN9 (Timer1, channel A) as output.

    ; Configure PIN 9 as output (Timer1, channel A)
    ldi r16, (1 << 1)
    sts DDRB, r16

Preconfigure the TOP (register) of Timer1 to count from 0 to 4999 (0x1387), so it provides 5000 ticks per 20ms (50Hz) with a prescaler of 64.

    ; Set frequency to 50Hz
    ; Prescaler is 64, ICR1 (TOP) is set to 4999d=0x1387
    ldi r16, 0x13       ; High byte of 4999
    sts ICR1H, r16
    ldi r16, 0x87       ; Low byte of 4999
    sts ICR1L, r16

Preconfigure the trigger (comparator) so it flips the output on GPIO 9 when only the TOP reaches 500 (0x01F4), which is equivalent to 2ms (500 is 10% of 5000). The Timer1 instantly compares the TOP register with this trigger, and when the level of 500 is reached, it switches the output from 1 to 0. The other switch is handled automatically by Timer1 on TOP overflow.

    ; Set triggers (comparators) to 10% of TOP
    ; 500d=0x01F4 to OCR1A
    ldi r16, 0x01       ; High byte of 500
    sts OCR1AH, r16
    ldi r16, 0xF4       ; Low byte of 500
    sts OCR1AL, r16

Configure Timer1 to work in Mode 14 (Fast PWM, cyclical square wave with controllable duty cycle via triggers/comparators).

    ; Set timer to operate as Fast PWM (Mode 14): 
    ; Mode 14 -> WGM = 1110b=14d
    ; COM1A1 = 1 (Clear Pin on Match - Non-Inverting)
    ldi r16, (1 << 7) | (1 << 1)
    sts TCCR1A, r16

Set prescaler to 64 - it automatically starts Timer1

    ; Start timer with prescaler=64
    ; WGM13=1, WGM12=1, CS11=1, CS10=1
    ldi r16, (1 << 4) | (1 << 3) | (1 << 1) | (1 << 0)
    sts TCCR1B, r16

And then do nothing: this loop is a dummy; all work is handled by Timer1. CPU is ready to handle something else.

loop:
    rjmp loop           ; The CPU does nothing! 
                        ; The Timer1 hardware toggles the pin forever.
                        ; It is done asynchronously to the main code!

When connecting an oscilloscope to GPIO pin 9, the result is as presented in figure 9.

PWM signal observed on GPIO 9
Figure 9: PWM signal observed on GPIO 9
Registers TCCR1A and TCCR1B names may be confusing: A and B may suggest that they refer to channel A and channel B. It is NOT the case: both register control Timer1.

Reading analogue values

Reading from the analogue input is not as straightforward as with digital inputs. Built-in ADC converter uses 10-bit resolution, has 6 channels (A0-A5, respectively). It also uses a reference voltage (configurable) as 5V (power source), internal 1.1V source or external reference voltage, connected to Aref input pin.
Inputs are connected to the ADC through the multiplexer, so only one input can be serviced at a time (the ADC has only one channel). Switching inputs may render the first reading invalid due to the measurement method.
The low-level ADC register-based operations use the following formula to obtain an ADC value (figure 10, based on the input value Vgpio and the reference value Vref).

Figure 10: ADC value calculation based on the input voltage and reference voltage

Technically, inside ADC, there is a 34 pF capacitor that loads the input and discharges. For this reason, measuring high impedance can yield inaccurate readings, so the first ADC reading is commonly discarded, and the best practice is to take multiple measurements and calculate an average.

From the assembler developer's point of view, it is more important that the ADC readings is a value between 0 and 1023 (10-bit resolution), and to convert the ADC reading to the measured voltage on ADC input (Vinput), the following formula (figure 11) is valid:

Figure 11: Conversion formula from ADC reading to value represented in Volts
Note that the formula uses Vref: reference voltage. So one needs to know the current configuration is for ADC: whether Vref is power 5V, internal 1.1V, or an external provided with the use of the Aref pin, and pass the appropriate argument to the equation.

Analogue reading uses a complex setup of ADC-related registers as presented in table 13. ADC has a number of registers mapped to a memory area and accessible using the lds and sts instructions.

Table 13: ADC-related registers used for reading the analogue values of GPIOs
Register (Address) Bit Name Description
ADMUX (0x7C) 7 REFS1 Reference Selection Bit 1
6 REFS0 Reference Selection Bit 0 (01 = AVcc)
5 ADLAR Left Adjust Result (1 = Left, 0 = Right)
4 - Reserved
3 MUX3 Analog Channel Selection Bit 3
2 MUX2 Analog Channel Selection Bit 2
1 MUX1 Analog Channel Selection Bit 1
0 MUX0 Analog Channel Selection Bit 0 (0000 = A0)
ADCSRA (0x7A) 7 ADEN ADC Enable (Must be 1)
6 ADSC ADC Start Conversion (Write 1 to start)
5 ADATE ADC Auto Trigger Enable
4 ADIF ADC Interrupt Flag
3 ADIE ADC Interrupt Enable
2 ADPS2 ADC Prescaler Select Bit 2
1 ADPS1 ADC Prescaler Select Bit 1
0 ADPS0 ADC Prescaler Select Bit 0 (111 = by 128)
ADCSRB (0x7B) 7 - Reserved
6 ACME Analog Comparator Multiplexer Enable
5 - Reserved
4 - Reserved
3 - Reserved
2 ADTS2 ADC Auto Trigger Source Bit 2
1 ADTS1 ADC Auto Trigger Source Bit 1
0 ADTS0 ADC Auto Trigger Source Bit 0
ADCH (0x78) 15..8 ADC[9:0] 10-bit Result (ADCL first, then ADCH)
ADCL (0x79) 7..0
DIDR0 (0x7E) 5:0 ADC5D:ADC0D Digital Input Disable (1 = Disable Buffer)

An algorithm for reading an analogue value from a selected input is implemented as follows:

Last step is optional but highly recommended: if the input value is around 2.5V (mid between LOW and HIGH) and the input GPIO is still active as digital GPIO (parallel to analogue), the Arduino may start draining the power source heavily and also increase analogue signal noise due to frequent switching between LOW and HIGH.

The sample code below configures the ADC, reads from the A0 input, and stores the value as a 16-bit value in the adc_storage variable.

; --- Register Definitions (ATmega328P) ---
.equ ADCL,   0x78
.equ ADCH,   0x79
.equ ADCSRA, 0x7A
.equ ADCSRB, 0x7B
.equ ADMUX,  0x7C
.equ DIDR0,  0x7E
 
; --- Bit Definitions ---
.equ REFS0,  6      ; Reference selection bit 0
.equ ADEN,   7      ; ADC Enable
.equ ADSC,   6      ; ADC Start Conversion
.equ ADPS2,  2      ; Prescaler bit 2
.equ ADPS1,  1      ; Prescaler bit 1
.equ ADPS0,  0      ; Prescaler bit 0
 
; --- Data Segment ---
.section .data
.org 0x0100
adc_storage: .byte 2    ; Reserve 2 bytes in SRAM for the 10-bit result

Now the setup part: connect A0 to the ADC via a multiplexer, select the reference voltage as the power supply (5V), and set the conversion sampling speed using a prescaler (128, which gives 125kHz). Also, disable A0 as a digital GPIO to save energy and lower noise.

; --- Code Segment ---
.section .text
.global main
 
main:
    ; setup multiplexer to use A0 (0000) 
    ; and AVcc (powering voltage +5V) as reference
    ; It is done with the ADMUX register
    ldi r24, (1 << REFS0)
    sts ADMUX, r24
 
    ; setup prescaler and enable ADC.
    ; prescaler is 128 (16MHz/128 = 125kHz)
    ldi r24, (1 << ADEN) | (1 << ADPS2) | (1 << ADPS1) | (1 << ADPS0)
    sts ADCSRA, r24
 
    ; and disable A0 GPIO as a digital input (analogue still works)
    ; good practice
    ldi r24, 0x01
    sts DIDR0, r24

Start conversion by setting the ADSC bit (6) of ADSCRA to 1. ADC requires some time to read the value and complete the conversion (it is based on a capacitor); thus, when it is ready to read, the ADSC bit is cleared by the ADC hardware. Here, we do not use any interrupts, just dummy pulling.

loop:
    ; start ADC conversion
    lds r24, ADCSRA
    ori r24, (1 << ADSC)
    sts ADCSRA, r24
 
wait_adc:
    ; pull ADSC bit
    ; when conversion is ready, ADC clears this bit
    lds r24, ADCSRA
    sbrc r24, ADSC
    rjmp wait_adc

The converted value is stored in the ADCL and ADCH registers. And it is crucial to keep the reading order: low byte first, then high.

    ; read conversion result
    ; IMPORTANT: Read Low Byte first to lock the values
    lds r18, ADCL       ; r18 = Low Byte
    lds r19, ADCH       ; r19 = High Byte
 
    ; save to memory
    ldi r26, lo8(adc_storage)
    ldi r27, hi8(adc_storage)
    st X+, r18          ; Store low byte
    st X, r19           ; Store high byte
 
    rjmp loop           ; Repeat indefinitely
You always need to read the low byte of the result (ADCL), then the high (ADCH), not the opposite!

Speed vs Quality ADC converts an analogue value to its digital representation using a capacitor. Charging and discharging of the capacitor require time and depend on the impedance of the analogue signal's input source. The general rule says that the faster the conversion, the lower the quality and the higher the error ratio. Conversion speed can be controlled using a prescaler value (bits ADPS2, ADPS1, and ADPS0 of the ADCSRA register). The prescaler divides the clock frequency (16MHz) to slow down the conversion process. Prescaler value and related conversion speed and time is presented in table 14.

Table 14: Prescaler values and related conversion times
ADPS2 ADPS1 ADPS0 Division Factor ADC Clock (16 MHz) Clock Period (1/f)
0 0 0 2 8 MHz 0.125 µs
0 0 1 2 8 MHz 0.125 µs
0 1 0 4 4 MHz 0.25 µs
0 1 1 8 2 MHz 0.5 µs
1 0 0 16 1 MHz 1.0 µs
1 0 1 32 500 kHz 2.0 µs
1 1 0 64 250 kHz 4.0 µs
1 1 1 128 125 kHz 8.0 µs

The technical recommendation is to use up to 250kHz. Faster conversions will bring poor quality.

SUT AVR Assembler Laboratory Node Hardware Reference

Introduction

Each laboratory node is equipped with an Arduino Uno R3 development board, based on the ATmega328P MCU. It also has two extension boards:

There are 10 laboratory nodes. They can be used independently, but for collaboration, nodes are interconnected symmetrically, with GPIOs described in the hardware reference section below.

Hardware reference

The table 15 lists all hardware components and details. Note that some elements are accessible, but their use is not supported via the remote lab, e.g., buttons and a buzzer.
The node is depicted in the figure 12 and its interface visual schematic is presented in the figure 13. The schematic presents only components used in scenarios and accessible via the VREL NextGen environment (controllable and observable via video stream), omitting unused components such as buttons, a buzzer, and a potentiometer.

Figure 12: AVR (Arduino Uno) SUT Node
Figure 13: SUT node's visual interface components schematic
Table 15: AVR (Arduino Uno) SUT Node Hardware Details
Component ID Component Hardware Details (controller) Control method GPIOs (as mapped to the Arduno Uno) Remarks
D1 LED (red) direct via GPIO binary (0→on, 1→off) GPIO13
D2 LED (red) direct via GPIO binary (0→on, 1→off) GPIO12
D3 LED (red) direct via GPIO binary (0→on, 1→off) GPIO11
D4 LED (red) direct via GPIO binary (0→on, 1→off) GPIO10 shared with interconnection with another module
LED4 4x 7-segment display indirect, via two 74HC575 registers serial load to 2 registers, daisy-chained GPIO8 - serial input of the controller
GPIO7 - shift data internally, raising edge (write next bit and shift data in serial)
GPIO4 - reset display buffer

Communication

Devices (laboratory nodes) are interconnected in pairs, so it is possible to work in groups and implement scenarios involving more than one device:

Interconnections are symmetrical, so that device 1 can send data to device 2 and vice versa (similar to serial communication). Note that analogue inputs are also involved in the interconnection interface. See image 14 for details.

Figure 14: SUT AVR nodes interconnection diagram

The in-series resistors protect the Arduino boards' outputs from excessive current when both pins are configured as outputs with opposite logic states.

The capacitors on the analogue lines filter the PWM signal, providing a stable voltage for the analogue-to-digital converter to measure.

Table 16: AVR (Arduino Uno) SUT Node Interconnections
Arduino Uno pin name AVR pin name Alternate function Comment
D2 PD2 INT0 Interrupt input
D5 PD5 T1 Timer/counter input
D6 PD6 OC0A PWM output to generate analogue voltage
D9 PB1 OC1A Digital output / Timer output
D10 PB2 OC1B Digital output / Timer output
A5 PC5 ADC5 Analogue input

Such a connection makes it possible to implement a variety of scenarios:

Nodes are interconnected in pairs: 1-2, 3-4, 5-6, 7-8, 9-10. Scenarios for data transmission between MCUs require booking and the use of correct nodes for sending and receiving messages.

— MISSING PAGE —

Programming in Assembler for x64

Introduction to VS

Creating a project in VS with MASM source. Assembling, debugging, disassembly window, register view, memory view - data section,

Instroduction to Linux assembly programming

NASM