Boot Process

Bootloaders, Linker Scripts, and the Boot Process

You write code that does int main() and somehow it runs when the chip powers on. For most firmware engineers, that is the whole story. Then one day you need to add a bootloader, or place a buffer in DTCM RAM, or recover a brick, and the gap in your knowledge becomes a wall. This article is the wall-removal exercise.

We focus on ARM Cortex-M because it is the most common modern target and because the boot process is straightforward enough to fit in one article. RISC-V works similarly. AVR is simpler. Cortex-A boots through a multi-stage chain that deserves its own piece.

What happens when you press reset

flowchart TD Reset([Reset pressed or applied]) --> Power[Chip powers up,
internal POR completes] Power --> ReadVT[CPU reads address 0x00000000
= initial stack pointer] ReadVT --> ReadPC[CPU reads address 0x00000004
= reset handler address] ReadPC --> SetSP[Set SP register] SetSP --> Jump[Jump to reset handler] Jump --> Init[Reset_Handler runs:
copy .data, zero .bss,
initialise libc] Init --> Main[Call main] Main --> Loop([Your code runs])

The Cortex-M boot sequence. Six steps from power-on to your code running. Every step is fixed by the architecture; you only control the contents of the table at the start of flash.

Step by step:

  1. Power-on reset (POR) brings the chip out of an unpowered state. Voltage stabilises, internal clocks come up.
  2. The CPU is hard-wired to fetch the first 32-bit word of memory and load it into the stack pointer (SP). On Cortex-M, this is at address 0x00000000.
  3. It then fetches the second word (address 0x00000004) and jumps to that address. This word is the address of the reset handler.
  4. The reset handler is C (or assembly) code provided by your toolchain's startup file. It copies initialised globals from flash into RAM (the .data section), zero-fills the .bss section (uninitialised globals), and calls any constructors (for C++).
  5. The reset handler then calls main().
  6. main() is your code.

The interesting machinery is between steps 2 and 5. Everything else is fixed by the chip designers.

The vector table

The first 1024 bytes (or so) of flash on every Cortex-M chip is the vector table. It contains pointers to handler functions for every interrupt and exception the chip can generate. The first two entries are the initial stack pointer and the reset handler; the rest are interrupt handlers.

// Excerpt from a typical startup file (simplified)
extern uint32_t _estack;        // top of stack, defined in linker script

void Reset_Handler(void);
void NMI_Handler(void);
void HardFault_Handler(void);
// ... many more handlers

__attribute__((section(".isr_vector")))
const void* g_pfnVectors[] = {
    (void*)&_estack,         // 0x00000000: initial SP
    (void*)Reset_Handler,    // 0x00000004: reset
    (void*)NMI_Handler,      // 0x00000008: NMI
    (void*)HardFault_Handler,// 0x0000000C: hard fault
    // ... peripheral interrupts
};

The section attribute tells the linker to place this array at a specific spot in the output binary. The linker script then maps that section to address 0x00000000.

The linker script

A linker script is a small file in its own DSL that tells the linker where to place each section of the output binary. Most embedded projects have one, usually with a name like STM32F411xC_FLASH.ld. People treat it as magic; it is mostly a memory map.

/* Memory regions */
MEMORY
{
    FLASH (rx)  : ORIGIN = 0x08000000, LENGTH = 512K
    RAM (xrw)   : ORIGIN = 0x20000000, LENGTH = 128K
}

/* Top of stack at top of RAM */
_estack = ORIGIN(RAM) + LENGTH(RAM);

SECTIONS
{
    /* Vector table at start of flash */
    .isr_vector :
    {
        KEEP(*(.isr_vector))
    } > FLASH

    /* Code follows */
    .text :
    {
        *(.text*)
        *(.rodata*)
    } > FLASH

    /* Initialised data: stored in flash, copied to RAM at startup */
    _sidata = LOADADDR(.data);
    .data :
    {
        _sdata = .;
        *(.data*)
        _edata = .;
    } > RAM AT> FLASH

    /* Zero-initialised data */
    .bss :
    {
        _sbss = .;
        *(.bss*)
        *(COMMON)
        _ebss = .;
    } > RAM
}

The linker script answers three questions for every byte of your program: where in flash does it live?, where in RAM (if anywhere) does it run from?, and what symbols mark its boundaries? The reset handler uses those boundary symbols (_sdata, _edata, etc.) to copy and zero the right ranges.

Customisations you might write a linker script for:

  • Reserve the first 32 KB of flash for a bootloader, place application at 0x08008000.
  • Place a fast-access buffer in DTCM RAM (Cortex-M7) instead of regular SRAM.
  • Pin a specific function to a known address so a bootloader can call it.
  • Allocate a region for non-volatile config that survives reset.

Bootloaders

A bootloader is a small program that runs first, decides what to do, and optionally hands off to a larger application. Three reasons to have one:

Field firmware updates

The bootloader can receive a new firmware image (over UART, USB, BLE, WiFi) and write it to flash, then jump to it. Without a bootloader, updating firmware requires physical access to the SWD pins. With one, you can update the device over the air or via USB.

Multiple application images

Two application slots: one running, one for the next update. The bootloader decides which to boot based on a flag. Failed updates can roll back to the previous version.

Recovery

If the application fails to boot N times, the bootloader can fall back to a known-good recovery firmware that lets you re-flash without bricking the device.

The bootloader handoff is straightforward:

void jump_to_application(uint32_t app_addr) {
    // Disable interrupts
    __disable_irq();

    // Set vector table to application's location
    SCB->VTOR = app_addr;

    // Read application's stack pointer (first word) and entry point (second)
    uint32_t app_sp = *((volatile uint32_t*)app_addr);
    uint32_t app_pc = *((volatile uint32_t*)(app_addr + 4));

    // Set SP, then jump
    __set_MSP(app_sp);
    void (*entry)(void) = (void(*)(void))app_pc;
    entry();
}

The application's startup code does its own initialisation as if it were running standalone. The bootloader just sets up the environment so the application can take over.

The DFU and USB-DFU protocols

STM32 chips ship with a USB DFU (Device Firmware Update) bootloader pre-programmed in ROM. Pull the BOOT0 pin high during reset and the chip enumerates as a USB DFU device. Tools like dfu-util can flash a binary directly without any special hardware. This is how most USB-equipped STM32 boards are reflashed when SWD is not available.

ESP32 chips have a similar mechanism: pull GPIO 0 to ground during reset to enter ROM bootloader mode, then esptool.py over the UART can flash. This is what every ESP32 development board does automatically with the auto-reset circuit.

What actually goes wrong

  • Stack pointer mis-set. Forgot to update _estack after changing RAM size; SP points outside RAM; first push crashes.
  • Vector table misalignment. Cortex-M requires the vector table to be aligned to its size rounded up to a power of two. Custom-placed tables that violate this fail silently.
  • BSS not zeroed. Custom startup code that forgets to zero .bss means uninitialised globals contain whatever was in RAM at boot — usually zero on cold boot, garbage on warm reset.
  • Data not copied. Same idea for .data: if you forget to copy initialised globals from flash to RAM, every const int x = 5; reads wrong.
  • Bootloader does not relocate VTOR. Application's interrupts go to the bootloader's old vector table instead of its own; weird interrupt behaviour ensues.
  • Watchdog enabled in bootloader, not fed by application. Application boots, runs briefly, then resets. Easy to miss because everything looks fine for a few seconds.

Frequently Asked Questions

Do I need to write my own bootloader?

Almost certainly not for hobby projects. Use the chip's ROM bootloader (DFU on STM32, esptool on ESP32) or a popular open-source one (MCUboot is the standard for Zephyr / nRF). Writing your own is a learning exercise; using a battle-tested one is the right move for production.

What is MCUboot?

An open-source secure bootloader for ARM Cortex-M and RISC-V. Supports image signing, encryption, two-slot updates, rollback. Used by Zephyr, Apache Mynewt, and increasingly elsewhere. If you need a serious bootloader, start there.

How big is the vector table?

On a Cortex-M with a typical chip, around 100–250 entries (4 bytes each), so 400 bytes to 1 KB. Some chips have a smaller table. The startup file generated by your toolchain matches the chip exactly.

Why does my linker complain about "region FLASH overflowed"?

Your binary is bigger than the flash region you defined. Either the chip has less flash than you specified, or your code grew too large. Check the linker output for the size summary; reduce code size or migrate to a larger chip.

Can I run code from RAM instead of flash?

Yes. Some sections (interrupt handlers, time-critical loops) benefit from being in RAM because flash access on some chips has wait states. Add an attribute (__attribute__((section(".ramfunc")))) and a section in the linker script that loads it into flash but maps it to RAM. The startup code copies it on boot.

Share your thoughts

Worked with this in production and have a story to share, or disagree with a tradeoff? Email us at support@mybytenest.com — we read everything.