ULP coprocessor programming

Warning

ULP coprocessor programming approach described here is experimental. It is probable that once binutils support for ULP is done, this preprocessor-based approach may be deprecated. We welcome discussion about and contributions to ULP programming tools.

ULP coprocessor is a simple FSM which is designed to perform measurements using ADC, temperature sensor, and external I2C sensors, while main processors are in deep sleep mode. ULP coprocessor can access RTC_SLOW_MEM memory region, and registers in RTC_CNTL, RTC_IO, and SARADC peripherals. ULP coprocessor uses fixed-width 32-bit instructions, 32-bit memory addressing, and has 4 general purpose 16-bit registers.

ULP coprocessor doesn’t have a dedicated binutils port yet. Programming ULP coprocessor is possible by embedding assembly-like macros into an ESP32 application. Here is an example how this can be done:

const ulp_insn_t program[] = {
    I_MOVI(R3, 16),         // R3 <- 16
    I_LD(R0, R3, 0),        // R0 <- RTC_SLOW_MEM[R3 + 0]
    I_LD(R1, R3, 1),        // R1 <- RTC_SLOW_MEM[R3 + 1]
    I_ADDR(R2, R0, R1),     // R2 <- R0 + R1
    I_ST(R2, R3, 2),        // R2 -> RTC_SLOW_MEM[R2 + 2]
    I_HALT()
};
size_t load_addr = 0;
size_t size = sizeof(program)/sizeof(ulp_insn_t);
ulp_process_macros_and_load(load_addr, program, &size);
ulp_run(load_addr);

The program array is an array of ulp_insn_t, i.e. ULP coprocessor instructions. Each I_XXX preprocessor define translates into a single 32-bit instruction. Arguments of these preprocessor defines can be register numbers (R0 R3) and literal constants. See ULP coprocessor instruction defines section for descriptions of instructions and arguments they take.

Load and store instructions use addresses expressed in 32-bit words. Address 0 corresponds to the first word of RTC_SLOW_MEM (which is address 0x50000000 as seen by the main CPUs).

To generate branch instructions, special M_ preprocessor defines are used. M_LABEL define can be used to define a branch target. Label identifier is a 16-bit integer. M_Bxxx defines can be used to generate branch instructions with target set to a particular label.

Implementation note: these M_ preprocessor defines will be translated into two ulp_insn_t values: one is a token value which contains label number, and the other is the actual instruction. ulp_process_macros_and_load function resolves the label number to the address, modifies the branch instruction to use the correct address, and removes the the extra ulp_insn_t token which contains the label numer.

Here is an example of using labels and branches:

const ulp_insn_t program[] = {
    I_MOVI(R0, 34),         // R0 <- 34
    M_LABEL(1),             // label_1
    I_MOVI(R1, 32),         // R1 <- 32
    I_LD(R1, R1, 0),        // R1 <- RTC_SLOW_MEM[R1]
    I_MOVI(R2, 33),         // R2 <- 33
    I_LD(R2, R2, 0),        // R2 <- RTC_SLOW_MEM[R2]
    I_SUBR(R3, R1, R2),     // R3 <- R1 - R2
    I_ST(R3, R0, 0),        // R3 -> RTC_SLOW_MEM[R0 + 0]
    I_ADDI(R0, R0, 1),      // R0++
    M_BL(1, 64),            // if (R0 < 64) goto label_1
    I_HALT(),
};
RTC_SLOW_MEM[32] = 42;
RTC_SLOW_MEM[33] = 18;
size_t load_addr = 0;
size_t size = sizeof(program)/sizeof(ulp_insn_t);
ulp_process_macros_and_load(load_addr, program, &size);
ulp_run(load_addr);

Functions

esp_err_t ulp_process_macros_and_load(uint32_t load_addr, const ulp_insn_t *program, size_t *psize)

Resolve all macro references in a program and load it into RTC memory.

Return
  • ESP_OK on success
  • ESP_ERR_NO_MEM if auxiliary temporary structure can not be allocated
  • one of ESP_ERR_ULP_xxx if program is not valid or can not be loaded
Parameters
  • load_addr: address where the program should be loaded, expressed in 32-bit words
  • program: ulp_insn_t array with the program
  • psize: size of the program, expressed in 32-bit words

esp_err_t ulp_run(uint32_t entry_point)

Run the program loaded into RTC memory.

Return
ESP_OK on success
Parameters
  • entry_point: entry point, expressed in 32-bit words

Error codes

ESP_ERR_ULP_BASE 0x1200

Offset for ULP-related error codes

ESP_ERR_ULP_SIZE_TOO_BIG (ESP_ERR_ULP_BASE + 1)

Program doesn’t fit into RTC memory reserved for the ULP

ESP_ERR_ULP_INVALID_LOAD_ADDR (ESP_ERR_ULP_BASE + 2)

Load address is outside of RTC memory reserved for the ULP

ESP_ERR_ULP_DUPLICATE_LABEL (ESP_ERR_ULP_BASE + 3)

More than one label with the same number was defined

ESP_ERR_ULP_UNDEFINED_LABEL (ESP_ERR_ULP_BASE + 4)

Branch instructions references an undefined label

ESP_ERR_ULP_BRANCH_OUT_OF_RANGE (ESP_ERR_ULP_BASE + 5)

Branch target is out of range of B instruction (try replacing with BX)

ULP coprocessor registers

ULP co-processor has 4 16-bit general purpose registers. All registers have same functionality, with one exception. R0 register is used by some of the compare-and-branch instructions as a source register.

These definitions can be used for all instructions which require a register.

R0 0

general purpose register 0

R1 1

general purpose register 1

R2 2

general purpose register 2

R3 3

general purpose register 3

ULP coprocessor instruction defines

I_DELAY(cycles_) { .delay = {\ .opcode = OPCODE_DELAY, \ .unused = 0, \ .cycles = cycles_ } }

Delay (nop) for a given number of cycles

I_HALT { .halt = {\ .unused = 0, \ .opcode = OPCODE_HALT } }

Halt the coprocessor

I_END(wake) { .end = { \ .wakeup = wake, \ .unused = 0, \ .sub_opcode = SUB_OPCODE_END, \ .opcode = OPCODE_END } }

End program.

If wake == 1, wake up main CPU.

I_ST(reg_val, reg_addr, offset_) { .st = { \ .dreg = reg_val, \ .sreg = reg_addr, \ .unused1 = 0, \ .offset = offset_, \ .unused2 = 0, \ .sub_opcode = SUB_OPCODE_ST, \ .opcode = OPCODE_ST } }

Store value from register reg_val into RTC memory.

The value is written to an offset calculated by adding value of reg_addr register and offset_ field (this offset is expressed in 32-bit words). 32 bits written to RTC memory are built as follows:

  • 5 MSBs are zero
  • next 11 bits hold the PC of current instruction, expressed in 32-bit words
  • next 16 bits hold the actual value to be written

RTC_SLOW_MEM[addr + offset_] = { 5’b0, insn_PC[10:0], val[15:0] }

I_LD(reg_dest, reg_addr, offset_) { .ld = { \ .dreg = reg_dest, \ .sreg = reg_addr, \ .unused1 = 0, \ .offset = offset_, \ .unused2 = 0, \ .opcode = OPCODE_LD } }

Load value from RTC memory into reg_dest register.

Loads 16 LSBs from RTC memory word given by the sum of value in reg_addr and value of offset_.

I_WR_REG(reg, low_bit, high_bit, val) {.wr_reg = {\ .addr = reg & 0xff, \ .periph_sel = SOC_REG_TO_ULP_PERIPH_SEL(reg), \ .data = val, \ .low = low_bit, \ .high = high_bit, \ .opcode = OPCODE_WR_REG } }

Write literal value to a peripheral register

reg[high_bit : low_bit] = val This instruction can access RTC_CNTL_, RTC_IO_, and SENS_ peripheral registers.

I_RD_REG(reg, low_bit, high_bit, val) {.wr_reg = {\ .addr = reg & 0xff, \ .periph_sel = SOC_REG_TO_ULP_PERIPH_SEL(reg), \ .unused = 0, \ .low = low_bit, \ .high = high_bit, \ .opcode = OPCODE_RD_REG } }

Read from peripheral register into R0

R0 = reg[high_bit : low_bit] This instruction can access RTC_CNTL_, RTC_IO_, and SENS_ peripheral registers.

I_BL(pc_offset, imm_value) { .b = { \ .imm = imm_value, \ .cmp = B_CMP_L, \ .offset = abs(pc_offset), \ .sign = (pc_offset >= 0) ? 0 : 1, \ .sub_opcode = SUB_OPCODE_B, \ .opcode = OPCODE_BRANCH } }

Branch relative if R0 less than immediate value.

pc_offset is expressed in words, and can be from -127 to 127 imm_value is a 16-bit value to compare R0 against

I_BGE(pc_offset, imm_value) { .b = { \ .imm = imm_value, \ .cmp = B_CMP_GE, \ .offset = abs(pc_offset), \ .sign = (pc_offset >= 0) ? 0 : 1, \ .sub_opcode = SUB_OPCODE_B, \ .opcode = OPCODE_BRANCH } }

Branch relative if R0 greater or equal than immediate value.

pc_offset is expressed in words, and can be from -127 to 127 imm_value is a 16-bit value to compare R0 against

I_BXR(reg_pc) { .bx = { \ .dreg = reg_pc, \ .addr = 0, \ .unused = 0, \ .reg = 1, \ .type = BX_JUMP_TYPE_DIRECT, \ .sub_opcode = SUB_OPCODE_BX, \ .opcode = OPCODE_BRANCH } }

Unconditional branch to absolute PC, address in register.

reg_pc is the register which contains address to jump to. Address is expressed in 32-bit words.

I_BXI(imm_pc) { .bx = { \ .dreg = 0, \ .addr = imm_pc, \ .unused = 0, \ .reg = 0, \ .type = BX_JUMP_TYPE_DIRECT, \ .sub_opcode = SUB_OPCODE_BX, \ .opcode = OPCODE_BRANCH } }

Unconditional branch to absolute PC, immediate address.

Address imm_pc is expressed in 32-bit words.

I_BXZR(reg_pc) { .bx = { \ .dreg = reg_pc, \ .addr = 0, \ .unused = 0, \ .reg = 1, \ .type = BX_JUMP_TYPE_ZERO, \ .sub_opcode = SUB_OPCODE_BX, \ .opcode = OPCODE_BRANCH } }

Branch to absolute PC if ALU result is zero, address in register.

reg_pc is the register which contains address to jump to. Address is expressed in 32-bit words.

I_BXZI(imm_pc) { .bx = { \ .dreg = 0, \ .addr = imm_pc, \ .unused = 0, \ .reg = 0, \ .type = BX_JUMP_TYPE_ZERO, \ .sub_opcode = SUB_OPCODE_BX, \ .opcode = OPCODE_BRANCH } }

Branch to absolute PC if ALU result is zero, immediate address.

Address imm_pc is expressed in 32-bit words.

I_BXFR(reg_pc) { .bx = { \ .dreg = reg_pc, \ .addr = 0, \ .unused = 0, \ .reg = 1, \ .type = BX_JUMP_TYPE_OVF, \ .sub_opcode = SUB_OPCODE_BX, \ .opcode = OPCODE_BRANCH } }

Branch to absolute PC if ALU overflow, address in register

reg_pc is the register which contains address to jump to. Address is expressed in 32-bit words.

I_BXFI(imm_pc) { .bx = { \ .dreg = 0, \ .addr = imm_pc, \ .unused = 0, \ .reg = 0, \ .type = BX_JUMP_TYPE_OVF, \ .sub_opcode = SUB_OPCODE_BX, \ .opcode = OPCODE_BRANCH } }

Branch to absolute PC if ALU overflow, immediate address

Address imm_pc is expressed in 32-bit words.

I_ADDR(reg_dest, reg_src1, reg_src2) { .alu_reg = { \ .dreg = reg_dest, \ .sreg = reg_src1, \ .treg = reg_src2, \ .unused = 0, \ .sel = ALU_SEL_ADD, \ .sub_opcode = SUB_OPCODE_ALU_REG, \ .opcode = OPCODE_ALU } }

Addition: dest = src1 + src2

I_SUBR(reg_dest, reg_src1, reg_src2) { .alu_reg = { \ .dreg = reg_dest, \ .sreg = reg_src1, \ .treg = reg_src2, \ .unused = 0, \ .sel = ALU_SEL_SUB, \ .sub_opcode = SUB_OPCODE_ALU_REG, \ .opcode = OPCODE_ALU } }

Subtraction: dest = src1 - src2

I_ANDR(reg_dest, reg_src1, reg_src2) { .alu_reg = { \ .dreg = reg_dest, \ .sreg = reg_src1, \ .treg = reg_src2, \ .unused = 0, \ .sel = ALU_SEL_AND, \ .sub_opcode = SUB_OPCODE_ALU_REG, \ .opcode = OPCODE_ALU } }

Logical AND: dest = src1 & src2

I_ORR(reg_dest, reg_src1, reg_src2) { .alu_reg = { \ .dreg = reg_dest, \ .sreg = reg_src1, \ .treg = reg_src2, \ .unused = 0, \ .sel = ALU_SEL_OR, \ .sub_opcode = SUB_OPCODE_ALU_REG, \ .opcode = OPCODE_ALU } }

Logical OR: dest = src1 | src2

I_MOVR(reg_dest, reg_src) { .alu_reg = { \ .dreg = reg_dest, \ .sreg = reg_src, \ .treg = 0, \ .unused = 0, \ .sel = ALU_SEL_MOV, \ .sub_opcode = SUB_OPCODE_ALU_REG, \ .opcode = OPCODE_ALU } }

Copy: dest = src

I_LSHR(reg_dest, reg_src, reg_shift) { .alu_reg = { \ .dreg = reg_dest, \ .sreg = reg_src, \ .treg = reg_shift, \ .unused = 0, \ .sel = ALU_SEL_LSH, \ .sub_opcode = SUB_OPCODE_ALU_REG, \ .opcode = OPCODE_ALU } }

Logical shift left: dest = src << shift

I_RSHR(reg_dest, reg_src, reg_shift) { .alu_reg = { \ .dreg = reg_dest, \ .sreg = reg_src, \ .treg = reg_shift, \ .unused = 0, \ .sel = ALU_SEL_RSH, \ .sub_opcode = SUB_OPCODE_ALU_REG, \ .opcode = OPCODE_ALU } }

Logical shift right: dest = src >> shift

I_ADDI(reg_dest, reg_src, imm_) { .alu_imm = { \ .dreg = reg_dest, \ .sreg = reg_src, \ .imm = imm_, \ .unused = 0, \ .sel = ALU_SEL_ADD, \ .sub_opcode = SUB_OPCODE_ALU_IMM, \ .opcode = OPCODE_ALU } }

Add register and an immediate value: dest = src1 + imm

I_SUBI(reg_dest, reg_src, imm_) { .alu_imm = { \ .dreg = reg_dest, \ .sreg = reg_src, \ .imm = imm_, \ .unused = 0, \ .sel = ALU_SEL_SUB, \ .sub_opcode = SUB_OPCODE_ALU_IMM, \ .opcode = OPCODE_ALU } }

Subtract register and an immediate value: dest = src - imm

I_ANDI(reg_dest, reg_src, imm_) { .alu_imm = { \ .dreg = reg_dest, \ .sreg = reg_src, \ .imm = imm_, \ .unused = 0, \ .sel = ALU_SEL_AND, \ .sub_opcode = SUB_OPCODE_ALU_IMM, \ .opcode = OPCODE_ALU } }

Logical AND register and an immediate value: dest = src & imm

I_ORI(reg_dest, reg_src, imm_) { .alu_imm = { \ .dreg = reg_dest, \ .sreg = reg_src, \ .imm = imm_, \ .unused = 0, \ .sel = ALU_SEL_OR, \ .sub_opcode = SUB_OPCODE_ALU_IMM, \ .opcode = OPCODE_ALU } }

Logical OR register and an immediate value: dest = src | imm

I_MOVI(reg_dest, imm_) { .alu_imm = { \ .dreg = reg_dest, \ .sreg = 0, \ .imm = imm_, \ .unused = 0, \ .sel = ALU_SEL_MOV, \ .sub_opcode = SUB_OPCODE_ALU_IMM, \ .opcode = OPCODE_ALU } }

Copy an immediate value into register: dest = imm

I_LSHI(reg_dest, reg_src, imm_) { .alu_imm = { \ .dreg = reg_dest, \ .sreg = reg_src, \ .imm = imm_, \ .unused = 0, \ .sel = ALU_SEL_LSH, \ .sub_opcode = SUB_OPCODE_ALU_IMM, \ .opcode = OPCODE_ALU } }

Logical shift left register value by an immediate: dest = src << imm

I_RSHI(reg_dest, reg_src, imm_) { .alu_imm = { \ .dreg = reg_dest, \ .sreg = reg_src, \ .imm = imm_, \ .unused = 0, \ .sel = ALU_SEL_RSH, \ .sub_opcode = SUB_OPCODE_ALU_IMM, \ .opcode = OPCODE_ALU } }

Logical shift right register value by an immediate: dest = val >> imm

M_LABEL(label_num) { .macro = { \ .label = label_num, \ .unused = 0, \ .sub_opcode = SUB_OPCODE_MACRO_LABEL, \ .opcode = OPCODE_MACRO } }

Define a label with number label_num.

This is a macro which doesn’t generate a real instruction. The token generated by this macro is removed by ulp_process_macros_and_load function. Label defined using this macro can be used in branch macros defined below.

M_BL(label_num, imm_value) M_BRANCH(label_num), \ I_BL(0, imm_value)

Macro: branch to label label_num if R0 is less than immediate value.

This macro generates two ulp_insn_t values separated by a comma, and should be used when defining contents of ulp_insn_t arrays. First value is not a real instruction; it is a token which is removed by ulp_process_macros_and_load function.

M_BGE(label_num, imm_value) M_BRANCH(label_num), \ I_BGE(0, imm_value)

Macro: branch to label label_num if R0 is greater or equal than immediate value

This macro generates two ulp_insn_t values separated by a comma, and should be used when defining contents of ulp_insn_t arrays. First value is not a real instruction; it is a token which is removed by ulp_process_macros_and_load function.

M_BX(label_num) M_BRANCH(label_num), \ I_BXI(0)

Macro: unconditional branch to label

This macro generates two ulp_insn_t values separated by a comma, and should be used when defining contents of ulp_insn_t arrays. First value is not a real instruction; it is a token which is removed by ulp_process_macros_and_load function.

M_BXZ(label_num) M_BRANCH(label_num), \ I_BXZI(0)

Macro: branch to label if ALU result is zero

This macro generates two ulp_insn_t values separated by a comma, and should be used when defining contents of ulp_insn_t arrays. First value is not a real instruction; it is a token which is removed by ulp_process_macros_and_load function.

M_BXF(label_num) M_BRANCH(label_num), \ I_BXFI(0)

Macro: branch to label if ALU overflow

This macro generates two ulp_insn_t values separated by a comma, and should be used when defining contents of ulp_insn_t arrays. First value is not a real instruction; it is a token which is removed by ulp_process_macros_and_load function.

Defines

RTC_SLOW_MEM ((uint32_t*) 0x50000000)

RTC slow memory, 8k size