There are some errors in these tables. See the trace outputs in the Example Analyses section below for details.
The cycle counts shown for instructions in PM0044 section 7 are one less than the actual counts because the first decode cycle of an instruction normally overlaps with the last execution cycle of the preceding instruction.
Error/warning event reporting of stall cycles is available should timings be important in your application.
0> show error
  Error: non-classified [on/ON]
    [...]
    Error: stm8 [off/OFF]
      Warning: pipeline [unset/OFF]
        Warning: decode_stall [unset/OFF]
        Warning: fetch_stall [unset/OFF]
    [...]
  These are off by default but may be enabled as required either as a group:
0> set error pipelineor individually:
0> set error decode_stall on 0> set error fetch_stall on
The simulator is able to generate detailed analyses of execution showing timings for each instruction executed including pipeline overlaps and stalls. This is controlled via the pipetrace feature of the STM8 CPU module. The output is in the form of a self-contained HTML document that can be opened with a browser or imported into other application documentation.
To generate a pipeline analysis:
0> set hw cpu pipetrace title "..."
0> set hw cpu pipetrace style "url"
0> set hw cpu pipetrace start "path"
0> set hw cpu pipetrace fold [on|off]
0> set hw cpu pipetrace pause
0> set hw cpu pipetrace data "text"
0> set hw cpu pipetrace resume
0> set hw cpu pipetrace stop
These are taken from the examples in ST's “PM0044 Programming Manual” section “5.3 Pipelined execution examples” and are generated by the test ucsim/stm8.src/test/stm8-cycles using the “pipetrace”functionality described above.
Note that there are some errors in the examples in section 5.3. These are noted in the output below and the differences confirmed on HW.
The DIV instruction is special in that it takes a variable number of cycles and is interruptible.
Other instructions, each run individually starting from an empty pipeline and showing the overlap with the following instruction.
Actual cycle counts may be obtained from hardware for comparison using a combination of stm8-gdb, openocd and an STLink or other openocd/SWIM compatible debugger. Set the master and CPU clocks to be equivalent and use one of the target's timers to count cycles.
For instance:
$ openocd -f interface/stlink.cfg -f target/stm8s003.cfg &
$ stm8-gdb
[...]
(gdb) target extended-remote :3333
(gdb) set $DM_CSR2 = 0x7f99
(gdb) set $DM_ENFCTR = 0x7f9a
(gdb) set $CLK_CKDIVR  = 0x50c6
(gdb) set $CLK_PCKENR1 = 0x50c7
(gdb) set $TIM2_CR1  = 0x5300
(gdb) set $TIM2_EGR  = 0x5306
(gdb) set $TIM2_CNTRH = 0x530c
(gdb) set $TIM2_CNTRL = 0x530d
(gdb) set $TIM2_PSCR = 0x530e
(gdb) define cycles
    dont-repeat
    # Freeze TIM2 when CPU is stalled by DM
    set {unsigned char}$DM_ENFCTR = 0xfd
    # Set HSIDIV = 0, CPUDIV = 0
    set {unsigned char}$CLK_CKDIVR = 0x00
    # Set TIM2 prescalar to 0 so f_CK_CNT matches f_MASTER (and hence f_CPU)
    set {unsigned char}$TIM2_PSCR = 0x00
    # Clear count and update config
    set {unsigned char}$TIM2_EGR = 1
    set {unsigned char}$TIM2_CNTRH = 0xff
    set {unsigned char}$TIM2_CNTRL = 0xff
    # Enable counter
    set {unsigned char}$TIM2_CR1  = 0x01
    # Enable clock gate
    set {unsigned char}$CLK_PCKENR1 = 0x20
    # Set PC
    # N.B. Do not attempt to flush the decoder by writing to DM_CSR2. It upsets
    # openocd which is then unable to set breakpoints.
    set $pc = $arg0
    #set {unsigned char}$DM_CSR2 = 0x81
    # Set a HW breakpoint, run, then clear
    monitor bp $arg1 1 hw
    cont
    monitor rbp $arg1
    set $_tmp = {unsigned short}$TIM2_CNTR
    disass/r $arg0,$arg1
    printf "%u cycles\n", $_tmp
end
(gdb) document cycles
Set PC to the first address, set a HW break at the second address,
run and report how many cycles (as reported by $TIM2_CNTR) it took.
The target is assumed to be halted initially.
end
(gdb) monitor reset halt
target halted due to debug-request, pc: 0x00008000
(gdb) x/3i 0x811c
   0x811c:      ldw X,#0xfc00 ;0xfc00
   0x811f:      ld A,#0x80 ;0x80
   0x8121:      div X,A
(gdb) cycles 0x811c 0x8122
target halted due to debug-request, pc: 0x00008000
breakpoint set at 0x00008122
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00008122 in ?? ()
Dump of assembler code from 0x811c to 0x8122:
   0x0000811c:  ae fc 00        ldw X,#0xfc00 ;0xfc00
   0x0000811f:  a6 80   ld A,#0x80 ;0x80
   0x00008121:  62      div X,A
End of assembler dump.
14 cycles
  Don't forget that there will be an initial pipeline fetch cycle before the first instruction can be decoded, there may be stall cycles, multiple instructions (mostly) overlap by one cycle (which is assumed in the timings given by PM0044), and you may have interrupts that should be disabled.