Developing
for the Mitsubishi M32R/X/D targets
The
following documentation discusses the Mitsubishi M32R
processors.
Mitsubishi M32R/X/D processors
The following documentation discusses the Mitsubishi M32R/X/D
processors.
Compiler
support for M32R/X/D targets
For a list of
available generic compiler options, see GNU
CC command options and Option
summary for GCC in Using
GNU CC in GNUPro Compiler Tools.
The following
M32R/X/D-specific command-line options are supported.
Generate code for the M32R processor (including
M32R/D).
Generate code for the M32R/X processor.
Assume all objects live in the lower 16MB of
memory (so that their addresses can be loaded with the ld24
instruction), and assume all subroutines are reachable with the bl
instruction. This is the default.
The
addressability of a particular object can be set with the model
attribute in the source code. See M32R/X/D-specific
compiling attributes.
Assume objects may be anywhere in the 32 bit
address space (the compiler will generate seth/add3 instructions
to load their addresses), and assume all subroutines are reachable with
the bl instruction.
Assume objects may be anywhere in the 32 bit
address space (the compiler will generate seth/add3 instructions
to load their addresses), and assume subroutines may not be reachable with
the bl instruction (the compiler will generate the much slower
seth/add3/jl instruction sequence).
Disable use of the small data area. Variables
will be put into one of .data, bss, or .rodata
(unless the section attribute has been specified). This is the
default.
The
small data area consists of sections .sdata and .sbss.
Objects may be explicitly put in the small data area with the section
attribute using one of these sections.
Put small global and static data in the small
data area, but do not generate special code to reference them. This is
normally only used to build system libraries. It enables them to be used
with both -msdata=none and -msdata=use.
Put small global and static data in the small
data area, and generate special instructions to reference them.
Put global and static objects less than or equal
to num bytes into the small data or bss sections instead
of the normal data or bss sections. The default value of num
is 8.
The
-msdata option must be set to one of sdata or use
for this option to have any effect.
All
modules should be compiled with the same -G num value.
Compiling with different values of num may or may not
work; if it does not work, the linker will give an error message. Incorrect
code will not be generated.
Preprocessor
symbol issues for M32R/X/D targets
By default, the compiler defines
the __M32R__ preprocessor symbol.
M32R/X/D-specific
compiling attributes
The following M32R/X/D-specific
attributes are supported. Names may be surrounded with double-underscores
to avoid namespace pollution. For example, __interrupt__ can also
be used for interrupt . See Declaring
attributes of functions and Specifying
attributes of variables in Extensions
to the C language family in Using GNU CC in GNUPro Compiler
Tools for more information.
Indicates the specified function is an interrupt handler.
The compiler will generate prologue and epilogue sequences
appropriate for an interrupt handler.
Use this attribute on the M32R/X/D to set the
addressability of an object, and the code generated for a function. The
identifier <model-name> is one of small,
medium, or large , representing each of the code
models.
Small
model objects live in the lower 16MB of memory (so that their addresses
can be loaded with the ld24 instruction), and are callable with
the bl instruction.
Medium
model objects may live anywhere in the 32 bit address space (the compiler
will generate seth/add3 instructions to load their addresses),
and are callable with the bl instruction.
Large
model objects may live anywhere in the 32 bit address space (the compiler
will generate seth/add3 instructions to load their addresses),
and may not be reachable with the bl instruction (the compiler
will generate the much slower seth/add3/jl instruction sequence).
ABI
summary for M32R/X/D targets
The following documentation
discusses the Application Binary Interface (ABI)
for the M32R/X/D processors.
Data
types and alignment for M32R/X/D targets
Table
1 shows the data type sizes for M32R/X/D processors.
Data
type sizes for M32R/X/D processors
The stack is aligned
to a four-byte boundary. One byte is used for characters (including structure/unions
made entirely of chars), two bytes for shorts (including
structure/unions made entirely of shorts), and four-byte alignment
for everything else.
Allocation
rules for structures and unions for M32R/X/D targets
The following rules apply
to the allocation of structure and union members in memory.
-
Structure and union packing can
be controlled by attributes specified in the source code. In the absence
of any attributes however, the following rules are obeyed:
-
Fields that are shorts are aligned
to 2 byte boundaries. Fields that are ints, longs, floats,
doubles
and long longs are aligned to 4 byte boundaries;
char
fields are not aligned.
-
Composite fields (ones that are
themselves structures or unions) are aligned to greatest alignment requirement
of any of their component fields. So if a field is a structure that contains
a char, a short and an int, the field will be aligned to a 4-byte boundary
because of the int.
-
Bit fields are packed in a big-endian
fashion, and they are aligned so that they will not cross boundaries of
their type. For instance, consider the following examples structure.
struct { int a:2, b:31;} s = { 0x1, 0x3};
Such input is stored
in memory as the following code example shows.
.zero 3
.byte 0x0
.byte 0x0
.byte 0x0
.byte 0x6
So
the a field is stored in the top two bits of the first byte;
with the most significant bit of a being stored in the most
significant bit of the byte. The bottom six bits of that byte and the next
three bytes are all padding, so that the next bitfield b does
not cross a word boundary.
Consider
the following examples structure.
struct { short c:2, d:2, e:13; } s = { 0x2, 0x3, 0xf};
Such
input is stored in memory as the following code example shows.
.byte 0xb0
.zero 1
.byte 0x0
.byte 0x78
So
c and d fields are both held in the same byte, but
the e field starts two bytes further on, so that it will not
cross a two byte boundary.
Fields in
unions are treated in the same way as fields in structures. A union is
aligned to the greatest alignment requirement of any of its members.
CPU
registers for M32R/X/D targets
The following registers
are specific to the M32R/X/D processors.
Used for passing arguments to functions. Additional
arguments are passed on the stack (see below). r0, r1
is also used to return the result of function calls. The values of these
registers are not preserved across function calls.
Temporary registers for expression evaluation.
The values of these registers are not preserved across function calls.
r4
is reserved for use as a temporary register in the prologue.
r6
is also reserved for use as a temporary in the Position Independent Code
(PIC) calling sequence (if ever necessary) and may not be used in the function
calling sequence or prologue of functions.
r7
is also used as the static chain pointer in nested functions (a GNU C extension)
and may not be used in the function calling sequence or prologue of functions.
In other contexts it is used as a temporary register.
Temporary registers for expression evaluation.
The values of these registers are preserved across function calls.
Temporary register for expression evaluation.
Its value is preserved across function calls. It is also reserved for use
as potential global pointer.
Reserved for use as the frame pointer if one
is needed. Otherwise it may be used for expression evaluation. Its value
is preserved across function calls.
Link register. This register contains the return
address in function calls. It may also be used for expression evaluation
if the return address has been saved.
Stack pointer.
This register is not preserved across function
calls.
The carry bit of the psw is not preserved
across function calls.
The
stack frame for M32R/X/D targets
Stack frames
for M32R/X/D processors use the following functionality.
-
The stack grows downwards from
high addresses to low addresses.
-
A leaf function need not allocate
a stack frame if it does not need one.
-
A frame pointer need not be allocated.
-
The stack pointer shall always
be aligned to 4-byte boundaries.
-
The register save area shall be
aligned to a 4-byte boundary.
Stack frames
for functions that take a fixed number of arguments use the definitions
and allocations shown in M32R/X/D
stack frames for functions that take a fixed number of arguments. The
frame pointer ( FP ) points
to the same location as the stack pointer ( SP
).
Stack frames
for functions taking a variable number of arguments use the definitions
and allocations shown in M32R/X/D
stack frames for functions that take a variable number of arguments.
The frame pointer ( FP ) points
to the same location as the stack pointer ( SP
).
Argument
passing for M32R/X/D targets
Arguments are passed to a
function using first registers and then memory if the argument passing
registers are used up. Each register is assigned an argument until all
are used. Unused argument registers have undefined values on entry. The
following rules must be adhered to.
-
An argument, if it is less than
or equal to 8 bytes in size, is passed in registers if available. However,
if such an argument is a composite structure (one with more than one field
and greater than 4 bytes in size) it is also passed on the stack, in addition
to being passed in the registers. An argument, which is greater than 8
bytes in size, is always passed by reference, which means that a copy of
the argument is placed on the stack and a pointer to that copy is passed
in the register.
-
If a data type would overflow
the register arguments, then it is passed in registers and memory. A long
long data type passed in r3 would be passed in r3
and in the first 4 bytes of the stack.
-
Arguments passed on the stack
begin at sp with respect to the caller.
-
Each argument passed on the stack
is aligned on a 4 byte boundary.
-
Space for all arguments is rounded
up to a multiple of 4 bytes.
Function
return values for M32R/X/D targets
Integers, floating point values,
and aggregates of 8 bytes or less are returned in register r0
(and r1 if necessary).
Aggregates larger
than 8 bytes are returned by having the caller pass the address of a buffer
to hold the value in r0 as an invisible first argument.
All arguments are then shifted down by one. The address of this buffer
is returned in r0.
Startup
code
for M32R/X/D targets
Before the main
function can be called, code must be run that does four things:
-
Contain symbol _start
-
Initialize the stack pointer
-
Zeros the bss section
-
Runs constructors for any global
objects that have them
The default
startup code is shown in the following example. It is part of the libgloss/m32r/crt0.S
file in the source tree. The best way to write your own startup code is
to take the following example and modify it to suit your needs.
.balign 4
.global _start
_start:
ld24 sp, _stack
ldi fp, #0
# Clear the BSS. Do it in two parts for efficiency: longwords first
# for most of it, then the remaining 0 to 3 bytes.
ld24 r2, __bss_start ; R2 = start of BSS
ld24 r3, _end ; R3 = end of BSS + 1
sub r3, r2 ; R3 = BSS size in bytes
mv r4, r3
srli r4, #2 ; R4 = BSS size in longwords (rounded down)
ldi r1, #0 ; clear R1 for longword store
addi r2, #-4 ; account for pre-inc store
beqz r4, .Lendloop1 ; any more to go?
.Lloop1:
st r1, @+r2 ; yep, zero out another longword
addi r4, #-1 ; decrement count
bnez r4, .Lloop1 ; go do some more
.Lendloop1:
and3 r4, r3, #3 ; get no. of remaining BSS bytes to clear
addi r2, #4 ; account for pre-inc store
beqz r4, .Lendloop2 ; any more to go?
.Lloop2:
stb r1, @r2 ; yep, zero out another byte
addi r2, #1 ; bump address
addi r4, #-1 ; decrement count
bnez r4, .Lloop2 ; go do some more
.Lendloop2:
# Run code in the .init section.
# This will queue the .fini section to be run with atexit.
bl __init
# Call main, then exit.
bl main
bl exit
# If that fails just loop.
.Lexit:
bra .Lexit
Producing
S-records for M32R/X/D targets
The following command reads
the contents of hello.x, converts the code and data into S-records,
and puts the result into hello.srec.
m32r-elf-objcopy
-O srec hello.x hello.srec
The following
example shows the first few lines of the resulting hello.srec
S-record.
S00D000068656C6C6F2E7372656303
S11801002D7F2E7F1D8FF000E0006DF4FE0000FEFE001B281F54
S11801158D2EEF2DEF1FCEEF1000006D00F000E20075C0E300C8
S118012A75F4032214835402610042FCF000B0840003216244B4
S118013FFFB094FFFF84C300034204F000B08400042102420148
Assembler
support for M32R/X/D targets
For a list of available generic
assembler options, see Command-line
options in Using
as in
GNUPro Utilities. In addition, the
following M32R/X/D-specific command-line options are supported.
Support the extended m32rx instruction set
Try to combine instructions in parallel (M32R/X
only)
-warn-explict-parallel-conflicts
-no-warn-explict-parallel-conflicts
Warn (or dont warn with -no-warn-explict-parallel-conflicts
or -Wnp ) when parallel instructions conflict. The default is
to issue the warning.
Warn (or dont warn with -no-warn-unmatched-high
or -Wnuh ) if a high or shigh relocation
has no matching low relocation. The default is no warning.
Syntax for
M32R/X/D is based on the syntax in Mitsubishis M32R Family Software
Manual.
The M32R/X/D
assembler supports ; (semi-colon) and # (pound).
Both characters are line comment characters when used in column zero. The
semi-colon may also be used to start a comment anywhere within a line.
Specify that two instructions are executed in
parallel by placing them on the same line, separated by ||.
Use the following examples input, for instance.
mv r1,r2 || mv r2,r1
These
two instructions are executed in parallel.
A new syntax
has been added to explicitly allow specifying two instructions executed
sequentially.
Specify that two instructions are executed sequentially
by placing them on the same line, separated by ->. This is useful
when assembling with optimization turned on and you explicitly want to
state that two instructions are to be executed sequentially and not in
parallel.
Use
the following examples input, for instance.
mv r1,r2 -> ld r1,@r2
The
mv r1,r2 instruction is first executed, and then the ld
r1,@r2 instruction is executed.
Register
names for M32R/X/D targets
You can use the predefined
symbols r0 through r15 to
refer to the M32R/X/D registers. You can also use sp as an alias
for r15, lr as an alias for r14 , and
fp as an alias for r13.
The M32R/X/D
also has predefined symbols for the control registers and status bits described
in Symbols and usage for M32R/X/D
processors.
Symbols
and usage for M32R/X/D processors
|
|
|
|
|
Processor status word
(alias for cr0)
|
|
Condition bit register
(alias for cr1)
|
|
Interrupt stack pointer
(alias for cr2)
|
|
User stack pointer (alias
for cr3)
|
|
Backup program counter
(alias for cr6)
|
Addressing
modes for M32R/X/D targets
The assembler understands
the following addressing modes for the M32R/X/D. The Rn
symbol in the following examples refers to any of the specifically numbered
registers or register pairs, but not the control registers.
Symbols
and addressing modes for the M32R/X/D processors
|
|
|
|
|
|
|
Register indirect with
post-increment
|
|
Register indirect with
post-decrement
|
|
Register indirect with
pre-decrement
|
|
Register indirect with
displacement
|
|
PC relative address (for
branch or rep)
|
|
|
Floating
point for M32R/X/D targets
Although the M32R/X/D has
no hardware floating point, the .float and .double
directives generate IEEE-format floating-point values for compatibility
with other development tools.
Pseudo
opcodes for M32R/X/D targets
M32R/X/D processors use one
pseudo opcode.
Create a <label> label
with the value of the next instruction that follows the pseudo opcode.
Unlike normal labels, the label created with .debugsym does
not force the next instruction to be aligned to a 32-bit boundary (in other
words, it does not generate a nop, if the previous instruction is a 16-bit
instruction, and the instruction that follows is also a 16-bit instruction).
Opcodes
for M32R/X/D targets
For detailed information
on the M32R/X/D machine instruction set, see M32R Family Software
Manual. The GNU assembler implements all the standard M32R/X/D
opcodes.
The assembler
does not support the :8 or :24 syntax for explicitly
specifying the size of the branch instruction. Instead, the assembler supports
the .s suffix to specify a short branch, and the .l
suffix to specify a long branch.
For example,
bra label:8 becomes bra.s label and bra label:24
becomes bra.l label.
The assembler
does not support the :8 or :16 syntax for explicitly
specifying the size of an immediate constant. Instead, the assembler supports
the ldi8 and ldi16mnemonics. For example, ldi
r0, 1:8 becomes ldi8 r0, 1 and ldi r0, 1:16
becomes ldi16 r0, 1.
Synthetic
instructions for M32R/X/D targets
Synthetic instructions are
aliases for existing instructions. They provide an additional and often
simpler way to specify an instruction.
Synthetic
instructions for M32R/X/D processors
|
|
|
|
|
|
|
|
|
bcl label [24
bit offset]
|
|
|
|
|
|
|
|
bnc label [24-bit
offset]
|
|
bncl label [8
bit offset]
|
|
bncl label [24
bit offset]
|
|
|
|
bra label [24-bit
offset]
|
|
ldi reg, #const [8-bit
constant]
|
|
ldi reg, #const [16-bit
constant]
|
|
|
|
|
Writing
assembler code for M32R/X/D targets
The best way to write assembler
code is to write a small C program, compile it with the -S flag,
and study the assembler code GCC produces.
The assembler
code in the following example (hello.s) is from the hello.c
example. It was created with m32r-elf-gcc -S -O2 hello.c. See
Using
as
in GNUPro Utilities for more information on GNU assembler
directives, or pseudo-opcodes.
See the M32R Family Software Manual for more information
on the instruction set, and syntax.
.section .rodata
.balign 4
.LC0:
.string"hello world!\n"
.balign 4
.LC1:
.string"%d + %d = %d\n"
.section .text
.balign 4
.globalmain
.type main,@function
main:
; BEGIN PROLOGUE ; vars= 0, regs= 2, args= 0, extra= 0
push r8
push lr
; END PROLOGUE
ld24 r8,#a
ldi r4,#3
st r4,@(r8)
ld24 r0,#.LC0
bl printf
ld24 r0,#.LC1
ld r1,@(r8)
ld24 r4,#c
ldi r2,#4
add3 r3,r1,#4
st r3,@(r4)
bl printf
; EPILOGUE
pop lr
pop r8
jmp lr
.Lfe1:
.size main,.Lfe1-main
.comma,4,4
.commc,4,4
.ident"GCC: (GNU) 2.7-m32r-970408"
To assemble
the hello.s file, use the following input.
m32r-elf-as
hello.s -o hello.o
The following
are some tips for assembler programmers.
-
To clear the CBR register,
just one instruction can be used:
Where Rx
, is an arbitrary register. Note the operation does not destroy the contents
of Rx. The previous code example is smaller than the following
code:
cmpi Rx,#0 total 6 bytes and destroys Rx.
-
To set the CBR register,
there are several methods. First, try using the following examples input.
ldi Rx,#-1
addv R0,R0 total 4 bytes
Alternatively,
try using the following examples input.
addx R0,R0 total 4 bytes
The
previous code examples are smaller than the following code example:
cmpi Rx,#1 total 6 bytes
-
To set a comparison result to
a register, there are some idioms for the M32R.
For instance, try
using the following examples input.
(a) ... flag = (x == 0);...
mvfc Rx,CBR total 4 byte
(b) ...flag = !(x op 0); ...
To
get the inverted result of comparison, first set CBR using one
of the methods above, then, try using the following examples input.
subx Rx,Rx
addi Rx,#1 total 4 byte
The
previous example will provide better results than than the following code.
mvfc Rx,CBR
xor3 Rx,Rx,#1 total 6-byte
Note:
The subx Rx,Rx
operation is equivalent to the following code.
neg Rx,Rx
M32R/X/D-specific
assembler error messages
The following error messages
may occur for M32R/X/D processors during assembly implementation.
The instruction is misspelled or there is a syntax
error somewhere.
Error:
expression too complex
Error:
unresolved expression that must be resolved
The instruction contains an expression that is
too complex; no relocation exists to handle it.
Error:
relocation overflow
The instruction contains an expression that is
too large to fit in the field.
Linker
support for M32R/X/D targets
For a list of available generic
linker options, see Linker
scripts in Using
ld
in
GNUPro Utilities. In addition, the following M32R/X/D-specific
command-line option is supported.
Specify the initial value for the stack pointer.
This assumes the application loads the stack pointer with the value of
_stack in the start up code.
The
initial value for the stack pointer is defined in the linker script with
the PROVIDE linker command. This allows the user to specify a
new value on the command line with the standard linker option --defsym.
Linker
script for M32R/X/D targets
The GNU linker uses a linker
script to determine how to process each section in an object file, and
how to lay out the executable. The linker script is a declarative program
consisting of a number of directives. For instance, the ENTRY ()
directive specifies the symbol in the executable that will be the executables
entry point. Since linker scripts can be complicated to write, the linker
includes one built-in script that defines the default linking process.
For the M32R/X/D tools, the following example shows the default script.
Although this
script is somewhat lengthy, it is a generic script that will support all
ELF situations. In practice, generation of sections like .rela.dtors
are unlikely when compiling using embedded ELF tools.
OUTPUT_FORMAT("elf32-m32r", "elf32-m32r", "elf32-m32r")
OUTPUT_ARCH(m32r)
ENTRY(_start)
SEARCH_DIR( <installation directory path> );
SECTIONS
{
/* Read-only sections, merged into text segment: */
. = 0x200000;
.interp : { *(.interp) }
.hash : { *(.hash) }
.dynsym : { *(.dynsym) }
.dynstr : { *(.dynstr) }
.rel.text : { *(.rel.text) }
.rela.text : { *(.rela.text) }
.rel.data : { *(.rel.data) }
.rela.data : { *(.rela.data) }
.rel.rodata : { *(.rel.rodata) }
.rela.rodata : { *(.rela.rodata) }
.rel.got : { *(.rel.got) }
.rela.got : { *(.rela.got) }
.rel.ctors : { *(.rel.ctors) }
.rela.ctors : { *(.rela.ctors) }
.rel.dtors : { *(.rel.dtors) }
.rela.dtors : { *(.rela.dtors) }
.rel.init : { *(.rel.init) }
.rela.init : { *(.rela.init) }
.rel.fini : { *(.rel.fini) }
.rela.fini : { *(.rela.fini) }
.rel.bss : { *(.rel.bss) }
.rela.bss : { *(.rela.bss) }
.rel.plt : { *(.rel.plt) }
.rela.plt : { *(.rela.plt) }
.init : { *(.init) } =0
.plt : { *(.plt) }
.text :
{
*(.text)
/* .gnu.warning sections are handled specially by
elf32.em. */
*(.gnu.warning)
*(.gnu.linkonce.t*)
} =0
_etext = .;
PROVIDE (etext = .);
.fini : { *(.fini) } =0
.rodata : { *(.rodata) *(.gnu.linkonce.r*) }
.rodata1 : { *(.rodata1) }
/* Adjust the address for the data segment. We want to
adjust up to the same address within the page on the
next page up. */
. = ALIGN(32) + (ALIGN(8) & (32 - 1));
.data :
{
*(.data)
*(.gnu.linkonce.d*)
CONSTRUCTORS
}
.data1 : { *(.data1) }
.ctors : { *(.ctors) }
.dtors : { *(.dtors) }
.got : { *(.got.plt) *(.got) }
.dynamic : { *(.dynamic) }
/* We want the small data sections together, so
single-instruction offsets can access them all, and
initialized data all before uninitialized, so we can
shorten the on-disk segment size. */
.sdata : { *(.sdata) }
_edata = .;
PROVIDE (edata = .);
__bss_start = .;
.sbss : { *(.sbss) *(.scommon) }
.bss : { *(.dynbss) *(.bss) *(COMMON) }
_end = . ;
PROVIDE (end = .);
/* Stabs debugging sections. */
.stab 0 : { *(.stab) }
.stabstr 0 : { *(.stabstr) }
.stab.excl 0 : { *(.stab.excl) }
.stab.exclstr 0 : { *(.stab.exclstr) }
.stab.index 0 : { *(.stab.index) }
.stab.indexstr 0 : { *(.stab.indexstr) }
.comment 0 : { *(.comment) }
/* DWARF debug sections.
Symbols in the .debug DWARF section are relative to the
beginning of the section so we begin .debug at 0. Its
not clear yet what needs to happen for the others. */
.debug 0 : { *(.debug) }
.debug_srcinfo 0 : { *(.debug_srcinfo) }
.debug_aranges 0 : { *(.debug_aranges) }
.debug_pubnames 0 : { *(.debug_pubnames) }
.debug_sfnames 0 : { *(.debug_sfnames) }
.line 0 : { *(.line) }
PROVIDE (_stack = 0x3ffffc);
}
Debugger
support for M32R/X/D targets
GDBs built-in software simulation
of the M32R/X/D processor allows the debugging of programs compiled for
the M32R/X/D without requiring any access to actual hardware. Activate
this mode in GDB by typing target sim
. Then load code into the simulator by typing load and debug
it in the normal fashion.
For the available
generic debugger options, see Debugging
with GDB in GNUPro Debugging Tools. There are no
M32R/X/D specific debugger command-line options.
Cygnus Insight
is the graphic user interface (GUI) for the GNUPro debugger. See Working
with Cygnus Insight, the visual debugger in GETTING STARTED.
Standalone
simulator for M32R/X/D targets
The simulator supports the
general-registers ( r0 to r15
), control-registers (psw , cbr , spi , spu
, and bpc ), and the accumulator. The simulator allocates a contiguous
chunk of memory starting at the 0 address. The default memory
size is 8 MB.
Three run-time
command-line options are available with the simulator: -t , -v
, and
-p .
Simulator cycle
counts are not intended to be extremely accurate in the following script
examples. Use them with caution.
-
The -t command-line
option to the stand-alone simulator turns on instruction level tracing
as shown in the following segment.
0x00011c ld24 sp,0x100000 dr <- 0x100000
0x000120 ldi fp,0 dr <- 0x0
0x000122 nop
0x000124 ld24 r2,0x75c0 dr <- 0x75c0
0x000128 ld24 r3,0x75f4 dr <- 0x75f4
0x00012c sub r3,r2 dr <- 0x34
0x00012e mv r4,r3 dr <- 0x34
0x000130 srli r4,0x2 dr <- 0xd
0x000132 ldi r1,0 dr <- 0x0
0x000134 addi r2,-4 dr <- 0x75bc
0x000136 nop
0x000138
. . .
The -v command-line
option prints some simple statistics.
hello world!
3 + 4 = 7
Total: 3808 insns
Fill nops: 609
The -p command prints
profiling statistics.
Hello world!
3 + 4 = 7
Instruction Statistics
Total: 3796 insns
add: 75: *****
add3: 123: ********
and: 3:
and3: 61: ****
or: 28: *
or3: 3:
addi: 222: ***************
bc8: 9:
bc24: 3:
beq: 23: *
beqz: 131: ********
bgez: 8:
bgtz: 2:
blez: 42: **
bltz: 6:
bnez: 252: *****************
bl8: 11:
bl24: 82: *****
bnc8: 52: ***
bne: 1:
bra8: 29: *
bra24: 9:
cmp: 28: *
cmpu: 34: **
cmpui: 2:
jl: 7:
jmp: 100: ******
ld: 93: ******
ld-d: 277: ******************
ldb: 77: *****
ldb-d: 6:
ldh-d: 38: **
ldub: 23: *
lduh-d: 23: *
ld-plus: 158: **********
ld24: 55: ***
ldi8: 163: ***********
ldi16: 5:
mv: 282: *******************
neg: 26: *
nop: 584: ****************************************
sll: 3:
sll3: 7:
slli: 25: *
srai: 25: *
srli: 35: **
st: 52: ***
st-d: 195: *************
stb: 27: *
stb-d: 4:
sth: 25: *
sth-d: 11:
st-plus: 13:
st-minus: 164: ***********
sub: 52: ***
trap: 2:
Memory Access Statistics
Total read: 1891 accesses
Total write: 491 accesses
QI read: 83: **
QI write: 31: *
HI read: 38: *
HI write: 36: *
SI read: 528: *****************
SI write: 424: **************
UQI read: 23:
UHI read: 23:
USI read: 1196: ****************************************
Model m32r/d timing information:
Taken branches: 532
Untaken branches: 237
Cycles stalled due to branches: 1064
Cycles stalled due to loads: 670
Total cycles (approx): 4946
Fill nops: 584
Overlays
for the M32R/X/D targets
Overlays are
sections of code or data, which are to be loaded as part of a single memory
image, but are to be run or used at a common memory address. At run time,
an overlay manager will copy the sections in and out of the runtime memory
address. This approach can be useful, for example, when a certain region
of memory is faster than another section.
A simple, portable
runtime overlay manager is provided in the examples directory.
To access the examples directory follow the instructions for installing
the entire source tree. The full path will be:
/usr/cygnus/m32r-<yymmdd>/src/examples
.
Replace <yymmdd>
with the release date found on the CD.
The sample
overlay manager may be used as is, or as a prototype to develop a 3rd party
overlay manager (or adapt an existing one for use with the GDB debugger).
It is intended to be extremely simple, easy to understand, but not particularly
sophisticated.
The overlay
manager has a single entry point as a function called OverlayLoad(ovly_number).
It looks up the overlay in a table called ovly_table to find
the corresponding sections load address and runtime address; then it copies
the section from its load address into its runtime address. OverlayLoad
must be called before code, or data in an overlay section can be used
by the program. It is up to the programmer to keep track of which overlays
have been loaded. The _ovly_table table is built by the linker from
information provided by the programmer in the linker script; see the example
with Linker script with
overlays for M32R/X/D targets.
The example
program contains four overlay sections, which are mapped into two runtime
regions of memory. Sections .ovly0 and .ovly1 are
both mapped into the region starting at 0x300000, and sections
.ovly2 and .ovly3 are both mapped into the region
starting at 0x380000.
Linker
script with overlays for M32R/X/D targets
To build a program with overlays
requires a customized linker script. Our example program is built with
the script m32rtext.ld, found in the examples/overlay
directory. This is just a modified version of the default linker script,
with two parts added.
The first added
part describes the overlay sections, and must be located in the SECTIONS
block, before the .text and .data sections. Here
we use the new linker command OVERLAY, which allows the specification
of groups of sections sharing a common runtime address range.
{
OVERLAY 0x300000 : AT (0x400000)
{
.ovly0 { foo.o(.text) }
.ovly1 { bar.o(.text) }
}
OVERLAY 0x380000 : AT (0x480000)
{
.ovly2 { baz.o(.text) }
.ovly3 { grbx.o(.text) }
}
[...]
The OVERLAY
command has two arguments: first, the base address where all of the overlay
sections link and run; second, the address where the first overlay section
loads.
In the example,
the .ovly1 section will load at 0x400000 + SIZEOF(.ovly0).
For a full description of the OVERLAY linker command, see Output
section type and Overlay
description in Linker
scripts in Using ld
in GNUPro Utilities.
The OVERLAY
command is really just a syntactic convenience. If you need finer control
over where the individual sections will be loaded, you can use the following
examples syntax.
SECTIONS
{
.ovly0 0x300000 : AT (0x400000) { foo.o(.text) }
.ovly1 0x300000 : AT (0x410000) { bar.o(.text) }
.ovly2 0x380000 : AT (0x420000) { baz.o(.text) }
.ovly3 0x380000 : AT (0x430000) { grbx.o(.text) }
[...]
The second
addition to the linker script actually builds the _ovly_table
table, which will be used by the sample runtime overlay manager. This table
has several entries for each overlay, and must be located somewhere in
the .data section:
{
[...]
_ovly_table = .;
LONG(ABSOLUTE(ADDR(.ovly0)));
LONG(SIZEOF(.ovly0));
LONG(LOADADDR(.ovly0));
LONG(0);
LONG(ABSOLUTE(ADDR(.ovly1)));
LONG(SIZEOF(.ovly1));
LONG(LOADADDR(.ovly1));
LONG(0);
LONG(ABSOLUTE(ADDR(.ovly2)));
LONG(SIZEOF(.ovly2));
LONG(LOADADDR(.ovly2));
LONG(0);
LONG(ABSOLUTE(ADDR(.ovly3)));
LONG(SIZEOF(.ovly3));
LONG(LOADADDR(.ovly3));
LONG(0);
_novlys = .;
LONG((_novlys - _ovly_table) / 16);
[...]
}
Example
overlay program for M32R/X/D targets
The example program has four
functions: foo , bar , baz , and grbx
. Each is in a separate overlay section. The foo and bar
functions are both linked to run at the 0x300000 address, while
the baz and grbx functions are both linked to run
at the 0x380000 address.
The main program
calls OverlayLoad once before calling each of the overlaid
functions, giving it the overlay number of the respective overlay. The
overlay manager, using the table _ovly_table, that was built
up by the linker script, copies each overlayed function into the appropriate
region of memory before it is called.
In order to
compile and link the example overlay manager, use the following examples
input.
m32r-elf-gcc -g -Tm32rdata.ld -oovlydata maindata.c ovlymgr.c
Debugging the overlay program for M32R/X/D
targets
Using GDBs
built-in overlay support, it is possible to debug this program even though
several of the functions share an address range. After loading the program,
give GDB the overlay auto command. GDB then detects the actions
of the overlay manager on the target, and can step into overlayed functions,
showing appropriate backtraces, etc. If a symbol is in an overlay that
is not currently mapped, GDB will access the symbol from its load address
instead of the mapped runtime address (which would currently be holding
something else from another overlay).
In the following
example, the foo and bar functions are in different
overlays which run at the same address. The example shows the use of GDBs
overlay debugging to step into and debug them.
Reading symbols from ovlydata...done.
(gdb) target sim
Connected to the simulator.
(gdb) load
Loading section .ovly0, size 0x28 lma 0x400000
Loading section .ovly1, size 0x28 lma 0x400028
Loading section .ovly2, size 0x28 lma 0x480000
Loading section .ovly3, size 0x28 lma 0x480028
Loading section .data00, size 0x4 lma 0x440000
Loading section .data01, size 0x4 lma 0x440004
Loading section .data02, size 0x4 lma 0x4c0000
Loading section .data03, size 0x4 lma 0x4c0004
Loading section .init, size 0x1c lma 0x208000
Loading section .text, size 0xa3c lma 0x20801c
Loading section .fini, size 0x14 lma 0x208a58
Loading section .rodata, size 0x24 lma 0x208a6c
Loading section .data, size 0x374 lma 0x208ab0
Loading section .ctors, size 0x8 lma 0x208e24
Loading section .dtors, size 0x8 lma 0x208e2c
Start address 0x20801c
Transfer rate: 30240 bits in <1 sec.
(gdb) overlay auto
(gdb) overlay list
No sections are mapped.
(gdb) info address foo
Symbol "foo" is a function at address 0x300000,
loaded at 0x400000 in overlay section .ovly0.
(gdb) info symbol 0x300000
foo in unmapped overlay section .ovly0
bar in unmapped overlay section .ovly1
(gdb) info address bar
Symbol "bar" is a function at address 0x300000,
loaded at 0x400028 in overlay section .ovly1.
(gdb) break main
Breakpoint 1 at 0x20839c: file maindata.c, line 12.
(gdb) run
Starting program: ovlydata
Breakpoint 1, main () at maindata.c:12
12 if (!OverlayLoad(0))
(gdb) next
14 if (!OverlayLoad(4))
(gdb) next
16 a = foo(1);
(gdb) overlay list
Section .ovly0, loaded at 00400000 - 00400028, mapped at 00300000 - 00300028
Section .data00, loaded at 00440000 - 00440004, mapped at 00340000 - 00340004
(gdb) info symbol 0x300000
foo in mapped overlay section .ovly0
bar in unmapped overlay section .ovly1
The overlay
containing the foo function is now mapped.
(gdb) step
foo (x=1) at foo.c:5
5 if (x)
(gdb) x /i $pc
0x300008 <foo+8>: ld r4, @fp || nop
(gdb) print foo
$1 = {int (int)} 0x300000 <foo>
(gdb) print bar
$2 = {int (int)} 0x400028 <*bar*>
GDB uses labels
such as <*bar*> (with asterisks) to distinguish overlay
load addresses from the symbols runtime address (where it will be when
used by the program).
(gdb) disassemble
Dump of assembler code for function foo:
0x300000 <foo>: st fp,@-sp -> addi sp,-4
0x300004 <foo+4>: mv fp,sp -> st r0,@fp
0x300008 <foo+8>: ld r4,@fp || nop
0x30000c <foo+12>: beqz r4,0x30001c <foo+28>
0x300010 <foo+16>: ld24 r4,0x340000 <foox>
0x300014 <foo+20>: ld r5,@r4 -> mv r0,r5
0x300018 <foo+24>: bra 0x300020 <foo+32> -> bra 0x300020 <foo+32>
0x30001c <foo+28>: ldi r0,0 -> bra 0x300020 <foo+32>
0x300020 <foo+32>: add3 sp,sp,4
0x300024 <foo+36>: ld fp,@sp+ -> jmp lr
End of assembler dump.
(gdb) disassemble bar
Dump of assembler code for function bar:
0x400028 <*bar*>: st fp,@-sp -> addi sp,-4
0x40002c <*bar+4*>: mv fp,sp -> st r0,@fp
0x400030 <*bar+8*>: ld r4,@fp || nop
0x400034 <*bar+12*>: beqz r4,0x400044 <*bar+28*>
0x400038 <*bar+16*>: ld24 r4,0x340000 <foox>
0x40003c <*bar+20*>: ld r5,@r4 -> mv r0,r5
0x400040 <*bar+24*>: bra 0x400048 <*bar+32*> -> bra 0x400048 <*bar+32*>
0x400044 <*bar+28*>: ldi r0,0 -> bra 0x400048 <*bar+32*>
0x400048 <*bar+32*>: add3 sp,sp,4
0x40004c <*bar+36*>: ld fp,@sp+ -> jmp lr
End of assembler dump.
Since the overlay
containing bar is not currently mapped, GDB finds bar
at its load address, and disassembles it there.
(gdb) finish
Run till exit from #0 foo (x=1) at foo.c:5
0x2083cc in main () at maindata.c:16
16a = foo(1);
Value returned is $3 = 324
(gdb) next
17if (!OverlayLoad(1))
(gdb) next
19if (!OverlayLoad(5))
(gdb) next
21b = bar(1);
(gdb) overlay list
Section .ovly1, loaded at 00400028 - 00400050, mapped at 00300000 - 00300028
Section .data01, loaded at 00440004 - 00440008, mapped at 00340000 - 00340004
(gdb) info symbol 0x300000
foo in unmapped overlay section .ovly0
bar in mapped overlay section .ovly1
(gdb) step
bar (x=1) at bar.c:5
5 if (x)
(gdb) x /i $pc
0x300008 <bar+8>: ld r4,@fp || nop
Now bar
is mapped, and foo is not. Even though the PC is at the same
address as before, GDB recognizes that we are in bar rather
than foo.
(gdb) disassemble
Dump of assembler code for function bar:
0x300000 <bar>: st fp,@-sp -> addi sp,-4
0x300004 <bar+4>: mv fp,sp -> st r0,@fp
0x300008 <bar+8>: ld r4,@fp || nop
0x30000c <bar+12>: beqz r4,0x30001c <bar+28>
0x300010 <bar+16>: ld24 r4,0x340000 <barx>
0x300014 <bar+20>: ld r5,@r4 -> mv r0,r5
0x300018 <bar+24>: bra 0x300020 <bar+32> -> bra 0x300020 <bar+32>
0x30001c <bar+28>: ldi r0,0 -> bra 0x300020 <bar+32>
0x300020 <bar+32>: add3 sp,sp,4
0x300024 <bar+36>: ld fp,@sp+ -> jmp lr
End of assembler dump.
(gdb) finish
Run till exit from #0 bar (x=1) at bar.c:5
0x208400 in main () at maindata.c:21
21b = bar(1);
Value returned is $4 = 309
Also in this
example, the bazx and grbxx variables are both mapped
to the same runtime address. We will see that with the automatic overlay
debugging mode, GDB always knows which variable is using that address.
(gdb) info addr bazx
Symbol "bazx" is static storage at address 0x3c0000,
loaded at 0x4c0000 in overlay section .data02.
(gdb) info sym 0x3c0000
bazx in unmapped overlay section .data02
grbxx in unmapped overlay section .data03
(gdb) info addr grbxx
Symbol "grbxx" is static storage at address 0x3c0000,
loaded at 0x4c0004 in overlay section .data03.
(gdb) break baz
Breakpoint 2 at 0x380008: file baz.c, line 5.
(gdb) break grbx
Breakpoint 3 at 0x380008: file grbx.c, line 5.
The two breakpoints
are actually set at the same address, yet GDB will correctly distinguish
between them when it hits them. If only one overlay function has a breakpoint
on it, GDB will not stop at that address in other overlay functions.
(gdb) cont
Continuing.
Breakpoint 2, baz (x=1) at baz.c:5
5 if (x)
(gdb) print &bazx
$5 = (int *) 0x3c0000
(gdb) x /d &bazx
0x3c0000 <bazx>: 317
(gdb) print &grbxx
$6 = (int *) 0x4c0004
(gdb) cont
Continuing.
Breakpoint 3, grbx (x=1) at grbx.c:5
5 if (x)
(gdb) print &grbxx
$7 = (int *) 0x3c0000
(gdb) x /d &grbxx
0x3c0000 <grbxx>: 435
(gdb) print &bazx
$7 = (int *) 0x4c0000
(gdb) x /d &bazx
0x4c0000 <*bazx*>: 317
GDB
overlay support for M32R/X/D targets
GDB provides special functionality
for debugging a program that is linked using the overlay mechanism of the
GNU linker. In such programs, an overlay corresponds to a section with
a load address that is different from its runtime address. GDB can provide
manual overlay debugging for any program linked in such a way
(providing that the overlays all reside somewhere in memory). Automatic
overlay debugging is also provided.
Manual
mode commands for M32R/X/D targets
The following commands are
for manual mode for the overlay manager.
overlay
map
<section-name>
overlay
unmap
<section-name>
The manual mode requires input from the user
to specify what overlays are mapped into their runtime address regions
at any given time. The overlay map command informs GDB that
the overlay has been mapped by the target into its shared runtime address
range. The overlay unmap command informs GDB that the overlay
is no longer resident in its runtime address region, and must be accessed
from the load-time address region. If two overlays share the same runtime
address region, then mapping one implies unmapping the other.
Auto
mode commands for M32R/X/D targets
The following commands are
for automatic mode for the overlay manager.
Automatic overlay debugging support in GDB works
with the runtime overlay manager provided in the examples directory.
When
this mode is activated, GDB will automatically read and interpret the data
structures maintained in target memory by the overlay manager. To learn
what overlays are mapped at any time, use the overlay list command.
Whenever
the target program is allowed to run (by the step command),
GDB will refresh its overlay map by reading from the targets overlay tables.
The
automatic mapping may be temporarily overridden by the overlay map
and overlay unmap commands, but these mappings will last only
until the next time the target is allowed to run. To explicitly take control
of GDBs overlay mapping, switch to the overlay manual mode.
Debugging
with overlays for M32R/X/D targets
When GDBs overlay support
(either manual or auto) is active, GDBs concept of a symbols address
is controlled by which overlays are mapped into which memory regions. For
instance, if you print a variable that is in an overlay which
is currently mapped (located in its runtime address region) GDB will fetch
the variables memory from the runtime address. If the variables overlay
is currently not mapped, GDB will fetch it from its load-time address.
Similarly,
if you disassemble a function that is in an unmapped overlay, or use a
symbols address to examine memory, GDB will fetch the memory from the
symbols load-time address range instead of the runtime range. If GDBs
output contains labels that are relative to an overlays load-time address
instead of the runtime address, the labels will be distinguished like the
following examples input shows.
(gdb) x /x foo
0x300000 <foo>: 0x2d7f4ffc
(gdb) overlay unmap .ovly0
(gdb) x /x foo
0x400000 <*foo*>: 0x2d7f4ffc
The asterisks
(* ) around the foo label may be interpreted as meaning
that this is where foo is, but not where it will be when it
is in use by the target program.
The INFO
ADDRESS command can tell you what overlay a symbol is in, as well
as where it is loaded and mapped. The INFO SYMBOL command can
list all of the symbols that are mapped to an address.
(gdb) info addr foo
Symbol "foo" is a function at address 0x300000,
-- loaded at 0x400000 in overlay section .ovly0.
(gdb) info symbol 0x300000
foo in mapped overlay section .ovly0
bar in unmapped overlay section .ovly1
Breakpoints
for M32R/X/D targets
So long as the overlay sections
are located in RAM rather than ROM, GDB can set breakpoints in them. The
breakpoints work by inserting trap instructions into the load-time address
region. When the overlay is mapped into the runtime region, the trap instructions
are mapped along with it, and when executed, cause the target program to
break out to the debugger. If the overlay regions are located in ROM, you
can only set breakpoints in them after they have been mapped into the runtime
region in RAM.
Developing
for the M32R/D targets
The following documentation
discusses the M32R/D processor.
Compiler
support for M32R/D targets
The following documentation
discusses the GNU compiler usage for M32R/D processors.
By default,
the compiler defines the __M32R__ preprocessor symbol.
For a list
of available generic compiler options, see GNU
CC command options in Using
GNU CC in GNUPro Compiler Tools. The following M32R/D-specific
command-line options have support.
Assume all objects live in the lower 16MB of memory
(so that their addresses can be loaded with the ld24 instruction),
and assume all subroutines are reachable with the bl instruction. This
is the default.
The
addressability of a particular object can be set with the model
attribute in the source code. See M32R/D-specific
attributes for compiling.
-mmodel=medium
Assume objects may be anywhere in the 32 bit
address space (the compiler will generate seth/add3 instructions
to load their addresses), and assume all subroutines are reachable with
the bl instruction.
Assume objects may be anywhere in the 32 bit
address space (the compiler will generate seth/add3 instructions
to load their addresses), and assume subroutines may not be reachable with
the bl instruction (the compiler will generate the much slower
seth/add3/jl instruction sequence).
Disable use of the small data area. Variables
will be put into one of .data, bss, or .rodata
(unless the section attribute has been specified). This is the
default. The small data area consists of sections .sdata and
.sbss. Objects may be explicitly put in the small data area
with the section attribute using one of these sections.
Put small global and static data in the small
data area, but do not generate special code to reference them. This is
normally only used to build system libraries. It enables them to be used
with both -msdata=none and -msdata=use options.
Put small global and static data in the small
data area, and generate special instructions to reference them.
Put global and static objects less than or equal
to num bytes into the small data or bss sections instead
of the normal data or bss sections. The default value of num
is 8.
The
-msdata option must be set to one of sdata or use
for this option to have any effect.
All
modules should be compiled with the same -G num value.
Compiling with different values of num may or may not work;
if it does not work, the linker will give an error message. Incorrect code
will not be generated.
M32R/D-specific
attributes for compiling
The following M32R/D-specific
attributes are supported. Names may be surrounded with double-underscores
to avoid
namespace pollution. For example __interrupt__
can also be used for interrupt. See also Declaring
attributes of functions and Specifying
attributes of variables in Extensions
to the C language family in Using
GNU CC in GNUPro Compiler Tools.
Indicates the specified function is an interrupt
handler. The compiler will generate prologue and epilogue sequences appropriate
for an interrupt handler.
Use this attribute on the M32R/D to set the addressability
of an object, and the code generated for a function. The identifier <model-name>
is one of small, medium, or large ,
representing each of the code models.
Small
model objects live in the lower 16MB of memory (so that their addresses
can be loaded with the ld24 instruction), and are callable with
the bl instruction.
Medium
model objects may live anywhere in the 32 bit address space (the compiler
will generate seth/add3 instructions to load their addresses),
and are callable with the bl instruction.
Large
model objects may live anywhere in the 32 bit address space (the compiler
will generate seth/add3 instructions to load their addresses),
and may not be reachable with the bl instruction (the compiler
will generate the much slower seth/add3/jl instruction sequence).
ABI
summary for M32R/D targets
The following documentation
describes the Application Binary Interface ( ABI)
for the M32R/D processor.
Data
types and alignment for M32R/D targets
Data
type sizes for the M32R/D processor
The stack is
aligned to a four-byte boundary. One byte is used for characters (including
structure/unions made entirely of chars), two bytes for shorts
(including structure/unions made entirely of shorts), and four-byte
alignment for everything else.
Allocation
rules for structures and unions for M32R/D targets
The following rules apply
to the allocation of structure and union members in memory.
-
Structure and union packing can
be controlled by attributes specified in the source code. In the absence
of any attributes however, the following rules are obeyed:
-
Fields that are shorts are aligned
to 2 byte boundaries. Fields that are ints, longs, floats, doubles and
long longs are aligned to 4 byte boundaries. Char fields are not aligned.
-
Composite fields (ie ones that
are themselves structures or unions) are aligned to greatest alignment
requirement of any of their component fields. So if a field is a structure
that contains a char, a short and an int, the field will be aligned to
a 4-byte boundary because of the int.
-
Bit fields are packed in a big-endian
fashion, and they are aligned so that they will not cross boundaries of
their type.
So
for example this structure:
struct { int a:2, b:31;} s = { 0x1, 0x3};
.zero 3
.byte 0x0
.byte 0x0
.byte 0x0
.byte 0x6
So
the a field is stored in the top two bits of the first byte;
with the most significant bit of a being stored in the most
significant bit of the byte. The bottom six bits of that byte and the next
three bytes are all padding, so that the next bitfield b does
not cross a word boundary.
struct { short c:2, d:2, e:13; } s = { 0x2, 0x3, 0xf};
.byte 0xb0
.zero 1
.byte 0x0
.byte 0x78
So
fields c and d are both held in the same byte, but
field e starts two bytes further on, so that it will not cross
a two byte boundary.
Fields in
unions are treated in the same way as fields in structures. A union is
aligned to the greatest alignment requirement of any of its members.
CPU
registers for M32R/D targets
The following documentation
details the registers for M32R/D processors.
Used for passing arguments to functions. Additional
arguments are passed on the stack (see The
stack frame for M32R/D targets). r0, r1 is also
used to return the result of function calls. The values of these registers
are not preserved across function calls.
Temporary registers for expression evaluation.
The values of these registers are not preserved across function calls.
r4
is reserved for use as a temporary register in the prologue.
r6
is also reserved for use as a temporary in the Position Independent Code
(PIC) calling sequence (if ever necessary) and may not be used in the function
calling sequence or prologue of functions.
r7
is also used as the static chain pointer in nested functions (a GNU C extension)
and may not be used in the function calling sequence or prologue of functions.
In other contexts it is used as a temporary register.
Temporary registers for expression evaluation.
The values of these registers are preserved across function calls.
Temporary register for expression evaluation.
Its value is preserved across function calls. It is also reserved for use
as potential global pointer.
Reserved for use as the frame pointer if one
is needed. Otherwise it may be used for expression evaluation. Its value
is preserved across function calls.
Link register. This register contains the return
address in function calls. It may also be used for expression evaluation
if the return address has been saved.
Stack pointer.
This register is not preserved across function
calls.
The carry bit of the psw is not preserved
across function calls.
The
stack frame for M32R/D targets
Stack frame information follows
for the M32R/D processor.
-
The stack grows downwards from
high addresses to low addresses.
-
A leaf function need not allocate
a stack frame if it does not need one.
-
A frame pointer need not be allocated.
-
The stack pointer shall always
be aligned to 4-byte boundaries.
-
The register save area shall
be aligned to a 4-byte boundary.
FP
points to the same location as SP.
Argument
passing for M32R/D processors
Arguments are passed to a
function using first registers and then memory if the argument passing
registers are used up. Each register is assigned an argument until all
are used. Unused argument registers have undefined values on entry. The
following rules must be adhered to.
-
An argument, if it is less than
or equal to 8 bytes in size, is passed in registers if available. However,
if such an argument is a composite structure (one with more than one field
and greater than 4 bytes in size) it is also passed on the stack, in addition
to being passed in the registers. An argument, which is greater than 8
bytes in size, is always passed by reference, which means that a copy of
the argument is placed on the stack and a pointer to that copy is passed
in the register.
-
If a data type would overflow
the register arguments, then it is passed in registers and memory. A long
long data type passed in r3 would be passed in r3
and in the first 4 bytes of the stack.
-
Arguments passed on the stack
begin at sp with respect to the caller.
-
Each argument passed on the stack
is aligned on a 4 byte boundary.
-
Space for all arguments is rounded
up to a multiple of 4 bytes.
Function
return values for M32R/D processors
Integers, floating point
values, and aggregates of 8 bytes or less are returned in register
r0 (and r1 if necessary).
Aggregates
larger than 8 bytes are returned by having the caller pass the address
of a buffer to hold the value in r0 as an "invisible" first
argument. All arguments are then shifted down by one. The address of this
buffer is returned in r0.
Startup
code for M32R/D targets
Before the main
function can be called, code must be run that does four things:
-
Contain _start symbol
-
Initialize the stack pointer
-
Zeros the bss section
-
Runs constructors for any global
objects that have them
The default
startup code is shown in the following example of the libgloss/m32r/crt0.S
file. The best way to write your own startup code is to take this and
modify it to suit your needs.
.balign 4
.global _start
_start:
ld24 sp, _stack
ldi fp, #0
# Clear the BSS. Do it in two parts for efficiency: longwords first
# for most of it, then the remaining 0 to 3 bytes.
ld24 r2, __bss_start ; R2 = start of BSS
ld24 r3, _end ; R3 = end of BSS + 1
sub r3, r2 ; R3 = BSS size in bytes
mv r4, r3
srli r4, #2 ; R4 = BSS size in longwords (rounded down)
ldi r1, #0 ; clear R1 for longword store
addi r2, #-4 ; account for pre-inc store
beqz r4, .Lendloop1 ; any more to go?
.Lloop1:
st r1, @+r2 ; yep, zero out another longword
addi r4, #-1 ; decrement count
bnez r4, .Lloop1 ; go do some more
.Lendloop1:
and3 r4, r3, #3 ; get no. of remaining BSS bytes to clear
addi r2, #4 ; account for pre-inc store
beqz r4, .Lendloop2 ; any more to go?
.Lloop2:
stb r1, @r2 ; yep, zero out another byte
addi r2, #1 ; bump address
addi r4, #-1 ; decrement count
bnez r4, .Lloop2 ; go do some more
.Lendloop2:
# Run code in the .init section.
# This will queue the .fini section to be run with atexit.
bl __init
# Call main, then exit.
bl main
bl exit
# If that fails just loop.
.Lexit:
bra .Lexit
Assembler
features for the M32R/D targets
The following documentation
discusses the assembler issues for the M32R/D processor.
For a list
of available generic assembler options, see Command-line
options in Using as
in GNUPro Utilities. In addition, the following M32R/D-specific
command-line options are supported.
Warn (or do not warn using -no-warn-unmatched-high
or -Wnuh ), if a high or shigh relocation
has no matching low relocation. The default is no warning.
The M32R/D
assembler syntax is based on the syntax in Mitsubishis M32R Family
Software Manual.
The M32R/D
assembler supports ; (semi-colon) and # (pound).
Both characters are line comment characters when used in column zero. The
semi-colon may also be used to start a comment anywhere within a line.
Register
names for the M32R/D targets
You can use the r0
through r15 predefined symbols to refer to the M32R/D registers.
You can also use sp as an alias for
r15, lr as an alias for r14, and fp
as an alias for r13.
The M32R/D
also has predefined symbols for the following control registers and status
bits.
Predefined
symbols and usage for M32R/D processors
|
|
|
|
|
Processor status word
(alias for cr0)
|
|
Condition bit register
(alias for cr1)
|
|
Interrupt stack pointer
(alias for cr2)
|
|
User stack pointer (alias
for cr3)
|
|
Backup program counter
(alias for cr6)
|
Addressing
modes for M32R/D targets
See Symbols
and addressing modes for the M32R/D processors for the addressing modes
for the M32R/D. The Rn symbol in refers
to any of the specifically numbered registers or register pairs, but not
the control registers.
Symbols
and addressing modes for the M32R/D processors
|
|
|
|
|
|
|
Register indirect with
post-increment
|
|
Register indirect with
post-decrement
|
|
Register indirect with
pre-decrement
|
|
Register indirect with
displacement
|
|
PC relative address (for
branch or rep)
|
|
|
Floating
point for M32R/D targets
Although the M32R/D has no
hardware floating point, the .float and
.double directives generate IEEE-format floating-point values
for compatibility with other development tools.
Pseudo
opcodes for M32R/D targets
M32R/D processors use one
pseudo opcode.
Create a label <label>
with the value of the next instruction that follows the pseudo op. Unlike
normal labels, the label created with .debugsym does not force
the next instruction to be aligned to a 32-bit boundary (i.e., it does
not generate a nop, if the previous instruction is a 16-bit instruction,
and the instruction that follows is also a 16-bit instruction).
Opcodes
for M32R/D targets
For detailed information
on the M32R/D machine instruction set, see M32R Family Software Manual.
The assembler implements all the standard M32R/D opcodes.
The assembler
does not support the :8 or :24 syntax for explicitly
specifying the size of the branch instruction. Instead, the assembler supports
the .s suffix to specify a short branch, and the .l
suffix to specify a long branch. For example, bra label:8
becomes bra.s label and bra label:24 becomes bra.l
label.
The assembler
does not support the :8 or :16 syntax for explicitly
specifying the size of an immediate constant. Instead, the assembler supports
the ldi8 and ldi16 mnemonics . For example, ldi
r0, 1:8 becomes ldi8 r0, 1 and ldi r0, 1:16
becomes ldi16 r0, 1.
Synthetic
instructions for M32R/D targets
Synthetic instructions are
aliases for existing instructions. They provide an additional and often
simpler way to specify an instruction. See Synthetic
instructions for M32RR/D processors.
Synthetic
instructions for M32R/D processors
|
|
|
|
|
|
|
|
|
|
|
|
|
bnc label [24-bit
offset]
|
|
|
|
bra label [24-bit
offset]
|
|
ldi reg, #const [8-bit
constant]
|
|
ldi reg, #const [16-bit
constant]
|
|
|
|
|
Writing
assembler code for M32R/D targets
The best way to write assembler
code is to write a small C program, compile it with the -S
flag, and study the assembler code GCC produces.
The assembler
code in the following example (hello.s) is from the hello.c
example. It was created with m32r-elf-gcc -S -O2 hello.c.
See Using
as
in GNUPro Utilities for more information on GNU assembler
directives, or pseudo-opcodes.
See the M32R Family Software Manual for more information
on the instruction set, and syntax.
.section .rodata
.balign 4
.LC0:
.string"hello world!\n"
.balign 4
.LC1:
.string"%d + %d = %d\n"
.section .text
.balign 4
.globalmain
.type main,@function
main:
; BEGIN PROLOGUE ; vars= 0, regs= 2, args= 0, extra= 0
push r8
push lr
; END PROLOGUE
ld24 r8,#a
ldi r4,#3
st r4,@(r8)
ld24 r0,#.LC0
bl printf
ld24 r0,#.LC1
ld r1,@(r8)
ld24 r4,#c
ldi r2,#4
add3 r3,r1,#4
st r3,@(r4)
bl printf
; EPILOGUE
pop lr
pop r8
jmp lr
.Lfe1:
.size main,.Lfe1-main
.comma,4,4
.commc,4,4
.ident"GCC: (GNU) 2.7-m32r-970408"
To assemble
the hello.s file, use the following input.
m32r-elf-as
hello.s -o hello.o
The following
are some tips for assembler programmers.
-
To clear the CBR register,
just one instruction can be used:
Where Rx
, is an arbitrary register. Note the operation does not destroy the contents
of Rx. The previous code example is smaller than the following
code:
cmpi Rx,#0 total 6 bytes and destroys
Rx.
To set the
CBR register, there are several methods. First, try using the
following examples input.
ldi Rx,#-1
addv R0,R0 total 4 bytes
Alternatively,
try using the following examples input.
addx R0,R0 total 4 bytes
The
previous code examples are smaller than the following code example:
cmpi Rx,#1 total 6 bytes
To set a
comparison result to a register, there are some idioms for the M32R.
For instance, try
using the following examples input.
(a) ... flag
= (x == 0);...
mvfc Rx,CBR total 4 byte
(b)
...flag = !(x op 0); ...
To
get the inverted result of comparison, first set CBR using one
of the methods above, then, try using the following examples input.
addi Rx,#1 total 4 byte
The
previous example will provide better results than than the following code.
xor3 Rx,Rx,#1 total 6-byte
The subx
Rx,Rx operation is equivalent to the following code.
Writing
assembler code for M32R/D targets
The best way to write assembler
code is to write a small C program, compile it with the -S flag,
and study the assembler code GCC produces.
See Using
as
in GNUPro Utilities for more information on GNU assembler
directives, or pseudo-opcodes.
See the M32R Family Software Manual for more information
on the instruction set, and syntax.
The following
example shows the hello.s assembler code from the hello.c
example. It was created with m32r-elf-gcc -S -O2 hello.c.
.section .rodata
.balign 4
.LC0:
.string"hello world!\n"
.balign 4
.LC1:
.string"%d + %d = %d\n"
.section .text
.balign 4
.globalmain
.type main,@function
main:
; BEGIN PROLOGUE ; vars= 0, regs= 2, args= 0, extra= 0
push r8
push lr
; END PROLOGUE
ld24 r8,#a
ldi r4,#3
st r4,@(r8)
ld24 r0,#.LC0
bl printf
ld24 r0,#.LC1
ld r1,@(r8)
ld24 r4,#c
ldi r2,#4
add3 r3,r1,#4
st r3,@(r4)
bl printf
; EPILOGUE
pop lr
pop r8
jmp lr
.Lfe1:
.size main,.Lfe1-main
.comma,4,4
.commc,4,4
.ident"GCC: (GNU) 2.7-m32r-970408"
To assemble
the hello.s file, enter:
m32r-elf-as
hello.s -o hello.o
The following
are some tips for assembler programmers:
-
To clear the CBR register,
just one instruction can be used:
Where Rx
, is an arbitrary register. Note the operation does not destroy the contents
of Rx. The previous code example is smaller than the following
code:
ldi Rx,#1
cmpi Rx,#0 total 6 bytes and destroys Rx.
-
To set the CBR register,
there are several methods:
ldi Rx,#-1
addv R0,R0 total 4 bytes
addx R0,R0 total 4 bytes
The
previous code examples are smaller than the following code example:
cmpi Rx,#1 total 6 bytes
To set a comparison result to
a register, there are some idioms for the M32R.
(a) ... flag
= (x == 0);...
cmpui Rx,#1
mvfc Rx,CBR total 4 byte
(b)
...flag = !(x op 0); ...
To
get the inverted result of comparison, first set CBR using one
of the methods above, then:
addi Rx,#1 total 4 byte
rather
than the following code
xor3 Rx,Rx,#1 total 6-byte
The subx Rx,Rx operation is equivalent
to:
neg Rx,Rx
Inserting
assembly instructions into C code for M32R/D targets
Assembly code can be embedded
in C or C++ code with the asm keyword. There are two forms of
asm: simple and extended. The syntax
uses the following form.
For instance,
consider the following examples input.
C string concatenation
works with asm so more complicated expressions can be spread
out over several lines.
".global foo\n"
"foo:\n"
".word 42\n"
);
This example
creates a variable called foo with the value of 42, and is obviously
intended to be compiled outside of any function definition.
Another way
to write that would be:
.global foo
foo:
.word 42
");
Warning!
The simple form is
only for cases where the compiler doesnt need to know what values are
being used and what values are being modified by the assembly code. This
is because the contents of the assembly code are hidden from GCCs data-flow
analysis. GCC does not parse the assembly code, it merely copies it verbatim
to the output file.
Using the extended
form of asm, you can specify the operands of the instruction
using C expressions. You need not guess which registers or memory locations
will contain the data you want to use. Its syntax has the following form.
asm ("assembly
code" : outputs : inputs : clobbers);s
The inputs
and clobbers are optional in an extended asm. The outputs are optional
too, but then the asm is no longer an "extended asm" and is rather a "simple
asm".
outputs
is a comma separated list of C expressions that are the results of the
assembly code. The syntax is a string containing the "operand constraint"
followed by a C expression in parentheses.
inputs
syntax is identical to the syntax of outputs.
clobbers
is a comma separated list of registers that are modified by the assembly
code but arent listed in the outputs. If memory is or may be modified,
specify "memory" in the clobbers section.
The following
example shows an asm statement that adds two values together.
int
add (int arg1, int arg2)
{
asm ("add %0, %1" : "+r" (arg1) : "r" (arg2));
return arg1;
}
The statement
was constructed with the following procedure.
-
1. The text to create the assembler instruction
is the first part of the asm statement, as in the following
example.
The
registers containing the arguments, however, if unknown to the programmer,
are given placeholders, as in the following example.
Specify
the values of these placeholders in numerical order, starting from 0, immediately
after the assembler instruction, as in the following example.
This is wrong in
several ways. First, the syntax specifies, that C variables and expressions
must be enclosed in parentheses, as in the following example.
"add %0, %1" (arg1)
(arg2)
Second, there must
be a colon between the assembler text and the placeholders, as in the following
example.
"add %0, %1" : (arg1)
(arg2)
Third, each placeholder
should be separated from the next by a comma, as in the following example.
"add %0, %1" : (arg1)
, (arg2)
Specify
the constraints for the placeholders. These constraints use the same syntax
as the constraints found on machine patterns in the m32r.md
file. A constraint is a sequence of letters enclosed within double quotes
that specifies what kind of thing the placeholder can be. For a complete
list of letters, see Simple
constraints.
Both arguments
should be in registers (since the add instruction only takes register arguments),
so it now resembles the following examples input.
"add %0, %1" : "r"
(arg1) , "r" (arg2)
Use extra constrain
on (arg1) to let the compiler know that not only is (arg1) used as an input
to the instruction, but that it is also used to hold the instructions
output. This is done in two parts.
First, the constraint
must include the
+ character to show that the register is both
read and written by the instruction (use the following examples input).
"add %0, %1" : "+r"
(arg1) , "r" (arg2)
Second, the syntax
specifies that all placeholders that are outputs of the instruction must
be specified first; then a colon must appear and then any placeholders
that are just inputs can appear, as in the following examples input. The
comma is removed, since the colon takes its place.
"add %0, %1" : "+r"
(arg1) : "r" (arg2)
That is the complete
asm statement.
For more information on extended
asm, see
Alternate
keywords in
Using GNU
CC in
GNUPro Compiler Tools.
M32R/D-specific assembler error messages
The following error messages
may occur for M32R/X/D processors during assembly implementation.
The instruction is misspelled or there is a syntax error somewhere.
Error: expression
too complex
Error: unresolved
expression that must be resolved
The instruction contains an expression that is too complex; no relocation
exists to handle it.
Error: relocation
overflow
The instruction contains an expression that is too large to fit in
the field.
Producing S-records for M32R/D targets
The following command reads
the contents of the
hello.x file, converts the code and data
into S-records, and puts the result into the
hello.srec file.
m32r-elf-objcopy
-O srec hello.x hello.srec
The first few lines of
hello.srec are in the following example.
S00D000068656C6C6F2E7372656303
S11801002D7F2E7F1D8FF000E0006DF4FE0000FEFE001B281F54
S11801158D2EEF2DEF1FCEEF1000006D00F000E20075C0E300C8
S118012A75F4032214835402610042FCF000B0840003216244B4
S118013FFFB094FFFF84C300034204F000B08400042102420148
Linker issues for
M32R/D targets
For a list of available generic
linker options, see
Linker
scripts in
Using
ld
in
GNUPro Utilities. In addition, the following M32R/D-specific
command-line option is supported.
Specify the initial value for the stack pointer. This assumes the
application loads the stack pointer with the value of _stack
in the start up code.
The initial value
for the stack pointer is defined in the linker script with the PROVIDE
linker command. This allows the user to specify a new value on the command
line with the standard linker option --defsym.
Linker script for
the M32R/D targets
The GNU linker uses a linker
script to determine how to process each section in an object file, and
how to lay out the executable. The linker script is a declarative program
consisting of a number of directives. For instance, the
ENTRY()
directive specifies the symbol in the executable that will be the executables
entry point. Since linker scripts can be complicated to write, the linker
includes one built-in script that defines the default linking process.
For the M32R/D tools, the
following example shows the default script. Although the script is somewhat
lengthy, it is a generic script that will support all ELF situations. In
practice, generation of sections like
.rela.dtors are unlikely
when compiling using embedded ELF tools.
OUTPUT_FORMAT("elf32-m32r",
"elf32-m32r", "elf32-m32r")
OUTPUT_ARCH(m32r)
ENTRY(_start)
SEARCH_DIR( <installation directory path> );
SECTIONS
{
/* Read-only sections, merged into text segment: */
. = 0x200000;
.interp : { *(.interp) }
.hash : { *(.hash) }
.dynsym : { *(.dynsym) }
.dynstr : { *(.dynstr) }
.rel.text : { *(.rel.text) }
.rela.text : { *(.rela.text) }
.rel.data : { *(.rel.data) }
.rela.data : { *(.rela.data) }
.rel.rodata : { *(.rel.rodata) }
.rela.rodata : { *(.rela.rodata) }
.rel.got : { *(.rel.got) }
.rela.got : { *(.rela.got) }
.rel.ctors : { *(.rel.ctors) }
.rela.ctors : { *(.rela.ctors) }
.rel.dtors : { *(.rel.dtors) }
.rela.dtors : { *(.rela.dtors) }
.rel.init : { *(.rel.init) }
.rela.init : { *(.rela.init) }
.rel.fini : { *(.rel.fini) }
.rela.fini : { *(.rela.fini) }
.rel.bss : { *(.rel.bss) }
.rela.bss : { *(.rela.bss) }
.rel.plt : { *(.rel.plt) }
.rela.plt : { *(.rela.plt) }
.init : { *(.init) } =0
.plt : { *(.plt) }
.text :
{
*(.text)
/* .gnu.warning sections are handled specially by
elf32.em. */
*(.gnu.warning)
*(.gnu.linkonce.t*)
} =0
_etext = .;
PROVIDE (etext = .);
.fini : { *(.fini) } =0
.rodata : { *(.rodata) *(.gnu.linkonce.r*) }
.rodata1 : { *(.rodata1) }
/* Adjust the address for the data segment. We want to
adjust up to the same address within the page on the
next page up. */
. = ALIGN(32) + (ALIGN(8) & (32 - 1));
.data :
{
*(.data)
*(.gnu.linkonce.d*)
CONSTRUCTORS
}
.data1 : { *(.data1) }
.ctors : { *(.ctors) }
.dtors : { *(.dtors) }
.got : { *(.got.plt) *(.got) }
.dynamic : { *(.dynamic) }
/* We want the small data sections together, so
single-instruction offsets can access them all, and
initialized data all before uninitialized, so we can
shorten the on-disk segment size. */
.sdata : { *(.sdata) }
_edata = .;
PROVIDE (edata = .);
__bss_start = .;
.sbss : { *(.sbss) *(.scommon) }
.bss : { *(.dynbss) *(.bss) *(COMMON) }
_end = . ;
PROVIDE (end = .);
/* Stabs debugging sections. */
.stab 0 : { *(.stab) }
.stabstr 0 : { *(.stabstr) }
.stab.excl 0 : { *(.stab.excl) }
.stab.exclstr 0 : { *(.stab.exclstr) }
.stab.index 0 : { *(.stab.index) }
.stab.indexstr 0 : { *(.stab.indexstr) }
.comment 0 : { *(.comment) }
/* DWARF debug sections.
Symbols in the .debug DWARF section are relative to the
beginning of the section so we begin .debug at 0. Its
not clear yet what needs to happen for the others. */
.debug 0 : { *(.debug) }
.debug_srcinfo 0 : { *(.debug_srcinfo) }
.debug_aranges 0 : { *(.debug_aranges) }
.debug_pubnames 0 : { *(.debug_pubnames) }
.debug_sfnames 0 : { *(.debug_sfnames) }
.line 0 : { *(.line) }
PROVIDE (_stack = 0x3ffffc);
}
Debugger issues with M32R/D targets
For the available generic
debugger options, see
Debugging
with GDB in
GNUPro Debugging Tools. There are no
M32R/D specific debugger command-line options.
Cygnus Insight is the graphic
user interface (GUI) for the GNUPro debugger. See
Working
with Cygnus Insight, the visual debugger in
GETTING
STARTED.
There are three ways for
GDB to talk to an M32R/D target: through the built-in simulator, through
a remote target board with a remote stub linked directly to the user program
and through a remote target board with the remote stub already loaded independently.
See the following documentation for details.
-
Simulator
GDBs built-in software simulation of the M32R/D processor allows the
debugging of programs compiled for the M32R/D without requiring any access
to actual hardware. Activate this mode in GDB by using the target
sim command. Then load code into the simulator by using the
load command and debug it in the normal fashion.
-
Remote target board, with remote stub linked
directly to user program
The program being debugged must have the remote debugging protocol
subprogram linked directly into it, to use this mode.
The program is
then downloaded to the target board by GDB, using the
target mon2000
<devicename>
command where
<devicename> will be a serial device such as
/dev/ttya (Unix) or
com2 (Windows 95). After being
downloaded, the program must be running and it must execute the following
function calls into the remote debugging subprogram:
breakpoint();
If GDB is running
on a Unix host computer, start the target program by simply using the
run command at the (gdb) prompt. Then GDB must be interrupted
by using several Ctrl-c (^C ) characters. However, if GDB
is being run on a Microsoft Windows 95 host computer, you must exit from
GDB and connect to the M32R/D EVA target board with a terminal program
such as Kermit or HyperTerminal. Use the Return key to get the ROM
monitors ok prompt; then use the go command and
use the Return key, as the following example input shows.
ok go
Then exit from the
terminal program and start up GDB again. It is then possible to connect
GDB to the target using GDBs remote protocol, with the command target
remote <devicename> where <devicename>, as
before, is the name of a serial device. The debugging session can then
proceed. GDB will initially report that the program has received a
SIGTRAP while executing the call to the breakpoint( )
function . From there you can continue or single-step to get back into
your own program.
-
Remote target board, remote stub already
loaded independently
In this mode, it is assumed that the remote protocol subprogram is
already running on the target board. With the remote stub already running
on the target board, use the gdbcommand to start the debugging,
then use the target remote <devicename> command, where
<devicename> will be a serial device such as /dev/ttya(Unix)
or com2 (Windows 95), and then download your program and begin
debugging it. Downloading is from six to seven times faster using this
method.
Note:
When using the remote target,
GDB does not accept the
run command. However, since downloading
the program has the side effect of setting the PC to the start address,
you can start your program by using the
continue command.
Stand-alone simulator
for M32R/D targets
The simulator supports the
r0 to
r15
general-registers
, the
psw,
cbr,
spi,
spu ,
bpc
control-registers , and the
accumulator .
The simulator allocates a contiguous chunk of memory starting at the
0
address. Default memory size is 8 MB.
Three run-time command-line
options are available with the simulator:
-t ,
-v , and
-p
.
warning!
Simulator cycle counts are
not intended to be extremely accurate in the following script examples.
Use them with caution.
-
The -t command-line option to the
stand-alone simulator turns on instruction level tracing as shown in the
following segment:
0x00011c ld24 sp,0x100000 dr <- 0x100000
0x000120 ldi fp,0 dr <- 0x0
0x000122 nop
0x000124 ld24 r2,0x75c0 dr <- 0x75c0
0x000128 ld24 r3,0x75f4 dr <- 0x75f4
0x00012c sub r3,r2 dr <- 0x34
0x00012e mv r4,r3 dr <- 0x34
0x000130 srli r4,0x2 dr <- 0xd
0x000132 ldi r1,0 dr <- 0x0
0x000134 addi r2,-4 dr <- 0x75bc
0x000136 nop
0x000138 . . .
-
The -v command-line option prints
some simple statistics:
hello world!
3 + 4 = 7
Total: 3808 insns
Fill nops: 609
-
The -p command prints profiling
statistics.
% m32r-elf-run
-p hello.x
Hello world!
3 + 4 = 7
Instruction Statistics
Total: 3796 insns
add: 75: *****
add3: 123: ********
and: 3:
and3: 61: ****
or: 28: *
or3: 3:
addi: 222: ***************
bc8: 9:
bc24: 3:
beq: 23: *
beqz: 131: ********
bgez: 8:
bgtz: 2:
blez: 42: **
bltz: 6:
bnez: 252: *****************
bl8: 11:
bl24: 82: *****
bnc8: 52: ***
bne: 1:
bra8: 29: *
bra24: 9:
cmp: 28: *
cmpu: 34: **
cmpui: 2:
jl: 7:
jmp: 100: ******
ld: 93: ******
ld-d: 277: ******************
ldb: 77: *****
ldb-d: 6:
ldh-d: 38: **
ldub: 23: *
lduh-d: 23: *
ld-plus: 158: **********
ld24: 55: ***
ldi8: 163: ***********
ldi16: 5:
mv: 282: *******************
neg: 26: *
nop: 584: ****************************************
sll: 3:
sll3: 7:
slli: 25: *
srai: 25: *
srli: 35: **
st: 52: ***
st-d: 195: *************
stb: 27: *
stb-d: 4:
sth: 25: *
sth-d: 11:
st-plus: 13:
st-minus: 164: ***********
sub: 52: ***
trap: 2:
Memory Access Statistics
Total read: 1891 accesses
Total write: 491 accesses
QI read: 83: **
QI write: 31: *
HI read: 38: *
HI write: 36: *
SI read: 528: *****************
SI write: 424: **************
UQI read: 23:
UHI read: 23:
USI read: 1196: ****************************************
Model m32r/d timing information:
Taken branches: 532
Untaken branches: 237
Cycles stalled due to branches: 1064
Cycles stalled due to loads: 670
Total cycles (approx): 4946
Fill nops: 584
Overlays for M32R/D
targets
Overlays are sections of
code or data, which are to be loaded as part of a single memory image,
but are to be run or used at a common memory address. At run time, an overlay
manager will copy the sections in and out of the runtime memory address.
This approach can be useful, for example, when a certain region of memory
is faster than another section.
See the following documentation
for more details on using overlays for the M32R/D processor.
Sample runtime
overlay manager for M32R/D
A simple, portable runtime
overlay manager is provided in the
examples directory. To access
the examples directory, use the following path (
<yymmdd>
is replaced with the release date found on the CD).
/usr/cygnus/m32r-<yymmdd>/src/examples
The sample overlay manager
may be used as is, or as a prototype to develop a third party overlay manager
(or adapt an existing one for use with the GDB debugger). It is intended
to be extremely simple, easy to understand, and not particularly sophisticated.
The overlay manager has a
single entry point: the
OverlayLoad(ovly_number) function. It
looks up the overlay in a
ovly_table table to find the corresponding
sections load address and runtime address; then it copies the section
from its load address into its runtime address.
OverlayLoad
must be called before code, or data in an overlay section can be used by
the program. It is up to the programmer to keep track of which overlays
have been loaded. The
_ovly_table table is built by the linker
from information provided by the programmer in the linker script; see
Linker
script for overlays for the M32R/D targets .
The example program contains
four overlay sections, which are mapped into two runtime regions of memory.
Sections
.ovly0 and
.ovly1 are both mapped into the
region starting at
0x300000, and sections
.ovly2
and
.ovly3 are both mapped into the region starting at
0x380000.
Linker script for
overlays for the M32R/D targets
To build a program with overlays
requires a customized linker script. An example program is built with the
m32rtext.ld script, found in the
examples/overlay
directory. It is a modified version of the default linker script, with
two parts added.
The first added part describes
the overlay sections, and must be located in the
SECTIONS block,
before the
.text and
.data sections. It uses the
OVERLAY linker command, which allows the specification of groups
of sections sharing a common runtime address range.
{
OVERLAY 0x300000 : AT (0x400000)
{
.ovly0 { foo.o(.text) }
.ovly1 { bar.o(.text) }
}
OVERLAY 0x380000 : AT (0x480000)
{
.ovly2 { baz.o(.text) }
.ovly3 { grbx.o(.text) }
}
[...]
The OVERLAY command
has two arguments: first, the base address where all of the overlay sections
link and run; second, the address where the first overlay section loads.
In the example, the .ovly1 section will load at 0x400000
+ SIZEOF(.ovly0). For a full description of the OVERLAY
linker command, see Output
section type and Overlay
description in Using ld
in GNUPro Utilities.
The OVERLAY command
is really just a syntactic convenience. For finer control over where the
individual sections will load, use the following examples syntax
SECTIONS
{
.ovly0 0x300000 : AT (0x400000) { foo.o(.text) }
.ovly1 0x300000 : AT (0x410000) { bar.o(.text) }
.ovly2 0x380000 : AT (0x420000) { baz.o(.text) }
.ovly3 0x380000 : AT (0x430000) { grbx.o(.text) }
[...]
The second addition to the
linker script actually builds the _ovly_table table, which the
sample runtime overlay manager uses. This table has several entries for
each overlay, and must be located somewhere in the .data section.
.data :
{
[...]
_ovly_table = .;
LONG(ABSOLUTE(ADDR(.ovly0)));
LONG(SIZEOF(.ovly0));
LONG(LOADADDR(.ovly0));
LONG(0);
LONG(ABSOLUTE(ADDR(.ovly1)));
LONG(SIZEOF(.ovly1));
LONG(LOADADDR(.ovly1));
LONG(0);
LONG(ABSOLUTE(ADDR(.ovly2)));
LONG(SIZEOF(.ovly2));
LONG(LOADADDR(.ovly2));
LONG(0);
LONG(ABSOLUTE(ADDR(.ovly3)));
LONG(SIZEOF(.ovly3));
LONG(LOADADDR(.ovly3));
LONG(0);
_novlys = .;
LONG((_novlys - _ovly_table) / 16);
[...]
}
The example program has four
functions; foo, bar, baz, and grbx.
Each is in a separate overlay section. Functions foo and bar
are both linked to run at address 0x300000 , while functions
baz and grbx are both linked to run at 0x380000.
The main program calls
OverlayLoad once before calling each of the overlaid functions, giving
it the overlay number of the respective overlay. The overlay manager, using
the _ovly_table table that was built up by the linker script,
copies each overlayed function into the appropriate region of memory before
it is called.
In order to compile and link
the example overlay manager, use the following examples input.
m32r-elf-gcc -g -Tm32rdata.ld
-oovlydata maindata.c ovlymgr.c
Debugging the example program for M32R/D targets
Using GDBs built-in overlay
support, we can debug this program even though several of the functions
share an address range. After loading the program, give GDB the command
overlay auto. GDB then detects the actions of the overlay manager
on the target, and can step into overlayed functions, show appropriate
backtraces, etc. If a symbol is in an overlay that is not currently mapped,
GDB will access the symbol from its load address instead of the mapped
runtime address (which would currently be holding something else from another
overlay).
In the following example,
the
foo and
bar functions are in different overlays
that run at the same address. We will use GDBs overlay debugging to step
into and debug them.
Reading symbols from ovlydata...done.
(gdb) target sim
Connected to the simulator.
(gdb) load
Loading section .ovly0, size 0x28 lma 0x400000
Loading section .ovly1, size 0x28 lma 0x400028
Loading section .ovly2, size 0x28 lma 0x480000
Loading section .ovly3, size 0x28 lma 0x480028
Loading section .data00, size 0x4 lma 0x440000
Loading section .data01, size 0x4 lma 0x440004
Loading section .data02, size 0x4 lma 0x4c0000
Loading section .data03, size 0x4 lma 0x4c0004
Loading section .init, size 0x1c lma 0x208000
Loading section .text, size 0xa3c lma 0x20801c
Loading section .fini, size 0x14 lma 0x208a58
Loading section .rodata, size 0x24 lma 0x208a6c
Loading section .data, size 0x374 lma 0x208ab0
Loading section .ctors, size 0x8 lma 0x208e24
Loading section .dtors, size 0x8 lma 0x208e2c
Start address 0x20801c
Transfer rate: 30240 bits in <1 sec.
(gdb) overlay auto
(gdb) overlay list
No sections are mapped.
(gdb) info address foo
Symbol "foo" is a function at address 0x300000,
loaded at 0x400000 in overlay section .ovly0.
(gdb) info symbol 0x300000
foo in unmapped overlay section .ovly0
bar in unmapped overlay section .ovly1
(gdb) info address bar
Symbol "bar" is a function at address 0x300000,
loaded at 0x400028 in overlay section .ovly1.
(gdb) break main
Breakpoint 1 at 0x20839c: file maindata.c, line 12.
(gdb) run
Starting program: ovlydata
Breakpoint 1, main () at maindata.c:12
12 if (!OverlayLoad(0))
(gdb) next
14 if (!OverlayLoad(4))
(gdb) next
16 a = foo(1);
(gdb) overlay list
Section .ovly0, loaded at 00400000 - 00400028, mapped at 00300000
- 00300028
Section .data00, loaded at 00440000 - 00440004, mapped at 00340000
- 00340004
(gdb) info symbol 0x300000
foo in mapped overlay section .ovly0
bar in unmapped overlay section .ovly1
The overlay containing the
foo function is now mapped.
(gdb) step
foo (x=1) at foo.c:5
5 if (x)
(gdb) x /i $pc
0x300008 <foo+8>: ld r4, @fp || nop
(gdb) print foo
$1 = {int (int)} 0x300000 <foo>
(gdb) print bar
$2 = {int (int)} 0x400028 <*bar*>
GDB uses labels such as
<*bar*> (with asterisks) to distinguish overlay load addresses
from the symbols runtime address (where it will be when used by the program).
(gdb) disassemble
Dump of assembler code for function foo:
0x300000 <foo>: st fp,@-sp -> addi sp,-4
0x300004 <foo+4>: mv fp,sp -> st r0,@fp
0x300008 <foo+8>: ld r4,@fp || nop
0x30000c <foo+12>: beqz r4,0x30001c <foo+28>
0x300010 <foo+16>: ld24 r4,0x340000 <foox>
0x300014 <foo+20>: ld r5,@r4 -> mv r0,r5
0x300018 <foo+24>: bra 0x300020 <foo+32> -> bra 0x300020
<foo+32>
0x30001c <foo+28>: ldi r0,0 -> bra 0x300020 <foo+32>
0x300020 <foo+32>: add3 sp,sp,4
0x300024 <foo+36>: ld fp,@sp+ -> jmp lr
End of assembler dump.
(gdb) disassemble bar
Dump of assembler code for function bar:
0x400028 <*bar*>: st fp,@-sp -> addi sp,-4
0x40002c <*bar+4*>: mv fp,sp -> st r0,@fp
0x400030 <*bar+8*>: ld r4,@fp || nop
0x400034 <*bar+12*>: beqz r4,0x400044 <*bar+28*>
0x400038 <*bar+16*>: ld24 r4,0x340000 <foox>
0x40003c <*bar+20*>: ld r5,@r4 -> mv r0,r5
0x400040 <*bar+24*>: bra 0x400048 <*bar+32*> -> bra 0x400048
<*bar+32*>
0x400044 <*bar+28*>: ldi r0,0 -> bra 0x400048 <*bar+32*>
0x400048 <*bar+32*>: add3 sp,sp,4
0x40004c <*bar+36*>: ld fp,@sp+ -> jmp lr
End of assembler dump.
Since the overlay containing
bar is not currently mapped, GDB finds bar at its
load address, and disassembles it there.
(gdb) finish
Run till exit from #0 foo (x=1) at foo.c:5
0x2083cc in main () at maindata.c:16
16a = foo(1);
Value returned is $3 = 324
(gdb) next
17if (!OverlayLoad(1))
(gdb) next
19if (!OverlayLoad(5))
(gdb) next
21b = bar(1);
(gdb) overlay list
Section .ovly1, loaded at 00400028 - 00400050, mapped at 00300000
- 00300028
Section .data01, loaded at 00440004 - 00440008, mapped at 00340000
- 00340004
(gdb) info symbol 0x300000
foo in unmapped overlay section .ovly0
bar in mapped overlay section .ovly1
(gdb) step
bar (x=1) at bar.c:5
5 if (x)
(gdb) x /i $pc
0x300008 <bar+8>: ld r4,@fp || nop
Now bar is mapped,
and foo is not. Even though the PC is at the same address as
before, GDB recognizes that we are in bar rather than
foo.
(gdb) disassemble
Dump of assembler code for function bar:
0x300000 <bar>: st fp,@-sp -> addi sp,-4
0x300004 <bar+4>: mv fp,sp -> st r0,@fp
0x300008 <bar+8>: ld r4,@fp || nop
0x30000c <bar+12>: beqz r4,0x30001c <bar+28>
0x300010 <bar+16>: ld24 r4,0x340000 <barx>
0x300014 <bar+20>: ld r5,@r4 -> mv r0,r5
0x300018 <bar+24>: bra 0x300020 <bar+32> -> bra 0x300020
<bar+32>
0x30001c <bar+28>: ldi r0,0 -> bra 0x300020 <bar+32>
0x300020 <bar+32>: add3 sp,sp,4
0x300024 <bar+36>: ld fp,@sp+ -> jmp lr
End of assembler dump.
(gdb) finish
Run till exit from #0 bar (x=1) at bar.c:5
0x208400 in main () at maindata.c:21
21b = bar(1);
Value returned is $4 = 309
The bazx and
grbxx variables are now both mapped to the same runtime address.
With the automatic overlay debugging mode, GDB always knows which variable
is using an address.
(gdb) info
addr bazx
Symbol "bazx" is static storage at address 0x3c0000,
loaded at 0x4c0000 in overlay section .data02.
(gdb) info sym 0x3c0000
bazx in unmapped overlay section .data02
grbxx in unmapped overlay section .data03
(gdb) info addr grbxx
Symbol "grbxx" is static storage at address 0x3c0000,
loaded at 0x4c0004 in overlay section .data03.
(gdb) break baz
Breakpoint 2 at 0x380008: file baz.c, line 5.
(gdb) break grbx
Breakpoint 3 at 0x380008: file grbx.c, line 5.
The two breakpoints are actually
set at the same address, yet GDB will correctly distinguish between them
when it hits them. If only one overlay function has a breakpoint on it,
GDB will not stop at that address in other overlay functions.
(gdb) cont
Continuing.
Breakpoint 2, baz (x=1) at baz.c:5
5 if (x)
(gdb) print &bazx
$5 = (int *) 0x3c0000
(gdb) x /d &bazx
0x3c0000 <bazx>: 317
(gdb) print &grbxx
$6 = (int *) 0x4c0004
(gdb) cont
Continuing.
Breakpoint 3, grbx (x=1) at grbx.c:5
5 if (x)
(gdb) print &grbxx
$7 = (int *) 0x3c0000
(gdb) x /d &grbxx
0x3c0000 <grbxx>: 435
(gdb) print &bazx
$7 = (int *) 0x4c0000
(gdb) x /d &bazx
0x4c0000 <*bazx*>: 317
GDB overlay support
for M32R/D targets
GDB provides special functionality
for debugging a program that is linked using the overlay mechanism of
ld
, the GNU linker. In such programs, an overlay corresponds to a section
with a load address that is different from its runtime address. GDB can
provide
manual overlay debugging for any program linked in such
a way (providing that the overlays all reside somewhere in memory). Automatic
overlay debugging is also provided.
Manual mode commands for M32R/D targets
The following commands are
for manual mode for the overlay manager.
overlay map
<section-name>
overlay unmap
<section-name>
The manual mode requires input from the user to specify what overlays
are mapped into their runtime address regions at any given time. The
overlay map command informs GDB that the overlay has been mapped
by the target into its shared runtime address range. The overlay unmap
command informs GDB that the overlay is no longer resident in its runtime
address region, and must be accessed from the load-time address region.
If two overlays share the same runtime address region, then mapping one
implies unmapping the other.
Auto mode commands for M32R/D targets
The following commands are
for
automatic mode for
the overlay manager.
Automatic overlay debugging support in GDB works with the runtime
overlay manager provided in the examples directory.
When this mode is
activated, GDB will automatically read and interpret the data structures
maintained in target memory by the overlay manager. To learn what overlays
are mapped at any time, use the overlay list command.
Whenever the target
program is allowed to run (by the step command), GDB will refresh
its overlay map by reading from the targets overlay tables.
The automatic mapping
may be temporarily overridden by the overlay map and overlay
unmap commands, but these mappings will last only until the next
time the target is allowed to run. To explicitly take control of GDBs
overlay mapping, switch to the overlay manual mode.
Debugging with overlays for M32R/D targets
When GDBs overlay support
(either manual or auto) is active, GDBs concept of a symbols address
is controlled by which overlays are mapped into which memory regions. For
instance, if you
print a variable that is in an overlay which
is currently mapped (located in its runtime address region) GDB will fetch
the variables memory from the runtime address. If the variables overlay
is currently not mapped, GDB will fetch it from its load-time address.
Similarly, if you disassemble
a function that is in an unmapped overlay, or use a symbols address to
examine memory, GDB will fetch the memory from the symbols load-time address
range instead of the runtime range. If GDBs output contains labels that
are relative to an overlays load-time address instead of the runtime address,
the labels will be distinguished like the following examples input shows.
(gdb) x /x foo
0x300000 <foo>: 0x2d7f4ffc
(gdb) overlay unmap .ovly0
(gdb) x /x foo
0x400000 <*foo*>: 0x2d7f4ffc
The asterisks (* )
around the foo label may be interpreted as meaning that this
is where foo is, but not where it will be when it is in use
by the target program.
The INFO ADDRESS
command can tell you what overlay a symbol is in, as well as where it
is loaded and mapped. The INFO SYMBOL command can list all of
the symbols that are mapped to an address.
(gdb) info
addr foo
Symbol "foo" is a function at address 0x300000,
-- loaded at 0x400000 in overlay section .ovly0.
(gdb) info symbol 0x300000
foo in mapped overlay section .ovly0
bar in unmapped overlay section .ovly1
Breakpoints for M32R/D targets
So long as the overlay sections
are located in RAM rather than ROM, GDB can set breakpoints in them. The
breakpoints work by inserting trap instructions into the load-time address
region. When the overlay is mapped into the runtime region, the trap instructions
are mapped along with it, and when executed, cause the target program to
break out to the debugger. If the overlay regions are in ROM, you can only
set breakpoints in them after they have been mapped into the runtime region
in RAM.