Profiler Concepts

Analyzer ››
Parent Previous Next

Table of Contents

1        Introduction        3

1.1        Related Documents        3

1.2        Measurable Events        4

1.3        What is an Event?        4

1.4        Which Event statistics are measured?        4

1.4.1.1        Count        7

1.4.1.1.1        Function count        7

1.4.1.1.2        Line count        7

1.4.1.1.3        Regular variable / state variable count        7

1.4.1.1.4        Analog AUX count        7

1.4.1.1.5        State / OS task / digital AUX state count        7

1.4.1.2        Period        7

1.4.1.2.1        Function period        7

1.4.1.2.2        Regular variable / state variable write period        7

1.4.1.2.3        State / OS task / digital AUX state period        7

1.4.1.3        Net Time        8

1.4.1.4        Gross Time        8

1.4.1.5        Call Time        8

1.4.1.6        Outside Time        9

1.5        What kind of a tool is required for profiling?        9

1.5.1        How long (time) can the profiler session be?        9

1.5.1.1        Trace data from the CPU        9

1.5.1.2        Trace buffer size        9

1.5.1.3        Upload while sampling        9

1.5.1.4        OCT program flow reconstruction complexity        9

1.6        Can profiler be used on all CPUs?        9

1.6.1        On-Chip Trace (OCT)        10

1.6.1.1        Program execution trace        10

1.6.1.2        Data trace        10

1.6.1.3        Instrumentation trace        10

1.6.1.4        ARM Naming        10

1.6.1.5        Nexus Naming        11

2        Data profiling        12

2.1        How it works        12

2.2        Requirements        12

2.2.1.1        Working around the number of hardware comparators        12

2.3        When is it used        12

2.3.1        Monitoring state variables        12

2.3.2        Monitoring regular variables        12

2.4        Profiling sub-variables        12

2.4.1        Endian issues        13

2.5        Instrumentation Trace Encoding        13

2.5.1.1        No encoding        14

2.5.1.2        Multiple, Stop.Start.Id.Little Endian data        14

2.5.1.3        Multiple, Stop.Start.Big Endian data.Id        14

2.5.1.4        Multiple, StopStart.Toggle.Big Endian data.Id        14

2.5.1.5        Single, Big Endian data.Id        15

3        Execution profiling        16

3.1        Requirements        16

3.2        When is it used        16

3.2.1        Identifying performance bottlenecks        16

3.2.2        Identify execution time deviations        16

3.2.3        Identify invocation period        16

3.3        Entry/Exit Mode        17

3.3.1        How it works        17

3.3.1.1        Advantages        17

3.3.1.2        Disadvantages        17

3.3.2        Assumptions        17

3.3.2.1        Assembler routines        18

3.3.3        Possible issues        18

3.3.3.1        Identifying high-level function exit points        18

3.3.3.2        Inaccurate trace        18

3.4        Range Mode        19

3.4.1        How it works        19

3.4.1.1        Advantages        19

3.4.1.2        Disadvantages        19

3.4.2        Assumptions        19

3.5        General Considerations        20

3.5.1        On-Chip Trace FIFO Overflows        20

4        Task profiling        21

4.1        Multi-tasking concepts        21

4.1.1        Task control block (TCB)        21

4.1.2        Task creation and termination        21

4.1.3        Task activation and deactivation        22

4.1.4        Task and Interrupt levels        22

4.1.4.1        Context nesting        23

4.2        Detecting task events        23

4.2.1        Detecting task creation and termination        23

4.2.2        Detecting task activation and deactivation        23

4.2.3        Detecting interrupt entry and exit        23

4.2.4        Detecting other OS events        24

4.3        Execution profiling in a multitasking environment        24

4.4        Possible issues        25

4.4.1        Identification of task activation.        25

4.4.2        Data and Program trace synchronization        25

4.4.2.1        Nexus bandwidth consideration (advanced)        25

4.4.3        Task Termination / Stack Kill        26

4.5        OSEK/ORTI        26

4.5.1        Possible issues        26

4.5.1.1        Untraceable ID problem        27

4.5.1.2        Task identification requires more bits than data/instrumentation trace supports        27

4.5.2        Task and IRQ level definitions        28

4.5.3        Task Termination / Stack Kill        28

4.5.4        Context nesting detection        29

4.5.4.1        OTM message generation        29

4.5.4.2        Nexus program/data trace synchronization        29

4.5.4.3        Task Switches        30

4.5.4.4        Tasks        30

4.5.4.4.1        WaitEvent        30

4.5.4.4.2        TerminateTask        30

4.5.4.5        Interrupt Services        31

4.5.4.6        ORTI file adjustment        31

4.5.4.6.1        Create vs_SIGNAL_ declaration entries        31

4.5.4.6.2        Create vs_SIGNAL_ implementation entries        32

4.5.4.6.3        Create vs_OSSIGNAL and vs_SIGNAL_vs_OSSIGNAL declaration entry        32

4.5.4.6.4        Create vs_OSSIGNAL_ and vs_SIGNAL_vs_OSSIGNAL implementation entries        32

4.5.4.7        winIDEA Profiler configuration        33

4.5.4.7.1        OS Configuration        33

4.5.4.7.2        Profiler Configuration        34

4.6        Custom operating system        35

5        Profiling via instrumentation        36

5.1        Preparing for instrumentation        36

5.2        Execution profiling via instrumentation        37

5.2.1        Preparing instrumentation IDs        37

5.2.2        Instrumenting the code        37

5.3        OS Event profiling via instrumentation        37

5.4        Monitoring user data via instrumentation        38



17Introduction

Profiler is a real-time analysis tool which uses the acquired low level trace information to derive higher level analysis of the application real-time behavior.

17.1Related Documents

Analyzer  interface manual for using the profiler.

Emulation Technical Notes for the respective CPU provide more specific information. Not all CPUs and emulation tools provide all the profiling capabilities which are discussed in this document.

17.2Measurable Events

Profiler can determine:

How many times an event occurred

How much time did the event last

What were the minimum, maximum and average event durations

What’s the period (time) between consecutive occurrences of the event – minimum, maximum and average

Sequence of various events

17.3What is an Event?

An Event can be any of these:

Writing a variable with a certain value – data profiling

Execution of a program function – execution profiling

Activation of an OS task – task profiling (measured through data trace or code instrumentation)

Auxiliary event measured on an AUX line – AUX profiling

17.4Which Event statistics are measured?

We measure counts, values and several timing measurements (period, net time, gross time, call time and inactive time). Each type of statistic is explained in more detail further in this chapter.


Functions Lines

Regular variables

Analog AUX

State variables

OS objects

Digital AUX

Entry/Exit variables

Count

Period


Net time




Gross time





Call time





Inactive time




Value











As an example, consider this simple program, where we examine functions main, f and g and the variables varF and stateF. We record program execution from the first source line of the main function and in addition record a regular variable varF and a state variable stateF.

int varF = 0;


void g()

{

 if (varF%2 == 0)

 {

   stateF = EVEN_STATE;

 }else

 {

   stateF = ODD_STATE;

 }

}


void f()

{

 varF++;

 g();

}


void main()

{

 for (I = 0; I < 3; ++I)

   f();

}

The profiler has recorded these events (function and data events are displayed separately for clarity):

Area

Event

Time

State of main

State of f

State of g

f

Entry

1 us

Suspended

Active

Inactive

g

Entry

3 us

Suspended

Suspended

Active

g

Exit

5 us

Suspended

Active

Inactive

f

Exit

6 us

Active

Inactive

Inactive

f

Entry

7 us

Suspended

Active

Inactive

g

Entry

9 us

Suspended

Suspended

Active

g

Exit

11 us

Suspended

Active

Inactive

f

Exit

12 us

Active

Inactive

Inactive

f

Entry

14 us

Suspended

Active

Inactive

g

Entry

16 us

Suspended

Suspended

Active

g

Exit

18 us

Suspended

Active

Inactive

f

Exit

19 us

Active

Inactive

Inactive

main

Exit

20 us

Inactive

Inactive

Inactive

The term Suspended means that the function is on stack, but it has called another function.


Area

Event

Time

Value of varF

State of stateF

varF

Write

2 us

1

Unknown state

stateF

Write

4 us

1

ODD_STATE

varF

Write

8 us

2

ODD_STATE

stateF

Write

10 us

2

EVEN_STATE

varF

Write

15 us

3

EVEN_STATE

stateF

Write

17 us

3

ODD_STATE

Trace modules typically produce events for data writes only.

17.4.1.1Count

17.4.1.1.1Function count

indicates the number of entries to the function. In case the function was not entered, but it was executed, then the count does not increase. The function exits are not considered.

Function main was executed, but the entry was not recorded, so the count remains 0.

Function f is entered 3 times, therefore the count is 3.

Function g is entered 3 times, therefore the count is 3.

17.4.1.1.2Line count

indicates the number of times the first instruction of the source line was executed.

17.4.1.1.3Regular variable / state variable count

indicates the number of writes to the variable (no value distinction).

varF was written to 3 times, therefore the count is 3.

stateF was writen to 3 times as well, therefore the count is 3.

17.4.1.1.4Analog AUX count

indicates the number of different levels measured (no level distinction).

17.4.1.1.5State / OS task / digital AUX state count

indicates the number of times a state was entered. It is possible that a variable enters the same state twice in a row (writes of the same value occur) – in this case winIDEA offers settings how to treat the successive writes to either count or dismiss them.

stateF has 2 possible states - ODD_STATE and EVEN_STATE.

ODD_STATE was entered 2 times, therefore the count is 2.

EVEN_STATE was entered 1 time, therefore the count is 1.

17.4.1.2Period

provides time measurement between events of the same type (difference between the timestamps of the two events) and offers the following information:

average period = (sum of all periods) / (period count)

maximum period (timing measurement, as well as information when it occurred)

minimum period (timing measurement, as well as information when it occurred)

At least two events of the same type are required for a period measurement (time before the first event and after the last event is ignored, as it could provide false information).

17.4.1.2.1Function period

Function period measures the time between function entries.

Period of the function main is not calculated, as there are no function entries recorded.
Period of the function f is calculated as follows:
Average period: ((7 us- 1 us) + (14 us – 7us)) / 2 = 6,5 us
Maximum period: 7 us (between 7 us and  14 us)
Minimum period: 6 us (between 1 us and 7 us)
Period information is available for function g as well, but is omitted in this example.

17.4.1.2.2Regular variable / state variable write period

Variable write period measures the time between writes to the variable.

17.4.1.2.3State / OS task / digital AUX state period

State period measures time between successive entries to the same state.


17.4.1.3Net Time

Net Time indicates the time spent in the body of the function – where the function state is Active.

In this case, this is the sum of intervals (1,2) + (3,4) + (5,6) + (7,8). The Net Time is thus 4 us.

17.4.1.4Gross Time

Gross Time indicates the time spent in the body of the function and in the called functions – where the function state is Active or Suspended.

In this case, this is the sum of intervals (1,4) + (5,8). The Gross Time is thus 6 us.

17.4.1.5Call Time

Call Time is the difference between function entry and exit. In an RTOS application, the execution can be forcefully removed from the normal program flow (context switch). The function Gross Time will not be affected, but the Call Time will show the time spent in other contexts.

In this example this time is identical to Gross Time, but if the example is enhanced with an OS task switch:

Area

Event

Time

State of f

main

Entry

0 us

Inactive

f

Entry

1 us

Active

g

Entry

2 us

Suspended

g

Exit

3 us

Active

f

Exit

4 us

Inactive

f

Entry

5 us

Active

TASK

Other

5.5 us

Active, out of context

TASK

MAIN

15.5us

Active

g

Entry

16 us

Suspended

g

Exit

17 us

Active

f

Exit

18 us

Inactive

main

Exit

19 us

Inactive

In this case the call time for second invocation is (5,18) = 13 us. In total (with the first call) it’s 16 us.

17.4.1.6Outside Time

Outside Time indicates the time spent outside the body of the function– where the function state is Inactive.

In this case, this is the sum of intervals (0,1) + (4,5) + (8,9). The Outside Time is thus 3 us.

17.5What kind of a tool is required for profiling?

To capture all events, the tool must provide:

real-time trace capability

high precision time measurement

a large enough recording buffer to capture the required sequence of events

17.5.1How long (time) can the profiler session be?

This depends on several factors:

17.5.1.1Trace data from the CPU

can be generated at a high bandwidth. On an ICE system, this can exceed 400MB/s, on a modern, wide OCT even more. If this data is unfiltered, it will quickly fill any trace buffer.

On an ICE system, qualifiers (online address comparators) are used to record only events of interest and discard all other bus activity.

On an OCT system, the OCT qualifier is configured to report only program trace for example, or just data accesses to a certain variable.

17.5.1.2Trace buffer size

If trace data bandwidth is filtered downto 50 MB/s, a 1GB trace buffer will record 20s.

17.5.1.3Upload while sampling

If an optimal hardware configuration (including the PC) is used, over 40 MB/s can be streamed to the PC via high-speed USB2. If trace data bandwidth is filtered below this, a profiler session can last indefinitely.

17.5.1.4OCT program flow reconstruction complexity

OCT program trace must be reconstructed by the PC. If strong compression is used, more processing is required for every trace message. If the PC can not keep up, profiler session will end. A faster PC might improve this.

17.6Can profiler be used on all CPUs?

The CPU activity of interest must be visible to the tool. This is possible on CPUs where:

CPU core bus is visible externally – typically older CPUs with no on-chip memory or newer 8 and 16-bit CPUs which can be put (by an ICE) into special emulation mode

CPU’s on-chip trace provides the required information – newer CPUs with on-chip trace port (ETM, Nexus, etc.)

17.6.1On-Chip Trace (OCT)

On-chip trace is typically used on single-chip CPUs, which keep all required memory inside the CPU package, or the on-chip CPU pipeline and cache would obscure the real CPU activity.

CPU employs a compression technology to reduce the traffic over the OCT port. Where a classical ICE approach would use 70 CPU pins, the OCT port usually does fine with 16 or less. The OCT port does not show addresses and data in classical sense, but rather “send messages” which the trace tool then uses to reconstruct the activity in the CPU.

A typical program trace message is a branch message, which can say as much as “a branch was executed after 4 sequential instructions”. The tool must then reconstruct the program flow from its knowledge of the downloaded code and the CPU state from the previous message.

17.6.1.1Program execution trace

can be highly compressed and therefore only a few package pins are required to stream it out. Many modern CPUs provide it.

17.6.1.2Data trace

is less compressible and would require too many package pins to make all data accesses visible. A compromise approach is to configure the CPU’s OCT module (at runtime) to show only select accesses, for example only write accesses to a certain variable. This is sufficient for data and task profiling, but unfortunately many silicon vendors choose not to implement data trace at all.

17.6.1.3Instrumentation trace

is similar to data trace, but the trace message is explicitly generated by execution of a dedicated CPU op-code or by writing a special register. To generate instrumentation messages, the application must be modified (instrumented) to generate them at appropriate locations.

On ARM CPUs this is called ITM.

On CPUs with Nexus OCT port, this is called OTM. On most PowerPC processors this message is emitted when MMU configuration is modified and its usage is thus restricted to applications who do not make use of MMU.
Some newer Nexus CPUs also implement the DQM protocol which is, much like the ARM’s ITM, dedicated to instrumentation trace.

The size of the data transmitted in an instrumentation message depends on the CPU. Sizes range from 8 to 32 bits.

Instrumentation trace requires less data to be transmitted over the OCT port (address is not given, just data value) and is in its nature less frequently used as a memory write. Most CPUs with program trace also implement instrumentation trace, without having to increase the number of OCT pins.

17.6.1.4ARM Naming

ETM = always program trace, sometimes also data trace

DWT = data trace

ITM = instrumentation trace

HTM = AHB bus trace

The term ETM refers to an OCT ‘Macrocell’, but is for historical reasons often used to describe the entire OCT system as well as the OCT external port.

17.6.1.5Nexus Naming

level 1 = no trace, just debugging, usually IEEE1149 JTAG

level 2 = program trace

level 2+ = program trace and instrumentation trace (OTM)

level 3 = program, instrumentation and data trace

level 4 = level 3, plus capability to ‘emulate’ memory locations where the external tool supplies the data to the requested memory location.

18Data profiling

18.1How it works

The profiler is configured to record write accesses to a specific memory location. Whenever the location is written, the value and the time are recorded.

18.2Requirements

The memory access (address and data value) must be visible. This is the case on a CPU with core bus visibility or a CPU with on-chip Data Trace or Instrumentation Trace.

18.2.1.1Working around the number of hardware comparators

On OCT systems the on-chip data address comparators are used to qualify the generated data. These comparators are usually few in numbers. If more variables are profiled, the profiler can merge multiple areas of interest in a single range.

This provides the capacity to profile unlimited number of variables, but if frequently written variables lie in the merged range, the data trace bandwidth will increase which could lead to trace port overflow. In such case the data variable layout in the application could be rearranged so that variables of interest are located closer together.

18.3When is it used

18.3.1Monitoring state variables

A state variable is a (global) program variable which indicates the state of the application. It will typically assume only a small number of different values. Transition to any state is considered an Event and statistics is maintained for every state.

Example: state of a traffic light. For every state (Red, Green, flashing,…) the count/duration/period statistics are provided.

18.3.2Monitoring regular variables

A regular variable can assume many different values. Every change is considered an Event. Only single statistic (count, period) is maintained for entire variable.

Example: readout of a temperature sensor. The A/D converter typically supplies thousands of different values, where statistics about every measurable temperature is not of interest. Profiler will show how the temperature changed over time, rate of change, etc.

18.4Profiling sub-variables

If a data object is composed of several distinct sub-items (bit fields,…), a part of the data object can be extracted and profiled separately.

struct

{

 int b1:3;

 int b2:2;

 int b3:5;

 int b4:4;

} S;                  

In the above example a 16-bit variable holds several bit fields. When any bit-field is accessed, the CPU will write the entire 16-bit variable and the entire access will be visible to the trace.

To profile just b2, which is 2 bits wide and is placed at bit 3 of the S variable, its data area configuration should look like:

18.4.1Endian issues

Care should be taken to correctly interpret data layout on big and little endian systems. Consider this example:

union

{

 char c[4];

 long l;

} U;

On little endian machine U.c[0] is the LSB of U.l. To profile it, this configuration would be used:

On big endian machine U.c[0] is MSB of U.l. To profile it, this configuration would be used:

Note: this example illustrates the effects of endian ordering. In the above case U.c[0] could be specified as the variable itself. To profile non-C data constructs for which the debug information is not available however, the memory layout must be considered.

18.5Instrumentation Trace Encoding

Some CPUs which have no data trace, implement instrumentation trace, which allows the application to send a signal to the profiler.

Since instrumentation trace is on some CPUs limited to 8-bits, the signal cannot transmit all the necessary data.

To circumvent this limitation, the profiler supports instrumentation trace encoding – a dedicated sequence which allows transmission of more than 8 bits using a sequence of instrumentation trace messages.

The concept of an ID is introduced, using this, the application can signal for example a task switch using ID=0, and an IRQ execution using ID=1.

Following encodings are supported:

18.5.1.1No encoding

No encoding is used. Every value seen on the instrumentation trace is used directly.

18.5.1.2Multiple, Stop.Start.Id.Little Endian data

Multiple instrumentation messages can be used using this encoding:


7

6

5

4

3

2

1

0

First

STOP

START

ID/DATA

ID/DATA

DATA

DATA

DATA

DATA

Following

STOP

0

DATA

DATA

DATA

DATA

DATA

DATA

ID size is defined in the encoding configuration dialog. These sizes are available:

0 – no ID is used, 6 bits data are available in first message

1 – one bit is used, 5 bits data in the first message

2 – two bits ID, 4 bits data in first message.

DATA is transmitted in little endian format

Example:

Transmit a value of 0x1234, with a 2-bit ID = 1


7

6

5

4

3

2

1

0

VAL

First

0

1

0 (ID)

1 (ID)

0

1

0

0

0x4

2nd

0

0

1

0

0

0

1

1

0x234

3rd

1

0

0

0

0

1

0

0

0x1234

18.5.1.3Multiple, Stop.Start.Big Endian data.Id

This encoding is identical to Multiple, Stop.Start.Id.Little Endian data , only data is transmitted in big endian order.


7

6

5

4

3

2

1

0

VAL

First

0

1

0 (ID)

1 (ID)

0

0

0

1

0x1

2nd

0

0

0

0

1

0

0

0

0x48

3rd

1

0

1

1

0

1

0

0

0x1234

18.5.1.4Multiple, StopStart.Toggle.Big Endian data.Id

This encoding is a standard extension to ORTI operating system. For more information refer also to Task identification requires more bits than data/instrumentation trace supports on page 25.


7

6

5

4

3

2

1

0

First

1

0

DATA

DATA

DATA

DATA

DATA

DATA

Following

0

TOGGLE

DATA

DATA

DATA

DATA

DATA

DATA

Last

1

1

DATA

DATA

ID/DATA

ID/DATA

ID/DATA

ID/DATA

TOGGLE – changes with each subsequent message

Note: a minimum of two messages is required with this encoding even for a single bit value

Example:


7

6

5

4

3

2

1

0

VAL

First

1

0

0

0

0

1

0

0

0x1

2nd

0

0

1

0

0

0

1

1

0x48

3rd

1

1

0

1

0

0

0 (ID)

1 (ID)

0x1234

18.5.1.5Single, Big Endian data.Id

This encoding uses a single message. Compared to No encoding, the ID can be transmitted.


7

6

5

4

3

2

1

0

Message

DATA

DATA

DATA

DATA

ID/DATA

ID/DATA

ID/DATA

ID/DATA

Example:

Transmit a value of 0x34, with a 2-bit ID = 1


7

6

5

4

3

2

1

0

VAL

Message

1

1

0

1

0

0

0 (ID)

1 (ID)

0x1


19Execution profiling

19.1Requirements

Program execution must be visible. This is the case on a CPU with core bus visibility or a CPU with on-chip Program Trace.

19.2When is it used

19.2.1Identifying performance bottlenecks

In any application only a very small percentage of program functions will have a large impact on overall performance. Once these are located, optimization of their code or the algorithm can yield significant performance gain.

Profiler can be configured to profile only suspected or all functions. The results will show which functions execute frequently (and are perhaps candidates for inlining) and which take the large percentage of execution time (and should be manually optimized or revised).

19.2.2Identify execution time deviations

Sometimes a function is expected to execute within a narrow time frame. The profiler provides minimum and maximum execution times for all invocations of the function – which can confirm this assumption or show which sequence of events lead the function to deviate from it.

Example: starting and completing an A/D conversion is expected to take no less than 5 us (which is the minimum time for CPU’s ADC to stabilize) and 7 us (which is the total time we allow the function to perform all the necessary setup and cleanup actions.

19.2.3Identify invocation period

A function might be required to execute a specified number of times every second. The profiler provides minimum and maximum times between any two consecutive invocations of the function – which can confirm this assumption or show which sequence of events caused the deviation.

Example: Checking a motion sensor is required at least once per second, but checking it more often than every 800 ms wastes energy.

19.3Entry/Exit Mode

19.3.1How it works

The profiler is configured to record executions of instructions at a specific function’s entry and exit point. When any of these instructions is executed, the instruction address and the time are recorded.

Example: a function like this:

int min(int a, int b)

{

 return (a < b) ? a : b;

}

Yields code like this:

          min

          {

40002194   mr            r0,r3

40002198   mr            r3,r4

          return (a < b) ? a : b;

4000219C   cmp           7,0,r4,r0

400021A0   bclr          4,29

400021A4   mr            r3,r0

          min_EXIT_

          }

400021A8   blr          

When this function is profiled, executions of instructions on addresses 40002194 and 400021A8 will be recorded.

19.3.1.1Advantages

The information obtained in Entry/Exit mode is most accurate.

If the profiler hardware has access to real-time program flow (ICE bus access), hardware filtering can be used to reduce the amount of trace information considerably.

Only a few functions can be selected for profiling. If these follow the Assumptions below, the rest of the application can use optimizations and techniques which would break the Entry/Exit algorithm.

19.3.1.2Disadvantages

This mode relies heavily on accurate Exit information. High compiler optimizations can obscure this and render this mode unusable.

19.3.2Assumptions

Execution profiling analysis assumes regular function entry and exit sequence. This application:

void g()

{

}

void f()

{

 g();

}

is expected to yield this sequence:

f

g

g_EXIT_

f_EXIT_

A sequence like the one below (where exit from g() is not detected) is incorrect and the profiler will abort with an incorrect entry/exit sequence error:

f

g

f_EXIT_

19.3.2.1Assembler routines

Routines written in assembly language can have an arbitrary number of return points and lack the symbolic information which would allow their automatic identification.

To be able to profile such routines, the exits can be configured manually in the profiler configuration dialog, or preferably, each exit from the routine is given a symbolic name following the exit naming convention.

Example:

MyRoutine:

 cmp R0,#3

 ble L1

MyRoutine_EXIT_:

 blr

L1:

 sub R0,#1

MyRoutine_EXIT_2:

 blr

Exits named in this manner are detected automatically.

19.3.3Possible issues

19.3.3.1Identifying high-level function exit points

Function entry and exit information is obtained from the download file which contains symbolic information. While function entry points are always well defined, only few compilers generate information on where a function exits (note that this is not necessarily the last byte in the function space).

To identify the exit points, the function code is analyzed at download time. Locations where exits have been identified are given an artificial symbol named <function name>_EXIT_[<index>], for example:

min_EXIT_

min_EXIT_2

If an exit is incorrectly identified, the profiler results can be incorrect, but usually the profiler session will fail due to incorrect function entry/exit sequence.

Correctness of function exit identification can be checked in the disassembly window. If a discrepancy is detected, technical support should be notified. Until a solution is provided, the exit can be specified manually in the profiler configuration dialog for the function.

19.3.3.2Inaccurate trace

Ideally the CPU would provide accurate execution information, but this is almost never the case. One of the most common deviations is an op-code refetch after an interrupt service. In this case the CPU fetches an op-code and indicates that it is being executed, but a pending interrupt causes this instruction to be cancelled. After the interrupt returns, the instruction is fetched and executed again. This can give a false impression that the op-code was executed twice.

Example:

          min

          {

40002194   mr   r0,r3  <- this op-code is interrupted by an IRQ and refetched later

40002198   mr   r3,r4

The resulting trace then appears like this:

min

IRQHandler

IRQHandler_EXIT_

min

this for the profiler is indistinguishable from a function which would look like this:

void min()

{

 IRQHandler()

 min(); // recursive call

}

Through code analysis and understanding of the trace protocol such situations are filtered out, but locating all such deviations of the CPU is a trial and error process.

19.4Range Mode

19.4.1How it works

The profiler monitors continuous stream of program flow. When PC moves from a body of one function to another, profiler considers this a call of a function or return from the current function (depending on the type of instruction which caused the change of flow).

19.4.1.1Advantages

Quality of debug information is not as critical as in Entry/Exit mode.

Compiler and OS optimizations do not affect the profiler.

Task switches do not need to be recorded. If data trace is not available, Range mode profiler still works.

Function tail optimizations are supported:

tail-merge: this optimization effectively moves part of function (A) code body into another function (B). In range mode, execution in function B would be attributed to function B, instead of the optimized function A.

tail-call: this optimization occurs when function(A) calls another function(B) just before it exits. Instead of using a call op-code, a branch is used. When function B returns, effectively function A returns too.

Note: profiling tail-optimized functions requires much more recorded data and incurs a lot of additional processing. It is strongly recommended to disable such optimizations on the functions which are profiled.

19.4.1.2Disadvantages

Full program flow access is required, which is unpractical for bus trace systems.

19.4.2Assumptions

Debug information must provide accurate enough information about function location and size.

The profiler assumes that the function will always enter in the same (starting) address. Routines which are reported as functions but can enter anywhere in their body, are detected at run-time and are considered non-functions. One typical example is interrupt vector table, which enters on address that matches the active interrupt. Some compilers incorrectly report such tables as functions.

For non-functions the only measurable quantity is the Net time spent in the body.

19.5General Considerations

19.5.1On-Chip Trace FIFO Overflows

If the application code uses dense indirect branches (return from function, call function via pointer, frequent IRQs), the OCT FIFO cannot keep up with the amount of generated messages and on-chip trace overflows are reported.

Possible solutions:

Check if the CPU can use a wider OCT port. This is configurable on some CPUs

Check if the OCT clock can be increased or double data rate / half rate clocking can be used. This is configurable on some CPUs.

Check if the Profiler configuration dialog provides ‘Stall CPU to avoid overflows’ option (or an option with equivalent meaning). If the option is available and checked, the internal on-chip trace logic will stall the CPU until there is free space available for a new message in the on-chip trace FIFO.

Note that depending on the CPU and the application, the run time performance can be affected by this option significantly. If absolutely no impact on real-time execution is required, then alternative solutions must be used.

This option is available depending on the microcontroller architecture.

Check if compiler optimizations are the cause. Especially when optimizing for size, the compiler can move typical function prolog and epilog code in a separate routine – but this triples the number of direct and indirect messages for a simple function.

Use code instrumentation as explained in section Execution timing via instrumentation.

20Task profiling

20.1Multi-tasking concepts

In a multitasking environment, usually provided by an operating system, apparently multiple parallel operations are executed. Every such operation is called a Task. In reality only one task is active at a time. The operating system decides which task should be run based on task priorities, states, synchronizations etc.

20.1.1Task control block (TCB)

For a task to be independent of other tasks, it keeps its own set of registers. When the task is executing, these are the regular CPU core registers.

When the OS deactivates a task, all registers are saved to a structure called task control block which is kept in global memory of the OS. The registers include the program counter and the stack pointer.

When the OS activates the task again, the registers previously saved in the task control block are restored to CPU registers and execution resumes at the point where it was previously interrupted.

20.1.2Task creation and termination

An application still starts with a single task - at function ‘main’, but can then create more tasks. To create a task, it calls into the OS and specifies the task control function and usually the stack size for the task. This function behaves just like function ‘main’. When it exits, the task terminates.

void main()

{

 OSCreateTask(Task1ControlFunc, 100); // create a task, specify control func and stack size


// perform main task activities

 while (WorkToBeDoneInMain)

 {

   DoMainWork();

 }

}


void Task1ControlFunc()        // this function is called first after task is created

{

 while (WorkToBeDoneInATask)

 {

   DoTaskWork();

 }

}                              

The OSCreateTask will:

Create a new task control block

Set TCB’s initial value of the program counter to address of Task1ControlFunc

Allocate 100 bytes of stack space

Set TCB’s initial value of the stack pointer to the address of allocated stack

Put the (pointer to) TCB in the list of task for the scheduler to process.

When Task1ControlFunc ultimately exits, the OS will perform a cleanup:

Remove the TCB from the scheduler list

Free the allocated stack space

Free the TCB structure

Even though the real task lifetime exists between a call to OSCreateTask and final cleanup, it is usually considered to be the progress from the control function entry and exit. This function is entered and exited only once.

20.1.3Task activation and deactivation

To execute multiple tasks in ‘parallel’, the operating systems switches between tasks. Typically every task will only be given a few milliseconds to run in order to achieve an illusion of parallelism.

When a task is allowed to run, it is activated and considered active. When the OS scheduler decides that a different task should run, this task is deactivated.

In the lifetime of a task, it can be activated and deactivated many times.

20.1.4Task and Interrupt levels

When regular application code is executing, it is considered to run on task level. This execution can be preempted by interrupts, which can in turn be preempted by higher priority interrupts.

Unlike task IDs, interrupt IDs have an inactive state. When the IRQ ID assumes this value, this indicates that this IRQ level is not active.

Area

Event

Context

IRQ level

Comment

main

Entry

Default

0 / none

main enters, OS not started yet

StartOS

Entry

Default

0 / none

OS is started, no task activated yet

TASK

0

TSK:0

1 / Task

Task 0 is activated

TASK

1

TSK:1

1 / Task

Task 0 is deactivated, Task 1 is activated

IRQ0

1

IRQ0

2 / IRQ0

IRQ0 preempts Task 1

IRQ1

1

IRQ1

3 / IRQ1

IRQ1 preempts IRQ0

IRQ1

0

IRQ0

2 / IRQ0

IRQ1 exits, IRQ0 resumes

IRQ0

0

TSK:1

1 / Task

IRQ0 exits, Task 1 resumes

TASK

0

TSK:0

1 / Task

Task 1 is deactivated, Task 0 is activated

The profiler maintains separate context for every task (above for Task 0, and Task 1), and a separate context for every IRQ level above task level. If different IRQ0 IDs are used by the application, independent of the ID all activity is stored in the IRQ0 context - an IRQ cannot preempt an executing IRQ handler of  the same level.

20.1.4.1Context nesting

To ensure correct identification of context nesting, the OS must signal also the manner in which an IRQ exits and how a task deactivates.

These are the different paths:

return to a preempted task/IRQ (RET):

When IRQ exits normally via a return from interrupt instruction.

return to OS scheduler (RET_OS), typically:

When task is terminated (e.g. inside TerminateTask function)

When task enters a wait (e.g. inside WaitEvent function)

When ISR exits to scheduler (when another IRQ is pending or the servicing of the IRQ has made a higher priority task ready to run)

An example for ORTI is given in a section 4.5.4.

20.2Detecting task events

20.2.1Detecting task creation and termination

In a static embedded OS (number of tasks known at compile time), each task usually has a dedicated control function. In this case the creation and termination can be observed by execution profiling the task control function.

The gross time will match the total time spent executing the task.

The count will match the number of times the task has been created. It will usually be just one.

20.2.2Detecting task activation and deactivation

Determining the active task is the key point to profiling in a multitasking environment. Besides giving information about task activation frequency, run time etc. it is mandatory to allow execution profiling.

The OS scheduler always keeps track of the active task in a global OS variable, usually a pointer to the active task’s control block.

By data profiling this variable, both the task profiling and the execution profiling are made possible.

If the CPU does not provide data trace, the value of this variable (or the task ID) must be signaled using instrumentation trace.

20.2.3Detecting interrupt entry and exit

To correctly identify task and function’s own run-times, the activity performed in interrupts must not be included into their run-time. If interrupts are signaled, the profiler will be able to distinguish one from the other.

Some OSes signal the IRQ in a similar fashion as the active task ID. If this is the case, the signaling object must be designated as an IRQ level object of corresponding priority.

If no signaling is provided by the OS, it can be added by instrumenting the code. The instrumentation should signal IRQ entry as close as possible to start of IRQ handler and IRQ exit as close as possible to the return from it.

char g_cIRQ = 0;

void IRQHandler()

{

// indicate that IRQ is executing

 g_cIRQ = 1;


// perform IRQ service

 ...


// indicate that IRQ is exiting

 g_cIRQ = 0;

}

20.2.4Detecting other OS events

If the OS provides signaling for other events, like the service it is currently executing, regular data profiling is used to capture these signals.

20.3Execution profiling in a multitasking environment

Because the OS scheduler can interrupt a regular program flow, the execution profiling assumption no longer stands. This code:

void main()

{

 OSCreateTask(Task1ControlFunc, 100); // create a task, specify control func and stack size

// perform main task activities

 while (WorkToBeDoneInMain)

 {

   DoMainWork();

 }

}

void Task1ControlFunc()        // this function is called first after task is created

{

 while (WorkToBeDoneInATask)

 {

   DoTaskWork();

 }

}                              

Can very well generate this event sequence (leading number is the timestamp of the event):

10 DoMainWork

20 DoTaskWork

30 DoMainWork_EXIT_

40 DoMainWork

50 DoTaskWork_EXIT


this will cause a multitasking unaware profiler to abort with invalid function entry/exit sequence error.

If however the active task is traced along, the same sequence now looks like this:

5  TASK: 0        // ID of the main task

10 DoMainWork

15 TASK: 1        // ID of Task1ControlFunc’s task as it got activated

20 DoTaskWork

25 TASK: 0        // ID of the main task

30 DoMainWork_EXIT_

40 DoMainWork

45 TASK: 1        // ID of Task1ControlFunc

50 DoTaskWork_EXIT_

The execution profiler can now look at each task separately and the regular entry/exit sequence is used again:

TASK: 0

10 DoMainWork

Suspend from 15-25

30 DoMainWork_EXIT_

40 DoMainWork

Suspend from 45-


TASK: 1

20 DoTaskWork

Suspend from 25-45

DoTaskWork_EXIT_

This ‘rearrangement’ also affects the timestamps and considers the suspended time.

20.4Possible issues

20.4.1Identification of task activation.

If data trace is available, it’s just a matter of identifying the global OS variable which keeps the active task identification.

If data trace is not available, the OS itself must provide active task signaling using instrumentation trace, which can be realized only by the OS vendor.

Some operating systems provide a mechanism called pre/post task hook, where the application registers a function with the OS, which the OS will call whenever a task switch occurs:

void PreTaskHook(TaskType TaskID)

{

 GenerateInstrumentationMessage(TaskID);  // generate an instrumentation

                                          // message with the value given in the parameter

}


20.4.2Data and Program trace synchronization

On on-chip trigger trace configurations, the data/OTM and program trace streams are not completely synchronized. While a data/OTM messages will be generated immediately, the program flow is accumulating highly compressed information, which is transmitted via trace message only when and indirect jump is taken.

Thus it can appear that a data access was performed much before the code that generates it was executed. If the function which generates the data/OTM must also be profiled, then program message flush must be enforced by modifying program code. Mostly this can be achieved by simply calling an empty void function prior to the data/OTM generation.

void Empty() {}; // just return


void PreTaskHook(TaskType TaskID)

{

 Empty();       // call Empty function

 GenerateInstrumentationMessage(TaskID);  

}

Note: make sure that the compiler optimizations don’t inline the Empty function.

20.4.2.1Nexus bandwidth consideration (advanced)

Multiple consecutive indirect branches could fill the on-chip trace FIFO, which would cause the profiling session to abort. With careful analysis of the code, some savings can be obtained.

If the code prior to GenerateInstrumentationMessage() call just returned from some function, the program trace was already flushed and the call to Empty() is not required.

20.4.3Task Termination / Stack Kill

The execution profiler expects functions which enter to also exit. An operating system can however determine that the entire task should be terminated, while the task’s stack is still alive. The OS will in such case not return from the scheduler and complete the remaining functions. From the perspective of the profiler, these functions are merely suspended and will continue once that task is resumed.

To compensate for this effect, the profiler can be configured to be aware of these stack killers. When execution in the stack killer is detected, all active functions in the task are considered preemptively terminated.

20.5OSEK/ORTI

The internal layout of an OSEK operating system is described by an ORTI file. This file describes amongst other things how many tasks there are (all tasks are known at compile time) and how to determine which task is active.

Per ORTI standard, a RUNNINGTASK ‘object’ defines the active/running task. This example defines two tasks named Task_1 and Task_2, plus a state where no task is running – NO_TASK.

IMPLEMENTATION OS_XY {

  OS {

     ENUM  [

        "NO_TASK" = 0xFFFF,

        "Task_1" = 0,

        "Task_2" = 1,

     ] RUNNINGTASK, "Running Task Identification";

   ...

Later in the file (in the information section), the location of the RUNNINGTASK object is defined.

OS xx {

  RUNNINGTASK = "osActiveTaskIndex";

These two definitions mean:

To find out which the active task is, evaluate "osActiveTaskIndex"

If the evaluation yields a value of 0, Task_1 is active, if it’s a 1, Task_2 is active, if it’s 0xFFFF, NO_TASK is active.

When CPU is stopped, the active task is obtained by reading the osActiveTaskIndex  variable.

When CPU is running (during profiling session), value of the osActiveTaskIndex is obtained by tracing write accesses to it.

20.5.1Possible issues

In the above example, the variable osActiveTaskIndex keeps the ID of the active task. As far as the OS is concerned, this variable is just overhead. The optimal concept for it is to:

Have a TCB for every task

Have a global pointer to the active TCB

Task ID is then just a member in the TCB structure:

struct TCB

{

 unsigned short ID; // ID of the task

 unsigned long StackBase;

 unsigned long PC;

 unsigned long SP;

 ...

};

TCB tcbTask_1, tcbTask_2; // TCBs for the two tasks


TCB * pActiveTCB; // global variable pointing to the active task’s control block

Switching to a different task is now just a matter of setting the pointer to point to a different TCB. However the following problems can occur:

20.5.1.1Untraceable ID problem

To get the task ID, some OSes report this in the ORTI file:

OS xx {

  RUNNINGTASK = "pActiveTCB->ID";

  ...

This can be evaluated when CPU is stopped, but when the pActiveTCB is changed at runtime, the trace will only see the new value of pActiveTCB but not the value of the ID.

Solution 1

Have the OS report the ID via global variable as per usual practice. This can sometimes be forced by OS configuration option, or by a fix provided by the OS vendor.

Solution 2

The ORTI file must be modified (manually or by OS vendor fix), where the value of the pointer identifies the task:

IMPLEMENTATION xx {

  OS {

     TOTRACE ENUM  [

        "NO_TASK" = 0,

        "Task_1" = “&tcbTask_1”,

        "Task_2" = “&tcbTask_2”,

     ]


OS xx {

  RUNNINGTASK = "pActiveTCB";

  ...

In this case a change of the value of pActiveTCB is traced and the value of the pointer can be matched directly to the task.

This can lead to the next problem:

20.5.1.2Task identification requires more bits than data/instrumentation trace supports

If the active task is identified by value of a pointer, full pointer value must be visible. Problems arise when:

Tracing a 16-bit pointer on 8-bit CPU, where the value is transmitted in two (not necessarily consecutive) writes.

Data trace is not available, and instrumentation trace is used for signaling, but the instrumentation trace is only 8-bits wide.

Solution 1

Have the OS report the ID via global variable as per usual practice. This can sometimes be forced by OS configuration option, or by a fix provided by the OS vendor.

Solution 2

Implement further instrumentation to transmit all 32-bits through several instrumentation messages. Since this incurrs much more time and code size overhead than the recommended solution, it should be avoided if possible.

An extension to OSEK and ORTI has been implemented by several OS vendors which allows the OS to inform the tool via ORTI file on how the task switches are signaled. Contact your OS vendor to verify if these extensions have been implemented in your version.

The ORTI file using extension could look like this:

OS VendorX_OS {

 RUNNINGTASK = "pActiveTcb";

 RUNNINGISR2 = "activeCat2IsrId";

 RUNNINGISR  = "activeCat1IsrId"

 vs_SIGNAL_RUNNINGTASK = "OTM.1.2.0";

 // RUNNINGTASK signaling uses OTM version 1

 // ID size = 2 bits, ID = 0

 vs_SIGNAL_RUNNINGISR2 = "OTM.1.2.1";

 // RUNNINGISR signaling uses OTM version 1

 // ID size = 2 bits, ID = 1

 vs_SIGNAL_RUNNINGISR = "OTM.1.2.2";

 // RUNNINGISR signaling uses OTM version 1

 // ID size = 2 bits, ID = 2

};

20.5.2Task and IRQ level definitions

Per default these levels are used for ORTI objects:

Object

IRQ level

RUNNINGTASK

Task

RUNNINGISR2

IRQ(0)

RUNNINGISR

IRQ(1)

If different object naming is used or IRQ levels are different, the default configuration can be changed in OS Setup

20.5.3Task Termination / Stack Kill

An OSEK task must per specification end with a call to TerminateTask function.

To correctly profile functions within tasks which are (periodically) created and terminated, the actual TerminateTask function, or the OS scheduler function must be specified as stack killer.

While TerminateTask is standardized by OSEK API, most implementers define this as a preprocessor macro and the actual function has a different name. To find out the actual name of the function, it is usually best to inspect the disassembly code and use the symbolic name displayed there.

20.5.4Context nesting detection

This is a use case on an MCU which does not support data trace, with an OSEK OS with basic signaling.

20.5.4.1OTM message generation

OTM messages are generated by writing a value to the PID register. Each OTM message can send 8 bits of data.

ORTI signaling extensions specify two protocols for sending a value plus an ID.

The user application implements three signaling objects/IDs:

Tasks (RUNNINGTASK)

Category 1 ISRs (RUNNINGISR)

Category 2 ISRs (RUNNINGISR2)

Maximum number of distinct values within an ID is 14.

For this purpose the simpler and faster v0 signaling protocol can be used. 2 bits are used to encode the ID, 6 bits remain for the value.

In addition (to allow context nesting profiling), the vs_OSSIGNAL signaling must be implemented.

This ID designation was chosen:

Object

ID

RUNNINGTASK

0

RUNNINGISR2

1

RUNNINGISR

2

vs_OSSIGNAL

3

20.5.4.2Nexus program/data trace synchronization

Due to the nature of the Nexus trace, additional measures must be taken to flush the pending program trace message before the OTM message is generated. This is ensured by taking an indirect jump before writing the PID register, best achieved by calling an empty function.

Signaling code could look like this:

#define ID_TASK 0

#define ID_ISR2 1

#define ID_ISR1 2

#define ID_OSSIGNAL 3

void PTM_Flush() {};

void OTM_Signal(int value, int ID)

{

 PTM_Flush();

 register int otm = (value << 2) | ID;

 _asm(“mtpid %otm”);

}

Note: For performance reasons the OTM_Signal should be inlined, but not PTM_Flush.

20.5.4.3Task Switches

Prior to activation of a task, the OS calls this function:

os_result_t OS_KernActivateTask(os_taskid_t t)

{

 OTM_Signal(t, ID_TASK); // signal new task ID = t

 ...

The newly active task ID must be signaled via OTM as depicted in the highlighted code.

20.5.4.4Tasks

A task should be considered suspended when it is preempted by an ISR. No special provision in task routines is required.

A task should be considered deactivated when it enters a wait (WaitEvent call), or when it terminates. For this purpose two instrumentation points are required:

20.5.4.4.1WaitEvent

The WaitEvent routine is instrumented on entry (NO_TASK + RET_OS) and on exit (the actual task ID):

os_result_t WaitEvent(os_eventmask_t e)

{

 os_result_t r;

 os_uint8_t t_TaskId;

 GetTaskID(&t_TaskId); // remember active task ID 

 // PTM_Flush() not required after just returning

  OTM_Signal(NO_TASK, ID_TASK);

  OTM_Signal(RET_OS, ID_OSSIGNAL);

 
 r = WaitEvent1(e); // renamed original WaitEvent

 
 // PTM_Flush() not required after just returning

  OTM_Signal(t_TaskId, ID_TASK);

 
 return r;
}

20.5.4.4.2TerminateTask

void TerminateTask(...)

{

  OTM_Signal(NO_TASK, ID_TASK);

  OTM_Signal(RET_OS, ID_OSSIGNAL);

...

}

20.5.4.5Interrupt Services

The application uses two interrupt categories: Cat1 and Cat2, which call functions OS_Cat1_Entry and OS_Cat2_Entry respectively. These functions should:

Signal activation of the ISR

Call the application function which handles the ISR

OSSIGNAL RET or RET_OS depending on where they are exiting

Note: since the ISR signal occurs within the body of the OS_Cat<X>_Entry, this function cannot be profiled as the entry, body and exit from the function will appear to the profiler to occur in different contexts.
For this reason the whole processing is moved to OS_Cat<X>_Entry_1 function. The OS_CatX_Entry() serves as a simple ISR signaling wrapper.

void OS_Cat1_Entry(int iid)

{

 OTM_Signal(iid, ID_ISR1);              // signal new ISR1 ID = t

 OS_Cat1_Entry_1(iid);                  // call application ISR function

}

void OS_Cat1Exit()

{

  if (..) // decide whether to call OS_CommonIsrExit

    goto OS_CommonIsrExit;

  OTM_Signal(RET_OS, ID_OSSIGNAL); // ISR exiting, returning to scheduler

  goto OS_Dispatch;

}

void OS_CommonIsrExit()

{

  OTM_Signal(RET, ID_OSSIGNAL); // ISR is terminating, returning to previous context

  ...

}

And likewise for OS_Cat2_Entry/Exit.

20.5.4.6ORTI file adjustment

The original ORTI file provides information about task IDs and ISR IDs. To allow the tool to automatically recognize task and ISR events, the ORTI signaling extensions must be implemented manually.

20.5.4.6.1Create vs_SIGNAL_ declaration entries

The IMPLEMENTATION section holds definitions for RUNNINGTASK:

ENUM [ "TASK_a" = 10,... ] RUNNINGTASK, "Running task identification";

And RUNNINGISR2:

ENUM ["NO_ISR" = 0, ...] RUNNINGISR2, "Running ISR identification";

These mappings allow 1:1 usage for OTM signaled values.

Thus vs_SIGNAL_RUNNINGTASK is a copy of RUNNINGTASK with addition of a NO_TASK value which is generated artificially by manual instrumentation to split task activation with WaitEvent calls. For NO_TASK, a value 0x3F is designated - which is the highest signaling value allowed in 6 bits OTM.

ENUM [ " TASK_a" = 10,..., "NO_TASK" = 0x3F] vs_SIGNAL_RUNNINGTASK, "Running task identification";

Because two ISR categories are profiled separately, the RUNNINGISR2 is split into two objects. All Category 1 ISR IDs are moved to RUNNINGISR object.

ENUM [        "NO_ISR" = 0,

       "ISR_2_a" = 4,

       "ISR_2_b" = 6,

       etc

       ] RUNNINGISR2, "Running ISR identification";

ENUM [        "NO_ISR" = 0,

       "ISR_1_a " = 1,

       "ISR_1_b " = 2,

       etc

       ] RUNNINGISR, "Running ISR1 identification";

Signaling objects are an exact copy:

ENUM [        "NO_ISR" = 0,

       "ISR_2_a" = 4,

       "ISR_2_b" = 6,

       etc

       ] vs_SIGNAL_RUNNINGISR2, "Running ISR identification";

20.5.4.6.2Create vs_SIGNAL_ implementation entries

In the OS section the signaling objects are defined like this:

OS

{

 RUNNINGTASK = "OS_taskCurrent";

 vs_SIGNAL_RUNNINGTASK = "OTM.0.2.0";

// RUNNINGTASK signaling uses OTM version 0, ID size = 2 bits, ID = 0


 RUNNINGISR2 = "OS_isrCurrent";

 vs_SIGNAL_RUNNINGISR2 = "OTM.0.2.1";

// RUNNINGISR2 signaling uses OTM version 0, ID size = 2 bits, ID = 1


 RUNNINGISR = "OS_isrCurrent";

 vs_SIGNAL_RUNNINGISR = "OTM.0.2.2";

// RUNNINGISR signaling uses OTM version 0, ID size = 2 bits, ID = 2  

20.5.4.6.3Create vs_OSSIGNAL and vs_SIGNAL_vs_OSSIGNAL declaration entry

IMPLEMENTATION

{

  OS

  {

    ENUM [ "RET" = 0, "RET_OS" = 1] vs_OSSIGNAL, "OS Signaling";

    ENUM [ "RET" = 0, "RET_OS" = 1] vs_SIGNAL_vs_OSSIGNAL, "OS Signaling";

20.5.4.6.4Create vs_OSSIGNAL_ and vs_SIGNAL_vs_OSSIGNAL implementation entries

OS

{

  vs_OSSIGNAL = "0";  // in this OTM case no variable exists

  vs_SIGNAL_vs_OSSIGNAL = "OTM.0.2.3"; // OTM version 0, ID size = 2 bits, ID = 3

20.5.4.7winIDEA Profiler configuration

20.5.4.7.1OS Configuration

In winIDEA/Debug/Operating System/OSEK the OS should be configured like this (this is the default configuration):

Tasks:

Note: Tasks too can define a default value. This would typically be the task which runs when no other task is running e.g. IDLE, BGND,…
 
ISRs (Cat1)

 
ISRs2 (Cat2)

OSSignal

20.5.4.7.2Profiler Configuration

In the profiler configuration specify the Functions to be profiled and check the OS Objects option.

In the Advanced configuration, ensure that  Ignore context reactivation option is set


20.6Custom operating system

If your operating system is not explicitly supported by winIDEA, support can be added by manually generating an ORTI file, which describes the OS – especially the active task identification.

This example assumes that there are 2 tasks in the OS and that a global variable g_byActiveTaskID always holds the value of the active task (0 for no task, 1 for "Task1" and 2 for "Task2")

IMPLEMENTATION MyORTI {
 OS {
  ENUM UINT8[
   "NO_TASK" = “0”,
   "Task1" = “1”,
   "Task2" = “2”
  ] RUNNINGTASK, "Running task";
 };
}; 
OS MyOS {
 RUNNINGTASK = "g_byActiveTaskID";
};

Detailed ORTI specification is available at: http://portal.osek-vdx.org/files/pdf/specs/

21Profiling via instrumentation

Some CPUs have restricted trace capabilities due to

Small trace port bandwidth (SVW on ARM, LPD-SFT on RH850, low pin ETM and Nexus implementations)

On-Chip Trace Buffer only – the size is typically too small for longer session.

No trace port or OCTB at all

Profiling on such CPUs can still be realized by:

instrumenting the application to generate a signal only on points of interest – thus reducing the trace bandwidth

using an emulation adapter which uses a free port, on a bigger package device, to signal events to the tool.

A set of header and C files is provided by iSYSTEM, which are included in the user application. Header files and samples utilizing them are available together with other winIDEA examples, which can be installed by choosing Help / Install Examples


21.1Preparing for instrumentation

A set of header and C files is provided by iSYSTEM, which should be included in the user application. Select the instrumentation file according to your CPU architecture, the instrumentation type and the compiler being used. For example:

isystem_profile_V850Fx4L_UTP_GHS.h holds macros for:

The V850Fx4L CPU

using the UserTracePort emulation adapter

with GHS compiler

isystem_profile_RH850_LPDSFT_DD.h holds macros for

the RH850 CPU

using the LPD SofTrace port

with DiabData compiler

isystem_profile_noop.h holds macros which generate no instrumentation code. It can be used in case where existing instrumentation should be disabled, without modifying the source code.

Controlling the instrumentation is best realized by creating a common header file, which is included by application files that will use instrumentation:

/* isystem.h */

#ifdef DEBUG // this is defined when in debug builds

#include "isystem_profile_RH850_LPDSFT_DD.h"

#else

#include "isystem_profile_noop.h"

#endif


21.2Execution profiling via instrumentation

21.2.1Preparing instrumentation IDs

For all functions which will be profiled, a constant ID must be defined and placed in file isystem_profile_functions.h. This file is implicitly included by the aforementioned include files.

The function IDs should start at value 1, and be named:

ipf_<function name>

Example

#define ipf_MyFunction 1

#define ipf_MyOtherFunction 2

#define ipf_AndAnotherFunction 3

If such naming convention is followed, the profiler will be able to associate the observed ID with an existing program function. Otherwise the full ID name will be displayed.

21.2.2Instrumenting the code

Functions to be profiled must add instrumentation macros at the function entry and exit.

Example

int MyFunction(int a)

{

 return a * 2;

}

Such function must be instrumented like this:

int MyFunction(int a)

{

 isystem_profile_func_entry(MyFunction);


 int Result = a * 2;


 isystem_profile_func_exit(); // place instrumentation before return statement

 return Result;

}

Note: the function instrumentation macros already prepend the ipf_ prefix. Simply use the function name as a parameter to the isystem_profile_func_entry macro.

21.3OS Event profiling via instrumentation

The operating system code must be instrumented to signal the new active task ID at the moment of context switch.

Example

os_result_t OS_KernActivateTask(os_taskid_t t)

{

 isystem_profile_task(t); // the value of ‘t’ is signaled

 ...

The interrupts handled by the OS can be instrumented similarly:

Example

void OS_IRQ_Entry(int iid)

{

 isystem_profile_irq(iid);      // signal new ISR

 OS_IRQ_Handler(iid);           // call the OS handler

 isystem_profile_irq(0);

}

Such modifications are OS specific and should be implemented by consulting the OS vendor.

See Autosar RTOS manual for information on how to realize OS event profiling on an AUTOSAR OS.

21.4Monitoring user data via instrumentation

Variables which should be monitored can be signaled via instrumentation too. Place the instrumentation macro at appropriate point(s) in the application.

Example

int MyFunction(int a)

{

...

 isystem_profile_user(a);       // transmit the value of 'a'

...

}





















Disclaimer: iSYSTEM assumes no responsibility for any errors which may appear in this document, reserves the right to change devices or specifications detailed herein at any time without notice, and does not make any commitment to update the information herein.

© iSYSTEM . All rights reserved.