arjun.a
range data
5aefcf4
raw
history blame
10.7 kB
Ticket Name: TDA4VMXEVM: TDA4x Latency Measurement queries
Query Text:
Part Number: TDA4VMXEVM Hello , We are working on TDA4 PSDK 7.0 on custom board ,We need to calculate the latency of the application from Camera node to end of processing. For this we have some queries. Is there any facility in existing Open VX framework to measure the latency of application at runtime as we have requirement to send the latency out continuously as debug parameters. Is there a provision to attach timestamp and a frame count to captured frame that can be passed to succeeding nodes? Similar implementation was available in TDA2 vision sdk framework. Also in the current Open vx framework for a particular node if the num of buffers are already consumed and a new input arrives does that node drops the frame? Is there any facility to count frames dropped in the execution pipeline? Can it happen that once we enqueue frames in a sequence in the chain but at the end that sequence might change due to core unavailability or other reasons? Regards, Swapnil N
Responses:
Hello, My responses to your questions are below. Let me know if you have any follow ups to these questions: 1. Regarding measuring the latency, we have API's that are used in our sample applications that measure query the node latencies as well as overall latency of the application. For an example of this, you can review the multi cam application at the location vision_apps/apps/basic_demos/app_multi_cam. The API's that are used can be called in this application by using the "print statistics" command. These API's are called appPerfStatsPrintAll and tivx_utils_graph_perf_print. 2. Yes, by default the capture node timestamps the output images and passes these along the OpenVX graph to subsequent images. These timestamps can be queried from the OpenVX data object by using the vxQueryReference API and using the TIVX_REFERENCE_TIMESTAMP enum. If you are not using the capture node, you can set a timestamp directly from the application using the tivxSetReferenceAttribute API. 3. Regarding the frame count, frame drops, etc., there are queries that can be made of the capture node to either print this information or retrieve it and populate it within a local variable. This can be done by using the tivxNodeSendCommand with either the enumeration TIVX_CAPTURE_PRINT_STATISTICS or TIVX_CAPTURE_GET_STATISTICS. 4. Regarding the policy for frame dropping, the frames will drop in the capture node rather than in downstream nodes. 5. Regarding the question "Can it happen that once we enqueue frames in a sequence in the chain but at the end that sequence might change due to core unavailability or other reasons?", I am not aware of any reason that this would occur. Subsequent nodes in the OpenVX graph will process frames in the order they are sent from the capture node. Regards, Lucas
Hi Lucas, We have tried the options suggested above. Please find our observations below: 1. Timestamp and frame number details: After using the API's vxQueryReference and tivxNodeSendCommand in the application after dequeue, we were able to get the timestamp and frame number for the captured frame. However, we need these values inside a node. We have the camera node connected to certain subsequent processing nodes and a TCP/IP node in the end. Now, we want to get the latency value and frame number inside the TCP/IP node and transmit those values out over the ethernet. We tried to use these API's inside TCP node but they do not work inside the target.c files. Do we have any facility to get these values in the TCP node? 2. Latency value prints: We have been using the API appPerfStatsPrintAll in order to get the performance statistics. We see a lot of prints depicting CPU and memory utilization but not the latency. Please find the UART log for same below: Summary of CPU load, ==================== CPU: mpu1_0: TOTAL LOAD = 0. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % ) CPU: mcu2_0: TOTAL LOAD = 11.93 % ( HWI = 1.83 %, SWI = 0.65 % ) CPU: mcu3_0: TOTAL LOAD = 0. 8 % ( HWI = 0. 4 %, SWI = 0. 2 % ) CPU: mcu3_1: TOTAL LOAD = 0. 8 % ( HWI = 0. 5 %, SWI = 0. 2 % ) CPU: C66X_1: TOTAL LOAD = 29.73 % ( HWI = 0.32 %, SWI = 0.13 % ) CPU: C66X_2: TOTAL LOAD = 48.30 % ( HWI = 0.32 %, SWI = 0.11 % ) CPU: C7X_1: TOTAL LOAD = 26.21 % ( HWI = 0.12 %, SWI = 0. 6 % ) HWA performance statistics, =========================== HWA: VISS: LOAD = 7.20 % ( 513 MP/s ) HWA: LDC : LOAD = 7.81 % ( 201 MP/s ) HWA: MSC0: LOAD = 3.46 % ( 337 MP/s ) HWA: DOF : LOAD = 9.41 % ( 123 MP/s ) DDR performance statistics, =========================== DDR: READ BW: AVG = 782 MB/s, PEAK = 1065 MB/s DDR: WRITE BW: AVG = 799 MB/s, PEAK = 1101 MB/s DDR: TOTAL BW: AVG = 1581 MB/s, PEAK = 2166 MB/s Detailed CPU performance/memory statistics, =========================================== PERF STATS: ERROR: Invalid command (cmd = 00000003, prm_size = 260 B CPU: mcu2_0: TASK: IPC_RX: 0. 4 % CPU: mcu2_0: TASK: REMOTE_SRV: 0. 3 % CPU: mcu2_0: TASK: TIVX_CPU: 0.28 % CPU: mcu2_0: TASK: TIVX_NF: 0. 0 % CPU: mcu2_0: TASK: TIVX_LDC1: 1.32 % CPU: mcu2_0: TASK: TIVX_MSC1: 1.10 % CPU: mcu2_0: TASK: TIVX_MSC2: 0. 0 % CPU: mcu2_0: TASK: TIVX_SDE: 0. 0 % CPU: mcu2_0: TASK: TIVX_DOF: 2. 7 % CPU: mcu2_0: TASK: TIVX_VISS1: 1.36 % CPU: mcu2_0: TASK: TIVX_CAPT1: 0.43 % CPU: mcu2_0: TASK: TIVX_CAPT2: 0. 0 % CPU: mcu2_0: TASK: TIVX_DISP1: 0. 0 % CPU: mcu2_0: TASK: TIVX_DISP2: 0. 0 % CPU: mcu2_0: TASK: TIVX_VDEC1: 0. 0 % CPU: mcu2_0: TASK: TIVX_VDEC2: 0. 0 % CPU: mcu2_0: HEAP: DDR_SHARED_MEM: size = 16777216 B, free = 14251264 B ( 84 % unused) CPU: mcu2_0: HEAP: DDR_NON_CACHE_M: size = 67108864 B, free = 47210496 B ( 6 % unused) CPU: mcu3_0: TASK: IPC_RX: 0. 0 % CPU: mcu3_0: TASK: REMOTE_SRV: 0. 0 % CPU: mcu3_0: TASK: IPC_TEST_RX: 0. 0 % CPU: mcu3_0: TASK: IPC_TEST_TX: 0. 0 % CPU: mcu3_0: TASK: IPC_TEST_TX: 0. 0 % CPU: mcu3_0: TASK: IPC_TEST_TX: 0. 0 % CPU: mcu3_0: TASK: IPC_TEST_TX: 0. 0 % CPU: mcu3_0: TASK: IPC_TEST_TX: 0. 0 % CPU: mcu3_0: TASK: IPC_TEST_TX: 0. 0 % CPU: mcu3_0: HEAP: DDR_SHARED_MEM: size = 2097152 B, free = 2097152 B (100 % unused) CPU: mcu3_1: TASK: IPC_RX: 0. 0 % CPU: mcu3_1: TASK: REMOTE_SRV: 0. 0 % CPU: mcu3_1: TASK: IPC_TEST_RX: 0. 0 % CPU: mcu3_1: TASK: IPC_TEST_TX: 0. 0 % CPU: mcu3_1: TASK: IPC_TEST_TX: 0. 0 % CPU: mcu3_1: TASK: IPC_TEST_TX: 0. 0 % CPU: mcu3_1: TASK: IPC_TEST_TX: 0. 0 % CPU: mcu3_1: TASK: IPC_TEST_TX: 0. 0 % CPU: mcu3_1: TASK: IPC_TEST_TX: 0. 0 % CPU: mcu3_1: HEAP: DDR_SHARED_MEM: size = 2097152 B, free = 2097152 B (100 % unused) CPU: C66X_1: TASK: IPC_RX: 0. 2 % CPU: C66X_1: TASK: REMOTE_SRV: 0. 0 % CPU: C66X_1: TASK: TIVX_CPU: 29.23 % CPU: C66X_1: TASK: IPC_TEST_RX: 0. 0 % CPU: C66X_1: TASK: IPC_TEST_TX: 0. 0 % CPU: C66X_1: TASK: IPC_TEST_TX: 0. 0 % CPU: C66X_1: TASK: IPC_TEST_TX: 0. 0 % CPU: C66X_1: TASK: IPC_TEST_TX: 0. 0 % CPU: C66X_1: TASK: IPC_TEST_TX: 0. 0 % CPU: C66X_1: TASK: IPC_TEST_TX: 0. 0 % CPU: C66X_1: HEAP: DDR_SHARED_MEM: size = 16777216 B, free = 16755712 B ( 99 % unused) CPU: C66X_1: HEAP: L2_MEM: size = 229376 B, free = 229376 B (100 % unused) CPU: C66X_1: HEAP: DDR_SCRATCH_MEM: size = 50331648 B, free = 50331648 B ( 14 % unused) CPU: C66X_2: TASK: IPC_RX: 0. 5 % CPU: C66X_2: TASK: REMOTE_SRV: 0. 0 % CPU: C66X_2: TASK: TIVX_CPU: 53.52 % CPU: C66X_2: TASK: IPC_TEST_RX: 0. 0 % CPU: C66X_2: TASK: IPC_TEST_TX: 0. 0 % CPU: C66X_2: TASK: IPC_TEST_TX: 0. 0 % CPU: C66X_2: TASK: IPC_TEST_TX: 0. 0 % CPU: C66X_2: TASK: IPC_TEST_TX: 0. 0 % CPU: C66X_2: TASK: IPC_TEST_TX: 0. 0 % CPU: C66X_2: TASK: IPC_TEST_TX: 0. 0 % CPU: C66X_2: HEAP: DDR_SHARED_MEM: size = 16777216 B, free = 16670720 B ( 99 % unused) CPU: C66X_2: HEAP: L2_MEM: size = 229376 B, free = 229376 B (100 % unused) CPU: C66X_2: HEAP: DDR_SCRATCH_MEM: size = 50331648 B, free = 50331648 B ( 14 % unused) CPU: C7X_1: TASK: IPC_RX: 0. 2 % CPU: C7X_1: TASK: REMOTE_SRV: 0. 0 % CPU: C7X_1: TASK: TIVX_CPU: 25.98 % CPU: C7X_1: TASK: IPC_TEST_RX: 0. 0 % CPU: C7X_1: TASK: IPC_TEST_TX: 0. 0 % CPU: C7X_1: TASK: IPC_TEST_TX: 0. 0 % CPU: C7X_1: TASK: IPC_TEST_TX: 0. 0 % CPU: C7X_1: TASK: IPC_TEST_TX: 0. 0 % CPU: C7X_1: TASK: IPC_TEST_TX: 0. 0 % CPU: C7X_1: TASK: IPC_TEST_TX: 0. 0 % CPU: C7X_1: HEAP: DDR_SHARED_MEM: size = 268435456 B, free = 249736192 B ( 13 % unused) CPU: C7X_1: HEAP: L3_MEM: size = 8159232 B, free = 2916352 B ( 35 % unused) CPU: C7X_1: HEAP: L2_MEM: size = 491520 B, free = 491520 B (100 % unused) CPU: C7X_1: HEAP: L1_MEM: size = 16384 B, free = 0 B ( 0 % unused) CPU: C7X_1: HEAP: DDR_SCRATCH_MEM: size = 234881024 B, free = 233655484 B ( 8 % unused) Kindly let us know how we can proceed. Thank you, Jyoti
Hi Jyoti, Thanks for the feedback. Lucas will help you with this next week. Regards Karthik
Hi Jyoti, My responses are below: 1. Regarding querying the timestamp within the node, the timestamp is a property of the object descriptor itself and is called "timestamp". For instance, if you have an object descriptor called "obj_desc", you can get the associated timestamp value from "obj_desc->timestamp". 2. For printing the latency of the nodes in the application, did you try to use the tivx_utils_graph_perf_print API as well? This API is used in the vision_apps/apps for reference. This will provide the average, min and max latency of each of the nodes in the graph. Regards, Lucas
Hi Lucas, We have tried the options suggested above. Please find our observations below: 1. Timestamp and frame number details: We are able to access the capture timestamp in the TCP/IP node successfully. We also require that the frame number updated in capture node be available in further nodes and access it in TCP/IP node. Could you suggest any existing approach to access the frame number ? Can we add an extra parameter (similar to timestamp) in the object descriptor and pass it on to the successive nodes? 2. Latency value prints: We already used the API " tivx_utils_graph_perf_print" to get the node level performance timings. We have used the API "appPerfPointPrint" to print FPS and Latency. But the Latency is not calculated for 1 frame complete execution. It provides an average captured during enqueue and dequeue process. We require complete application latency(system level latency). Regards, Jyoti Patil
Hello Jyoti, On #1, yes, you are free to add a new parameter to the object descriptor for frame number and pass this along to successive nodes. This should allow you to access the frame number in the TCP/IP node. On #2, one approach is to dequeue the final output of the graph within the application and query the captured timestamp associated with that object. You can then take a timestamp within the application and subtract these two values to get the system level latency. Regards, Lucas