Spaces:

File size: 15,988 Bytes
5aefcf4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
Ticket Name: Linux/TDA2: About the efficiency of the tda2 DDR bus

Query Text:
Part Number: TDA2 Tool/software: Linux Our current problem: When the DVR function is always turned on, if we want to add other functions, the efficiency of the DDR bus will become lower, resulting in other CPU performance degradation, and the performance of the TDA2Sx chip cannot be fully utilized. Question 1: At present, our hardware design TDA2 DDR bandwidth is more than 8,000 MB, but why the DDR bandwidth of the emif1 plus emif2 port shown in the figure below is more than 4,000 MB? What are its main influence factors? (We didn't use ECC DDR) My question 2: How can we optimize the IVA module? The frequency of accessing the DDR by the IVA module is reduced to about 500 MB/s. As shown in the red box below, the IVA access DDR frequency takes up 55% of the total DDR bandwidth. At present, the DVR solution we use is Capture to capture 4 channels of 1280x720P@30 frames, splicing 4 original images into 1 channel 2560*1440 image, then inputting to the encode module to generate H264 code stream, and finally encapsulating H264 code stream as Mp4 file.

Responses:
Hi Feng, The DDR efficiency as rule of thumb is 60% of the ideal throughput. So if you have 2 EMIFs interleaved, then you would see typically 2 x 532 x 2 x 4 x 0.6 = 5107.2 MBps. In your case it seems the IVA loading is going down with additional initiators. You can try placing a bandwidth regulator on the IVA to set a minimum bandwidth. You would find more details in Section 3.1 Bandwidth Regulator of the www.ti.com/.../sprabx1a.pdf application note. Also, see an example of the IVA Bandwidth regulator in Section 4.2 Pseudo Real-Time Video Subsystem Performance (fps) of www.ti.com/.../sprabx0.pdf Thanks and Regards, Piyali

Hi,Piyali About the bandwidth regulator settings, we are currently using the default settings in TI's SDK, and we have not confirmed that the feature is turned on. Do you mean that our software needs to be adjusted based on the default settings of the SDK, Right? Thanks and Regards,

Hi Feng, Yes that's right, the default SDK does not set the Bandwidth regulator as this is final usecase dependent. You can see the reference API in vision_sdk\links_fw\src\rtos\utils_common\src\tda2xx\utils_l3_emif_bw.c Utils_setBWRegulator to set this. Thanks and Regards, Piyali

Hi Piyali, We have debugged the corresponding API and have not achieved the expected results. With the screenshots above and our DVR solution, we want to know why the IVA module will take up 1574MB/s of bandwidth. Can you explain the specific process so that we can evaluate where we can cut down the requirements or optimize the configuration.

Hi Feng, Can you please help us understand the debug you have done so far? The way to go about the debug is that you should first consider at what IVA bandwidth you are able to achieve the expected IVA FPS. This is something you can measure in a lightly loaded system. Once you have this number, you can set this to the BW regulator with this IVA bandwidth to allow the IVA to get the required priority in the system when other initiators are generating traffic to the DDR. If this does not help, it would be good to see if there are any initiators in the system which are generating high and peak traffic (typical candidates are GPU, VPE, BB2D). These initiators would generate peak traffic and finish early even though their avg BW requirement may not be so high. Then placing a bandwidth limiter on these IPs would help to maintain the average bandwidth and not impact other initiators. Thanks and Regards, Piyali

Hi Feng, We haven't heard back from you on this one. I hope you have been able to proceed. Thanks and Regards, Piyali

Hi Piyali, What you recommend is to solve the situation that some modules use high bandwidth values instantaneously when using DDR? If the average value of some modules using DDR is high, your method may not work. If reduce the average of the IVA module, such as down to 500MB/s, We are worried that the frame rate of the encoding will drop.Do you agree with me? Thanks and Regards,

Hi Feng, You are right, if the sum of the average DDR traffic of all the modules is higher than what the device can support, then you would need to optimize the DDR traffic either through spec trade off or to a certain extent optimized utilization of internal memories along with DDR memory. In the case where the average total BW is lower than the practical limit of the DDR throughput a combination of Bandwidth limiter for the peak traffic generators, priority settings and Bandwidth regulators for the traffic which needs to be boosted. It is not clear from your post what have you tried so far. If you would help us understand what settings have you tried so far and what was the impact of setting these, we can help you better. Thanks and Regards, Piyali

Hi Piyali, Attachment is the content we try to configure bandwdith, please help us to see if it can be optimized. Our test results, when modifying the bus priority configuration, it seems that the bandwidth of the IVA module using DMA is higher, and the IVA module will have frame loss. In addition, the previous reply mentioned that DDR can be used to 60% of the ideal state. How to test, please provide a test plan. utils_l3_emif_bw.c /*
Copyright (c) [2012 - 2017] Texas Instruments Incorporated

All rights reserved not granted herein.

Limited License.

 Texas Instruments Incorporated grants a world-wide, royalty-free, non-exclusive
 license under copyrights and patents it now or hereafter owns or controls to
 make,  have made, use, import, offer to sell and sell ("Utilize") this software
 subject to the terms herein.  With respect to the foregoing patent license,
 such license is granted  solely to the extent that any such patent is necessary
 to Utilize the software alone.  The patent license shall not apply to any
 combinations which include this software, other than combinations with devices
 manufactured by or for TI ("TI Devices").  No hardware patent is licensed
 hereunder.

 Redistributions must preserve existing copyright notices and reproduce this
 license (including the above copyright notice and the disclaimer and
 (if applicable) source code license limitations below) in the documentation
 and/or other materials provided with the distribution

 Redistribution and use in binary form, without modification, are permitted
 provided that the following conditions are met:

 * No reverse engineering, decompilation, or disassembly of this software
   is permitted with respect to any software provided in binary form.

 * Any redistribution and use are licensed by TI for use only with TI Devices.

 * Nothing shall obligate TI to provide you with source code for the software
   licensed and provided to you in object code.

 If software source code is provided to you, modification and redistribution of
 the source code are permitted provided that the following conditions are met:

 * Any redistribution and use of the source code, including any resulting
   derivative works, are licensed by TI for use only with TI Devices.

 * Any redistribution and use of any object code compiled from the source code
   and any resulting derivative works, are licensed by TI for use only with TI
   Devices.

 Neither the name of Texas Instruments Incorporated nor the names of its
 suppliers may be used to endorse or promote products derived from this software
 without specific prior written permission.

 DISCLAIMER.

 THIS SOFTWARE IS PROVIDED BY TI AND TI�S LICENSORS "AS IS" AND ANY EXPRESS OR
 IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
 MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
 IN NO EVENT SHALL TI AND TI�S LICENSORS BE LIABLE FOR ANY DIRECT, INDIRECT,
 INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
 PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
 LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
 OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
 ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/

/**
 *******************************************************************************
 *
 * \file utils_l3_emif_bw.c
 *
 * \brief This file has the implementation of the APIs to config bandwdith
 *        related controls at L3 and EMIF
 *
 * \version 0.0 (Dec 2013) : [KC] First version
 *
 *******************************************************************************
 */

/*******************************************************************************
 *  INCLUDE FILES
 *******************************************************************************
 */
#include <src/rtos/utils_common/include/utils_l3_emif_bw.h>

#define DISPC_GLOBAL_MFLAG_ATTRIBUTE              (volatile UInt32*)(0x5800185C)
#define DISPC_GFX_MFLAG_THRESHOLD                 (volatile UInt32*)(0x58001860)
#define DISPC_VID1_MFLAG_THRESHOLD                (volatile UInt32*)(0x58001864)
#define DISPC_VID2_MFLAG_THRESHOLD                (volatile UInt32*)(0x58001868)
#define DISPC_VID3_MFLAG_THRESHOLD                (volatile UInt32*)(0x5800186C)

#define DMM_EMERGENCY                             (volatile UInt32*)(0x4E000020)
#define DMM_PEG_PRIO_0_ADDR                       (volatile UInt32*)(0x4E000620)
#define CTRL_CORE_EMIF_INITIATOR_PRIORITY_1_ADDR  (volatile UInt32*)(0x4A002420)

Int32 Utils_setDssMflagMode(Utils_DssMflagMode mode)
{
    *(DISPC_GLOBAL_MFLAG_ATTRIBUTE)
        =  (UInt32)(((UInt32)mode & (UInt32)0x3U) /* MFLAG_CTRL */
          |
           ((UInt32)1U << (UInt32)2U)) /* MFLAG_START
                   * 0x1: Even at the beginning of the frame when the DMA
                   *      buffer is empty, MFLAG_CTRL bitfield is used to
                   *      determine how MFLAG signal for each pipeline shall be
                   *      driven.
                   */
           ;
    return SYSTEM_LINK_STATUS_SOK;
}

Int32 Utils_setDssMflagThreshold(System_DssDispcPipes displayPipeId,
                        UInt32 thresHigh,
                        UInt32 thresLow)
{
    UInt32 value;
    Int32 status=SYSTEM_LINK_STATUS_SOK;
    volatile UInt32 *pReg;

    value = (thresLow & (UInt32)0xFFFFU) | ((thresHigh & (UInt32)0xFFFFU) << (UInt32)16U);

    switch(displayPipeId)
    {
        case SYSTEM_DSS_DISPC_PIPE_VID1:
            pReg = DISPC_VID1_MFLAG_THRESHOLD;
            break;
        case SYSTEM_DSS_DISPC_PIPE_VID2:
            pReg = DISPC_VID2_MFLAG_THRESHOLD;
            break;
        case SYSTEM_DSS_DISPC_PIPE_VID3:
            pReg = DISPC_VID3_MFLAG_THRESHOLD;
            break;
        case SYSTEM_DSS_DISPC_PIPE_GFX1:
            pReg = DISPC_GFX_MFLAG_THRESHOLD;
            break;
        default:
            status = SYSTEM_LINK_STATUS_EFAIL;
            break;
    }

    if(status==SYSTEM_LINK_STATUS_SOK)
    {
        *pReg = value;
    }

    return status;
}

Int32 Utils_setDmmPri(Utils_DmmInitiatorId initiatorId, UInt32 priValue)
{
    volatile UInt32 *pPegPrioReg = DMM_PEG_PRIO_0_ADDR;
    Int32 status = SYSTEM_LINK_STATUS_SOK;
    UInt32 index;
    UInt32 shift;

    index = initiatorId / (UInt32)8U;
    shift  = (initiatorId % (UInt32)8U) * (UInt32)4U;

    if(index < (UInt32)8U)
    {
        status = SYSTEM_LINK_STATUS_EFAIL;
    }
    else
    {
        priValue = (UInt32)0x8U | (priValue & (UInt32)0x7U);

        pPegPrioReg[index] = priValue << shift;

        Vps_printf(" DMM_PEG_PRIO_%d (0x%08x) = 0x%08x\n",
                index,
                &pPegPrioReg[index],
                pPegPrioReg[index]
            );
    }

    return status;
}

Int32 Utils_setDmmMflagEmergencyEnable(Bool enable)
{
    UInt32 value;

    value = *DMM_EMERGENCY;

    if(enable)
    {
        value |= (UInt32)0x1U;
    }
    else
    {
        value &= ~(UInt32)0x1U;
    }


    *DMM_EMERGENCY = value;

    return SYSTEM_LINK_STATUS_SOK;
}

Int32 Utils_setEmifPri(Utils_EmifInitiatorId initiatorId, UInt32 priValue)
{
    volatile UInt32 *pEmifPrioReg = CTRL_CORE_EMIF_INITIATOR_PRIORITY_1_ADDR;
    Int32 status = SYSTEM_LINK_STATUS_SOK;
    UInt32 index;
    UInt32 shift;

    index = initiatorId / (UInt32)8U;
    shift  = (initiatorId % (UInt32)8U) * (UInt32)4U;

    if(index < (UInt32)7U)
    {
        status = SYSTEM_LINK_STATUS_EFAIL;
    }
    else
    {

        priValue = (priValue & (UInt32)0x7U);

        /* clear field */
        pEmifPrioReg[index] &= ~((UInt32)0x7U << shift);

        /* set field */
        pEmifPrioReg[index] |= (priValue << shift);

        Vps_printf(" CTRL_CORE_EMIF_INITIATOR_PRIORITY_%d (0x%08x) = 0x%08x\n",
                index + (UInt32)1U,
                &pEmifPrioReg[index],
                pEmifPrioReg[index]
            );
    }

    return status;
}

#define L3_BW_LIMITER_BANDWIDTH_FRACTIONAL_GPU_P1 (volatile UInt32*)(0x44805B08)
#define L3_BW_LIMITER_BANDWIDTH_INTEGER_GPU_P1    (volatile UInt32*)(0x44805B0C)
#define L3_BW_LIMITER_WATERMARK_0_GPU_P1          (volatile UInt32*)(0X44805B10)
#define L3_BW_LIMITER_CLEARHISTORY_GPU_P1         (volatile UInt32*)(0X44805B14)

#define L3_BW_LIMITER_BANDWIDTH_FRACTIONAL_GPU_P2 (volatile UInt32*)(0x44805C08)
#define L3_BW_LIMITER_BANDWIDTH_INTEGER_GPU_P2    (volatile UInt32*)(0x44805C0C)
#define L3_BW_LIMITER_WATERMARK_0_GPU_P2          (volatile UInt32*)(0X44805C10)
#define L3_BW_LIMITER_CLEARHISTORY_GPU_P2         (volatile UInt32*)(0X44805C14)

Int32 Utils_setBWLimiter(Utils_DmmInitiatorId initiatorId, UInt32 BW_valueInMBps)
{
   UInt32 BW;
   UInt32 BW_int;
   UInt32 BW_frac;

   BW = (UInt32) (BW_valueInMBps/(UInt32)8.3125);
   BW_int = (BW & (UInt32)0xFFFFFFE0U) >> (UInt32)5U;
   BW_frac = (BW & (UInt32)0x1FU);

   if (UTILS_DMM_INITIATOR_ID_GPU_P1 == initiatorId)
   {
       *L3_BW_LIMITER_BANDWIDTH_FRACTIONAL_GPU_P1 = BW_frac;
       *L3_BW_LIMITER_BANDWIDTH_INTEGER_GPU_P1 = BW_int;
       *L3_BW_LIMITER_WATERMARK_0_GPU_P1= (UInt32)0U;
       *L3_BW_LIMITER_CLEARHISTORY_GPU_P1 = (UInt32)1U;
   }

   if (UTILS_DMM_INITIATOR_ID_GPU_P2 == initiatorId)
   {
       *L3_BW_LIMITER_BANDWIDTH_FRACTIONAL_GPU_P2 = BW_frac;
       *L3_BW_LIMITER_BANDWIDTH_INTEGER_GPU_P2 = BW_int;
       *L3_BW_LIMITER_WATERMARK_0_GPU_P2 = (UInt32)0U;
       *L3_BW_LIMITER_CLEARHISTORY_GPU_P2 = (UInt32)1U;
   }

    return SYSTEM_LINK_STATUS_SOK;
}

#define L3_BW_REGULATOR_BANDWIDTH_IVA             (volatile UInt32*)(0x44805008)
#define L3_BW_REGULATOR_WATERMARK_IVA             (volatile UInt32*)(0x4480500C)
#define L3_BW_REGULATOR_CLEARHISTORY_IVA          (volatile UInt32*)(0x44805014)

Int32 Utils_setBWRegulator(Utils_DmmInitiatorId initiatorId, UInt32 BW_valueInMBps)
{
   UInt32 BW, WM;

   BW = (UInt32) (BW_valueInMBps/(UInt32)8.3125);
   WM = (UInt32) (BW_valueInMBps * (UInt32)1); /* 1 MicroSec window */

   if (UTILS_DMM_INITIATOR_ID_IVA == initiatorId)
   {
       *L3_BW_REGULATOR_BANDWIDTH_IVA = BW;
       *L3_BW_REGULATOR_WATERMARK_IVA = WM;
       *L3_BW_REGULATOR_CLEARHISTORY_IVA = (UInt32)1U;
   }

    return SYSTEM_LINK_STATUS_SOK;
}

Hi Feng, I see you have attached the Vision SDK API files and not the actual configuration you have used in your system. I did not quite understand this comment : "when modifying the bus priority configuration, it seems that the bandwidth of the IVA module using DMA is higher, and the IVA module will have frame loss." If changing the IVA priority to higher is increasing the BW to the level at which IVA is able to meet the usecase requirements, you should not be observing frame drops. <60% of the ideal throughput of DDR is what you should be planning for in your usecase. You can look at the performance application note of the device www.ti.com/.../sprac21.pdf section 19.2 for data measured in bare metal environment without software overheads. thanks and Regards, Piyali