Spaces:
Runtime error
Runtime error
File size: 15,988 Bytes
5aefcf4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 |
Ticket Name: Linux/TDA2: About the efficiency of the tda2 DDR bus Query Text: Part Number: TDA2 Tool/software: Linux Our current problem: When the DVR function is always turned on, if we want to add other functions, the efficiency of the DDR bus will become lower, resulting in other CPU performance degradation, and the performance of the TDA2Sx chip cannot be fully utilized. Question 1: At present, our hardware design TDA2 DDR bandwidth is more than 8,000 MB, but why the DDR bandwidth of the emif1 plus emif2 port shown in the figure below is more than 4,000 MB? What are its main influence factors? (We didn't use ECC DDR) My question 2: How can we optimize the IVA module? The frequency of accessing the DDR by the IVA module is reduced to about 500 MB/s. As shown in the red box below, the IVA access DDR frequency takes up 55% of the total DDR bandwidth. At present, the DVR solution we use is Capture to capture 4 channels of 1280x720P@30 frames, splicing 4 original images into 1 channel 2560*1440 image, then inputting to the encode module to generate H264 code stream, and finally encapsulating H264 code stream as Mp4 file. Responses: Hi Feng, The DDR efficiency as rule of thumb is 60% of the ideal throughput. So if you have 2 EMIFs interleaved, then you would see typically 2 x 532 x 2 x 4 x 0.6 = 5107.2 MBps. In your case it seems the IVA loading is going down with additional initiators. You can try placing a bandwidth regulator on the IVA to set a minimum bandwidth. You would find more details in Section 3.1 Bandwidth Regulator of the www.ti.com/.../sprabx1a.pdf application note. Also, see an example of the IVA Bandwidth regulator in Section 4.2 Pseudo Real-Time Video Subsystem Performance (fps) of www.ti.com/.../sprabx0.pdf Thanks and Regards, Piyali Hi,Piyali About the bandwidth regulator settings, we are currently using the default settings in TI's SDK, and we have not confirmed that the feature is turned on. Do you mean that our software needs to be adjusted based on the default settings of the SDK, Right? Thanks and Regards, Hi Feng, Yes that's right, the default SDK does not set the Bandwidth regulator as this is final usecase dependent. You can see the reference API in vision_sdk\links_fw\src\rtos\utils_common\src\tda2xx\utils_l3_emif_bw.c Utils_setBWRegulator to set this. Thanks and Regards, Piyali Hi Piyali, We have debugged the corresponding API and have not achieved the expected results. With the screenshots above and our DVR solution, we want to know why the IVA module will take up 1574MB/s of bandwidth. Can you explain the specific process so that we can evaluate where we can cut down the requirements or optimize the configuration. Hi Feng, Can you please help us understand the debug you have done so far? The way to go about the debug is that you should first consider at what IVA bandwidth you are able to achieve the expected IVA FPS. This is something you can measure in a lightly loaded system. Once you have this number, you can set this to the BW regulator with this IVA bandwidth to allow the IVA to get the required priority in the system when other initiators are generating traffic to the DDR. If this does not help, it would be good to see if there are any initiators in the system which are generating high and peak traffic (typical candidates are GPU, VPE, BB2D). These initiators would generate peak traffic and finish early even though their avg BW requirement may not be so high. Then placing a bandwidth limiter on these IPs would help to maintain the average bandwidth and not impact other initiators. Thanks and Regards, Piyali Hi Feng, We haven't heard back from you on this one. I hope you have been able to proceed. Thanks and Regards, Piyali Hi Piyali, What you recommend is to solve the situation that some modules use high bandwidth values instantaneously when using DDR? If the average value of some modules using DDR is high, your method may not work. If reduce the average of the IVA module, such as down to 500MB/s, We are worried that the frame rate of the encoding will drop.Do you agree with me? Thanks and Regards, Hi Feng, You are right, if the sum of the average DDR traffic of all the modules is higher than what the device can support, then you would need to optimize the DDR traffic either through spec trade off or to a certain extent optimized utilization of internal memories along with DDR memory. In the case where the average total BW is lower than the practical limit of the DDR throughput a combination of Bandwidth limiter for the peak traffic generators, priority settings and Bandwidth regulators for the traffic which needs to be boosted. It is not clear from your post what have you tried so far. If you would help us understand what settings have you tried so far and what was the impact of setting these, we can help you better. Thanks and Regards, Piyali Hi Piyali, Attachment is the content we try to configure bandwdith, please help us to see if it can be optimized. Our test results, when modifying the bus priority configuration, it seems that the bandwidth of the IVA module using DMA is higher, and the IVA module will have frame loss. In addition, the previous reply mentioned that DDR can be used to 60% of the ideal state. How to test, please provide a test plan. utils_l3_emif_bw.c /* Copyright (c) [2012 - 2017] Texas Instruments Incorporated All rights reserved not granted herein. Limited License. Texas Instruments Incorporated grants a world-wide, royalty-free, non-exclusive license under copyrights and patents it now or hereafter owns or controls to make, have made, use, import, offer to sell and sell ("Utilize") this software subject to the terms herein. With respect to the foregoing patent license, such license is granted solely to the extent that any such patent is necessary to Utilize the software alone. The patent license shall not apply to any combinations which include this software, other than combinations with devices manufactured by or for TI ("TI Devices"). No hardware patent is licensed hereunder. Redistributions must preserve existing copyright notices and reproduce this license (including the above copyright notice and the disclaimer and (if applicable) source code license limitations below) in the documentation and/or other materials provided with the distribution Redistribution and use in binary form, without modification, are permitted provided that the following conditions are met: * No reverse engineering, decompilation, or disassembly of this software is permitted with respect to any software provided in binary form. * Any redistribution and use are licensed by TI for use only with TI Devices. * Nothing shall obligate TI to provide you with source code for the software licensed and provided to you in object code. If software source code is provided to you, modification and redistribution of the source code are permitted provided that the following conditions are met: * Any redistribution and use of the source code, including any resulting derivative works, are licensed by TI for use only with TI Devices. * Any redistribution and use of any object code compiled from the source code and any resulting derivative works, are licensed by TI for use only with TI Devices. Neither the name of Texas Instruments Incorporated nor the names of its suppliers may be used to endorse or promote products derived from this software without specific prior written permission. DISCLAIMER. THIS SOFTWARE IS PROVIDED BY TI AND TI�S LICENSORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL TI AND TI�S LICENSORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ /** ******************************************************************************* * * \file utils_l3_emif_bw.c * * \brief This file has the implementation of the APIs to config bandwdith * related controls at L3 and EMIF * * \version 0.0 (Dec 2013) : [KC] First version * ******************************************************************************* */ /******************************************************************************* * INCLUDE FILES ******************************************************************************* */ #include <src/rtos/utils_common/include/utils_l3_emif_bw.h> #define DISPC_GLOBAL_MFLAG_ATTRIBUTE (volatile UInt32*)(0x5800185C) #define DISPC_GFX_MFLAG_THRESHOLD (volatile UInt32*)(0x58001860) #define DISPC_VID1_MFLAG_THRESHOLD (volatile UInt32*)(0x58001864) #define DISPC_VID2_MFLAG_THRESHOLD (volatile UInt32*)(0x58001868) #define DISPC_VID3_MFLAG_THRESHOLD (volatile UInt32*)(0x5800186C) #define DMM_EMERGENCY (volatile UInt32*)(0x4E000020) #define DMM_PEG_PRIO_0_ADDR (volatile UInt32*)(0x4E000620) #define CTRL_CORE_EMIF_INITIATOR_PRIORITY_1_ADDR (volatile UInt32*)(0x4A002420) Int32 Utils_setDssMflagMode(Utils_DssMflagMode mode) { *(DISPC_GLOBAL_MFLAG_ATTRIBUTE) = (UInt32)(((UInt32)mode & (UInt32)0x3U) /* MFLAG_CTRL */ | ((UInt32)1U << (UInt32)2U)) /* MFLAG_START * 0x1: Even at the beginning of the frame when the DMA * buffer is empty, MFLAG_CTRL bitfield is used to * determine how MFLAG signal for each pipeline shall be * driven. */ ; return SYSTEM_LINK_STATUS_SOK; } Int32 Utils_setDssMflagThreshold(System_DssDispcPipes displayPipeId, UInt32 thresHigh, UInt32 thresLow) { UInt32 value; Int32 status=SYSTEM_LINK_STATUS_SOK; volatile UInt32 *pReg; value = (thresLow & (UInt32)0xFFFFU) | ((thresHigh & (UInt32)0xFFFFU) << (UInt32)16U); switch(displayPipeId) { case SYSTEM_DSS_DISPC_PIPE_VID1: pReg = DISPC_VID1_MFLAG_THRESHOLD; break; case SYSTEM_DSS_DISPC_PIPE_VID2: pReg = DISPC_VID2_MFLAG_THRESHOLD; break; case SYSTEM_DSS_DISPC_PIPE_VID3: pReg = DISPC_VID3_MFLAG_THRESHOLD; break; case SYSTEM_DSS_DISPC_PIPE_GFX1: pReg = DISPC_GFX_MFLAG_THRESHOLD; break; default: status = SYSTEM_LINK_STATUS_EFAIL; break; } if(status==SYSTEM_LINK_STATUS_SOK) { *pReg = value; } return status; } Int32 Utils_setDmmPri(Utils_DmmInitiatorId initiatorId, UInt32 priValue) { volatile UInt32 *pPegPrioReg = DMM_PEG_PRIO_0_ADDR; Int32 status = SYSTEM_LINK_STATUS_SOK; UInt32 index; UInt32 shift; index = initiatorId / (UInt32)8U; shift = (initiatorId % (UInt32)8U) * (UInt32)4U; if(index < (UInt32)8U) { status = SYSTEM_LINK_STATUS_EFAIL; } else { priValue = (UInt32)0x8U | (priValue & (UInt32)0x7U); pPegPrioReg[index] = priValue << shift; Vps_printf(" DMM_PEG_PRIO_%d (0x%08x) = 0x%08x\n", index, &pPegPrioReg[index], pPegPrioReg[index] ); } return status; } Int32 Utils_setDmmMflagEmergencyEnable(Bool enable) { UInt32 value; value = *DMM_EMERGENCY; if(enable) { value |= (UInt32)0x1U; } else { value &= ~(UInt32)0x1U; } *DMM_EMERGENCY = value; return SYSTEM_LINK_STATUS_SOK; } Int32 Utils_setEmifPri(Utils_EmifInitiatorId initiatorId, UInt32 priValue) { volatile UInt32 *pEmifPrioReg = CTRL_CORE_EMIF_INITIATOR_PRIORITY_1_ADDR; Int32 status = SYSTEM_LINK_STATUS_SOK; UInt32 index; UInt32 shift; index = initiatorId / (UInt32)8U; shift = (initiatorId % (UInt32)8U) * (UInt32)4U; if(index < (UInt32)7U) { status = SYSTEM_LINK_STATUS_EFAIL; } else { priValue = (priValue & (UInt32)0x7U); /* clear field */ pEmifPrioReg[index] &= ~((UInt32)0x7U << shift); /* set field */ pEmifPrioReg[index] |= (priValue << shift); Vps_printf(" CTRL_CORE_EMIF_INITIATOR_PRIORITY_%d (0x%08x) = 0x%08x\n", index + (UInt32)1U, &pEmifPrioReg[index], pEmifPrioReg[index] ); } return status; } #define L3_BW_LIMITER_BANDWIDTH_FRACTIONAL_GPU_P1 (volatile UInt32*)(0x44805B08) #define L3_BW_LIMITER_BANDWIDTH_INTEGER_GPU_P1 (volatile UInt32*)(0x44805B0C) #define L3_BW_LIMITER_WATERMARK_0_GPU_P1 (volatile UInt32*)(0X44805B10) #define L3_BW_LIMITER_CLEARHISTORY_GPU_P1 (volatile UInt32*)(0X44805B14) #define L3_BW_LIMITER_BANDWIDTH_FRACTIONAL_GPU_P2 (volatile UInt32*)(0x44805C08) #define L3_BW_LIMITER_BANDWIDTH_INTEGER_GPU_P2 (volatile UInt32*)(0x44805C0C) #define L3_BW_LIMITER_WATERMARK_0_GPU_P2 (volatile UInt32*)(0X44805C10) #define L3_BW_LIMITER_CLEARHISTORY_GPU_P2 (volatile UInt32*)(0X44805C14) Int32 Utils_setBWLimiter(Utils_DmmInitiatorId initiatorId, UInt32 BW_valueInMBps) { UInt32 BW; UInt32 BW_int; UInt32 BW_frac; BW = (UInt32) (BW_valueInMBps/(UInt32)8.3125); BW_int = (BW & (UInt32)0xFFFFFFE0U) >> (UInt32)5U; BW_frac = (BW & (UInt32)0x1FU); if (UTILS_DMM_INITIATOR_ID_GPU_P1 == initiatorId) { *L3_BW_LIMITER_BANDWIDTH_FRACTIONAL_GPU_P1 = BW_frac; *L3_BW_LIMITER_BANDWIDTH_INTEGER_GPU_P1 = BW_int; *L3_BW_LIMITER_WATERMARK_0_GPU_P1= (UInt32)0U; *L3_BW_LIMITER_CLEARHISTORY_GPU_P1 = (UInt32)1U; } if (UTILS_DMM_INITIATOR_ID_GPU_P2 == initiatorId) { *L3_BW_LIMITER_BANDWIDTH_FRACTIONAL_GPU_P2 = BW_frac; *L3_BW_LIMITER_BANDWIDTH_INTEGER_GPU_P2 = BW_int; *L3_BW_LIMITER_WATERMARK_0_GPU_P2 = (UInt32)0U; *L3_BW_LIMITER_CLEARHISTORY_GPU_P2 = (UInt32)1U; } return SYSTEM_LINK_STATUS_SOK; } #define L3_BW_REGULATOR_BANDWIDTH_IVA (volatile UInt32*)(0x44805008) #define L3_BW_REGULATOR_WATERMARK_IVA (volatile UInt32*)(0x4480500C) #define L3_BW_REGULATOR_CLEARHISTORY_IVA (volatile UInt32*)(0x44805014) Int32 Utils_setBWRegulator(Utils_DmmInitiatorId initiatorId, UInt32 BW_valueInMBps) { UInt32 BW, WM; BW = (UInt32) (BW_valueInMBps/(UInt32)8.3125); WM = (UInt32) (BW_valueInMBps * (UInt32)1); /* 1 MicroSec window */ if (UTILS_DMM_INITIATOR_ID_IVA == initiatorId) { *L3_BW_REGULATOR_BANDWIDTH_IVA = BW; *L3_BW_REGULATOR_WATERMARK_IVA = WM; *L3_BW_REGULATOR_CLEARHISTORY_IVA = (UInt32)1U; } return SYSTEM_LINK_STATUS_SOK; } Hi Feng, I see you have attached the Vision SDK API files and not the actual configuration you have used in your system. I did not quite understand this comment : "when modifying the bus priority configuration, it seems that the bandwidth of the IVA module using DMA is higher, and the IVA module will have frame loss." If changing the IVA priority to higher is increasing the BW to the level at which IVA is able to meet the usecase requirements, you should not be observing frame drops. <60% of the ideal throughput of DDR is what you should be planning for in your usecase. You can look at the performance application note of the device www.ti.com/.../sprac21.pdf section 19.2 for data measured in bare metal environment without software overheads. thanks and Regards, Piyali |