Spaces:
Sleeping
Sleeping
Ticket Name: TDA2SX: TCP sending large data can cause packet loss at a fixed location | |
Query Text: | |
Part Number: TDA2SX Other Parts Discussed in Thread: TDA2 Hi team, The customer is using TDA2's Ethernet feature, originally PROCESS_radar_SDK default NDK is running on IPU1_1 and now needs to use TCP to transfer large data to PC, so customer defines NDK to run in core of A15_0. Network 1000 Mbps rate negotiation is normal, TCP connection is normal, but the throughput rate is too slow, and it is discovered after the packet is captured with wireShake. When 4MB of data is sent, successive frames are lost between 90kB and 100kB of data, resulting in continuous retransmission of these frames. Except for the loss of these frames, the rest of the data is transmitted normally. They mainly made the following changes: 1. In the cfg.mk file in the tda2xx_cascade_BIOS_radar directory, set NDK_PROC_to_use=A15_0 2. Regenerate the associated privi.c and priv.h files in usecases, mainly modifying NetworkTx (IPU1_1) to NetworkTx (A15) 3. Adjust NDK_data_size and A15_0_data_size as appropriate in the file mem_segment_definition_BIOS.xs as follows: /* | |
* ======== mem_segment_definition.xs ======== | |
* ======== Single file for the memory map configuration of all cores ========= | |
*/ | |
function getMemSegmentDefinition_external(core) | |
{ | |
KB=1024; | |
MB=KB*KB; | |
DDR3_ADDR = 0x80000000; | |
DDR3_SIZE = 512*MB; | |
/* | |
* In case of ECC_FFI_INCLUDE, DDR3_BASE_ADDR_1 and DDR3_BASE_SIZE_1 | |
* are hard-coded in as values of gIpcNonCachedDataAddr and | |
* gIpcNonCachedDataSize in Ipu1_0.cfg | |
* If this DDR3_BASE_SIZE_0 is changed, update Ipu1_0.cfg | |
*/ | |
DDR3_BASE_ADDR_0 = DDR3_ADDR; | |
DDR3_BASE_SIZE_0 = 507*MB; | |
/* The start address of the second mem section should be 16MB aligned. | |
* This alignment is a must as a single 16MB mapping is used for EVE | |
* to map SR0, REMOTE_LOG_MEM sections. | |
* tlb_config_eveX.c need to be modified otherwise | |
*/ | |
DDR3_BASE_ADDR_1 = DDR3_BASE_ADDR_0 + DDR3_BASE_SIZE_0; | |
DDR3_BASE_SIZE_1 = DDR3_SIZE - DDR3_BASE_SIZE_0; | |
//if(core=="ipu1_1" || core=="ipu1_0" || core=="ipu2" || core=="a15_0") | |
//{ | |
/* for ipu1_0, ipu1_1, ipu2 DDR3_BASE_ADDR_1 should be | |
* in non-cached virtual address of | |
* DDR3_BASE_ADDR_1 + 512*MB | |
*/ | |
DDR3_BASE_ADDR_1 = DDR3_BASE_ADDR_1+512*MB; | |
//} | |
DSP1_L2_SRAM_ADDR = 0x40800000; | |
DSP1_L2_SRAM_SIZE = 288*KB; | |
DSP2_L2_SRAM_ADDR = 0x41000000; | |
DSP2_L2_SRAM_SIZE = 288*KB; | |
EVE1_SRAM_ADDR = 0x42000000; | |
EVE1_SRAM_SIZE = 1*MB; | |
EVE2_SRAM_ADDR = 0x42100000; | |
EVE2_SRAM_SIZE = 1*MB; | |
EVE3_SRAM_ADDR = 0x42200000; | |
EVE3_SRAM_SIZE = 1*MB; | |
EVE4_SRAM_ADDR = 0x42300000; | |
EVE4_SRAM_SIZE = 1*MB; | |
TOTAL_MEM_SIZE = (DDR3_SIZE); | |
/* First 512 MB - cached */ | |
/* EVE vecs space should be align with 16MB boundary, and if possible try to fit | |
* the entire vecs+code+data in 16MB section. In this case a single TLB map would | |
* be enough to map vecs+code+data of an EVE. | |
* tlb_config_eveX.c need to be modified if any of these EVE memory sections or | |
* SR1_FRAME_BUFFER_MEM section is modified. | |
*/ | |
/* EVE self-branch instruction block - EVE1_VECS | |
* In SBL, EVE self-branch instruction is inserted @ 0x80000000 if no AppImage for EVE. | |
* This could overwrites the code/data loaded at 0x80000000. | |
* So Reserving a small memory block in the beginning of the DDR @0x8000 0000 for | |
* EVE self-branch instruction if no AppImage for EVE. | |
* If EVE enabled, then the EVE VECS/DATA/CODE is placed @0x8000 0000, | |
* and hence we did not observe any issue. | |
* If EVE is disabled, then also DO NOT remove this EVE1_VECS section @0x80000000, | |
* if no AppImage for EVE. This could overwrites the code/data loaded at 0x80000000 | |
*/ | |
EVE1_VECS_SIZE = 0.5*MB; | |
EVE1_CODE_SIZE = 2*MB; | |
EVE1_DATA_SIZE =13.5*MB; | |
EVE2_VECS_SIZE = 0.5*MB; | |
EVE2_CODE_SIZE = 2*MB; | |
EVE2_DATA_SIZE =13.5*MB; | |
EVE3_VECS_SIZE = 0.5*MB; | |
EVE3_CODE_SIZE = 2*MB; | |
EVE3_DATA_SIZE =13.5*MB; | |
EVE4_VECS_SIZE = 0.5*MB; | |
EVE4_CODE_SIZE = 2*MB; | |
EVE4_DATA_SIZE =13.5*MB; | |
NDK_DATA_SIZE = 8*MB; | |
if(java.lang.System.getenv("ECC_FFI_INCLUDE")=="yes") | |
{ | |
IPU1_1_CODE_SIZE = 2.5*MB; | |
IPU1_1_DATA_SIZE = 12.5*MB; | |
IPU1_0_CODE_SIZE = 6*MB; | |
IPU1_0_DATA_SIZE = 12*MB; | |
} | |
else | |
{ | |
IPU1_1_CODE_SIZE = 2.5*MB; | |
IPU1_1_DATA_SIZE = 12.5*MB; | |
IPU1_0_CODE_SIZE = 6*MB; | |
IPU1_0_DATA_SIZE = 16*MB; | |
} | |
IPU2_CODE_SIZE = 2*MB; | |
IPU2_DATA_SIZE = 7*MB; | |
DSP1_CODE_SIZE = 6*MB; | |
DSP1_DATA_SIZE = 14*MB; | |
DSP1_DATA_SIZE_2 = 1*MB; | |
DSP2_CODE_SIZE = 4*MB; | |
DSP2_DATA_SIZE = 14*MB; | |
DSP2_DATA_SIZE_2 = 1*MB; | |
/* A15_0_CODE_SIZE reduced since it is not used in .bld file. | |
* Check .bld for details. Originally 2 + 14 MB. | |
*/ | |
A15_0_DATA_SIZE = 43.5*MB; | |
if(java.lang.System.getenv("OPENCL_INCLUDE") == "yes") | |
{ | |
A15_0_DATA_SIZE_INC = 101*MB /* in MB */ | |
A15_0_DATA_SIZE = (A15_0_DATA_SIZE + A15_0_DATA_SIZE_INC); | |
} | |
if(java.lang.System.getenv("ECC_FFI_INCLUDE")=="yes") | |
{ | |
/* Ensure ECC regions are 64kB aligned */ | |
SR1_FRAME_BUFFER_SIZE = 297.5*MB; | |
SR1_BUFF_ECC_ASIL_SIZE = 1*MB; | |
SR1_BUFF_ECC_QM_SIZE = 40*MB; | |
SR1_BUFF_NON_ECC_ASIL_SIZE = 1*MB; | |
} | |
else | |
{ | |
SR1_BUFF_ECC_ASIL_SIZE = 4*KB; | |
SR1_BUFF_ECC_QM_SIZE = 4*KB; | |
SR1_BUFF_NON_ECC_ASIL_SIZE = 4*KB; | |
SR1_FRAME_BUFFER_SIZE = 305.5*MB - (SR1_BUFF_ECC_ASIL_SIZE + SR1_BUFF_ECC_QM_SIZE + SR1_BUFF_NON_ECC_ASIL_SIZE); | |
if(java.lang.System.getenv("OPENCL_INCLUDE") == "yes") | |
{ | |
SR1_FRAME_BUFFER_SIZE = SR1_FRAME_BUFFER_SIZE - A15_0_DATA_SIZE_INC; | |
} | |
} | |
/* Second 512 MB - non-cached */ | |
/* The start address of the second mem section should be 16MB aligned. | |
* This alignment is a must as a single 16MB mapping is used for EVE | |
* to map SR0, REMOTE_LOG_MEM sections. | |
* tlb_config_eveX.c need to be modified otherwise | |
*/ | |
REMOTE_LOG_SIZE = 160*KB; | |
SYSTEM_IPC_SHM_SIZE = 480*KB; | |
SYSTEM_AUTOSAR_IPC_SHM_SIZE = 512*KB; | |
LINK_STATS_SIZE = 256*KB; | |
HDVPSS_DESC_SIZE = 1024*KB; | |
SR0_SIZE = 128*KB; | |
OPENVX_SHM_SIZE = 1984*KB; | |
EEPROM_PARAM_SIZE = 64*KB; | |
if((java.lang.System.getenv("OPENCL_INCLUDE") == "yes")) | |
{ | |
/* when OpenCL is enabled we need more SR0 space | |
*/ | |
SR0_SIZE = 2*MB; | |
} 4. MAC0 is used and static IP is set. Issue: if the NDK is configured to run in IPU1_1, Gigabit network communication is normal, 1MB of data per frame, and consecutive transmission does not result in missing frames. However, if the NDK is configured to run in the A15 kernel with the above configuration, the Gigabit network will be able to connect properly. However, if 4 frames are transmitted consecutively, 1 MB per frame, each time a frame is lost between 90 KB and 100 KB at 1 MB of the first frame is transmitted, the frames lost are eventually retransmitted through the TCP underlying retransmission mechanism. But this results in a throughput rate drop. The wireshake packet information is as follows: 192.168.31.100 is the PC. So the customer would like to know what should be aware of when configuring NDK to A15_0? Or is there anything wrong in the above configuration? Thanks. Best Regards, Cherry | |
Responses: | |
Hi, May I know is there any updates? Best, Cherry | |
Hi, Give some updates: 1) The customer originally sent the data in the NetworkTxLink_tskMain function in the networkTxLink_tsk .c file with the following code: UInt8* chBuf[2]; | |
for(c=0;c<2;c++) | |
{ | |
chBuf[0] = (UInt8*)((uint32_t*)(pADCFrames->bufferList.buffers[pADCFrames->rIndex+c*2]->payload))[0]; | |
chBuf[1] = (UInt8*)((uint32_t*)(pADCFrames->bufferList.buffers[pADCFrames->rIndex+c*2+1]->payload))[0]; | |
for(i=0; i<256; i++) | |
{ | |
TcpSendToAll(chBuf[0], 4096); | |
TcpSendToAll(chBuf[1], 4096); | |
chBuf[0] += 4096; | |
chBuf[1] += 4096; | |
} | |
} The above code sends a total of 4MByte of data, and will cause a missing frame. 2) If Task_sleep is included in the sent code for intermittent task sleep, no frame loss will occur: UInt8* chBuf[2]; | |
for(c=0;c<2;c++) | |
{ | |
chBuf[0] = (UInt8*)((uint32_t*)(pADCFrames->bufferList.buffers[pADCFrames->rIndex+c*2]->payload))[0]; | |
chBuf[1] = (UInt8*)((uint32_t*)(pADCFrames->bufferList.buffers[pADCFrames->rIndex+c*2+1]->payload))[0]; | |
for(i=0; i<256; i++) | |
{ | |
TcpSendToAll(chBuf[0], 4096); | |
TcpSendToAll(chBuf[1], 4096); | |
chBuf[0] += 4096; | |
chBuf[1] += 4096; | |
Task_sleep(1); | |
} | |
} The customer adjusts the sleep interval, such as sending it twice, 4096 bytes, sleep for 1 ms, no frame loss problem. But the transfer rate is slow. If modified to send 8 consecutive times, 4096 bytes, and hibernate 1ms, the rate can be increased to 160 Mbps. They also tried to lower the priority of the networkTxLink_tsk task and increase the TCP's TX cache, but did not resolve the issue. Packet loss can be avoided only by joining Task_sleep, and throughput can be increased. How to avoid packet loss and increase throughput without adding Task_sleep code? Thanks and Regards, Cherry | |
Hi Stanley, May I know is there any updates? Thanks and regards, Cherry | |