Spaces:
Sleeping
Sleeping
Ticket Name: Compiler/TDA2: Incorrect memory accesses after enabling and disabling MMU | |
Query Text: | |
Part Number: TDA2 Tool/software: TI C/C++ Compiler Hello, We have software that runs on an A15 core and does work based on a message sent through the mailbox queues from the other A15 core in the TDA2xx. We are using the mmu_a15_data_validation_app_main example in starterware_01_05... as a reference. The software does not use interrupts and is all Polling-based. We're using MMU and cache to speed up some functions but not all. By enabling and disabling MMU like in the provided starterware MMU example we occasionally see odd behavior when running software on subsequent commands. The below is an example of our software flow. uint32 message; main() { //1. MMU config and init same as example while(1) { //2. wait for command through mailbox wait_for_new_message(); //3. print and parse message from mailbox UARTprintf(" message:%d \n", message); if(message == 1) { UARTprintf(" Branch 1 \n"); /// 5. do something, no MMU } if(message == 2) { UARTprintf(" Branch 2 \n"); /// 6. do something, no MMU } else if(message == 4) { UARTprintf(" Branch 4 \n"); /// 4. enable/disable MMU around functions MMUA15Enable(&gMmuTable); function(); MMUA15Disable(); } } } In the above example we print the message prior to the IF checks and we confirm that it's as expected. The IF checks however, and subsequently the branch that is entered, does not match because we end up seeing a different Branch print statement. We've narrowed down the cause of the behavior to Enabling and Disabling MMU. If we never enable/disable mmu then we don't see an issue. Enabling and disabling MMU even just once can cause the above odd results but it doesn't always happen. Essentially we change software, compile and if we see an issue then the issue is repeatable on each system reboot and shows up in the same exact way. If we change software, compile, and don't see an issue after running each Branch including ones with MMU enabled/disabled then we don't ever get the weird behavior. Adding more print statements or any other inconsequential software can cause the odd behavior to change, like entering a different wrong Branch than before, or make the issue go away completely. So there also seems to be a relation to the compiling. We've seen this happen with multiple variables including pointers and not just the top level message variable. Within each top level Branch can be additional sub-branches that are based on other variables. These sub-branches have also shown the similar odd behavior where we print a variable and it looks correct but then the IF-checks result in the wrong branch. When it's a pointer that is affected we'll usually get a core halt since it's now accessing wrong memory. For some of the pointers that have had issues we only load them once at bootup and somehow they get modified after enabling/disabling MMU in a Branch that doesn't even access those specific pointers. By printing the pointer value before accessing it we confirmed that these bootup-loaded pointers are being changed. These are pointers that are properly loaded and working correctly for multiple loops before we run a branch with MMU. Debugging this has been difficult since any change in code can change the behavior. However we have been able to rule out that our software is doing any of this and any issue seems to only come up after running a branch that enables/disables MMU. We are using the mmu_a15_data_validation_app_main example in starterware_01_05... as a reference and we're calling the same exact functions and it is being configured and initialized the same way. Our compile environment uses the default make rules that come with the starterware folder. What are the potential reasons why enabling and disabling MMU would cause the above results? Why would modifying the software and recompiling it change the behavior when what is modified are only print statements? Thank you. | |
Responses: | |
Hi, It seems that you are not taking care of cache maintenance when you disable and re-enable MMU. Behavior may differ when you change print statements depending on contents that were present in cache. Can you add appropriate cache APIs and then try. Regards, Rishabh | |
Hello Rishabh, Thank you for your reply. While debugging we had confirmed that both the MMU Enable and Disable functions have cache routines inside of them. Those cache functions appear to be for maintenance and we didn't see any other cache routines used inside of the example source, mmu_a15_data_validation_app_main.c. Are there any other specific cache maintenance steps that should be done both before and after enabling/disabling MMU? Below are copies of the enable and disable functions being use. void MMUA15Disable(void) { uint32_t cacheType; /* Check if MMU is already disabled */ if (0U != MMUA15IsEnabledASM()) { /* Get the cache enabled type */ cacheType = CACHEA15GetEnabled(); if (0U != (((uint32_t) CACHE_A15_TYPE_L1D) & cacheType)) { /* Writeback Invalidate All Data Cache */ CACHEA15WriteBackAndInvalidateAll(); /* Drain Write Buffer */ CACHEA15Wait(); /* Disable L1 Data Cache */ CACHEA15Disable(CACHE_A15_TYPE_L1D); } if (0U != (((uint32_t) CACHE_A15_TYPE_L1I) & cacheType)) { /* Invalidate I Cache */ CACHEA15InvalidateL1IAll(); /* Disable L1 Instruction Cache */ CACHEA15Disable(CACHE_A15_TYPE_L1I); } /* Disable MMU */ MMUA15DisableASM(); /* Invalidate all TLBs */ MMUA15TLBInvalidateAll(); /* Enable Cache */ CACHEA15Enable(cacheType); } } void MMUA15Enable(const mmuA15ModuleTable_t *mmuTable) { uint32_t cacheType; /* Check if MMU is already enabled */ if ((0U == MMUA15IsEnabledASM()) && (NULL != mmuTable)) { /* Get the cache enabled type */ cacheType = CACHEA15GetEnabled(); if (0U != (((uint32_t) CACHE_A15_TYPE_ALLI) & cacheType)) { /* Invalidate I Cache */ CACHEA15InvalidateL1IAll(); /* Disable All Instruction Cache */ CACHEA15Disable(CACHE_A15_TYPE_ALLI); } /* Invalidate all TLBs */ MMUA15TLBInvalidateAll(); /* Enable MMU */ MMUA15EnableASM(mmuTable); /* Enable Cache */ CACHEA15Enable(cacheType); } } Thank you. | |
Hi, It seems that you are hitting the below A15 MMU issue: The invalidate instruction is treated by A15 as a clean/invalidate instruction. Therefore, calls to Cache_inv()/Cache_invAll() will behave like Cache_wbInv()/Cache_wbInvAll() on A15. Can you try the MMU enable/disable after disabling all cache at beginning of main and see if you are able to reproduce odd behavior. Regards, Rishabh | |
Rishabh, As part of the MMU init and configuration cache is checked to see if it's On. Are you saying to disable all cache before or after this step? /* Check if cache is already enabled */ cacheEnabled = CACHEA15GetEnabled(); /* In case cache is disabled, invalidate and enable it */ if (CACHE_A15_TYPE_ALL != cacheEnabled) { CACHEA15InvalidateL1DAll(); CACHEA15InvalidateL1IAll(); CACHEA15Enable(CACHE_A15_TYPE_ALL); } The above is done before MMU module is initialized and configured with first level descriptors. Thank you. | |
Hello Rishabh, Update on my previous question. I ended up running the function: CACHEA15Disable(CACHE_A15_TYPE_ALL); both before and after the MMU/cache init and config. When I ran that function before the init/config software the error came up in the same way as I've seen it. When I ran that function after the init/config software the system doesn't error like before. However the system is also much slower since I'm assuming cache is disabled. Can you explain what you meant with the error "The invalidate instruction is treated by A15 as a clean/invalidate instruction. Therefore, calls to Cache_inv()/Cache_invAll() will behave like Cache_wbInv()/Cache_wbInvAll() on A15."? What are fixes or workarounds to this issue that would still allow us to still use cache to speed up some routines? Also, this version doesn't show an error and if it's because of cache like you mentioned then it makes sense since it's disabled. However we have seen the system "fix" itself when we try some things only to have it break again on the next software change. I'm hoping that's not what happened during this test. Thank you. | |
Hi, There are two parts to this problem. First is the workaround for cache problem. Imagine a scenario where have two cores: one say M4 and other is A15, both want to update a shared region x. On M4 sequence should be to first update x, then do a cache write back. On the other hand on A15 sequence will be to first read x and then do a cache invalidate. This will make sure that the cache is never dirty. Second is making MMU enable -> disable -> enable work. You have look at the practical need of this. Regards, Rishabh | |
Hello Rishabh, For the first part, memory coherency, it was understood that by enabling and disabling MMU we would accomplish what you describe. If you're suggesting we manually perform those updates to synchronize DDR and cache, what functions are available to do what you describe? I'm not sure I understand what you mean with the second part regarding the practical need of enabling/disabling MMU. We wanted to follow the starterware MMU example that showed a time improvement and so we implemented MMU in the same way. We have looked at enabling MMU once at bootup and never disabling but like you describe in part one, there are a number of memory coherency issues for a multi-core system. The cache functions provided in starterware, filename cache_a15.c, did not help resolve memory coherency issues. So that's why I ask, what functions are available and should be used to resolve what you describe in part one? Thank you. | |
Hi, Enabling and disabling MMU is a very crude method of maintaining coherency. In RTOS systems when multiple tasks are running on multiple cores, you can't suddenly disable MMU as it will stop everything running on that particular core. Hence you need to use cache maintenance APIs. For cache APIs available on A15 you should refer to cache_a5.h. Starterware examples are standalone applications written to demonstrate the use of the peripherals and provide quick start to the user. Till now you have used cache APIs along with MMU enable/disable and that has created problems. In case you see any cache coherency issue after removing MMU enable/disable let me know. Also we have migrated from starterware to PDK last year and you must be using a quite old release. My suggestion would be to pick the latest Vision SDK release from software-dl.ti.com/.../index_FDS.html to begin your development. Regards, Rishabh | |
Rishabh, Thanks for the reply. I can see how it may be a crude method. I also understand the error you describe for your RTOS example but I don't believe our implementation is the same. This A15 core that we enable/disable MMU follows the example flow that I posted in the original question. MMU gets disabled on that core only when that core is finished. When that core finishes is when a reply gets sent to other core(s) indicating that data is ready. So other core(s) are not asynchronously accessing data and therefore the data that's accessed should be correct. If there's a better way to achieve what we want though then we're certainly open to doing that instead. You suggest we use cache maintenance APIs, I ask again, which functions would help achieve our intended flow as I described in my original question? With or without MMU, can you help narrow down what we need to do to achieve the performance improvements that are shown in the mmu_a15_data_validation_app_main example? If you suggest an alternate flow than what I posted, can you describe that flow? Regarding the PDK you recommend, what example(s) should be used as a reference? I last downloaded and worked with PDK_03_02_00_00 and didn't see equivalent examples to starterware. Thank you. | |
Hi, You should have the following flow: Main -> Invalidate All Cache -> Enable L1D/L2 Cache -> Enable MMU. After this you can access shared memory regions/perform operations that need coherency in following order: perform x -> invalidate cache. Hope this helps. Regards, Rishabh | |
Rishabh, Thanks for the suggestion, will start looking into implementing that flow. Are there preferred cache APIs to invalidate cache? I've had trouble when using CACHEA15InvalidateL1IAll() and not sure if there are any others that you recommend. Thank you. | |
Hi, What issue did you face with CACHEA15InvalidateL1IAll()? Regards, Rishabh | |
Also the A15 MMU app is present here in PDK: PROCESSOR_SDK_VISION_03_05_00_00\ti_components\drivers\pdk_01_10_01_06\packages\ti\csl\example\mmu\a15_data_validation. Please note that it is strongly recommended to use PDK to new customers. Regards, Rishabh | |
Rishabh, It was used in an earlier test where MMU was enabled only once at bootup and never disabled. I used it to try and sync ddr with cache and it seemed to work the first time I used it since other cores saw the new data. The second time though I didn't see the output properly update. There aren't many examples that use the cache APIs and with the initial errors we were getting it seemed more straightforward to implement what the mmu_a15_data_validation_app_main example does. Please let me know if there is additional information on the specific starterware cache APIs other than cache_a15.c and cache_a15.h. Thank you. | |
Rishabh, Thanks for listing the path to the example. I took a quick look and the example looks identical to what we're doing now. I'll start migrating to the new PDK system but it seems that the problem we have would also come up in PDK? Thank you. | |
Yes the issue you are facing will come up with PDK as well. For additional information on A15 cache you should refer to ARM documentation. Regards, Rishabh | |