timonel-py alpha! #28
Replies: 7 comments 10 replies
-
Cool, great start! I'll take a look at it these days. Although Python may not be my greatest strength at the moment, I will also try to collaborate where I can. I really liked your presentation page. |
Beta Was this translation helpful? Give feedback.
-
I've been looking at the format of packets sent by the controller to the Timonel device, and I have some questions. Some of these come from a reverse engineering of the esp32 host code, and some from the timonel bootloader itself Doc for READEEPR describes the TX content as having a checksum but the the timonel bootloader doesn't seem to check it. In fact four of the commands (READEEPR. WRITEEPR, WRITPAGE and STPGADDR) are documented as having TX checksums and I've having some trouble seeing exactly what they do. I what checksums are for in principle, by the way, not sure about some of the detail here. ANy chance you can clarify this? I can see that the bootloader can retry packet if the returned checksum in RX packet is unexepected - that's not the problem. Is there are reason why the WRITPAGE command has different endianness to READEEPR and WRITEEPR? WRITPAGE has LSB first, and the EEPR commands, also STPGADDR have the MSB of address first. On a casual reading of Reply_WRITPAGE() in the bootloader, I can see some logic to check ranges (line 532 in current master branch, starting #if CHECK_PAGE_IX). I'm a bit confused about this, and it's certainly possible I've missed something, but if there's a safety rollback at line 528, reply[1] is set to zero. Now, if reply[1] also being used as checksum return, how can a caller tell the difference between an error condition and value where the return checksum is legitimately zero? As I say, it's possible I'm misreading this, and it's the first time I've studied the way ATTINY bootloaders do self-programming, so may have missed something. Any clarifications would be appreciated. Thanks! |
Beta Was this translation helpful? Give feedback.
-
I'm afraid I have not so good news, with respect to ...
I'm starting to find the possible reasons for the inconsistency. Just by modifying the Reply_READFLSH function endianness, the bootloader size increases 10 bytes ... With the current byte order:
You get:
However, reversing the order:
you get:
The tests were done with the bootloader version of GitHub's v1.6 branch, compiled with avr-gcc version 8.3.0 for Windows. I think the changes need a bit more analysis before going forward. If you have a suggestion to achieve endianness consistency between functions, but without increasing the bootloader size, please let me know so I can test it. |
Beta Was this translation helpful? Give feedback.
-
OCD? I'll show you OCD! (smile) Consider a function that stores a value in an address (like POKE in days of BASIC). And let's imagine that, like Timonel, the address is stored as the second and third bytes of a data packet. Well, [1] and [2] anyway, because we're zero based. That's not identical to Timonel but it's close enough to look at the code that is generated. WARNING - this code not actually TESTED, I'm just looking at code generation! Reading AVR assembler gives me a headache, and obviously there's no point in highly optimised code that fails. So this is designed to make us think about code size, that's all. 3 versions of the function
/* endian.c */
#include <stdint.h>
void PokeMSBFirst(uint8_t *command, uint8_t val) {
uint16_t addr = (command[1] << 8) | (command[2]);
*((uint8_t*)addr) = val;
}
void PokeLSBFirst(uint8_t *command, uint8_t val) {
uint16_t addr = (command[2] << 8) | (command[1]);
*((uint8_t*)addr) = val;
}
void PokeNative(uint8_t *command, uint8_t val) {
uint16_t addr = *(uint16_t*)&command[1];
*((uint8_t*)addr) = val;
} Unravelling compilers is never an easy job, but if you compile with avr-gcc -Wa,-adhln -g -c endian.c > endian.lst then PokeNative is smaller. If we put in optimisations such as -O2, which lets face it is more realistic avr-gcc -O2 -Wa,-adhln -g -c endian.c > endian.lst Everything is smaller, but the PokeLSBFirst and PokeNative are smallest and generate identical code, which shows how clever compilers can be. I am using avr-gcc 5.4.0 and your stats may be different. Is this useful? I'd be nterested to see your thoughts. Furthermore on the subject of optimisation, if we assume the the code doesn't need to be made re-entrant/multi-threaded, then if you stop passing "command" as a stack parameter, but have it as a global buffer shared by all the apps. I am willing to bet that code gets smaller and faster again, because you can read command[1] and command[2] from static locations known at link time. If you've been brought up on modern architectures and programming styles there may be a revulsion to shared global memory, but sometimes programming a microcontroller needs skills that were last used in the 60s with FORTRAN programs. |
Beta Was this translation helpful? Give feedback.
-
I'm back, only on weekends until June 30, I'll be disappearing intermittently due to some deadlines I have :-) Thanks for the clarification on your avr-gcc configuration, those things happen. Anyway, for completeness's sake, I have tested your suggestions. My tests only consisted of doing the replacements you suggested, compiling, and verifying the size obtained (and that the function works of course). I have not gone down to analyze the results at assembler level because I honestly do not have the time to do it and also, somehow, I do not think it is my function to review what the compiler does. "I want to believe" that the GNU guys do things well and better than me. This stance, as well as the choice of C, have been decisions made quite early in this and other projects, because of accessibility and maintainability reasons, and possibly influenced a bit by this Aussie dude's remarks. This, of course, does not prevent me from being open to all suggestions that come as a ready-to-implement recipe to get smaller code, or other ideas (like the replacement of "+" by "|" that you suggested before), or better, if directly you implement it and open a PR. About the tests (I opened a temp "alt" branch for them): 1-I tried making "command" a global var to avoid passing it as a parameter, risking myself being thrown to the stake by anti-global-variable purists. But the compiler seems to have done a good job with the current code since the hex size obtained is exactly the same. 2-I also tried to make the "mem_pack" struct global just in case, again with bad results in terms of size: 2018 bytes instead of 1534 for the "test-comm" setup. 3-Moving 4-Regarding Now, going back to the original motivation for all this, perhaps it would better to change the WRITPAGE endianness if the code size does not increase, given that it is the only one different from the others. I'm going to do these tests today to see how it goes, then I'll let you know. Regarding doing the opposite, reversing the byte order of READEEPR, WRITEEPR, READFLSH, and STPGADDR, I have not managed to do it without increasing size on flash, if you find a way to do it, we can go that way, just let me know. |
Beta Was this translation helpful? Give feedback.
-
Absolutely! Thanks for looking at this! On 20 Jun 2021 15:00, Gustavo Casanova ***@***.***> wrote:
I'm back, only on weekends until June 30, I'll be disappearing intermittently due to some deadlines I have :-)
Thanks for the clarification on your avr-gcc configuration, those things happen. Anyway, for completeness's sake, I have tested your suggestions. My tests only consisted of doing the replacements you suggested, compiling, and verifying the size obtained (and that the function works of course). I have not gone down to analyze the results at assembler level because I honestly do not have the time to do it and also, somehow, I do not think it is my function to review what the compiler does. "I want to believe" that the GNU guys do things well and better than me. This stance, as well as the choice of C, have been decisions made quite early in this and other projects, because of accessibility and maintainability reasons, and possibly influenced a bit by this Aussie dude's remarks. This, of course, does not prevent me from being open to all suggestions that come as a ready-to-implement recipe to get smaller code, or other ideas (like the replacement of "+" by "|" that you suggested before), or better, if directly you implement it and open a PR.
About the tests (I opened a temp "alt" branch for them):
1-I tried making "command" a global var to avoid passing it as a parameter, risking myself being thrown to the stake by anti-global-variable purists. But the compiler seems to have done a good job with the current code since the hex size obtained is exactly the same.
2-I also tried to make the "mem_pack" struct global just in case, again with bad results in terms of size: 2018 bytes instead of 1534 for the "test-comm" setup.
3-Moving const __flash uint8_t *mem_position = (uint8_t*)(*(uint16_t*)&command[1]); before the reply[] allocation doesn't change a single byte on the iHex size either, unfortunately.
4-Regarding uint8_t reply[reply_len]; it is true what you say about it being a variable, therefore less efficient than a constant. In fact, I have tried with uint8_t reply_len = (SLV_PACKET_SIZE + 2) and "test-comm" compiles in 1522 bytes, 12 less than with the former. The point is that there is a variable there for a reason, if you take a look at it, its value comes from the I2C master command, so it allows it to control the packets' size returned by the bootloader. This is so to be able to somehow "fall back" to smaller sizes in case of noise and recurring errors. It's a feature that I wouldn't want to lose. Luckily, as you say, modern compilers allow it, otherwise, it would have to be done by hand with a dynamic arrangement, or similar.
Now, going back to the original motivation for all this, perhaps it would better to change the WRITPAGE endianness if the code size does not increase, given that it is the only one different from the others. I'm going to do these tests today to see how it goes, then I'll let you know.
Regarding doing the opposite, reversing the byte order of READEEPR, WRITEEPR, READFLSH, and STPGADDR, I have not managed to do it without increasing size on flash, if you find a way to do it, we can go that way, just let me know.
—You are receiving this because you authored the thread.Reply to this email directly, view it on GitHub, or unsubscribe.
|
Beta Was this translation helpful? Give feedback.
-
I spent a big chunk of this weekend doing several cross tests, and I am finally concluding on the bootloader functions' transmission endianness. And the winner is (drum roll) ... Little-endian everywhere! After creating two Timonel versions on the "Alt" branch, a BE one, and a LE one, and compiling all the configurations of both with So, if you agree with this, I would close this point and move on. I'm going to change the v1.6 bootloader code before its release, along with a new version of the I2C master TimonelTwiM library, to be compatible with this change. If you want to, you could also change your documentation to reflect this. These are the iHex sizes obtained with the BE Timonel version for all configs:
And with the LE version:
|
Beta Was this translation helpful? Give feedback.
-
https://github.com/prandeamus/timonel-py
I've created a proof of concept to show how a PC can be used to upload data to Timonel! Incomplete (very much so), but has some functionality and I can connect t a device, read status and device capabilities, read flash and read and write EEPROM.
This is very much pre-alpha, so just about anything may change, but there's an app that demonstrates what works.
Beta Was this translation helpful? Give feedback.
All reactions