I'm in the process of designing and building a small GardenSpaceProgram (GSP) mission to test out winter stuff like survival heaters, more on that in the next post.
During this, i finally realised that, as i have no missions active at the moment, this is the optimal time to take some lessons learned about RF24 communication and how i use it for the GSP and redesign the protocol and everything that works with it. Remember, i can only change "mission control" stuff directly, but not missions that have already launched. While i have demonstrated the ability to run remote firmware upgrades, i really hate the idea of having potential mission failures by changing protocols mid-mission and not being able to recover from a problem.
While the current protocol is very reliable, it's data throughput is abysmal. This isn't a problem for things like sending a temperature measurement per minute, but for downloading images or uploading new programs it gives me a double whammy. Not only do i have to wait for hours, twiddling my thumbs, but downloading an image requires the Raspberry Pi to be on, gobbling up precious power (which in turn can lead to overheating of the probe during summer, ...). There are multiple parts to solving this problem.
Data rates, radio channels and Duplex modes
Currently, the networks runs on the lowest data rate for increased reliability and range. I certainly like that to stay the same, at least for basic commands and low data rate science and engineering data. The basic reason behind that is that it has been proven to work in past missions and it's the safest option.
The three available data rates are 250 kilobits per second and 1 or 2 megabits per second. The last one if eight times faster than the first one. So, an image that would take an hours to download could be downloaded in less than 8 minutes instead.
But really, there is no reason not to use a second data channel for downloading huge data sets and uploading new firmware. This would most likely be implemented as a direct link without routing (but with the potential of a store-and-forward system). As long as i add the option to fall back to using the slow channel.
This will probably be implemented by giving the Raspberry Pi it's own RF24 transceiver and using the serial link to the Arduino for command and control and as a slow backup for data exchange.
As an added bonus, i could even run multiple high speed downlinks from multiple probes at the same time if a use different channels and have enough receivers at "mission control".
Currently, the whole network is half-duplex, meaning only one station can talk at any moment. This means that only one packet can travel the network at any given time. For most of the time, this is fine, but again, transmitting a huge data set also disturbs the communication in the rest of the network. And if it's a downling using relais stations, it slows down a huge amount, because only one packet can travel the network any given time and the sender has to wait before sending the next packet.
And exactly because of this problem, the current implementation for such a downlink is to send a request from misison control for a data packet, then waiting to get the answer before sending the next request, which slows things down a whole lot more. The new implementation will send the request for a big blobk of the data on the slow channel, then the probe would stream down the whole block on the high speed direct channel as fast as possible.
Given that improvements, the same one hour download i mentioned above as an example could now probably be done in two or three minutes, maybe even slightly faster. Suddenly this opens up the possibility to download short movies from the probe or even sort-of-realtime images (during summer months when we got the power budget).
Current and new protocols
Let's look at the definition of a packet in the current gen firmware:
#define P_LINKSENDER 0
#define P_NEXTLINK1 1
#define P_NEXTLINK2 2
#define P_NEXTLINK3 3
#define P_NEXTLINK4 4
#define P_NEXTLINK5 5
#define P_REALSENDER 6
#define P_REALRECEIVER 7
#define P_COMMAND 8
#define P_MEMORYOFFSET 9
#define P_PAYLOADLENGTH 11
#define P_DATA 12
#define P_CHECKSUM1 28
#define P_CHECKSUM2 29
As you can see, the first 8 bytes are routing information. Then we have a byte to tell us the type of command, two bytes for "memory offset", one byte of payload length, 16 bytes of data and two checksum bytes.
The "memory offset" bytes are only used for a few commands, like reading and writing to EEPROM and are otherwise unused.
My goal here is to maximise the number of bytes available for payload data, but to still maintain the ability to route packets through a number of stations (maybe even an arbitrary number, instead of the 5 station limit we currently have).
Let's see, if every station holds a routing table, then i only need to identify the original sender and the final destination, right. Except no, thats not enough, because a station would forwards the message, hear the next station repeat the packet and try to forward that copy to the next station. So we still need one byte to identify the current forwarding station of the packet, e.g. the thislink field. The reason i choose to use the sender instead of reciever as the third identifier is simple: This would allow me to set up multiple routing paths.
So we can reduce the routing header to this:
#define P_THISLINK 0
#define P_REALSENDER 1
#define P_REALRECEIVER 2
The routing table would include this 3 bytes as an identifier for "we need to forward this packet" as well as a routing delay (or zero for no delay). This way, important packets could be forwarded twice or more via different paths - especially useful when i have to recover communications or as a backup for important non-repeatable measurements.
But there is still a few ways we can optimize. Let's assume that only commands that need a memory offset will include one, so we put that into the data payload itself. The same goes for the payload length, which may be fixed for some commands. As an added bonus, data packets that need more than two bytes for a memory offset are now easier to implement.
There is more: Currently, we both use a CRC16 checksum in hardware as well as two additional bytes for software checksums. I'm pretty sure we can do away with one of those methods. I haven't yet completely decided which one to keep, but the software checksums would be the better ones in my case, since i also use the same protocol over serial which also needs checksums. Still, turning off the hardware checksums should give us the option to move the software checksum up two bytes.
So, the final packet definition would look something like this:
#define P_THISLINK 0
#define P_REALSENDER 1
#define P_REALRECEIVER 2
#define P_COMMAND 3
#define P_DATA 4
#define P_CHECKSUM1 30
#define P_CHECKSUM2 31
This new design would increase the available payload size from 16 to 26 bytes. And it allows multi-path routing without the sender having to play an active role, which is very good in case of error recovery.
One thing that just occured to me: Station ID's are a bit, but i can reasonable assume that i'm not going to have more than 127 stations in the network at any one time. I could reserve the upper most bit as a sort of "emergency" bit. If a station goes into safe mode (many future missions will have safemode implemented), it could just set the uppermost bit of it's station id. This would change the lookup in the routing tables of receiving stations. If i set up the routing tables correctly, this should turn on multi-path packet relais for that sender, as well as giving me here in mission control a safe mode warning.
Hmmm, much to think about and to implement and to test. I'll be back with more details, experience and the inevitable grumblings about the inevitable failure of some of those experiments in a future article...
|