COBS revisited

Today I pick an algorithm I like because of its usefulness and genious simplicity and make a version which uses C++ .
For source code along with some test see here [4].

consistent overhead byte stuffing

The idea is simple:
Take a given sequence of bytes (stream) and replace all elements which are zero. Then one can use zero as a delimiter to separate two or more sequences of bytes of that kind.” . All credits go to the authors of this byte stuffing algorithm [1]. I also recommend to get used to the wikipedia article [2] and an implementation in C [3].
This procedure is useful for example when sending binary data frames over wire like with RS232 based communication devices [3]. But with what data is zero replaced ? COBS replaces zeros with the number of non zero bytes that follow plus one ( so called code byte), because in case of no nonzero bytes that follow zero has to be excluded. Example:

 
input:     0,10,20,30,0,40,50,60,0,0,70,80,90,100
output:  1,3,10,20,30,3,40,50,60,1,4,70,80,90,100

It turns out that the output sequence is by one element larger (front byte) than the input sequence, as long as no block larger than 254 bytes without any zero bytes occurs. In that case COBS usese 0xFF as a special code byte to express that 254 bytes follow without a trailing zero byte.

encode

Because we are using C++ it feels quite natural to operate on containers. We can define a sequence of bytes as a std::vector of uint8_t. This reflects that the practical use case of COBS is to modify a continuous stream of bytes (8 Bit integer). Yet it is possible to make a version which operates on other containers too. The exact size of the output sequence is not known beforehand, but it dependends on the nature of input data. For that reason I return a new sequence by value here.
In order to eliminate zero bytes we first have to to std::find their positions. Then std::distance is nearly the code byte, we only had to add one. If no zero byte within a block of 254 elements is found, then the iterator is clipped and is not incremented.
A special case occurs if the last element of the input sequence is zero. In that case we have to manually push_back 1 to the end of the output sequence.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
typedef std::vector ByteSequence;
 
ByteSequence cobs_encode(const ByteSequence &input)
{
  ByteSequence output;
  auto next_zero_byte = input.begin();
  auto previous_zero_byte = input.begin();
 
  while(next_zero_byte != input.end() )
  {
 
    next_zero_byte = std::find(next_zero_byte,
                               input.end(),
                               uint8_t(0));
 
    auto dist = std::distance(previous_zero_byte,next_zero_byte);
 
    // clip to  max distance:
    dist = dist < 254 ? dist: 254;
 
    if(dist == 254) next_zero_byte = previous_zero_byte + 254; 
 
    // add code byte to output:
    output.push_back(dist+1);  
 
    //insert block of bytes between to code bytes , e.g two zeros:
    output.insert(output.end(), previous_zero_byte, next_zero_byte); 
 
    if(   dist != 254
          && next_zero_byte != input.end() )
    {
      // if we found a zero byte we move iterator to prepare for next std::find :
      std:: advance(next_zero_byte,1);//next_zero_byte++;  
    }
 
    previous_zero_byte = next_zero_byte;
  }
 
  // last element is zero , add 1 to output: 
  if(input[input.size()-1] == uint8_t(0)) output.push_back(uint8_t(1)); 
 
 
  return(output);
}

decode

In case of decoding we know the positions of the code bytes and while iterating over these positions we insert the bytes between two code bytes at the end of the output sequence followed by a zero byte. If a previous code byte is 0xFF we simply skip pushing a zero byte to the end of the output sequence.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
ByteSequence cobs_decode(const ByteSequence &input )
{
  ByteSequence output;
 
  auto next_code_byte = input.begin();
  auto previous_code_byte = input.begin();
 
  while(next_code_byte != input.end() )
  {
    std::advance(next_code_byte,*next_code_byte);
 
    output.insert(output.end(),previous_code_byte+1,next_code_byte);
 
    if(    *previous_code_byte != 0xFF
           && next_code_byte != input.end())
    {
      //restore zero byte only in case if code byte was not 0xFF :
      output.push_back(0); 
    }
 
    previous_code_byte = next_code_byte;
 
 
 
  }
 
  return(output);
}

references / further reading

[1] www.stuartcheshire.org/papers/COBSforToN.pdf
[2] https://en.wikipedia.org/wiki/Consistent_Overhead_Byte_Stuffing
[3] http://www.jacquesf.com/2011/03/consistent-overhead-byte-stuffing/
[4] cobs implementation in C++

Leave a Reply