COBS revisited

Leave a reply

Today I pick an algorithm I like because of its usefulness and genious simplicity and make a version which uses C++ .
For source code along with some test see here [4].

consistent overhead byte stuffing

The idea is simple:
“ Take a given sequence of bytes (stream) and replace all elements which are zero. Then one can use zero as a delimiter to separate two or more sequences of bytes of that kind.” . All credits go to the authors of this byte stuffing algorithm [1]. I also recommend to get used to the wikipedia article [2] and an implementation in C [3].
This procedure is useful for example when sending binary data frames over wire like with RS232 based communication devices [3]. But with what data is zero replaced ? COBS replaces zeros with the number of non zero bytes that follow plus one ( so called code byte), because in case of no nonzero bytes that follow zero has to be excluded. Example:

 
input:     0,10,20,30,0,40,50,60,0,0,70,80,90,100
output:  1,3,10,20,30,3,40,50,60,1,4,70,80,90,100

It turns out that the output sequence is by one element larger (front byte) than the input sequence, as long as no block larger than 254 bytes without any zero bytes occurs. In that case COBS usese 0xFF as a special code byte to express that 254 bytes follow without a trailing zero byte.

encode

Because we are using C++ it feels quite natural to operate on containers. We can define a sequence of bytes as a std::vector of uint8_t. This reflects that the practical use case of COBS is to modify a continuous stream of bytes (8 Bit integer). Yet it is possible to make a version which operates on other containers too. The exact size of the output sequence is not known beforehand, but it dependends on the nature of input data. For that reason I return a new sequence by value here.
In order to eliminate zero bytes we first have to to std::find their positions. Then std::distance is nearly the code byte, we only had to add one. If no zero byte within a block of 254 elements is found, then the iterator is clipped and is not incremented.
A special case occurs if the last element of the input sequence is zero. In that case we have to manually push_back 1 to the end of the output sequence.

typedef std::vector ByteSequence;
 
ByteSequence cobs_encode(const ByteSequence &input)
{
  ByteSequence output;
  auto next_zero_byte = input.begin();
  auto previous_zero_byte = input.begin();
 
  while(next_zero_byte != input.end() )
  {
 
    next_zero_byte = std::find(next_zero_byte,
                               input.end(),
                               uint8_t(0));
 
    auto dist = std::distance(previous_zero_byte,next_zero_byte);
 
    // clip to  max distance:
    dist = dist < 254 ? dist: 254;
 
    if(dist == 254) next_zero_byte = previous_zero_byte + 254; 
 
    // add code byte to output:
    output.push_back(dist+1);  
 
    //insert block of bytes between to code bytes , e.g two zeros:
    output.insert(output.end(), previous_zero_byte, next_zero_byte); 
 
    if(   dist != 254
          && next_zero_byte != input.end() )
    {
      // if we found a zero byte we move iterator to prepare for next std::find :
      std:: advance(next_zero_byte,1);//next_zero_byte++;  
    }
 
    previous_zero_byte = next_zero_byte;
  }
 
  // last element is zero , add 1 to output: 
  if(input[input.size()-1] == uint8_t(0)) output.push_back(uint8_t(1)); 
 
 
  return(output);
}

decode

In case of decoding we know the positions of the code bytes and while iterating over these positions we insert the bytes between two code bytes at the end of the output sequence followed by a zero byte. If a previous code byte is 0xFF we simply skip pushing a zero byte to the end of the output sequence.

ByteSequence cobs_decode(const ByteSequence &input )
{
  ByteSequence output;
 
  auto next_code_byte = input.begin();
  auto previous_code_byte = input.begin();
 
  while(next_code_byte != input.end() )
  {
    std::advance(next_code_byte,*next_code_byte);
 
    output.insert(output.end(),previous_code_byte+1,next_code_byte);
 
    if(    *previous_code_byte != 0xFF
           && next_code_byte != input.end())
    {
      //restore zero byte only in case if code byte was not 0xFF :
      output.push_back(0); 
    }
 
    previous_code_byte = next_code_byte;
 
 
 
  }
 
  return(output);
}

references / further reading

[1] www.stuartcheshire.org/papers/COBSforToN.pdf
[2] https://en.wikipedia.org/wiki/Consistent_Overhead_Byte_Stuffing
[3] http://www.jacquesf.com/2011/03/consistent-overhead-byte-stuffing/
[4] cobs implementation in C++

easterStats with C++

Leave a reply

A new year has begun, new plans for the next holidays were made….I take a look at the calendar and funny questions come into my mind :

“When a child is born on easter sunday 2017 , how long does the child has to wait until it can
celebrate its birthday party on easter sunday again ?”

After digging into this I realized that I can explore some of the stl algorithms and do some basic statistical analysis.

when it is easter ?

The definition is : “Easter would fall after the first full moon following the vernal or spring equinox. (The equinox is a day in the year on which daytime and night-time are of equal length. This happens twice a year, once in spring and once in autumn.)

It turns out that the earliest possble date is March 22th and the latest posible date is the April the 25th. Despite this definiton is quite clear one can not put it directly into code.

In 1800 C. F. Gauss [1] developed a formula that returns the easter (sun)day for a given year.
In the following I will use a modified version of the easter formula after which was introduced by Lichtenberg [2]. You can find the source code here ([3])

For example:

int eDay = getEasterDay(2017);

returns 47, which means the “47th of March”, which is in fact the 47 – 31 = 16th of April.

easter data

For a given esterday we calcluate a list of years and we store this distribution into a std::map. The range of years is choosen from the year 1582 to the year 2582.

 
std::map<int,std::vector<int> > edayDistribution;
 
for(int e = 22 ; e < 57; ++e)
{
  eDayDistribution[e] = getListOfYears(e,1582,2582);  
}

The function getListOfYears loops over the range of years and compares against the given easter day like this:

std::vector<int> getListOfYears(int easter_day,int start_year , int end_year )
{
 std::vector listOfYears; 
 for(int year= start_year ; y < end_year; ++y)
 {
	if( easter_day == getEasterDay(year))
	{
		ret.push_back(year);
	}
 }
 
return (listOfYears);
}

easter day repeat interval

This allows us to answer the question from the introduction like this:

std::vector<int> list_16th_of_April = eDayDistribution[47];
auto pos_of_2017 = std::find(list_16th_of_April.begin(),
                             list_16th_of_April.end(),
                             2017);
int result =  (*(pos_of_2017+1) - *pos_of_2017);

In this case it takes 11 years from 2017 on to meet easter sunday on the 16th of April again.

From that data base we can also plot the absolute frequency of the easter day, by simply draw the size of the vector ( eDayDistribution[e].size() ) against e.

std::adjacent_difference

The easter day repeat interval in the previous example is the difference of adjacent elements. To do it on the entire vector the stl provides us an algorithm called std::adjacent_difference.
Because this algorithm changes the elements in the vector we make a copy of the map first by using the assignment operator and than apply the algorithm for each vector in our map:

 
std::map<int, std::vector<int> > eDayDiffDistribution = eDayDistribution;
 
for(size_t e = 22 ; e < 57; ++e)
{
 std::adjacent_difference(eDayDiffDistribution[e].begin(),
			  eDayDiffDistribution[e].end(), 
		          eDayDiffDistribution[e].begin());
}

The first element of the resulting vector remains the unchanged value, because we can get only eDayDistribution[e].size()-1 differences. As a side effect we can reconstruct the original vector by adding up all the differences if needed.
For April 16th it looks like this:

statistical data

min, max

So what are the minimum and maximum number of years we have to wait ? So let us use stl min_element and max_element.
Note that we must skip the first element of the vector, because it does not contain a difference. These algorithms return an iterator pointing to min or max and by dereferncing the iterator we get the value.

auto min = std::min_element(eDayDiffDistribution[e].begin()+1,
                            eDayDiffDistribution[e].end());
 
auto max = std::max_element(eDayDiffDistribution[e].begin()+1,
                            eDayDiffDistribution[e].end());

mean
How long to wait on average ? So the work of looping over the elements and calculate the sum is taken over by std::accumulate.

int numberOfData = eDayDiffDistribution[e].size()-1;
auto sum = std::accumulate(eDayDiffDistribution[e].begin()+1,
                           eDayDiffDistribution[e].end(),
                           0.0);
double mean= sum/ numberOfData;

The third argument to accumulate is the start value of the sum.

standard deviation
std::accumulate uses a default computation kernel : a = a + b with 0.0 as initial value of the sum. An overloaded std::accumulate takes abinary operation function object that will be applied. With the help of a lambda we can rewrite like this:

auto sum =  std::accumulate(eDayDiffDistribution[e].begin()+1,
                            eDayDiffDistribution[e].end(),
                            0.0,
                            [](int a, int b) 
                              { 
                               return (a+b);
                              }
                            );

The standard deviation of a distribution [4] is proportional to the sum of the squared difference between the mean value and a data value. So we only have to rewrite the lambda function to compute the standard deviation. Note that we catch the previously calculated mean value.

double  sum_diff_squared = std::accumulate(eDayDiffDistribution[e].begin()+1,
                                           eDayDiffDistribution[e].end(),
                                           0.0,
                                           [mean](int a, int b)
						 {
			                              return (a + pow((b-mean),2.0) );
						 }
					    );
double std_dev =  sqrt( sum_diff_squared/(numberOfData -1));

The same applies for other statistical momentums based on the mean value like skewness:

// define a lambda function for "skewness":
auto diffCubicOp = [&mean](int a,int b)
                  {
                   return( a + pow((b-average),3.0) );
                  };	
 
double sum_diff_cubic = std::accumulate(eDayDiffDistribution[e].begin+1,eDayDiffDistribution[e].end(),0.0,diffCubicOp);
 
double skewness =sum_diff_cubic/((numberOfData-1)*pow(std_dev,3));

median
Sometimes one prefers the median over the mean value of a distibution for stability reasons. For example “outliers” are not taken into account that much. To calculate the median we have to sort the list of data and than take the value which is in the middle of the sorted list. If the list contains an even number of data points we take the average of the two values left and right of the “middle”. By knowing this it sounds straight forward to use std::sort. But it turns out that it is not neccessary to sort the entire container completly. It would fit if the elements left of the middle element are less and the elements right from the middle are greater than the middle element. For that purpose we can use std::nth_element, which does the described partial sort. We only have to specifiy the nth element as our middle elememt. If the size of the container is odd, we need only one call to std::nth_element . Otherwise we run a second time with (n+1)th element and average the two values.

template<class T,class Iter>
double median(const Iter begin, const Iter  end)
{
 T tmp(begin,end);
 std::nth_element(tmp.begin(),
                  tmp.begin()+tmp.size()/2,
                  tmp.end());
 
 double ret = tmp[tmp.size()/2];
 if(tmp.size() % 2 == 0 ) // second pass and average
  {
   std::nth_element(tmp.begin(),
 	            (tmp.begin()+tmp.size()/2)+1,
                    tmp.end());
 
   ret =( ret + tmp[ (tmp.size()/2)+1 ] )/2.0;
  }
  return ret;
 
 }

summary

references / further reading

[1] https://en.wikipedia.org/wiki/Easter
[2] H.Lichtenberg: “Zur Interpretation der Gauß’schen Osterformel und ihrer Ausnahmeregeln”, Historia Mathematica 24 (1997), S.441-44
[3] https://github.com/vlovo/easterStats
[4] https://en.wikipedia.org/wiki/Standard_deviation

ProtoMQTT::four()

Leave a reply

glue it all together

In the last post I already introduced class RobotCtlr. It holds the defined protocol buffer data structure [1] as a class member as well as a reference to the mqtt client handle.
As a reaction to an incoming message the robot can move to a given position.

class RobotCtrl
{
public:
 
 RobotCtrl(std::string robotName);
 ~RobotCtrl();
 
 void setClient(MQTTClient &client);
 
 int move(std::vector<double> position);
 
 int publishMessage();
 
 static int onMessageArrived(void* context, 
                             char* topic,
                             int tlen,
                             MQTTClient_message *msg);
 
 
private:
 RobotMsg mRobotMessage;
 MQTTClient mMQTTClient;
 std::vector<double> mPosition;
};

publish me

To push a message to the connected broker we have to serialize proto message to a byte sequence which is taken over by the MQTTClient_message.
Proto buffers offers a bunch of methods for this task, e.g. SerialzeToArray (line 13,14) or if you prefer to convert to std::string first just use SerializeAsString() (line 16,17). Beforhand we set the current timestamp as described in the previous post [2].

 
int publishMessage( )
{   
 int rc = 0;
 
 ptime t0 = microsec_clock::local_time();
 std::string  timeString  = to_simple_string(t0);
 
 mRobotMessage.set_timestamp(timeString);
 
 MQTTClient_message pubmsg = MQTTClient_message_initializer;         
 
 mRobotMessage.SerializeToArray(pubmsg.payload,mRobotMessage.ByteSize());
 pubmsg.payloadlen = mRobotMessage.ByteSize();
 
// shown as example to serialize to std::string
 pubmsg.payload = (void*)(mRobotMessage.SerializeAsString()).c_str(); 
 pubmsg.payloadlen =(int)(mRobotMessage.SerializeAsString()).size();
 
 
if(nullptr != mMQTTClient)
{
 MQTTClient_deliveryToken dt =0; 
 std::string topic = "Robo/data";
 rc = MQTTClient_publish(mMQTTClient,
                         topic.c_str(),
                         pubmsg.payloadlen,
                         pubmsg.payload,
                         pubmsg.qos,
                         pubmsg.retained,
                         &dt);
 
}
else
{
    rc = -1;
}
 
return (rc);
}

In this case the mqtt message is send in qos (quality of service leves) zero, which leads to a fire and forget behaviour that means the message won’t be acknowledged by the receiver or stored and redelivered by the client. A good summary of qos can be found at [3].

someone´s calling

The clas RoboCtrl provided a static method onMessageArrived which has to be registered to MQTT subscribe notification see [4].
By casting the context void* parameter to a RobotCtrl object we can effectively use the method of the actual object.
The arrived MQTTClient_message can be parsed directly into a RobotMsg object by using ParseFromArray member function.

 
static int onMessageArrived(void* context, char* topic , int tlen, MQTTClient_message *msg)
{
 RobotCtrl *parent = (RobotCtrl*) context;
 
 std::string _topic(topic);
 RobotMsg _msg;
 
 _msg.ParseFromArray(msg->payload,msg->payloadlen);
 
 std::vector<double> p(_msg.position_size(),0);
 
 memcpy(p.data(),
       _msg.mutable_position()->mutable_data(),
       _msg.position_size()*sizeof(double));
 
 
// setter with index and value 
//_msg.set_position(0,double(0.0);
 
// dynamically add element 
//  _msg.add_position(double(0.0));
 
 parent->move(p);
 
 return(1);
}

In this simple demo an arrived message is interpreted as a command to move to the new position in the included message.

accessing repeated fields of protocol buffer

The protocol buffer API provides direct access to the memory of repeated field (dynamic array) . So it is possible to copy all elements to a std::vector. This is done by calling mutable_data() of the repeated field mutable data member (line 13). In a use case where index based access is prefered one can use the set_position() call and/or add_positon() to add elements at the end of the buffer.
The commanded position is now processed by a call to the object move function.

let’s loop it

In our main function the only thing left is to subcribe to a message and then start a little loop to keep the process running, where we decide to send out the actual message every 500ms.

 
int ec = MQTTClient_subscribe(mqttClient,"Robo/Input",0);
 
for(;;)
{
  Sleep(500);
  robotControler.publishMessage();
}

With this post the small blog series about protocol buffers in combination with mqtt ends. Full source code can be found at [5].

references / further reading

[1] http://techblog.boptics.de/protomqttone/
[2] http://techblog.boptics.de/protomqttthree/
[3] http://www.hivemq.com/blog/mqtt-essentials-part-6-mqtt-quality-of-service-levels
[4] http://techblog.boptics.de/protomqttthree/
[5] https://github.com/vlovo/ProtoMQTT/

ProtoMQTT::three()

Leave a reply

what´s the time

In the last two posts everything was set up to use protocol buffers and a example message was defined [1],[2]. The robot message contains a data field timestamp which is useful to check for the ordering of messages (if from the same source) or if implementing some kind of archival storage. By design this data field is defined as string, because it should be human readable and human interpretable in an easy way and second reveal some kind of standard. We follow ISO 8601 and use boost::posix_time from the Boost.Date_Time library [3]. The following code will demo how it is used.

#include "boost/date_time/posix_time/posix_time.hpp" 
 
using namespace boost::posix_time;
 
void demoPosixTime()
{   
 
  ptime t0 = microsec_clock::local_time();
 
  std::string  timeString  = to_simple_string(t0);
 
  std::string  timeIsoString = to_iso_string(t0);
 
  ptime t1 = time_from_string(timeString);
 
  ptime t2 = from_iso_string(timeIsoString);
 
  if( t0 == t1 && t1 == t2 )   std::cout << "t0 ,t1 and t2 are equal" << "\n";
 
  ptime t3 = from_iso_string("20161120T170143.558219");  
 
  if(t3 > t1 )
  {
	std::cout << " t3 is a new message \n";
  }
  else
  {
        std::cout << "t3 is a message from the past \n";
  }
  return ;
}

Just include posix_time.hpp and use boost::posix namespace for less typing. You start by constructing a ptime object from a clock. There are two clocks available which differ in time resolution. For example you can choose between microsec_clock or second_clock each promising different resolutions. Note: If you have strong recommendations for your time resolution please test first on your system !
From each clock we can get the actual time, for example local_time like in the example. A second option is universal_time() for getting UTC [4] There are two free functions available which convert the ptime object in a string representation. to_simple_string gives a nice readable notation of the time whereas to_iso_string encodes to more compact (5 Bytes less than to_simple_string) ISO 8601 string.
The reverse operation can be achieve by calling from_iso_string or time_from_string for simple_string. Unfortunately the API is not symetric by names in this case. Do not mix the operations ! To call from_iso_string whith a simple_string will lead to an error.

Now, the big advantage after obtain the ptime object from lets say a string, is that you can compare two time points, because operators a well defined for ptime objects.

setting up MQTT

First we have to create a MQTTClient handle with the library function call MQTTClient_create. We need to specify a serverURL which we want to connect later. For the moment this server , also called a broker, is a public accessable one at iot.eclipse.org with default port 1883. Note: there is no user / password auth and there is no security/encryption like TLS, so be careful about your secret robot messages. To identify the client we can put a clientID into that function too. It must not more than 23 characters utf 8 encoded string.For example we can use the mac adress of one of our networkadapters here.

#include  "MQTTClient.h" 
 
RobotCtrl robotControler("Kraftwerk"); 
 
MQTTClient mqttClient;
 
std::string  clientID=  "RoboClient"; // do not use colons inside
 
int rc = MQTTClient_create(&mqttClient,
                          "tcp://iot.eclipse.org:1883", 
                           clientID.c_str(),
                           MQTTCLIENT_PERSISTENCE_DEFAULT, 
                           NULL);
 
rc = MQTTClient_setCallbacks(mqttClient,
                            &robotControler,
                            0,
                            RobotCtrl::onMessageArrived,
                            0);

! Note: it turns out that any colon appears in the client ID the connectio to the broker will fail.

In case we want to receive a message (subscribe) paho lib can invoke a user defined callback. The callback has to set BEFORE the connection to the broker takes place.
Unfortunately the MQTTClient_setCallback methods accept a pointer to a function, so we can not feed in a function pointer to a member function of our object robotControler which is responsible for control the robot, use protocol buffer message and publishing to the broker.
So only global or static functions are possible for the callback. So the callback function can be implemented as a static function of our RobotCtrl class like this:

 
static int RobotCtrl::onMessageArrived(void* context, char* topic , int tlen, MQTTClient_message *msg)
{
 RobotCtrl *caller = (RobotCtrl*) context;
 std::string _topic(topic);
 if( "right_topic" == _topic)
 {
    // process message
 }
 return(1);
}

By casting the context pointer to our RobotCtrl , we can actually use the calling object to invoke its mehods here. I come back to that function later.
! Note: if you return 0 from this function the callback is invoked again. So a value not equal to zero indicates a successful message processing.

connect me

For the connection we need to populate the connectOptions struct. For convenience the lib provides default initalizer here.
In case of connection termination the client can send a last will, which is defined in the MQTTClient_willOptions. So any listener who subscribed to the topic “Robo/disconnect” will
receive the message “R2D2 disconnected” right away.

 
MQTTClient_connectOptions opts = MQTTClient_connectOptions_initializer;
MQTTClient_willOptions wopts = MQTTClient_willOptions_initializer;
 
opts.keepAliveInterval = 20; // client checks connection every 20s
opts.cleansession = 1;
opts.connectTimeout = 2;
opts.will = &wopts;   
opts.will->message = "R2D2 disconnected";
opts.will->qos = 1;
opts.will->retained = 0;
opts.will->topicName = "Robo/disconnect";
 
rc = MQTTClient_connect(mqttClient, &opts);

The connect function returns MQTTCLIENT_SUCCESS on success.

references / further reading

[1] http://techblog.boptics.de/protomqttone/
[2] http://techblog.boptics.de/protomqtttwo/
[3] http://www.boost.org/doc/libs/1_55_0/doc/html/date_time/posix_time.html
[4] https://en.wikipedia.org/wiki/Coordinated_Universal_Time

ProtoMQTT::two()

Leave a reply

define what to say

The fundametal concept behind protcol buffers is to define a data structure containing all informations you like to involved. For this data structure a special format is used and it is defined in a so called .proto file.
After that the protocl buffer compiler translates this .proto file into a C++ class which is allows to access the data and in addition to that to do serializing/deserializing operations.
Because the .proto file kind of gerneral and descriptive they can translate to C#,Go,Java and Phyton out of the box , too.
That means once a common data structure or message is defined , different languages can talk to each other.
So lets jump into an example. Lets say we want to design a message a robot can send out to the world. Our .proto file RobotMsg.proto could look like that

message RobotMsg{
 
 required int32 messageId = 1;
 required string deviceName = 2;
 required string  timestamp = 3;
 
 
 enum RobotStates{
	  Unkown =0;
          Error=1;
          Connected=2;
          Idle=3;
          Moving=4;
      	  };
 
  required RobotStates robotState = 4;
  repeated double position = 5;
  optional int32  digitalInputBitMask= 6;
 
}

We see that POD types a available like int32 or double and composite types like enum , too. A dynamic array is defined by the key word repeated. Each data member is labeled by unique numbered tag.

forever or compatible

Sometimes it is neccessary to update your proto message by for example adding one data member. Therefore the keyword optional gives you the opportuity to keep things compatible in older versions. On the other hand the required keyword makes things stay forever. Data field with repeated keyword are optional by nature, because an array can have zero elements. In this case some robots do have a digital input extension, some have not so the bitmask is labeled optional.

I recommend to take a look at the offical protocol buffer documentation [1] for in deeper readings.

let’s generate

If you we do not want to get in touch with the protocol buffer compiler by hand everytime we can make use of cmake integration. The followinng lines in our CMakeLists.txt will do the job.

PROTOBUF_GENERATE_CPP(PROTO_SRCS PROTO_HDRS RobotMsg.proto)
include_directories(${CMAKE_CURRENT_BINARY_DIR})

That means our RobotMsg.proto file is feed to the compiler and the generates two files: RobotMsg.pb.h and RobotMsg.pb.cpp. These files are accessed by PROTO_SRCS and PROTO_HDRS variables.
The generated files are placed into our build tree directory. Therefore it is very important to add the CMAKE_CMAKE_CURRENT_BINARY_DIRBUILD_DIR to out include directories. Of course we have to compile the RobotMsg.pb.cpp and add it our the project see full CMakeLists.txt on github [2].

Hello Robot

No we ready to use our RobotMsg in our C++ code :

#include <iostream>
#include "RobotMsg.pb.h"
 
int main()
{
 
 RobotMsg msg;
 msg.set_devicename(std::string("R2D2"));
 msg.set_robotstate(RobotMsg_RobotStates_Connected);
 
 std::cout << "Hello Robot: size is "  << sizeof(msg) << "\n";
 std::cout << "Hello Robot: byte size is " << msg.ByteSize() << "\n";
 std::cout << "Hello Robot: " << msg.devicename()  <<  "\n";
 
 return(0);
}

Here we create the RobotMsg object on the stack and use setters/ getters for the data members we defined in the .proto file.

references / further reading

[1] https://developers.google.com/protocol-buffers/docs/proto

[2] https://github.com/vlovo/ProtoMQTT.git

ProtoMQTT::one()

Leave a reply

In this series of blog post I am going to demo the combination of mainly two libraries / technologys which I came across recently and which I find very useful. I am going to use google protocol buffers (protobuf) together with MQTT ( paho mqtt lib). As the project evolves probably there will be other libraries to mention. My toolchain is as follows: CMake, Visual Studio 2010. The source code will be available on my github repo [1]

building protocol buffers

We start with building protocol buffers. After downloading from their github repo [2] we find a CMakeLists.txt in the subfolder cmake.So we can run cmake-gui.exe and set the source dir to subfolder cmake.

Here are some hints:

I recommend to create build directory as a subfolder named vsprojects, because it turns out that if we would like to use the built in cmake command find_package the default behaviour is directed to this exact directory.
there is a default option to select static linking to MS runtime libraries, that can cause trouble when linking to protocol buffers in own projects. I prefer dynamically linking to runtime.
if cloned from github the build test option does not work, because of missing gmock framework

So my cmake settings looks like this.

Then simply build release and debug configurations.

consuming protocol buffers

Conuming protocol buffers in your project is super easy. In our CMakeLists.txt we do

SET(PROTOBUF_SRC_ROOT_FOLDER "X:/protobuf")  
find_package(Protobuf REQUIRED)
include_directories(${PROTOBUF_INCLUDE_DIRS})

That means after giving a hint to the root folder , find_package does the job and gives us PROTOBUF_INCLUDE_DIRS and PROTOBUF_LIBRARIES variables. The letter one will be
used for linker configuration later on.

building paho mqtt library

After downlading from the github repo [3] it turns out that they have cmake support to build the library. Unfortunately the cmake project generation is broken for Windows as noted in pull request #141. In addition to that the build is broken on older compilers like VS 2010 due to a style mixed declarations as noted in pull request #183. You can find the corrected CMakeLists.txt and source files in subfolder patches/paho on my github repro.

There is the opportunity to build the paho mqtt lib with SSL support which requires openSSL, but for now I leave it out.

consuming paho mqtt c library

The library comes in two flavours. One of which supports synchronous and one which deals with asynchronous operation. So for now we want to consume both and our CMakeLists.txt for that looks like :

find_path(PAHO_MQTT_INCLUDE_DIR MQTTClient.h)
find_library(PAHO_MQTT_SYNC_LIBRARY NAMES paho-mqtt3c.lib)
find_library(PAHO_MQTT_ASYNC_LIBRARY NAMES paho-mqtt3a.lib)
SET(PAHO_MQTT_LIBRARIES ${PAHO_MQTT_SYNC_LIBRARY} ${PAHO_MQTT_ASYNC_LIBRARY})
include_directories(${PAHO_MQTT_INCLUDE_DIR})

summary

I showed to configure and build protobuf and paho lib for VS2010. I hope this gonna save someone extra time if doing the same. At the end of the day, I can use compile and link with the two libs in my own project with:

add_executable(ProtoMQTT main.cpp ${PROTO_SRCS})
target_link_libraries(ProtoMQTT ${PROTOBUF_LIBRARIES} ${PAHO_MQTT_LIBRARIES})

references / further reading
[1] https://github.com/vlovo/ProtoMQTT.git
[2] https://github.com/google/protobuf.git
[3] https://github.com/eclipse/paho.mqtt.c

C++ STL unicode encoding conversion with STL strings

Leave a reply

In a recent issue of MSDN magazine I found a good article ((https://msdn.microsoft.com/magazine/mt763237?MC=CCPLUS&MC=Windows )) explaining how to convert between utf8 and utf16 encodings of std::string by using WIN 32 API functions. Besides a good written overview it provides the usable c++ code for download.

To write platform independent code, here I sum up, how far we can get with C++11 and STL library. Note: the code compiles and runs with C++0X capable compilers like VS 2010.

storage type

In order to store utf8 encoded strings we can use std::string. In order to store utf16 encoded string we should use std::u16string, which is basic_string with underlying type char16_t. In contrast to std::wstring the std::u16string is the same on all platforms.

utf8 -> utf16

#include <locale>
#include <codecvt>

 
typedef std::codecvt_utf8_utf16<char16_t>  conversionFacet;


std::u16string Utf8ToUtf16(const std::string& utf8)
{
 std::u16string utf16;

 std::wstring_convert<conversionFacet, char16_t> converter;

 utf16 = converter.from_bytes(utf8);

 return(utf16);

}

The working horse here is the template class std::wstring_convert. Despite its name it can be used not only with std::wstring, but due to its template parameter also with char16_t.

As a first template parameter it takes a individual facet. In this case we use the template class std::codevct_utf8_utf16 as a conversion facet.

The inverse conversion can be implemented in a similar way, just by calling member function to_bytes.

utf16 -> utf8

std::string Utf16ToUtf8(const std::u16string& utf16)
{
  
 std::string utf8;
    
 std::wstring_convert<conversionFacet, char16_t> converter;  
  
 utf8 =  converter.to_bytes(utf16); 
  
 return(utf8); 
}

summary

With c++11 we can use std::string, std::u16string and std::u32string to deal better with platform independent unicode support.

I showed simple example to convert between utf8 and utf16 encodings. One can implement other conversions in a similar way ,for example utf16 ->utf32.

boost::format your string

Leave a reply

Do you use sprintf in your code ? Did you see it even in “written in C++” called code ? Well, ok it is way to format your numbers to a string and perhaps it is liked most for its convenience and ease of use. There are some reasons not to use yout get by experience or by researching the web.

Recently I start using boost::format as an alternative to sprintf.

Suppose you have to program some hardware device for example a motion controller by sending a dedicated ASCII character string over some wire interface to trigger certain action like

std::string  command = "PAX=10000;SPX=30000;AMX=100000;BGX";
int ret = device.sendCommand(command);

Where usally the character X for selecting differnt motion stages and any of the numbers can change during runtime. So how to do this ?

sprintf

char* commmand;
const char* axis = "X";
sprintf(command,"PA%s=%d;SP%s=%d;AM%s=%d;BG%s",
                axis,
                pos,
                axis,
                speed,
                axis,
                acc,
                axis);
int ret device.sendCommand(std::string(command));

stringstream

std::stringstream stream;
std::string axis ="X";
stream << "PA" 
       << axis 
       << "="
       << pos
       << ";" 
       <<"SP"
       <<axis
       <<"="
       <<speed
       <<";"
       <<"AM"
       <<axis"="
       <<acc
       <<"BG"
       <<axis;
int ret = device.sendCommand(stream.str());

Using a stringstream object is definitely type safe C++, but gets unhandy and error prone very quick. Especially in this case. You get the point why many people go for sprintf.

boost::format

With boost::format it turns out that we can implement the example like this:

boost::format  cmdMove("PA%1%=%2%;SP%1%=%3%;AM%1%=%4%;BG%1%");
std::string command = (cmdMove % axis % pos % speed % accel).str();

int ret device.sendCommand(command);

First we create a boost::format object, which we can reuse or implement as a member in our command classs etc. The ctor takes a string and the variable arguments are marked by so called positional arguments which allows for reuse or reordering. When assiging the command string the arguments are fed with the % operator into the object. The argument axis ( %1%) is used four times in this case, which led to less typing and cleaner code.

With boost::format you can also realize posix-printf styles, but in a type safe way.

You can find more info at http://www.boost.org/doc/libs/1_58_0/libs/format/

string to number – the C++ way

Leave a reply

The recent post atof – root of some evil, showed that one has to be careful using atof. Since it is a function from standard C library, what does C++ really has to offer here ? Well of course it has not an answer of thinking about code usage in a multilanguage enviroment, but gives you good mechanism to write your own conversion.

Basic

 
#include  <sstream> 
float stringToFloat(const char *in)
{
  float val = 0.0f;
  std::istringstream inputStream(in);
  inputStream >> val;
  return(val);
};

The point is that you create a istringstream object from your buffer and use the extraction operator >> to get the float value.

Something failed ..?
You can use the fail member function of istringstream to check for failure.

 
#include  <sstream> 
float stringToFloat(const char *in)
{
  bool failed = false;
  float val = 0.0f;
  std::istringstream inputStream(in);
  failed = ( inputStream >> val).fail();
  if(true == failed) throw std::exception("extraction failed");
  return(val);
};

Better than atof ?

The big advantage now is that the istringstream object has its own locale and nobody can change that like with C library setlocale. Wow, perfect encapsulation ! Of course the default locale is “english” including the point as a decimal separator.

Locale invariant

If you like to make the conversion more “general” you can check for a comma in input string and change the locale of istringstream like this:

// class for decimal numbers with comma
#include  <sstream> 
class UseCommaAsSeparator: public std::numpunct<char> 
{
  protected: char do_decimal_point() const { return ','; } 
};

float stringToFloat(const char *in)
{
  bool failed = false;
  float val = 0.0f;
  std::istringstream inputStream(in);

  if( inputStream.str().find(",") != std::string::npos) 
  {
    std::locale loc = std::locale(std::locale(),new UseCommaAsSeparator);
    inputStream.imbue(loc);
  }
  failed = ( inputStream >> val).fail();
  if(true == failed) throw std::exception("extraction failed");
  return(val);
};

By overriding the do_decimalpoin() member of std::numpunct<char> and imbueing the default locale. Now the conversion is independend of the decimal separator in the input string.

Template programming ?

Are you looking for an application of C++ template programming. Here comes a good candidate, because you do not only want to convert to float ,but also to double, to int etc. ! So let´s make a nice template function.

// class for decimal numbers with comma
#include  <sstream> 
class UseCommaAsSeparator: public std::numpunct<char> 
{
  protected: char do_decimal_point() const { return ','; } 
};

template <typename T >
T stringToNumber(const char *in)
{
  bool failed = false;
  T val;
  std::istringstream inputStream(in);
  if( inputStream.str().find(",") != std::string::npos) 
  {
    std::locale loc = std::locale(std::locale(),new UseCommaAsSeparator);
    inputStream.imbue(loc);
  }
  failed = ( inputStream >> val).fail();
  if(true == failed) throw std::exception("extraction failed");
  return(val);
};

It is really typesafe !

Ok let’s use our template function.

unsigned char  number =0;
number = stringToNumber<unsigned char>("42");
std::cout << number << "\n";

Not surprisingly the output is not 42

Our expection that the data type unsigned char represents a number is wrong, it represents a character and so the istringstream >> operator type safe behaves like this and only extract the first character. In this case ‘4’. The ASCII representation is decimal 52. So if we cast number to for example int, we get 52 on the console output. Neither what we want.

The usage of unsigned char for number representation is very common to represent 8Bit unsigned data. We can implement a specialized template for this data type like this:

template<>
unsigned char stringToNumber<unsigned char>(const char *in)
{
  unsigned char val=0;
  val= static_cast<unsigned char> (stringToNumber<int>(in));
  return (val);
};

atof – root of some evil

Leave a reply

So when it comes to string to number conversion the atof, atoi … functions from good old C standard lib get very handy. Even if you a c++ programmer it is easy as 123 to write for example:

float f = atof(stringbuffer);

Localisation
There are more than one reasons to do better than that ! But let´s talk about localisation. What! You are a Hardcore c programmer doing fancy lowlevel stuff ? Never care about languages ? Never interact with user ? But guys you could be nailed down anyway!

Use case
Imagine you develop a nice small library doing the coolest bit banging algorithm or controlling the next mission to mars. You need to read/write a few parameters from a ASCII Text file( e.g. XML or ini style ). So lets say the file is like this:
[magic numbers] P1 = 1.23 P2 = 0.234

So ,you code straightforward or even just modify existing code:

#include <stdlib.h>
float getMagicNumber(char *filename )
{
 char *stringbuffer=NULL;
 stringbuffer = readFromFile("P2",filename);
 float f = atof(stringbuffer);
 return (f);

};

The others

So know your code runs well and you implement some test clients and are ready to ship the library to the customer.

A few month later a phone call comes in saying your algorithm heavily failed on a simple uses case. Fortunatly, you did implement a file logging and then you see that paramete P1 was set to f = 0.0 during getMagicNumber. What happens ?

After a couple days(if you are lucky) of investagtion and testing it is quite clear :

The customers application uses the function call :

 char *currentCategory = setlocale (LC_ALL,"");

to adjust his needs on his current enviroment which was set to french, because know they selling their product in France.Unfortunately this changes entire locale of the current program(process) and unfortunately in french the decimal separator is a comma, so the atof drops all after the point.

Doing it quick

If you really, really want to stay with atof you have to check for the current local setting ( to be precise , the decimal separator) like:


std::string getCurrentSeparator()
{ 
 struct lconv * lc;
 lc=localeconv();
 return(std::string(lc.decimal_point));
};

and take this into account.

consistent overhead byte stuffing

encode

decode

references / further reading

when it is easter ?

easter data

easter day repeat interval

std::adjacent_difference

statistical data

summary

references / further reading

glue it all together

publish me

someone´s calling

accessing repeated fields of protocol buffer

let’s loop it

references / further reading

what´s the time

setting up MQTT

connect me

references / further reading

define what to say

forever or compatible

let’s generate

Hello Robot

references / further reading

building protocol buffers

consuming protocol buffers

building paho mqtt library

consuming paho mqtt c library

summary

references / further reading [1] https://github.com/vlovo/ProtoMQTT.git [2] https://github.com/google/protobuf.git [3] https://github.com/eclipse/paho.mqtt.c

utf8 -> utf16

utf16 -> utf8

boost::format

references / further reading
[1] https://github.com/vlovo/ProtoMQTT.git
[2] https://github.com/google/protobuf.git
[3] https://github.com/eclipse/paho.mqtt.c