Category Archives: strings

C++ STL unicode encoding conversion with STL strings

In a recent issue of MSDN magazine I found a good  article ((https://msdn.microsoft.com/magazine/mt763237?MC=CCPLUS&MC=Windows )) explaining how  to convert between  utf8  and utf16 encodings  of std::string  by  using  WIN 32 API functions. Besides a good written overview  it provides the  usable c++ code for download.

To write  platform independent code, here I  sum up, how far we can get with  C++11 and STL library. Note: the code compiles and  runs  with C++0X capable  compilers like VS 2010.

storage type

In order to store  utf8 encoded strings we can use  std::string. In order to store utf16 encoded string we  should use  std::u16string, which is basic_string with  underlying type  char16_t.  In contrast to std::wstring  the std::u16string  is the  same on all platforms.

utf8 -> utf16

#include <locale>
#include <codecvt>

 
typedef std::codecvt_utf8_utf16<char16_t>  conversionFacet;


std::u16string Utf8ToUtf16(const std::string& utf8)
{
 std::u16string utf16;

 std::wstring_convert<conversionFacet, char16_t> converter;

 utf16 = converter.from_bytes(utf8);

 return(utf16);

}

The working horse here is the template class  std::wstring_convert. Despite its name it can be used not only with std::wstring, but  due to its template parameter  also with char16_t.

As a first template parameter it takes a individual facet. In this case we use the  template class std::codevct_utf8_utf16 as a conversion facet.

The inverse conversion can be implemented  in a similar way, just by calling  member function to_bytes.

utf16 -> utf8

std::string Utf16ToUtf8(const std::u16string& utf16)
{
  
 std::string utf8;
    
 std::wstring_convert<conversionFacet, char16_t> converter;  
  
 utf8 =  converter.to_bytes(utf16); 
  
 return(utf8); 
}

 

summary

With c++11 we can use    std::string,  std::u16string and  std::u32string  to deal better with platform independent unicode support.

I showed simple example to convert between  utf8 and utf16 encodings. One can implement other conversions  in  a similar way ,for example  utf16 ->utf32.

 

 

 

boost::format your string

Do you use sprintf in your code ?  Did you see it even in  “written in   C++” called code ?   Well, ok it is  way  to format your  numbers  to a string and perhaps it is liked most for its  convenience and ease of use. There are some reasons not to use yout get by experience or by researching the web.

Recently I start using  boost::format   as an alternative to   sprintf.

Suppose you have to program some hardware device for example a motion controller by sending a  dedicated  ASCII character string over some wire interface to trigger certain action like

std::string  command = "PAX=10000;SPX=30000;AMX=100000;BGX";
int ret = device.sendCommand(command); 

Where usally  the character  X for selecting differnt motion stages and any of the numbers can change  during runtime.   So how to do this ?

sprintf

char* commmand;
const char* axis = "X";
sprintf(command,"PA%s=%d;SP%s=%d;AM%s=%d;BG%s",
                axis,
                pos,
                axis,
                speed,
                axis,
                acc,
                axis);
int ret device.sendCommand(std::string(command));

stringstream

std::stringstream stream;
std::string axis ="X";
stream << "PA" 
       << axis 
       << "="
       << pos
       << ";" 
       <<"SP"
       <<axis
       <<"="
       <<speed
       <<";"
       <<"AM"
       <<axis"="
       <<acc
       <<"BG"
       <<axis;
int ret = device.sendCommand(stream.str());

Using  a stringstream object is definitely  type safe C++,  but gets  unhandy and error prone very quick. Especially in this case. You get the point why many people  go for sprintf.

boost::format

With boost::format  it turns out that we can implement the example like  this:

boost::format  cmdMove("PA%1%=%2%;SP%1%=%3%;AM%1%=%4%;BG%1%");
std::string command = (cmdMove % axis % pos % speed % accel).str();

int ret device.sendCommand(command);

First we  create a boost::format  object, which we can reuse or implement as a member in our command classs etc. The ctor takes a string and the variable arguments are marked by so called positional arguments which allows for reuse or reordering. When assiging the command string the arguments are fed with the % operator into the object. The argument axis ( %1%) is used four times in this case, which led to less typing and cleaner code.

With boost::format you can also  realize posix-printf styles, but in a type safe way.

You can find more info  at http://www.boost.org/doc/libs/1_58_0/libs/format/