C++ STL unicode encoding conversion with STL strings

In a recent issue of MSDN magazine I found a good  article ((https://msdn.microsoft.com/magazine/mt763237?MC=CCPLUS&MC=Windows )) explaining how  to convert between  utf8  and utf16 encodings  of std::string  by  using  WIN 32 API functions. Besides a good written overview  it provides the  usable c++ code for download.

To write  platform independent code, here I  sum up, how far we can get with  C++11 and STL library. Note: the code compiles and  runs  with C++0X capable  compilers like VS 2010.

storage type

In order to store  utf8 encoded strings we can use  std::string. In order to store utf16 encoded string we  should use  std::u16string, which is basic_string with  underlying type  char16_t.  In contrast to std::wstring  the std::u16string  is the  same on all platforms.

utf8 -> utf16

#include <locale>
#include <codecvt>

 
typedef std::codecvt_utf8_utf16<char16_t>  conversionFacet;


std::u16string Utf8ToUtf16(const std::string& utf8)
{
 std::u16string utf16;

 std::wstring_convert<conversionFacet, char16_t> converter;

 utf16 = converter.from_bytes(utf8);

 return(utf16);

}

The working horse here is the template class  std::wstring_convert. Despite its name it can be used not only with std::wstring, but  due to its template parameter  also with char16_t.

As a first template parameter it takes a individual facet. In this case we use the  template class std::codevct_utf8_utf16 as a conversion facet.

The inverse conversion can be implemented  in a similar way, just by calling  member function to_bytes.

utf16 -> utf8

std::string Utf16ToUtf8(const std::u16string& utf16)
{
  
 std::string utf8;
    
 std::wstring_convert<conversionFacet, char16_t> converter;  
  
 utf8 =  converter.to_bytes(utf16); 
  
 return(utf8); 
}

 

summary

With c++11 we can use    std::string,  std::u16string and  std::u32string  to deal better with platform independent unicode support.

I showed simple example to convert between  utf8 and utf16 encodings. One can implement other conversions  in  a similar way ,for example  utf16 ->utf32.

 

 

 

Leave a Reply