For languages other than C++, Ice encodes strings in their native Unicode representation, so applications can transparently use characters from non-English alphabets. However, for C++, the string encoding depends on which mapping is chosen for a particular string, the default mapping to
std::string, or the alternative mapping to
std::wstring (see
Section 6.6.1).
1 This section explains how strings are encoded by the Ice run time, and how you can achieve automatic conversion of strings into a particular encoding.
2
On the wire, Ice transmits all strings as Unicode strings in UTF‑8 encoding (see
Chapter 38). However, the native C++ representation for strings that contain non-English characters depends on the platform, as well as on locale settings and whether you use the narrow or wide mapping for Slice strings. By default, the Ice run time encodes strings as follows:
•
Narrow strings (that is, strings mapped to std::string) are presented to the application in UTF‑8 encoding and, similarly, the application is expected to provide narrow strings in UTF‑8 encoding to the Ice run time for transmission.
•
Wide strings (that is, strings mapped to std::wstring) are automatically encoded as Unicode by the Ice run time as appropriate for the platform. For example, for AIX in 32‑bit mode, the Ice run time converts between UTF‑8 and UTF‑16 in big-endian representation whereas, for AIX in 64‑bit mode, the Ice run time converts between UTF‑8 and UTF‑32 in big-endian representation.
The default behavior of the run time can be changed by providing application-specific string converters. If you install such converters, all Slice strings will be passed to the appropriate converter when they are marshaled and unmarshaled. Therefore, the string converters allow you to convert all strings transparently into their native representation without having to insert explicit conversion calls whenever a strings cross a Slice interface boundary.
You can install string converters on a per-communicator basis when you create a communicator by setting the
stringConverter and
wstringConverter members of the
InitializationData structure (see
Section 32.3). Any strings that use the default (
std::string) mapping are passed through the specified
stringConverter, and any strings that use the wide (
std::wstring) mapping are passed through the specified
wstringConverter.
namespace Ice {
class ICE_API UTF8Buffer {
public:
virtual Byte* getMoreBytes(size_t howMany,
Byte* firstUnused) = 0;
virtual ~UTF8Buffer() {}
};
template<typename charT>
class BasicStringConverter : public IceUtil::Shared {
public:
virtual Byte*
toUTF8(const charT* sourceStart, const charT* sourceEnd,
UTF8Buffer&) const = 0;
virtual void fromUTF8(const Byte* sourceStart,
const Byte* sourceEnd,
std::basic_string<charT>& target) const;
};
typedef BasicStringConverter<char> StringConverter;
typedef IceUtil::Handle<StringConverter> StringConverterPtr;
typedef BasicStringConverter<wchar_t> WstringConverter;
typedef IceUtil::Handle<WstringConverter> WstringConverterPtr;
}
As you can see, both narrow and wide string converters are simply templates with either a narrow or a wide character (
char or
wchar_t) as the template parameter.
If you have a string converter installed, the Ice run time calls the toUTF method whenever it needs to convert a native string into UTF‑8 representation for transmission. The
sourceStart and
sourceEnd pointers point at the first byte and one-beyond-the-last byte of the source string, respectively. The implementation of
toUTF8 must return a pointer to the first unused byte following the converted string.
Your implementation of toUTF8 must allocate the returned string by calling the
getMoreBytes member function of the
UTF8Buffer class that is passed as the third argument. (
getMoreBytes throws a
MemoryLimitException if it cannot allocate enough memory). The
firstUnused parameter must point at the first unused byte of the allocated memory region. You can make several calls to
getMoreBytes to incrementally allocate memory for the converted string. If you do,
getMoreBytes may relocate the buffer in memory. (If it does, it copies the part of the string that was converted so far into the new memory region.) The function returns a pointer to the first unused byte of the (possibly relocated) memory.
Conversion with toUTF8 can fail because no more memory is available, in which case you should throw a
MemoryLimitException. Conversion can also fail because the encoding of the source string is internally incorrect. In that case, you should throw a
StringConversionFailed exception from
toUTF8.
During unmarshaling, the Ice run time calls the fromUTF8 member function on the corresponding string converter. The function converts a UTF‑8 string into its native form as a
std::string. (The string into which the function must place the converted characters is passed to
fromUTF8 as the
target parameter.)