|
|
|
|||||
|
accents problems with Freeze::map
Hi!
I am trying to implement an application using Freeze and I am experiencing problems when working with accentuated characters like "é" or "è". When inserting a string for a key (or for a value) in a Freeze::map, I can not retrieve this string anymore (the string appears as empty) using Freeze, but the insertion was done in the db, as I can check with db_dump. I am not sure if this is a problem with Freeze, with Berkeley DB, with my locale settings or something with conflicting utf-8 / iso-8859-15, but anyhow, I cannot make that works. Below I am pasting informations that can be relevant. I am using linux, kernel version 2.4 (Mandrake 9.1). It seems to me that this Mandrake distribution has troubles with encoding, but I don't know enough about the subject for the moment. The following code shows the problem: #include <Freeze/Freeze.h> #include <StringStringMap.h> #include <iostream> int main(int argc, char* argv[]) { Ice::CommunicatorPtr communicator=Ice::initialize(argc,argv); Freeze::DBEnvironmentPtr dbEnv=Freeze::initialize(communicator,"db"); Freeze::DBPtr simpleDB=dbEnv->openDB("simple",true); StringStringMap Map(simpleDB); Map.clear(); Map.insert(std::make_pair("yes","elephant")); Map.insert(std::make_pair("non","élephant")); for(StringStringMap::const_iterator p=Map.begin();p!=Map.end();++p) { std::cout<<p->first<<" "<<p->second<<std::endl; } simpleDB->close(); dbEnv->close(); communicator->destroy(); return 0; } When running this code, the output is: non yes elephant instead of non élephant yes elephant The command "db_dump -p simple" output: VERSION=3 format=print type=btree db_pagesize=4096 HEADER=END \0a<Key>non</Key> \0a<Value>\e9lephant</Value> \0a<Key>yes</Key> \0a<Value>elephant</Value> DATA=END The "StringStringMap" class was generated with: slice2freeze --dict StringStringMap,string,string StringStringMap My c++ compiler is gcc 3.3. The "locale" command output: LANG=fr_CH.ISO-8859-15 LC_CTYPE=fr_CH.ISO-8859-15 LC_NUMERIC=fr_CH.ISO-8859-15 LC_TIME=fr_CH.ISO-8859-15 LC_COLLATE=fr_CH.ISO-8859-15 LC_MONETARY=fr_CH.ISO-8859-15 LC_MESSAGES=fr_CH.ISO-8859-15 LC_PAPER=fr_CH.ISO-8859-15 LC_NAME=fr_CH.ISO-8859-15 LC_ADDRESS=fr_CH.ISO-8859-15 LC_TELEPHONE=fr_CH.ISO-8859-15 LC_MEASUREMENT=fr_CH.ISO-8859-15 LC_IDENTIFICATION=fr_CH.ISO-8859-15 LC_ALL= I hope I give you enough informations. Thank you in advance for any hint. Last edited by sylvain : 08-10-2003 at 10:27 AM. |
|
|||||
|
Quote:
élephant Quote:
(...) std::string ss="élephant"; std::wstring ws(ss.begin(),ss.end());// in unicode format ... Map.insert(std::make_pair("yes","elephant")); Map.insert(std::make_pair("non",IceUtil::wstringTo String(ws))); (...) But the output is still wrong. By the way: std::string ss="élephant"; std::wstring ws(ss.begin(),ss.end());// in unicode format ... std::string s = IceUtil::wstringToString(ws); // s now holds "élephant" in utf-8 std::cout << s << std::endl; prints: élephant as well... Quote:
Thank you. |
|
||||||
|
Quote:
The reason why all this doesn't work, is because you are using "élephant" in ISO format, but the XML encoding in Freeze expects it in Unicode format. You can either use an editor that support Unicode, or you have to enter the escape sequence to represent "élephant" in Unicode. Or you must look for a method that conversts ISO to Unicode. Note that future versions of Freeze will be less sensitive with respect to such problems, when we use binary encodings for Freeze. However, it's still wrong, because you are using ISO strings where Ice expects Unicode. Cheers, Marc |
|
|||||
|
Obviously, I should dive a little bit more into the subject...
But there is something I don't understand. It seems that the Ice runtime correctly received the accentuated word "élephant" because in the db the entry is "\0a<Value>\e9lephant</Value>" for "élephant" so with a \e9 for the é (\00e9 is the unicode for 'é' no?) and "\0a<Value>elephant</Value>" for "elephant". So why does the iteration through the map retreive an empty string for the value "\e9lephant"? This string has been written down by the Ice runtime, so the Ice runtime should be able to retrieve it. For iso/unicode mismatch reason, I don't expect the exact "élephant" string to be retrieved, but maybe something like "lephant" ... I did some tries with accentuated word like "mariés" where the accentuated letter is not the first, but I still get an empty string. Where are my non-accentuated characters gone? regards, Sylvain |
|
|||||
|
Just for your information in the case of this is relevant:
I tested a little further by including a Java client that sends a string to a (c++) servant that insert it in a StringStringMap in the same way than the code in my first post, then print out the whole content of the Freeze::map. The Java client is just a text field (swing) and a button. The content of the text field is sent to the servant when the button is clicked using a simple operation from an Ice interface: "void add(string s)" My Java system is the one bundled with the Sun NetBeasn 3.5 IDE (so its jsdk/jre 1.4.2). When I enter "élephant" the servant says it received : " élephant " and put it in the db, but is unable to retrieve it from the db. More precisely: the value returned is an empty string (not a weird string). The same for word with accentudated characters not at the beginning of the word. A "db_dump -p" of the db file output: \0a<Value>\c3\a9lephant</Value> I don't know exactly how my java handles the accentuated character nor which charset it use, but I think it should be independant from the charset of the system... Anyway, as far as I understand it the problem is the same: the Ice runtime is writing down in the db something it can not retrieve. If I make a distributed program, how could I ensure that all the clients are running on systems with the correct charset? Regards, Sylvain Last edited by sylvain : 08-11-2003 at 06:18 AM. |
|
||||||
|
Hi,
I've been able to reproduce this problem. It looks like an issue with Xerces-C++, which is used in the XML encoding of Freeze maps. As an alternative, you can use the binary encoding by specifying the option --binary to slice2freeze. I will reply again when I've resolved this issue. - Mark |
![]() |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | Rate This Thread |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| problems with VC8.0 | stephan | Bug Reports | 2 | 07-31-2006 08:39 AM |
| some problems with dll | simpley | Help Center | 1 | 01-21-2005 03:18 AM |
| Hello example problems | jpm | Help Center | 1 | 06-02-2003 03:26 PM |