Mono/2008-03-19
< Mono
Jump to navigation
Jump to search
[8:59] Scouse Linden: babbage is on his way [8:59] Babbage Linden: hello [8:59] Phantom Ninetails: Oh wow there he is [8:59] Entering god mode, level 200 [8:59] Scouse Linden: Its only 8:59 :) [8:59] Phantom Ninetails: Well I was talking with Periapse a little while ago and that's what we figured [9:00] You shout: i've turned off scripts while we hold the office hours [9:00] Phantom Ninetails: The wiki says it starts at 8 [9:00] Scouse Linden: oh damn daylight savings..... [9:00] Scouse Linden: It hasnt changed for us [9:00] Phantom Ninetails: Ah ^_^ [9:01] Babbage Linden: let me ping zero, we wanted to talk to him about httprequest and unicode [9:01] Phantom Ninetails: :) [9:01] Scouse Linden: Since no one else is around, if you've got anything phantom go ahead and ask [9:01] Phantom Ninetails: Nope I've got nothing [9:02] Scouse Linden: I guess others probably turned up and left by now [9:02] Babbage Linden: yep, let me ping the group [9:02] Phantom Ninetails: Actually not many showed up at 8 either ;) [9:02] Scouse Linden: yeh it has slowed down [9:02] Scouse Linden: I bet it wouldnt be the same if we did it on the main grid [9:02] Phantom Ninetails: Only Periapse Linden and Becky Pippen showed up before [9:02] Phantom Ninetails: lol, yeah you might get more there [9:03] Babbage Linden: maybe we should ping the group on aditi too? [9:04] Phantom Ninetails: Couldn't hurt [9:04] Scouse Linden: Go ahead! [9:04] Scouse Linden: What things are the most important to you to get fixed next? [9:05] Phantom Ninetails: Control event. [9:05] Scouse Linden: Fixed it today [9:05] Phantom Ninetails: Yay :) [9:05] Scouse Linden: Infact I was writing tests for events [9:05] Scouse Linden: I'll put them on t he wiki [9:05] Scouse Linden: It will be fixed in the next refresh [9:05] Phantom Ninetails: Alrighty :) [9:07] Babbage Linden: we've uncovered some weird lsl2 behaviour recently [9:07] Phantom Ninetails: Weird? [9:07] Babbage Linden: see the language test 2 [9:07] Babbage Linden: we added tests for while, for and do while [9:07] Babbage Linden: a couple seemed odd in LSL2 [9:08] Scouse Linden: specifically if ([1]) is false, if ([1,2] is true [9:08] Phantom Ninetails: That does indeed seem weird [9:08] Babbage Linden: it's one of the cases in which we wonder whether being compatible is the correct approach [9:09] Phantom Ninetails: Heh [9:09] Babbage Linden: we probably should be, but it seems very odd [9:10] Phantom Ninetails: Yeah.. That certainly looks like a difficult choice [9:10] Babbage Linden: the other case is where LSL2 allows you to have some control flows return values, but not others [9:10] Babbage Linden: that can be caught by the compiler [9:10] Babbage Linden: and you can always fix it [9:10] Babbage Linden: so it's less bad [9:10] Phantom Ninetails: SVC-1421? [9:11] Babbage Linden: you can't catch the if(list) at compile time easily [9:11] Scouse Linden: https://wiki.secondlife.com/wiki/Event_test_script [9:11] Babbage Linden: hi zero [9:11] Phantom Ninetails: Greetings Zero [9:11] Babbage Linden: welcome [9:11] Zero Linden: oh - we're in text [9:11] Babbage Linden: yes [9:11] Babbage Linden: maybe something about scripting? ;-) [9:11] Zero Linden: scripting - ah yes - I remember scripting while out on the Seregheti! [9:12] Phantom Ninetails: Welcome back Periapse :) [9:12] Babbage Linden: we're communicating in text as it's the medium of code [9:12] Babbage Linden: hi peri [9:12] Periapse Linden: lol -- hello again phantom [9:12] Phantom Ninetails: :) [9:12] Babbage Linden: so, the issue we'd like to discuss with zero [9:12] Babbage Linden: (and we'd welcome input from you too phantom) [9:12] Phantom Ninetails: ^_^ [9:12] Babbage Linden: is a problem with llHTTPRequest specifically [9:13] Babbage Linden: and UTF8 handling in general [9:13] Periapse Linden: zero -- babbage turned off scripts, so the chair will be wonky [9:13] Babbage Linden: Mono uses UTF16 to represent strings internally [9:13] Zero Linden: yes it will! [9:13] Babbage Linden: so, we need to convert all UTF8 strings to UTF16 at some point [9:13] Babbage Linden: if the UTF8 is invalid, that conversion fails [9:14] Zero Linden: well - what do we do with LL strings now? [9:14] Babbage Linden: and by default Mono returns NULL strings [9:14] Zero Linden: Ah - [9:14] Babbage Linden: at the moment LSL will accept borked UTF8 [9:14] Babbage Linden: and just pass it around [9:14] Zero Linden: hmmm.... so the first question to ask is if mono handles the Astral Planes correctly [9:14] Zero Linden: loves Unicode terminology [9:14] Babbage Linden: and render what it can when you try to output it [9:14] Scouse Linden: Some web servers give us back latin 1 despite asking for utf-8 [9:15] Babbage Linden: yes, the bug we found [9:15] Babbage Linden: was a server reporting UTF-8 [9:15] Zero Linden: Scouse - no! really, can you point me to such a site? [9:15] Scouse Linden: Mono refuses to parse latin 1 as utf-8 [9:15] Babbage Linden: but returning latin-1 [9:15] Zero Linden: well, Latin-1 isn't UTF-8 and [9:15] Zero Linden: and can't be parse as such at all [9:15] Zero Linden: it is invalid [9:15] Scouse Linden: exactly [9:15] Babbage Linden: so, the llHTTPRequest gets a null conversion [9:15] Babbage Linden: doesn't touch the latin 1 [9:15] Scouse Linden: (under mono) [9:15] Zero Linden: so, we check, in llHTTPRequest, what the returned charset is and check that [9:16] Babbage Linden: then Mono can't convert it to UTF16 and so fails [9:16] Zero Linden: Wait, - is there a case where the site is giving us UTF-8 , claiming it is UTF-8, but isn't actually UTF-8? [9:16] Scouse Linden: yes [9:16] Babbage Linden: it claims it's giving us UTF8 [9:16] Babbage Linden: but then gives us latin 1 [9:17] Zero Linden: well then - we *shoud* treat that string as empty [9:17] Babbage Linden: but, with LSL2, we currently don't [9:17] Zero Linden: there is nothing you can do [9:17] Zero Linden: really? [9:18] Scouse Linden: Browsers typically replace the text with � [9:18] Babbage Linden: so, we can fix llHTTPRequest [9:18] Babbage Linden: to reject bad UTF8 after conversion [9:18] Babbage Linden: but, there are other ways to give borked UTF8 to LSL [9:18] Zero Linden: I'm checking the source - does it not validate if the result claims to be UTF8? [9:18] Babbage Linden: and we currently just pass it around and then throw it at the screen [9:19] Babbage Linden can't remember [9:19] Zero Linden: which won't render correctly [9:19] Babbage Linden: it was a while ago [9:19] Scouse Linden: http://www.salahzar.info/lsl/feedburner.php?feed=http://www.beppegrillo.it/index.xml the URL, if you change the character set it displays the characters correctly [9:19] Zero Linden: there is no , fallback to Latin-1 [9:19] Scouse Linden: I personally think we should try to do like the browsers and mark invalid characters [9:19] Babbage Linden: so, should we try to work around broken UTF8 [9:20] Babbage Linden: try to fix it so it turns in to some reasonable UTF16 when passed on Mono [9:20] Zero Linden: okay, so first off [9:20] Babbage Linden: or, just sulk and replace strings that return NULL when passed to Mono [9:20] Babbage Linden: with some "Invalid UTF8" string? [9:20] Babbage Linden: (which would be a change in behaviour to LSL2) [9:21] Zero Linden: well, first off, in that case we should not support that feed [9:21] Zero Linden: it is broken acording to at least three RFCs! [9:21] Babbage Linden: ok, so that is easy for the llHTTPRequest case [9:21] Babbage Linden: try to validate the UTF8 [9:22] Babbage Linden: convert the status to 40X [9:22] Zero Linden: Now, when Doug did the UTF8 conversion to LSL - he intended that non-UTF-8 strings would be considered ILLEGAL [9:22] Babbage Linden: that is not currently the case [9:22] Zero Linden: and I don't know if other functions will do bad things [9:22] Babbage Linden: the other one we found that could generate borked UTF8 [9:22] Babbage Linden: was llBase64ToString [9:23] Zero Linden: *THAT* function is EVIL [9:23] Zero Linden: :-) [9:23] Zero Linden: So - let's look at the idea of "fixing" it [9:23] Babbage Linden: that has no out of band error [9:23] Babbage Linden: no status we can set [9:23] Zero Linden: so - you have an LSL string with arbitrary binary data in it, say from llBase64ToString [9:23] Babbage Linden: our only option is to return some "special string" [9:23] Zero Linden: how would we fix it? [9:23] Babbage Linden: or cause a fault in the script [9:23] Babbage Linden: (LSL error semantics are horrible) [9:24] Zero Linden: so that it can become a valid Unicode string, that Mono can deal with as UTF-16? [9:24] Babbage Linden: so, we ask Mono for the UTF16 string representation [9:24] Babbage Linden: check the return [9:24] Babbage Linden: if it's NULL [9:24] Zero Linden: You can't really, since you don't know the encoding..... [9:24] Zero Linden: BUT [9:24] Babbage Linden: use some error string [9:25] Zero Linden: you could pick some arbitrary 8-bit character set which has all 256 characters legal, and which 'round trips into and out of Unicode [9:25] Babbage Linden: (if it was C# we could throw an exception) [9:25] Zero Linden: and just DECLARE it to be in taht character set [9:26] Zero Linden: That will work, but will be unreliable.... [9:27] Zero Linden: In otherwords - if the llBase64Decode [9:27] Zero Linden: resutls in a sequence that happens to be UTF-8 valid, then you get the sequence of bytes you decoded [9:27] Zero Linden: if it doesn't, then you get re-interpretation in another charset, which result sin a UTF-8/UTF-16 string that represents [9:27] Zero Linden: a different sequece of bytes [9:28] Zero Linden: so - you could do that, but it would break llBase64Decode anyway [9:28] Zero Linden: So, I don't see a "fall back" position that works for llBase64Decode [9:28] Babbage Linden: no [9:28] Scouse Linden: Its pretty evil [9:28] Babbage Linden: so, in general [9:29] Babbage Linden: the goal was to dissallow invalid UTF8 in LSL anyway [9:29] Zero Linden: Well - we should state (even if we don't) that the Base64 functions ONLY work on Unicode text [9:29] Zero Linden: not arbitary binary data [9:29] Babbage Linden: so if we start converting invalid strings to error strings [9:29] Zero Linden: now [9:29] Babbage Linden: that may actually be better than what we have [9:29] Zero Linden: you could create a class of objects, "binary string" [9:30] Babbage Linden: (although it's different behaviour) [9:30] Zero Linden: well - you could get away with it perhaps, with these semantics [9:30] Zero Linden: all the string functions [9:30] Zero Linden: on binary strings produce binary strings [9:30] Babbage Linden: " Converts a Base 64 string to a conventional string. If the conversion creates any unprintable characters, they are converted to question marks" [9:30] Zero Linden: but as soon as the binary string is mixed with a normal string [9:30] Babbage Linden: from the LSL wiki [9:31] Zero Linden: the bianry string is interpreted as UTF-8 and if invalid is "" [9:31] Zero Linden: well, Babbage, "unprintable" in what charset? what charset are you considering the byte stream of the conversion to be in [9:31] Zero Linden: UTF-8? [9:31] Zero Linden: so say [9:31] Babbage Linden: so, there is a (defacto) specification atm it seems [9:32] Babbage Linden: even if the ? chars are currently coming from trying to display the invalid cahrs [9:32] Babbage Linden: instead of due to the conversion [9:32] Zero Linden: "Converts a Base 64 encoded string into it's decoded bytes, which are then interpreted as a UTF-8 string. Invalid UTF-8 sequences in this byte stream will be replaced with question marks" [9:32] Simil Miles: Hi all [9:32] Phantom Ninetails: Greetings [9:32] Zero Linden: actually, I think Unicode has a "this was a conversion error" character [9:32] Babbage Linden: (most browsers do that) [9:33] Babbage Linden: replace borked chars with "question mark in a diamond" [9:33] Zero Linden: thats the Unicode thingy [9:33] Babbage Linden: emulating that was scouse's favoured approach [9:34] Babbage Linden: given that we don't have good ways of reporting errors, i think it may be the best way forward [9:34] Zero Linden: I think that is reasonable [9:34] Babbage Linden: returning "" or setting an error and stopping the script seem horrible in terms of compatibility [9:35] Zero Linden: and the right thing to do in that HTTP case - the server says explicitly that the result is UTF-8 (I checked with Curl) [9:35] Zero Linden: and so we have no choice but to believe it --- [9:35] Zero Linden: so - if it fails conversion - replace them characters [9:35] Zero Linden: rather, them bytes [9:35] Babbage Linden: great [9:35] Babbage Linden: so now we just need a gadget to do the replacement [9:35] Babbage Linden: we'll have a look in glib [9:35] Zero Linden: what is the mono method that interprets a vector of bytes into a string? [9:36] Babbage Linden: it's just mono_string_new [9:36] Babbage Linden: which calls the glib UTF8 to UTF16 function under the hood [9:37] Zero Linden: ahand that function doesn't have a "what to do with bad encodings" option? [9:37] Babbage Linden: no, it just reports an error if the stream is invalid UTF8 [9:38] Babbage Linden: (so we need to massage the vector of bytes before hand) [9:38] Zero Linden: U+FFFD [9:38] Zero Linden: That's easy - I can write that with you if you need [9:38] Scouse Linden: � [9:38] Zero Linden: I have UTf-8 encoding in my blood [9:38] Babbage Linden: ok, cool [9:38] Zero Linden: � [9:38] Babbage Linden: we'll have a sniff around [9:39] Zero Linden: bizarre that on the Mac that doesn't print correctly [9:39] Scouse Linden: The are glib functions apparently which do the stuff we need [9:39] Scouse Linden: according to the folks in the mono channel [9:39] Babbage Linden: and if we can't find anything suitable we'll take you up on the offer [9:39] Babbage Linden: can we use the httprequest conversion dodad/ [9:39] Zero Linden: that uses APR, which is a wrapper on iconv [9:39] Babbage Linden: "convert potentially borken utf8 in to mostly fixed utf8" [9:39] Babbage Linden: that's the thingy [9:40] Babbage Linden: i'll have a look in to iconv too [9:40] Zero Linden: hmmm... I notice that we as to do the transcode no amtter what the charset [9:40] Zero Linden: so we ask to transcode from utf-8 to utf-8! [9:40] Zero Linden: which is good! [9:41] Babbage Linden: does that behaviour sound reasonable to everyone? [9:41] Babbage Linden: (you should only notice if you do something other than displaying the strings) [9:41] Scouse Linden: yes [9:41] Babbage Linden: (like emailing them or HTTP requesting them) [9:42] Babbage Linden: silence == consent [9:42] Zero Linden: well - it will cause llBase64Encode(llBase64Decode(s)) to not be the identity function [9:43] Zero Linden: but I think this is reasonable [9:43] Zero Linden: the other option, the BinaryString class option [9:43] Zero Linden: would preserve this [9:43] Scouse Linden: We might have to [9:43] Phantom Ninetails: lol, well I've never used any of the functions being discussed here so.. But yeah seems like a good idea [9:45] Babbage Linden doesn't like the idea of defining a new, non standard string class to support that corner case [9:45] Zero Linden: I think it might be okay to break llBase64Decode in that way [9:45] Zero Linden: if you gneerate a non-UTF-8 string that way now, almost nothing else in the LSL library will work right [9:45] Babbage Linden: you probably didn't mean to do it if you do now [9:46] Babbage Linden is tempted to go ahead with this plan and deal with the screaming later [9:47] Phantom Ninetails: lol [9:47] Babbage Linden: ok, so that's enough of that i think [9:47] Babbage Linden: thanks for coming along zero [9:47] Scouse Linden: yep [9:47] Babbage Linden: has anyone got anything else? [9:47] Zero Linden: welcome [9:47] Siann Beck: Any word on the db update? [9:48] Babbage Linden: peri? [9:48] Babbage Linden: do you know about that? [9:49] Scouse Linden: So that means we don't know [9:49] Babbage Linden: joel was leading the charge on the agni db update [9:49] Siann Beck: Last Friday Peri said he was going to talk to the Havok4 team about it; I just wondered if there was any news. [9:49] Babbage Linden: i'll ask him about that [9:49] Siann Beck: OK [9:50] Periapse Linden: Sorry, I was in IM. [9:50] Scouse Linden: did you get the question? [9:50] Periapse Linden: No, I haven't talked to Sidewinder yet. I'll ping him now. But I dont' think the update is likely, as havok4 is trying hard to get off of aditi and onto the maingrid [9:51] Siann Beck: OK [9:51] Siann Beck: And when can we expect a Mono refresh? [9:51] Babbage Linden: tomorrow hopefully [9:51] Siann Beck: COol [9:51] Scouse Linden: Tuesday at the latest [9:51] Babbage Linden: (us brits are on holiday on friday and monday, so we want to give you some new code before we push off) [9:51] Scouse Linden: (if for some reason tomorrows fails) [9:52] Siann Beck: Ah, I see. What holiday is it? [9:52] Scouse Linden: easter [9:52] Babbage Linden: easter friday and monday [9:52] Siann Beck: Oh, duh! I knew that :) [9:52] Phantom Ninetails: I've got easter coming too :P [9:52] Babbage Linden: so, peri may be running an office hours on friday [9:52] Periapse Linden: I will run the office hour on Friday as normal [9:52] Babbage Linden: from the land of state/religion separation [9:52] Zero Linden: oy - so apr-iconv is a think wrapper around only 3/4 of iconv. [9:53] Zero Linden: and iconv the lib has "translisterate" [9:53] Zero Linden: as an option [9:53] Zero Linden: which we can get by converting to "UTF-8//TRANSLIT" [9:53] Zero Linden: which we can sneak past APR's call I bet [9:53] Zero Linden: BUT [9:53] Zero Linden: the command line iconv has exactly the feature we want [9:54] Zero Linden: the question is - how does it do it... [9:54] Babbage Linden wonders whether we can call iconv directly [9:55] Babbage Linden wonders whether we do elsewhere already [9:56] Babbage Linden: ok, are we all done? [9:57] Babbage Linden: anything else? [9:57] Zero Linden: may not need to [9:57] Siann Beck: Well, I don't know what I missed, so I'll just wait for the transcript. [9:57] Phantom Ninetails: Eerie silence. I guess it's just about over [9:57] Zero Linden: the one non-wrapped call isn't needed, now I see [9:57] Periapse Linden: Siann, I just talked to sidewinder [9:58] Periapse Linden: He canceled the refresh because the beta grid would have to be down for 48 hours [9:58] Siann Beck: I see. [9:58] Periapse Linden: So I think we'll do it when havok4 leaves the grid [9:58] Phantom Ninetails: I find it odd that I became able to come on [9:58] Siann Beck: OK. Well, the object I need to test doesn't really have anything extraordinary, so it's probably fine. [9:59] Periapse Linden: But we can refresh specific people, so as to allow people to log on to aditi [9:59] Phantom Ninetails: Ah [9:59] Babbage Linden: i'll do the chat log peri [9:59] Phantom Ninetails: That'd probably explain it then :) [9:59] Siann Beck: I wondered about that, since I've seen people here who rezzed after the last update. [9:59] Periapse Linden: Thanks, Babbage.