Mono/2008-03-19

From Second Life Wiki
Jump to: navigation, search
[8:59]  Scouse Linden: babbage is on his way
[8:59]  Babbage Linden: hello
[8:59]  Phantom Ninetails: Oh wow there he is
[8:59]  Entering god mode, level 200
[8:59]  Scouse Linden: Its only 8:59 :)
[8:59]  Phantom Ninetails: Well I was talking with Periapse a little while ago and that's what we figured
[9:00]  You shout: i've turned off scripts while we hold the office hours
[9:00]  Phantom Ninetails: The wiki says it starts at 8
[9:00]  Scouse Linden: oh damn daylight savings.....
[9:00]  Scouse Linden: It hasnt changed for us
[9:00]  Phantom Ninetails: Ah ^_^
[9:01]  Babbage Linden: let me ping zero, we wanted to talk to him about httprequest and unicode
[9:01]  Phantom Ninetails: :)
[9:01]  Scouse Linden: Since no one else is around, if you've got anything phantom go ahead and ask
[9:01]  Phantom Ninetails: Nope I've got nothing
[9:02]  Scouse Linden: I guess others probably turned up and left by now
[9:02]  Babbage Linden: yep, let me ping the group
[9:02]  Phantom Ninetails: Actually not many showed up at 8 either ;)
[9:02]  Scouse Linden: yeh it has slowed down
[9:02]  Scouse Linden: I bet it wouldnt be the same if we did it on the main grid
[9:02]  Phantom Ninetails: Only Periapse Linden and Becky Pippen showed up before
[9:02]  Phantom Ninetails: lol, yeah you might get more there
[9:03]  Babbage Linden: maybe we should ping the group on aditi too?
[9:04]  Phantom Ninetails: Couldn't hurt
[9:04]  Scouse Linden: Go ahead!
[9:04]  Scouse Linden: What things are the most important to you to get fixed next?
[9:05]  Phantom Ninetails: Control event.
[9:05]  Scouse Linden: Fixed it today
[9:05]  Phantom Ninetails: Yay :)
[9:05]  Scouse Linden: Infact I was writing tests for events
[9:05]  Scouse Linden: I'll put them on t he wiki
[9:05]  Scouse Linden: It will be fixed in the next refresh
[9:05]  Phantom Ninetails: Alrighty :)
[9:07]  Babbage Linden: we've uncovered some weird lsl2 behaviour recently
[9:07]  Phantom Ninetails: Weird?
[9:07]  Babbage Linden: see the language test 2
[9:07]  Babbage Linden: we added tests for while, for and do while
[9:07]  Babbage Linden: a couple seemed odd in LSL2
[9:08]  Scouse Linden: specifically if ([1]) is false, if ([1,2] is true
[9:08]  Phantom Ninetails: That does indeed seem weird
[9:08]  Babbage Linden: it's one of the cases in which we wonder whether being compatible is the correct approach
[9:09]  Phantom Ninetails: Heh
[9:09]  Babbage Linden: we probably should be, but it seems very odd
[9:10]  Phantom Ninetails: Yeah.. That certainly looks like a difficult choice
[9:10]  Babbage Linden: the other case is where LSL2 allows you to have some control flows return values, but not others
[9:10]  Babbage Linden: that can be caught by the compiler
[9:10]  Babbage Linden: and you can always fix it
[9:10]  Babbage Linden: so it's less bad
[9:10]  Phantom Ninetails: SVC-1421?
[9:11]  Babbage Linden: you can't catch the if(list) at compile time easily
[9:11]  Scouse Linden: https://wiki.secondlife.com/wiki/Event_test_script
[9:11]  Babbage Linden: hi zero
[9:11]  Phantom Ninetails: Greetings Zero
[9:11]  Babbage Linden: welcome
[9:11]  Zero Linden: oh - we're in text
[9:11]  Babbage Linden: yes
[9:11]  Babbage Linden: maybe something about scripting? ;-)
[9:11]  Zero Linden: scripting - ah yes - I remember scripting while out on the Seregheti!
[9:12]  Phantom Ninetails: Welcome back Periapse :)
[9:12]  Babbage Linden: we're communicating in text as it's the medium of code
[9:12]  Babbage Linden: hi peri
[9:12]  Periapse Linden: lol -- hello again phantom
[9:12]  Phantom Ninetails: :)
[9:12]  Babbage Linden: so, the issue we'd like to discuss with zero
[9:12]  Babbage Linden: (and we'd welcome input from you too phantom)
[9:12]  Phantom Ninetails: ^_^
[9:12]  Babbage Linden: is a problem with llHTTPRequest specifically
[9:13]  Babbage Linden: and UTF8 handling in general
[9:13]  Periapse Linden: zero -- babbage turned off scripts, so the chair will be wonky
[9:13]  Babbage Linden: Mono uses UTF16 to represent strings internally
[9:13]  Zero Linden: yes it will!
[9:13]  Babbage Linden: so, we need to convert all UTF8 strings to UTF16 at some point
[9:13]  Babbage Linden: if the UTF8 is invalid, that conversion fails
[9:14]  Zero Linden: well - what do we do with LL strings now?
[9:14]  Babbage Linden: and by default Mono returns NULL strings
[9:14]  Zero Linden: Ah -
[9:14]  Babbage Linden: at the moment LSL will accept borked UTF8
[9:14]  Babbage Linden: and just pass it around
[9:14]  Zero Linden: hmmm.... so the first question to ask is if mono handles the Astral Planes correctly
[9:14]  Zero Linden: loves Unicode terminology
[9:14]  Babbage Linden: and render what it can when you try to output it
[9:14]  Scouse Linden: Some web servers give us back latin 1 despite asking for utf-8
[9:15]  Babbage Linden: yes, the bug we found
[9:15]  Babbage Linden: was a server reporting UTF-8
[9:15]  Zero Linden: Scouse - no! really, can you point me to such a site?
[9:15]  Scouse Linden: Mono refuses to parse latin 1 as utf-8
[9:15]  Babbage Linden: but returning latin-1
[9:15]  Zero Linden: well, Latin-1 isn't UTF-8 and
[9:15]  Zero Linden: and can't be parse as such at all
[9:15]  Zero Linden: it is invalid
[9:15]  Scouse Linden: exactly
[9:15]  Babbage Linden: so, the llHTTPRequest gets a null conversion
[9:15]  Babbage Linden: doesn't touch the latin 1
[9:15]  Scouse Linden: (under mono)
[9:15]  Zero Linden: so, we check, in llHTTPRequest, what the returned charset is and check that
[9:16]  Babbage Linden: then Mono can't convert it to UTF16 and so fails
[9:16]  Zero Linden: Wait, - is there a case where the site is giving us UTF-8 , claiming it is UTF-8, but isn't actually UTF-8?
[9:16]  Scouse Linden: yes
[9:16]  Babbage Linden: it claims it's giving us UTF8
[9:16]  Babbage Linden: but then gives us latin 1
[9:17]  Zero Linden: well then - we *shoud* treat that string as empty
[9:17]  Babbage Linden: but, with LSL2, we currently don't
[9:17]  Zero Linden: there is nothing you can do
[9:17]  Zero Linden: really?
[9:18]  Scouse Linden: Browsers typically replace the text with �
[9:18]  Babbage Linden: so, we can fix llHTTPRequest
[9:18]  Babbage Linden: to reject bad UTF8 after conversion
[9:18]  Babbage Linden: but, there are other ways to give borked UTF8 to LSL
[9:18]  Zero Linden: I'm checking the source - does it not validate if the result claims to be UTF8?
[9:18]  Babbage Linden: and we currently just pass it around and then throw it at the screen
[9:19]  Babbage Linden can't remember
[9:19]  Zero Linden: which won't render correctly
[9:19]  Babbage Linden: it was a while ago
[9:19]  Scouse Linden: http://www.salahzar.info/lsl/feedburner.php?feed=http://www.beppegrillo.it/index.xml the URL, if you change the character set it displays the characters correctly
[9:19]  Zero Linden: there is no , fallback to Latin-1
[9:19]  Scouse Linden: I personally think we should try to do like the browsers and mark invalid characters
[9:19]  Babbage Linden: so, should we try to work around broken UTF8
[9:20]  Babbage Linden: try to fix it so it turns in to some reasonable UTF16 when passed on Mono
[9:20]  Zero Linden: okay, so first off
[9:20]  Babbage Linden: or, just sulk and replace strings that return NULL when passed to Mono
[9:20]  Babbage Linden: with some "Invalid UTF8" string?
[9:20]  Babbage Linden: (which would be a change in behaviour to LSL2)
[9:21]  Zero Linden: well, first off, in that case we should not support that feed
[9:21]  Zero Linden: it is broken acording to at least three RFCs!
[9:21]  Babbage Linden: ok, so that is easy for the llHTTPRequest case
[9:21]  Babbage Linden: try to validate the UTF8
[9:22]  Babbage Linden: convert the status to 40X
[9:22]  Zero Linden: Now, when Doug did the UTF8 conversion to LSL - he intended that non-UTF-8 strings would be considered ILLEGAL
[9:22]  Babbage Linden: that is not currently the case
[9:22]  Zero Linden: and I don't know if other functions will do bad things
[9:22]  Babbage Linden: the other one we found that could generate borked UTF8
[9:22]  Babbage Linden: was llBase64ToString
[9:23]  Zero Linden: *THAT* function is EVIL
[9:23]  Zero Linden: :-)
[9:23]  Zero Linden: So - let's look at the idea of "fixing" it
[9:23]  Babbage Linden: that has no out of band error
[9:23]  Babbage Linden: no status we can set
[9:23]  Zero Linden: so - you have an LSL string with arbitrary binary data in it, say from llBase64ToString
[9:23]  Babbage Linden: our only option is to return some "special string"
[9:23]  Zero Linden: how would we fix it?
[9:23]  Babbage Linden: or cause a fault in the script
[9:23]  Babbage Linden: (LSL error semantics are horrible)
[9:24]  Zero Linden: so that it can become a valid Unicode string, that Mono can deal with as UTF-16?
[9:24]  Babbage Linden: so, we ask Mono for the UTF16 string representation
[9:24]  Babbage Linden: check the return
[9:24]  Babbage Linden: if it's NULL
[9:24]  Zero Linden: You can't really, since you don't know the encoding.....
[9:24]  Zero Linden: BUT
[9:24]  Babbage Linden: use some error string
[9:25]  Zero Linden: you could pick some arbitrary 8-bit character set which has all 256 characters legal, and which 'round trips into and out of Unicode
[9:25]  Babbage Linden: (if it was C# we could throw an exception)
[9:25]  Zero Linden: and just DECLARE it to be in taht character set
[9:26]  Zero Linden: That will work, but will be unreliable....
[9:27]  Zero Linden: In otherwords - if the llBase64Decode
[9:27]  Zero Linden: resutls in a sequence that happens to be UTF-8 valid, then you get the sequence of bytes you decoded
[9:27]  Zero Linden: if it doesn't, then you get re-interpretation in another charset, which result sin a UTF-8/UTF-16 string that represents
[9:27]  Zero Linden: a different sequece of bytes
[9:28]  Zero Linden: so - you could do that, but it would break llBase64Decode anyway
[9:28]  Zero Linden: So, I don't see a "fall back" position that works for llBase64Decode
[9:28]  Babbage Linden: no
[9:28]  Scouse Linden: Its pretty evil
[9:28]  Babbage Linden: so, in general
[9:29]  Babbage Linden: the goal was to dissallow invalid UTF8 in LSL anyway
[9:29]  Zero Linden: Well - we should state (even if we don't) that the Base64 functions ONLY work on Unicode text
[9:29]  Zero Linden: not arbitary binary data
[9:29]  Babbage Linden: so if we start converting invalid strings to error strings
[9:29]  Zero Linden: now
[9:29]  Babbage Linden: that may actually be better than what we have
[9:29]  Zero Linden: you could create a class of objects, "binary string"
[9:30]  Babbage Linden: (although it's different behaviour)
[9:30]  Zero Linden: well - you could get away with it perhaps, with these semantics
[9:30]  Zero Linden: all the string functions
[9:30]  Zero Linden: on binary strings produce binary strings
[9:30]  Babbage Linden: " Converts a Base 64 string to a conventional string. If the conversion creates any unprintable characters, they are converted to question marks"
[9:30]  Zero Linden: but as soon as the binary string is mixed with a normal string
[9:30]  Babbage Linden: from the LSL wiki
[9:31]  Zero Linden: the bianry string is interpreted as UTF-8 and if invalid is ""
[9:31]  Zero Linden: well, Babbage, "unprintable" in what charset? what charset are you considering the byte stream of the conversion to be in
[9:31]  Zero Linden: UTF-8?
[9:31]  Zero Linden: so say
[9:31]  Babbage Linden: so, there is a (defacto) specification atm it seems
[9:32]  Babbage Linden: even if the ? chars are currently coming from trying to display the invalid cahrs
[9:32]  Babbage Linden: instead of due to the conversion
[9:32]  Zero Linden: "Converts a Base 64 encoded string into it's decoded bytes, which are then interpreted as a UTF-8 string. Invalid UTF-8 sequences in this byte stream will be replaced with question marks"
[9:32]  Simil Miles: Hi all
[9:32]  Phantom Ninetails: Greetings
[9:32]  Zero Linden: actually, I think Unicode has a "this was a conversion error" character
[9:32]  Babbage Linden: (most browsers do that)
[9:33]  Babbage Linden: replace borked chars with "question mark in a diamond"
[9:33]  Zero Linden: thats the Unicode thingy
[9:33]  Babbage Linden: emulating that was scouse's favoured approach
[9:34]  Babbage Linden: given that we don't have good ways of reporting errors, i think it may be the best way forward
[9:34]  Zero Linden: I think that is reasonable
[9:34]  Babbage Linden: returning "" or setting an error and stopping the script seem horrible in terms of compatibility
[9:35]  Zero Linden: and the right thing to do in that HTTP case - the server says explicitly that the result is UTF-8 (I checked with Curl)
[9:35]  Zero Linden: and so we have no choice but to believe it ---
[9:35]  Zero Linden: so - if it fails conversion - replace them characters
[9:35]  Zero Linden: rather, them bytes
[9:35]  Babbage Linden: great
[9:35]  Babbage Linden: so now we just need a gadget to do the replacement
[9:35]  Babbage Linden: we'll have a look in glib
[9:35]  Zero Linden: what is the mono method that interprets a vector of bytes into a string?
[9:36]  Babbage Linden: it's just mono_string_new
[9:36]  Babbage Linden: which calls the glib UTF8 to UTF16 function under the hood
[9:37]  Zero Linden: ahand that function doesn't have a "what to do with bad encodings" option?
[9:37]  Babbage Linden: no, it just reports an error if the stream is invalid UTF8
[9:38]  Babbage Linden: (so we need to massage the vector of bytes before hand)
[9:38]  Zero Linden: U+FFFD
[9:38]  Zero Linden: That's easy - I can write that with you if you need
[9:38]  Scouse Linden: �
[9:38]  Zero Linden: I have UTf-8 encoding in my blood
[9:38]  Babbage Linden: ok, cool
[9:38]  Zero Linden: �
[9:38]  Babbage Linden: we'll have a sniff around
[9:39]  Zero Linden: bizarre that on the Mac that doesn't print correctly
[9:39]  Scouse Linden: The are glib functions apparently which do the stuff we need
[9:39]  Scouse Linden: according to the folks in the mono channel
[9:39]  Babbage Linden: and if we can't find anything suitable we'll take you up on the offer
[9:39]  Babbage Linden: can we use the httprequest conversion dodad/
[9:39]  Zero Linden: that uses APR, which is a wrapper on iconv
[9:39]  Babbage Linden: "convert potentially borken utf8 in to mostly fixed utf8"
[9:39]  Babbage Linden: that's the thingy
[9:40]  Babbage Linden: i'll have a look in to iconv too
[9:40]  Zero Linden: hmmm... I notice that we as to do the transcode no amtter what the charset
[9:40]  Zero Linden: so we ask to transcode from utf-8 to utf-8!
[9:40]  Zero Linden: which is good!
[9:41]  Babbage Linden: does that behaviour sound reasonable to everyone?
[9:41]  Babbage Linden: (you should only notice if you do something other than displaying the strings)
[9:41]  Scouse Linden: yes
[9:41]  Babbage Linden: (like emailing them or HTTP requesting them)
[9:42]  Babbage Linden: silence == consent
[9:42]  Zero Linden: well - it will cause llBase64Encode(llBase64Decode(s)) to not be the identity function
[9:43]  Zero Linden: but I think this is reasonable
[9:43]  Zero Linden: the other option, the BinaryString class option
[9:43]  Zero Linden: would preserve this
[9:43]  Scouse Linden: We might have to
[9:43]  Phantom Ninetails: lol, well I've never used any of the functions being discussed here so.. But yeah seems like a good idea
[9:45]  Babbage Linden doesn't like the idea of defining a new, non standard string class to support that corner case
[9:45]  Zero Linden: I think it might be okay to break llBase64Decode in that way
[9:45]  Zero Linden: if you gneerate a non-UTF-8 string that way now, almost nothing else in the LSL library will work right
[9:45]  Babbage Linden: you probably didn't mean to do it if you do now
[9:46]  Babbage Linden is tempted to go ahead with this plan and deal with the screaming later
[9:47]  Phantom Ninetails: lol
[9:47]  Babbage Linden: ok, so that's enough of that i think
[9:47]  Babbage Linden: thanks for coming along zero
[9:47]  Scouse Linden: yep
[9:47]  Babbage Linden: has anyone got anything else?
[9:47]  Zero Linden: welcome
[9:47]  Siann Beck: Any word on the db update?
[9:48]  Babbage Linden: peri?
[9:48]  Babbage Linden: do you know about that?
[9:49]  Scouse Linden: So that means we don't know
[9:49]  Babbage Linden: joel was leading the charge on the agni db update
[9:49]  Siann Beck: Last Friday Peri said he was going to talk to the Havok4 team about it; I just wondered if there was any news.
[9:49]  Babbage Linden: i'll ask him about that
[9:49]  Siann Beck: OK
[9:50]  Periapse Linden: Sorry, I was in IM.
[9:50]  Scouse Linden: did you get the question?
[9:50]  Periapse Linden: No, I haven't talked to Sidewinder yet. I'll ping him now. But I dont' think the update is likely, as havok4 is trying hard to get off of aditi and onto the maingrid
[9:51]  Siann Beck: OK
[9:51]  Siann Beck: And when can we expect a Mono refresh?
[9:51]  Babbage Linden: tomorrow hopefully
[9:51]  Siann Beck: COol
[9:51]  Scouse Linden: Tuesday at the latest
[9:51]  Babbage Linden: (us brits are on holiday on friday and monday, so we want to give you some new code before we push off)
[9:51]  Scouse Linden: (if for some reason tomorrows fails)
[9:52]  Siann Beck: Ah, I see. What holiday is it?
[9:52]  Scouse Linden: easter
[9:52]  Babbage Linden: easter friday and monday
[9:52]  Siann Beck: Oh, duh! I knew that :)
[9:52]  Phantom Ninetails: I've got easter coming too :P
[9:52]  Babbage Linden: so, peri may be running an office hours on friday
[9:52]  Periapse Linden: I will run the office hour on Friday as normal
[9:52]  Babbage Linden: from the land of state/religion separation
[9:52]  Zero Linden: oy - so apr-iconv is a think wrapper around only 3/4 of iconv.
[9:53]  Zero Linden: and iconv the lib has "translisterate"
[9:53]  Zero Linden: as an option
[9:53]  Zero Linden: which we can get by converting to "UTF-8//TRANSLIT"
[9:53]  Zero Linden: which we can sneak past APR's call I bet
[9:53]  Zero Linden: BUT
[9:53]  Zero Linden: the command line iconv has exactly the feature we want
[9:54]  Zero Linden: the question is - how does it do it...
[9:54]  Babbage Linden wonders whether we can call iconv directly
[9:55]  Babbage Linden wonders whether we do elsewhere already
[9:56]  Babbage Linden: ok, are we all done?
[9:57]  Babbage Linden: anything else?
[9:57]  Zero Linden: may not need to
[9:57]  Siann Beck: Well, I don't know what I missed, so I'll just wait for the transcript.
[9:57]  Phantom Ninetails: Eerie silence. I guess it's just about over
[9:57]  Zero Linden: the one non-wrapped call isn't needed, now I see
[9:57]  Periapse Linden: Siann, I just talked to sidewinder
[9:58]  Periapse Linden: He canceled the refresh because the beta grid would have to be down for 48 hours
[9:58]  Siann Beck: I see.
[9:58]  Periapse Linden: So I think we'll do it when havok4 leaves the grid
[9:58]  Phantom Ninetails: I find it odd that I became able to come on
[9:58]  Siann Beck: OK. Well, the object I need to test doesn't really have anything extraordinary, so it's probably fine.
[9:59]  Periapse Linden: But we can refresh specific people, so as to allow people to log on to aditi
[9:59]  Phantom Ninetails: Ah
[9:59]  Babbage Linden: i'll do the chat log peri
[9:59]  Phantom Ninetails: That'd probably explain it then :)
[9:59]  Siann Beck: I wondered about that, since I've seen people here who rezzed after the last update.
[9:59]  Periapse Linden: Thanks, Babbage.