Mono/2008-03-19

From Second Life Wiki
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
[8:59]  Scouse Linden: babbage is on his way
[8:59]  Babbage Linden: hello
[8:59]  Phantom Ninetails: Oh wow there he is
[8:59]  Entering god mode, level 200
[8:59]  Scouse Linden: Its only 8:59 :)
[8:59]  Phantom Ninetails: Well I was talking with Periapse a little while ago and that's what we figured
[9:00]  You shout: i've turned off scripts while we hold the office hours
[9:00]  Phantom Ninetails: The wiki says it starts at 8
[9:00]  Scouse Linden: oh damn daylight savings.....
[9:00]  Scouse Linden: It hasnt changed for us
[9:00]  Phantom Ninetails: Ah ^_^
[9:01]  Babbage Linden: let me ping zero, we wanted to talk to him about httprequest and unicode
[9:01]  Phantom Ninetails: :)
[9:01]  Scouse Linden: Since no one else is around, if you've got anything phantom go ahead and ask
[9:01]  Phantom Ninetails: Nope I've got nothing
[9:02]  Scouse Linden: I guess others probably turned up and left by now
[9:02]  Babbage Linden: yep, let me ping the group
[9:02]  Phantom Ninetails: Actually not many showed up at 8 either ;)
[9:02]  Scouse Linden: yeh it has slowed down
[9:02]  Scouse Linden: I bet it wouldnt be the same if we did it on the main grid
[9:02]  Phantom Ninetails: Only Periapse Linden and Becky Pippen showed up before
[9:02]  Phantom Ninetails: lol, yeah you might get more there
[9:03]  Babbage Linden: maybe we should ping the group on aditi too?
[9:04]  Phantom Ninetails: Couldn't hurt
[9:04]  Scouse Linden: Go ahead!
[9:04]  Scouse Linden: What things are the most important to you to get fixed next?
[9:05]  Phantom Ninetails: Control event.
[9:05]  Scouse Linden: Fixed it today
[9:05]  Phantom Ninetails: Yay :)
[9:05]  Scouse Linden: Infact I was writing tests for events
[9:05]  Scouse Linden: I'll put them on t he wiki
[9:05]  Scouse Linden: It will be fixed in the next refresh
[9:05]  Phantom Ninetails: Alrighty :)
[9:07]  Babbage Linden: we've uncovered some weird lsl2 behaviour recently
[9:07]  Phantom Ninetails: Weird?
[9:07]  Babbage Linden: see the language test 2
[9:07]  Babbage Linden: we added tests for while, for and do while
[9:07]  Babbage Linden: a couple seemed odd in LSL2
[9:08]  Scouse Linden: specifically if ([1]) is false, if ([1,2] is true
[9:08]  Phantom Ninetails: That does indeed seem weird
[9:08]  Babbage Linden: it's one of the cases in which we wonder whether being compatible is the correct approach
[9:09]  Phantom Ninetails: Heh
[9:09]  Babbage Linden: we probably should be, but it seems very odd
[9:10]  Phantom Ninetails: Yeah.. That certainly looks like a difficult choice
[9:10]  Babbage Linden: the other case is where LSL2 allows you to have some control flows return values, but not others
[9:10]  Babbage Linden: that can be caught by the compiler
[9:10]  Babbage Linden: and you can always fix it
[9:10]  Babbage Linden: so it's less bad
[9:10]  Phantom Ninetails: SVC-1421?
[9:11]  Babbage Linden: you can't catch the if(list) at compile time easily
[9:11]  Scouse Linden: https://wiki.secondlife.com/wiki/Event_test_script
[9:11]  Babbage Linden: hi zero
[9:11]  Phantom Ninetails: Greetings Zero
[9:11]  Babbage Linden: welcome
[9:11]  Zero Linden: oh - we're in text
[9:11]  Babbage Linden: yes
[9:11]  Babbage Linden: maybe something about scripting? ;-)
[9:11]  Zero Linden: scripting - ah yes - I remember scripting while out on the Seregheti!
[9:12]  Phantom Ninetails: Welcome back Periapse :)
[9:12]  Babbage Linden: we're communicating in text as it's the medium of code
[9:12]  Babbage Linden: hi peri
[9:12]  Periapse Linden: lol -- hello again phantom
[9:12]  Phantom Ninetails: :)
[9:12]  Babbage Linden: so, the issue we'd like to discuss with zero
[9:12]  Babbage Linden: (and we'd welcome input from you too phantom)
[9:12]  Phantom Ninetails: ^_^
[9:12]  Babbage Linden: is a problem with llHTTPRequest specifically
[9:13]  Babbage Linden: and UTF8 handling in general
[9:13]  Periapse Linden: zero -- babbage turned off scripts, so the chair will be wonky
[9:13]  Babbage Linden: Mono uses UTF16 to represent strings internally
[9:13]  Zero Linden: yes it will!
[9:13]  Babbage Linden: so, we need to convert all UTF8 strings to UTF16 at some point
[9:13]  Babbage Linden: if the UTF8 is invalid, that conversion fails
[9:14]  Zero Linden: well - what do we do with LL strings now?
[9:14]  Babbage Linden: and by default Mono returns NULL strings
[9:14]  Zero Linden: Ah -
[9:14]  Babbage Linden: at the moment LSL will accept borked UTF8
[9:14]  Babbage Linden: and just pass it around
[9:14]  Zero Linden: hmmm.... so the first question to ask is if mono handles the Astral Planes correctly
[9:14]  Zero Linden: loves Unicode terminology
[9:14]  Babbage Linden: and render what it can when you try to output it
[9:14]  Scouse Linden: Some web servers give us back latin 1 despite asking for utf-8
[9:15]  Babbage Linden: yes, the bug we found
[9:15]  Babbage Linden: was a server reporting UTF-8
[9:15]  Zero Linden: Scouse - no! really, can you point me to such a site?
[9:15]  Scouse Linden: Mono refuses to parse latin 1 as utf-8
[9:15]  Babbage Linden: but returning latin-1
[9:15]  Zero Linden: well, Latin-1 isn't UTF-8 and
[9:15]  Zero Linden: and can't be parse as such at all
[9:15]  Zero Linden: it is invalid
[9:15]  Scouse Linden: exactly
[9:15]  Babbage Linden: so, the llHTTPRequest gets a null conversion
[9:15]  Babbage Linden: doesn't touch the latin 1
[9:15]  Scouse Linden: (under mono)
[9:15]  Zero Linden: so, we check, in llHTTPRequest, what the returned charset is and check that
[9:16]  Babbage Linden: then Mono can't convert it to UTF16 and so fails
[9:16]  Zero Linden: Wait, - is there a case where the site is giving us UTF-8 , claiming it is UTF-8, but isn't actually UTF-8?
[9:16]  Scouse Linden: yes
[9:16]  Babbage Linden: it claims it's giving us UTF8
[9:16]  Babbage Linden: but then gives us latin 1
[9:17]  Zero Linden: well then - we *shoud* treat that string as empty
[9:17]  Babbage Linden: but, with LSL2, we currently don't
[9:17]  Zero Linden: there is nothing you can do
[9:17]  Zero Linden: really?
[9:18]  Scouse Linden: Browsers typically replace the text with �
[9:18]  Babbage Linden: so, we can fix llHTTPRequest
[9:18]  Babbage Linden: to reject bad UTF8 after conversion
[9:18]  Babbage Linden: but, there are other ways to give borked UTF8 to LSL
[9:18]  Zero Linden: I'm checking the source - does it not validate if the result claims to be UTF8?
[9:18]  Babbage Linden: and we currently just pass it around and then throw it at the screen
[9:19]  Babbage Linden can't remember
[9:19]  Zero Linden: which won't render correctly
[9:19]  Babbage Linden: it was a while ago
[9:19]  Scouse Linden: http://www.salahzar.info/lsl/feedburner.php?feed=http://www.beppegrillo.it/index.xml the URL, if you change the character set it displays the characters correctly
[9:19]  Zero Linden: there is no , fallback to Latin-1
[9:19]  Scouse Linden: I personally think we should try to do like the browsers and mark invalid characters
[9:19]  Babbage Linden: so, should we try to work around broken UTF8
[9:20]  Babbage Linden: try to fix it so it turns in to some reasonable UTF16 when passed on Mono
[9:20]  Zero Linden: okay, so first off
[9:20]  Babbage Linden: or, just sulk and replace strings that return NULL when passed to Mono
[9:20]  Babbage Linden: with some "Invalid UTF8" string?
[9:20]  Babbage Linden: (which would be a change in behaviour to LSL2)
[9:21]  Zero Linden: well, first off, in that case we should not support that feed
[9:21]  Zero Linden: it is broken acording to at least three RFCs!
[9:21]  Babbage Linden: ok, so that is easy for the llHTTPRequest case
[9:21]  Babbage Linden: try to validate the UTF8
[9:22]  Babbage Linden: convert the status to 40X
[9:22]  Zero Linden: Now, when Doug did the UTF8 conversion to LSL - he intended that non-UTF-8 strings would be considered ILLEGAL
[9:22]  Babbage Linden: that is not currently the case
[9:22]  Zero Linden: and I don't know if other functions will do bad things
[9:22]  Babbage Linden: the other one we found that could generate borked UTF8
[9:22]  Babbage Linden: was llBase64ToString
[9:23]  Zero Linden: *THAT* function is EVIL
[9:23]  Zero Linden: :-)
[9:23]  Zero Linden: So - let's look at the idea of "fixing" it
[9:23]  Babbage Linden: that has no out of band error
[9:23]  Babbage Linden: no status we can set
[9:23]  Zero Linden: so - you have an LSL string with arbitrary binary data in it, say from llBase64ToString
[9:23]  Babbage Linden: our only option is to return some "special string"
[9:23]  Zero Linden: how would we fix it?
[9:23]  Babbage Linden: or cause a fault in the script
[9:23]  Babbage Linden: (LSL error semantics are horrible)
[9:24]  Zero Linden: so that it can become a valid Unicode string, that Mono can deal with as UTF-16?
[9:24]  Babbage Linden: so, we ask Mono for the UTF16 string representation
[9:24]  Babbage Linden: check the return
[9:24]  Babbage Linden: if it's NULL
[9:24]  Zero Linden: You can't really, since you don't know the encoding.....
[9:24]  Zero Linden: BUT
[9:24]  Babbage Linden: use some error string
[9:25]  Zero Linden: you could pick some arbitrary 8-bit character set which has all 256 characters legal, and which 'round trips into and out of Unicode
[9:25]  Babbage Linden: (if it was C# we could throw an exception)
[9:25]  Zero Linden: and just DECLARE it to be in taht character set
[9:26]  Zero Linden: That will work, but will be unreliable....
[9:27]  Zero Linden: In otherwords - if the llBase64Decode
[9:27]  Zero Linden: resutls in a sequence that happens to be UTF-8 valid, then you get the sequence of bytes you decoded
[9:27]  Zero Linden: if it doesn't, then you get re-interpretation in another charset, which result sin a UTF-8/UTF-16 string that represents
[9:27]  Zero Linden: a different sequece of bytes
[9:28]  Zero Linden: so - you could do that, but it would break llBase64Decode anyway
[9:28]  Zero Linden: So, I don't see a "fall back" position that works for llBase64Decode
[9:28]  Babbage Linden: no
[9:28]  Scouse Linden: Its pretty evil
[9:28]  Babbage Linden: so, in general
[9:29]  Babbage Linden: the goal was to dissallow invalid UTF8 in LSL anyway
[9:29]  Zero Linden: Well - we should state (even if we don't) that the Base64 functions ONLY work on Unicode text
[9:29]  Zero Linden: not arbitary binary data
[9:29]  Babbage Linden: so if we start converting invalid strings to error strings
[9:29]  Zero Linden: now
[9:29]  Babbage Linden: that may actually be better than what we have
[9:29]  Zero Linden: you could create a class of objects, "binary string"
[9:30]  Babbage Linden: (although it's different behaviour)
[9:30]  Zero Linden: well - you could get away with it perhaps, with these semantics
[9:30]  Zero Linden: all the string functions
[9:30]  Zero Linden: on binary strings produce binary strings
[9:30]  Babbage Linden: " Converts a Base 64 string to a conventional string. If the conversion creates any unprintable characters, they are converted to question marks"
[9:30]  Zero Linden: but as soon as the binary string is mixed with a normal string
[9:30]  Babbage Linden: from the LSL wiki
[9:31]  Zero Linden: the bianry string is interpreted as UTF-8 and if invalid is ""
[9:31]  Zero Linden: well, Babbage, "unprintable" in what charset? what charset are you considering the byte stream of the conversion to be in
[9:31]  Zero Linden: UTF-8?
[9:31]  Zero Linden: so say
[9:31]  Babbage Linden: so, there is a (defacto) specification atm it seems
[9:32]  Babbage Linden: even if the ? chars are currently coming from trying to display the invalid cahrs
[9:32]  Babbage Linden: instead of due to the conversion
[9:32]  Zero Linden: "Converts a Base 64 encoded string into it's decoded bytes, which are then interpreted as a UTF-8 string. Invalid UTF-8 sequences in this byte stream will be replaced with question marks"
[9:32]  Simil Miles: Hi all
[9:32]  Phantom Ninetails: Greetings
[9:32]  Zero Linden: actually, I think Unicode has a "this was a conversion error" character
[9:32]  Babbage Linden: (most browsers do that)
[9:33]  Babbage Linden: replace borked chars with "question mark in a diamond"
[9:33]  Zero Linden: thats the Unicode thingy
[9:33]  Babbage Linden: emulating that was scouse's favoured approach
[9:34]  Babbage Linden: given that we don't have good ways of reporting errors, i think it may be the best way forward
[9:34]  Zero Linden: I think that is reasonable
[9:34]  Babbage Linden: returning "" or setting an error and stopping the script seem horrible in terms of compatibility
[9:35]  Zero Linden: and the right thing to do in that HTTP case - the server says explicitly that the result is UTF-8 (I checked with Curl)
[9:35]  Zero Linden: and so we have no choice but to believe it ---
[9:35]  Zero Linden: so - if it fails conversion - replace them characters
[9:35]  Zero Linden: rather, them bytes
[9:35]  Babbage Linden: great
[9:35]  Babbage Linden: so now we just need a gadget to do the replacement
[9:35]  Babbage Linden: we'll have a look in glib
[9:35]  Zero Linden: what is the mono method that interprets a vector of bytes into a string?
[9:36]  Babbage Linden: it's just mono_string_new
[9:36]  Babbage Linden: which calls the glib UTF8 to UTF16 function under the hood
[9:37]  Zero Linden: ahand that function doesn't have a "what to do with bad encodings" option?
[9:37]  Babbage Linden: no, it just reports an error if the stream is invalid UTF8
[9:38]  Babbage Linden: (so we need to massage the vector of bytes before hand)
[9:38]  Zero Linden: U+FFFD
[9:38]  Zero Linden: That's easy - I can write that with you if you need
[9:38]  Scouse Linden: �
[9:38]  Zero Linden: I have UTf-8 encoding in my blood
[9:38]  Babbage Linden: ok, cool
[9:38]  Zero Linden: �
[9:38]  Babbage Linden: we'll have a sniff around
[9:39]  Zero Linden: bizarre that on the Mac that doesn't print correctly
[9:39]  Scouse Linden: The are glib functions apparently which do the stuff we need
[9:39]  Scouse Linden: according to the folks in the mono channel
[9:39]  Babbage Linden: and if we can't find anything suitable we'll take you up on the offer
[9:39]  Babbage Linden: can we use the httprequest conversion dodad/
[9:39]  Zero Linden: that uses APR, which is a wrapper on iconv
[9:39]  Babbage Linden: "convert potentially borken utf8 in to mostly fixed utf8"
[9:39]  Babbage Linden: that's the thingy
[9:40]  Babbage Linden: i'll have a look in to iconv too
[9:40]  Zero Linden: hmmm... I notice that we as to do the transcode no amtter what the charset
[9:40]  Zero Linden: so we ask to transcode from utf-8 to utf-8!
[9:40]  Zero Linden: which is good!
[9:41]  Babbage Linden: does that behaviour sound reasonable to everyone?
[9:41]  Babbage Linden: (you should only notice if you do something other than displaying the strings)
[9:41]  Scouse Linden: yes
[9:41]  Babbage Linden: (like emailing them or HTTP requesting them)
[9:42]  Babbage Linden: silence == consent
[9:42]  Zero Linden: well - it will cause llBase64Encode(llBase64Decode(s)) to not be the identity function
[9:43]  Zero Linden: but I think this is reasonable
[9:43]  Zero Linden: the other option, the BinaryString class option
[9:43]  Zero Linden: would preserve this
[9:43]  Scouse Linden: We might have to
[9:43]  Phantom Ninetails: lol, well I've never used any of the functions being discussed here so.. But yeah seems like a good idea
[9:45]  Babbage Linden doesn't like the idea of defining a new, non standard string class to support that corner case
[9:45]  Zero Linden: I think it might be okay to break llBase64Decode in that way
[9:45]  Zero Linden: if you gneerate a non-UTF-8 string that way now, almost nothing else in the LSL library will work right
[9:45]  Babbage Linden: you probably didn't mean to do it if you do now
[9:46]  Babbage Linden is tempted to go ahead with this plan and deal with the screaming later
[9:47]  Phantom Ninetails: lol
[9:47]  Babbage Linden: ok, so that's enough of that i think
[9:47]  Babbage Linden: thanks for coming along zero
[9:47]  Scouse Linden: yep
[9:47]  Babbage Linden: has anyone got anything else?
[9:47]  Zero Linden: welcome
[9:47]  Siann Beck: Any word on the db update?
[9:48]  Babbage Linden: peri?
[9:48]  Babbage Linden: do you know about that?
[9:49]  Scouse Linden: So that means we don't know
[9:49]  Babbage Linden: joel was leading the charge on the agni db update
[9:49]  Siann Beck: Last Friday Peri said he was going to talk to the Havok4 team about it; I just wondered if there was any news.
[9:49]  Babbage Linden: i'll ask him about that
[9:49]  Siann Beck: OK
[9:50]  Periapse Linden: Sorry, I was in IM.
[9:50]  Scouse Linden: did you get the question?
[9:50]  Periapse Linden: No, I haven't talked to Sidewinder yet. I'll ping him now. But I dont' think the update is likely, as havok4 is trying hard to get off of aditi and onto the maingrid
[9:51]  Siann Beck: OK
[9:51]  Siann Beck: And when can we expect a Mono refresh?
[9:51]  Babbage Linden: tomorrow hopefully
[9:51]  Siann Beck: COol
[9:51]  Scouse Linden: Tuesday at the latest
[9:51]  Babbage Linden: (us brits are on holiday on friday and monday, so we want to give you some new code before we push off)
[9:51]  Scouse Linden: (if for some reason tomorrows fails)
[9:52]  Siann Beck: Ah, I see. What holiday is it?
[9:52]  Scouse Linden: easter
[9:52]  Babbage Linden: easter friday and monday
[9:52]  Siann Beck: Oh, duh! I knew that :)
[9:52]  Phantom Ninetails: I've got easter coming too :P
[9:52]  Babbage Linden: so, peri may be running an office hours on friday
[9:52]  Periapse Linden: I will run the office hour on Friday as normal
[9:52]  Babbage Linden: from the land of state/religion separation
[9:52]  Zero Linden: oy - so apr-iconv is a think wrapper around only 3/4 of iconv.
[9:53]  Zero Linden: and iconv the lib has "translisterate"
[9:53]  Zero Linden: as an option
[9:53]  Zero Linden: which we can get by converting to "UTF-8//TRANSLIT"
[9:53]  Zero Linden: which we can sneak past APR's call I bet
[9:53]  Zero Linden: BUT
[9:53]  Zero Linden: the command line iconv has exactly the feature we want
[9:54]  Zero Linden: the question is - how does it do it...
[9:54]  Babbage Linden wonders whether we can call iconv directly
[9:55]  Babbage Linden wonders whether we do elsewhere already
[9:56]  Babbage Linden: ok, are we all done?
[9:57]  Babbage Linden: anything else?
[9:57]  Zero Linden: may not need to
[9:57]  Siann Beck: Well, I don't know what I missed, so I'll just wait for the transcript.
[9:57]  Phantom Ninetails: Eerie silence. I guess it's just about over
[9:57]  Zero Linden: the one non-wrapped call isn't needed, now I see
[9:57]  Periapse Linden: Siann, I just talked to sidewinder
[9:58]  Periapse Linden: He canceled the refresh because the beta grid would have to be down for 48 hours
[9:58]  Siann Beck: I see.
[9:58]  Periapse Linden: So I think we'll do it when havok4 leaves the grid
[9:58]  Phantom Ninetails: I find it odd that I became able to come on
[9:58]  Siann Beck: OK. Well, the object I need to test doesn't really have anything extraordinary, so it's probably fine.
[9:59]  Periapse Linden: But we can refresh specific people, so as to allow people to log on to aditi
[9:59]  Phantom Ninetails: Ah
[9:59]  Babbage Linden: i'll do the chat log peri
[9:59]  Phantom Ninetails: That'd probably explain it then :)
[9:59]  Siann Beck: I wondered about that, since I've seen people here who rezzed after the last update.
[9:59]  Periapse Linden: Thanks, Babbage.