Open Source Meeting/2010-01-29

From Second Life Wiki
Jump to: navigation, search

Transcript

[#SNOW-431] Boost downloads for Snowglobe 1.3 aren't really 1.39.0

[#SNOW-435] LLTextureCache : mHeaderMutex is recursively locked sometimes + main loop time out paused when it shouldn't

  • [14:12] Merov Linden: yes, I think that, in essence, he is right
  • [14:12] Thickbrick Sleaford: depending on if that patch has a part in causing the deadlocks
  • [14:13] Merov Linden: yes but, as you know, this patch was added to prevent other problems
  • [14:13] Merov Linden: namely: unguarded access to shared data
  • [14:13] Merov Linden: so it seems that, before that patch, unguarded access was the norm and that (mostly by luck), it worked
  • [14:14] Thickbrick Sleaford: recursive locks will make the pattern of unlock; call a locking method; lock; pattern less problematic - but it's asking for other problems
  • [14:14] Merov Linden: now that we lock mutexes when we should, we get dead locking
  • [14:14] Mojito Sorbet: The clean way out is don't share the data
  • [14:15] Merov Linden: we don't really share except the worker queue
  • [14:15] Merov Linden: we use an implementation of worker threads basically
  • [14:15] Robin Cornelius: its decode workers that wing a hit with a network packet update for a texture with non zero discard that are the killers
  • [14:16] Merov Linden: could you elabrate Robin?
  • [14:16] Robin Cornelius: i can but it does not help the issue ;-). ...
  • [14:16] Merov Linden: I didn't think the decode thread was the problem
  • [14:16] Robin Cornelius: one of them is a decode callback
  • [14:16] Robin Cornelius: one of them is typicaly a network packet update
  • [14:17] Robin Cornelius: and there is a readcache responder as well thrown in there
  • [14:17] Robin Cornelius: basicly there are two critical mutexes here and they are locked in opposite orders in some code paths wrt the main thread and a worker thread doing *something*
  • [14:18] Robin Cornelius: and it does not appear to be trival to fix the order of either of them
  • [14:18] Merov Linden: hmmm...
  • [14:18] Morgaine Dinova: Welcome to the obnoxious world of threads and shared memory concurrency. It's not for no reason that I've always been proposing a disjoint processes model for the viewer.
  • [14:19] Merov Linden: spawn processes everywhere?
  • [14:19] Robin Cornelius: even with disjointed processes you still need IPC and that requires sharing some data somewhere to do anything useful and that requires locking
  • [14:19] Mojito Sorbet: Another way out is a big moby mutex that locks everything. This was used in TOPS-20 once, when the developers could not get the file system to work.
  • [14:20] Morgaine Dinova: You're doing something similar to that model with your plugin-api. There's lots of variants, but they all avoid the basic problem.
  • [14:20] Merov Linden: yeah: add an item to the queue, take an item off the queue
  • [14:20] Mojito Sorbet: As long as the queued work items are isolated, there should not be a problem,
  • [14:20] Thickbrick Sleaford: Robin: I got the impression that in most of LLTextureFetch mQueueMutes is locked before mNetworkQueueMutex. Which parts don't follow that?
  • [14:21] Morgaine Dinova: Robin: you don't need locking at all if you structure it right. But that's not a quick hack.
  • [14:21] Robin Cornelius: so doWork() locks the mWorkMutex and later the mQueueMutex and main thread locks the mQueueMutex and later the work mutex
  • [14:21] Mojito Sorbet: I think they need a quick hack for now, and the big fix in 2.0
  • [14:21] Morgaine Dinova: Agreed
  • [14:22] Pixel Gausman: Merov: i wanted to add SNOW-464 to the end of the list of bugs to discuss.
  • [14:22] JIRA helper: http://jira.secondlife.com/browse/SNOW-464

[#SNOW-464] Small texture issue on OpenSim

  • [14:22] Merov Linden: well, we have 3 solutions for a "quick fix":
  • [14:23] Merov Linden: 1. roll back all the lltexturefetch "fixes" we did in 1.3 (we did 2 actually)
  • [14:23] Merov Linden: obvisouly, we didn't fix things right and 1.2 doesn't lock/freeze as much
  • [14:23] Merov Linden: 2. merge the lltexturefetch.cpp from Viewer 2
  • [14:24] Robin Cornelius: is viewer 2 fixed here?
  • [14:24] Merov Linden: the code is almost the same except for some key refactoring that does thing the way Aleric wants
  • [14:24] Merov Linden: the problem is that Viewer 2 has *also* some dead locking issues :(
  • [14:24] Robin Cornelius: meh, 3?
  • [14:25] Merov Linden: 3. don't use threading for lltexturefetch at all
  • [14:25] Pixel Gausman: how hard is number 3?
  • [14:25] Thickbrick Sleaford: 3 would be sad
  • [14:25] Pixel Gausman nods
  • [14:25] Eddi Decosta: hello all! ㋡
  • [14:26] Merov Linden: there's an argument for this when intantiating the fetch queue
  • [14:26] Robin Cornelius: 2 makes some sense, as at least we can help fight the remaining issues rather than work on obsolete code
  • [14:26] Jonathan Yap: How much longer would textures take to load for option #3
  • [14:26] Merov Linden: but I agree with Thickbrick: it's sad
  • [14:26] Merov Linden: we have threaded fetch of texture in Snowglobe since 1.0 so it's really giving up turning it off in 1.3
  • [14:27] Robin Cornelius: there is a 4th option for a bit of a hack
  • [14:27] Mojito Sorbet: The downloading part of the fetching can be easily paralleled. You are waiting for the network most of the time.
  • [14:27] Robin Cornelius: deadlocks are occuring in a few known places
  • [14:27] Merov Linden: Jonathan: hard to tell "how much longer", slower for sure
  • [14:28] Merov Linden: Mojito: "easily" is misleading here, believe me :)
  • [14:28] Robin Cornelius: so in some of the worker threads a is_locked() test could be performed, then bail that iteration if it is locked() as we know this would deadlock in reailty, so just dump the worker and let it have another go next time it gets to the queue to be processed
  • [14:28] Jonathan Yap: In my rough testing of udp vs http texture loading I noticed a 2x speedup, so maybe the user experience wouldn't be that much worse
  • [14:29] Mojito Sorbet: Robin's suggestion is widely used in database transactions
  • [14:29] Merov Linden thinks...
  • [14:29] Pixel Gausman: so one thing abt taking the fetch from Viewer 2.0, if we had that, we could be working on shaking the bugs out of it, which might help viewer 2.0, which is a good thing
  • [14:30] Merov Linden: Pixel: true
  • [14:30] Merov Linden: this is one think that makes me want go that down that road
  • [14:31] Robin Cornelius: and may be we could apply the is_locked() hack to the new code and everyone is happy ;-)
  • [14:31] Thickbrick Sleaford: how big a diff is there between the two versions of LLtextureFetch ?
  • [14:31] Merov Linden: there are however oddities that I need to surgically cut around short of merging lots and lots of Viewer 2 stuff
  • [14:31] Merov Linden: lltexturefetch.cpp diff itself is not that big
  • [14:31] Merov Linden already went through the whole thing a couple of times
  • [14:32] Pixel Gausman: Merov: maybe you just do the whole viewer 2.0 merge over the weekend. :)
  • [14:32] Merov Linden: there are some changes though that are related to refectoring in the image list
  • [14:32] Merov Linden: Pixel: thank you so much! I didn't think about that :P
  • [14:33] Thickbrick Sleaford: we already have LLMutex::isLocked()
  • [14:33] Merov Linden: so, may be I should follow that path (solution 2 - merge Viewer 2.0 fetch more completely)
  • [14:34] Robin Cornelius: s/fetch more//
  • [14:34] Pixel Gausman: Merov: only you can judge how risky that is.
  • [14:34] Jonathan Yap: It seems like that work would be done in the near future anyhow?
  • [14:34] Merov Linden: I guess I can judge the risk without making a fair patch first
  • [14:35] Merov Linden: Jonathan: I was planning to rebase Snowglobe off Viewer 2, *not* merge Viewer 2
  • [14:35] Merov Linden: it's a bit of a different mind set or stance on the whole process
  • [14:35] Mojito Sorbet: You would not want to lose anything that was in SG1 but not in Vwr2
  • [14:36] Merov Linden: but, yes. I want to get 1.3 out so that I can work on this rebasing ASAP
  • [14:36] Pixel Gausman: Merov: definitely rebase
  • [14:36] Merov Linden: Mojito: a *lot* of what we did in Snowglobe has been merged into Viewer 2.0
  • [14:37] Mojito Sorbet: Thats good
  • [14:37] Robin Cornelius: Thickbrick, did you test Alerics v2 patch on 196?
  • [14:37] Mojito Sorbet: If it was just me, Id go for the V2 rebasing as the way out
  • [14:37] Eddi Decosta: hi OO ㋡
  • [14:37] Opensource Obscure: hello hello everybody
  • [14:37] Thickbrick Sleaford: I did test aleric's patch, but only on single-core linux, where I don't get deadlocks anyway
  • [14:37] Sahkolihaa Contepomi waves.
  • [14:37] Merov Linden: well, that's a way to just "kill" 1.3...
  • [14:37] Pixel Gausman: hi OO
  • [14:38] Merov Linden: or declare it "failed" in some way
  • [14:38] Mojito Sorbet: Everyone knows you have to do it *eventually*. Why not now?
  • [14:38] Robin Cornelius: i've been getting deadlocks on my single core system so even that is not immune
  • [14:38] Robin Cornelius: (pre alerics patch)
  • [14:38] Thickbrick Sleaford: I did get deadlocks on single core windows, but not linux for some reason.
  • [14:39] Sahkolihaa Contepomi got them on his dual-core.
  • [14:39] Merov Linden: ok, wait: we need to agree on how we work on this so we have a patch toward "progress" here
  • [14:39] Thickbrick Sleaford: I think more people testing aleric's patch from -196 on various systems and reporting back would help
  • [14:39] Robin Cornelius: +1
  • [14:40] Merov Linden: +1 on testing Aleric's patch
  • [14:40] Mojito Sorbet: More dadta always helps
  • [14:40] Mojito Sorbet: data
  • [14:40] Merov Linden: I'm also going to create that lltexturefetch.cpp "Viewer 2" patch
  • [14:40] Merov Linden: we'll see how that works
  • [14:41] Pixel Gausman: Robin: we just fixed the texture bake issue i was discussing in IRC last nite. for *once* it was OpenSim, and not the viewer.
  • [14:41] Merov Linden: if all fail in, say, 1 week, we can just fall back on "kill threading for 1.3"
  • [14:41] Robin Cornelius: Pixel great
  • [14:41] Merov Linden: how does that sound?
  • [14:42] Robin Cornelius: like a plan
  • [14:42] Merov Linden: ok, thanks :)
  • [14:42] Merov Linden: now, next agenda item
  • [14:42] Merov Linden: Snowglobe 1.4: what is that?
  • [14:42] Sahkolihaa Contepomi blinks.
  • [14:43] Pixel Gausman: Is Snowglobe 1.4 a rebased viewer 2.0?
  • [14:43] Morgaine Dinova: Snowglobe with pie?
  • [14:43] Pixel Gausman: or do we not know yet?
  • [14:43] Merov Linden: ok, after 1.3, as you all know, I'll be working on Snowglobe 2.0
  • [14:43] Sahkolihaa Contepomi: That would be Snowglobe 2.0 I think. :p
  • [14:43] Sahkolihaa Contepomi: Heh, pie.
  • [14:43] Merov Linden: the rebasing of Snowglobe off Viewer 2.0
  • [14:43] Mojito Sorbet: Ok, so SG1.4 is based on Vwr 2.0?
  • [14:43] Robin Cornelius: there may be a few extra commits to trunk that could form a 1.4 if they are worth while and of cause depending on how long 2.0 takes ;-)
  • [14:43] Merov Linden: there are a bunch of strictly internal issues I need to fix LL side
  • [14:44] Merov Linden: all the export and assemblies and hg things
  • [14:44] Pixel Gausman: bidness goo
  • [14:44] Sahkolihaa Contepomi: 1.4 = test run of phonon? :p
  • [14:44] Merov Linden: before I can even get something GPL safe
  • [14:44] Morgaine Dinova: Wouldn't it make sense for Snowglobe 2.0 to be the one with Viewer2.0 rebasing?
  • [14:44] Merov Linden: that'll take me a couple of weeks where, basically, I'll go dark :/
  • [14:45] Opensource Obscure: I like The Cure
  • [14:45] Opensource Obscure: sorry.
  • [14:45] Mojito Sorbet: Having the version numbers both start with "2" helps clarify what is going on
  • [14:45] Merov Linden: Morgaine: I don't understand your question
  • [14:45] Pixel Gausman imagines Merov locked in an office with linden feeding pizza under the door
  • [14:45] Merov Linden: Pixel: that is *exactly* the way they're describing it around here
  • [14:46] Sahkolihaa Contepomi is suddenly reminded of an image he saw on the Red hat 9 installer stating someone having to code to receive pizza.
  • [14:46] Morgaine Dinova: Merov: just choice of numbers. I thought you were going for rebasing in a 1.* release
  • [14:46] Pixel Gausman: Merov: ah, so my bribe to lljoe worked!
  • [14:46] Merov Linden: anyway, it's not because I work on that that all work on Snowglobe 1.x branch need to stop
  • [14:46] Merov Linden: although there's much less incentive to do big rework I understand
  • [14:47] Mojito Sorbet: People can continue to try new ideas on 1.x while Merov gets 2,0 ready. Then merge them later
  • [14:47] Merov Linden: the 1.x trunk of Snowglobe will remain open for commit
  • [14:47] Merov Linden: and, also, to make a difference with the 1.3 branch builds, I had to rename it something, hence 1.4
  • [14:47] Merov Linden: Mojito: yes, they can
  • [14:48] Merov Linden: I'm all for it
  • [14:48] Pixel Gausman: Merov: but should new features or complicated code be held off on?
  • [14:48] Merov Linden: though, as I said, I can understand it's not super motivating
  • [14:48] Thickbrick Sleaford: but are you planning a 1.4 release, or is it just a branch name?
  • [14:48] Mojito Sorbet: Depends on how major of an internal reorg 2,0 is
  • [14:48] Merov Linden: Thickbrick: it's just the branch name for now
  • [14:48] Thickbrick Sleaford: ok, that makes sense
  • [14:49] Merov Linden: and a back up in case my rebasing effor flunk up a big way
  • [14:49] Mojito Sorbet: If the diff between V1 and V2 is mostly in GUI layouts, then you probably do not want to do a lot of work in that area.
  • [14:49] Merov Linden: which is always a possibility
  • [14:49] Merov Linden: Mojito: true
  • [14:50] Mojito Sorbet: But if the diff is mainly GUI, then full stema ahead on under-the-covers type work. Like tecture loading
  • [14:50] Merov Linden: also, one thing I wanted to run to you guys is the following:
  • [14:50] Merov Linden: I'm planning to do the grunt work of cherry picking all those little fixes/patches that are Snow specific and not in Viewer 2
  • [14:50] Pixel Gausman: Merov: so let's assume you are wildly successful in your rebasing..... will the business goo hold u up from pushing your rebase out publiclly?
  • [14:51] Merov Linden: mostly because I have access to the internal JIRA so I know which one were merged, which one were not and why
  • [14:51] Merov Linden: Pixel: for the moment, I think I'll be a hero if I make the date the business guys ask me to make!
  • [14:52] Opensource Obscure: :)
  • [14:52] Merov Linden: so I've no worry that the business goo will be an issue...
  • [14:52] Merov Linden: more like "business oil" right now...
  • [14:52] Merov Linden: or grease
  • [14:52] Merov Linden: not sire
  • [14:52] Pixel Gausman: Merov: that's good, me thinks
  • [14:52] Merov Linden: sure
  • [14:52] Morgaine Dinova: Hmmm ... that seems to indicate that Viewer2.0 release is pretty soon :-)
  • [14:53] Merov Linden: anyway, I still haven't made my pledge
  • [14:54] Merov Linden: my pledge is: for some features (like OGP login or SOCKS5) support, I was thinking to hold off and wait for the Snowglobe 2.0 trunk to go live and ask your help to merge the features
  • [14:54] Pixel Gausman: the shoe drops
  • [14:54] Merov Linden: bees with shoes?
  • [14:55] Pixel Gausman: Merov: i'm reasonable. i can understand that.
  • [14:55] Pixel Gausman: can you promise me one thing?
  • [14:55] Merov Linden is grateful
  • [14:55] Merov Linden listen
  • [14:55] Pixel Gausman: that you won't ship a version of Snowglobe w/o OGP?
  • [14:55] Merov Linden: ship like: make it an official version?
  • [14:56] Pixel Gausman: yes
  • [14:56] Merov Linden: you have my words
  • [14:56] Pixel Gausman: kewl, i'll help in every way possible with getting OGP working on a viewer 2.0 based snowglobe
  • [14:56] Merov Linden: the trunk (source code) will go "live" though before we have such an official version
  • [14:57] Pixel Gausman: understand, and there will be daily builds
  • [14:57] Merov Linden: o/
  • [14:57] Pixel Gausman: i just dont want ogp to fall out of a release.
  • [14:57] Merov Linden: we don't want either :)
  • [14:57] Thickbrick Sleaford: it seems OGP is starting to get some momentum just now
  • [14:58] Merov Linden: wow! ok, well thanks Pixel, that's a big help for me
  • [14:58] Morgaine Dinova: The OGP stuff should be factored out into a demand-loadable .so / DLL anyway. There's no need for a protocol stack to be deeply entangled with the bulk of an app.
  • [14:58] Pixel Gausman: Thick: yeah, it's awesome that hurliman has been asking so many good questions
  • [14:59] Merov Linden: ok, we basically went through the whole agenda :)
  • [14:59] Merov Linden: we have 30 sec left for Pixel's question :)
  • [14:59] Pixel Gausman: i just wanted Thick to see http://jira.secondlife.com/browse/SNOW-464
  • [14:59] JIRA helper: [#SNOW-464] Small texture issue on OpenSim
  • [15:00] Thickbrick Sleaford: oh, I saw that, but have no idea
  • [15:00] Pixel Gausman: but since we're redoodling the texture fetch to viewer 2.0, it can wait until after that to see if it still happens
  • [15:00] Sahkolihaa Contepomi: Robin, how has the phonon support being going? Getting disappointed of not having any streaming media here. xP
  • [15:00] Pixel Gausman: ok, it smelled a little like SNOW-157
  • [15:00] most textures don't load logged on an OpenSim
  • [15:01] Thickbrick Sleaford: Pixel, it looks more decoding-related to me
  • [15:01] Pixel Gausman: Merov: so i think i'll retest after you do your texture fetch upgrade
  • [15:01] Merov Linden: Pixel: got that
  • [15:01] Pixel Gausman: mmmm.. ok.
  • [15:01] Pixel Gausman: that's all for me.
  • [15:02] Merov Linden: cool: it's 3:01pm and we're done!
  • [15:02] Merov Linden: :)
  • [15:02] Pixel Gausman: bon weekend!
  • [15:02] Merov Linden: thanks all for coming and for your help on 1.3

Generated with SLog Wikifier