User:Which Linden/Office Hours/2007 Nov 8
< User:Which Linden/Office Hours
Jump to navigation
Jump to search
Revision as of 11:20, 10 November 2007 by Which Linden (talk | contribs) (Transcript of office hours Nov 8)
Transcript of Which Linden's office hours:
[11:07] | Phli Foxchase: | does the rolling restart begin ? |
[11:08] | Which Linden: | I think the rolling restart happens this afternoon |
[11:08] | Wyn Galbraith: | Hard to get here. |
[11:08] | Unable to create requested object. Please try again. | |
[11:08] | Which Linden: | Hey Wyn |
[11:08] | Rex Cronon: | i had no problem getting here |
[11:10] | Wyn Galbraith: | Darn my feet wings won't turn off. |
[11:10] | Phli Foxchase: | :) |
[11:10] | Phli Foxchase: | you're burning !! |
[11:10] | Phli Foxchase: | Hi Zero :) |
[11:10] | Morgaine Dinova: | I was "Unable to load landmark" for Which's place, oddly. The slurl you posted worked though, Which. |
[11:10] | Which Linden: | Good morning all. |
[11:11] | Phli Foxchase: | Two Lindens for the price of one |
[11:11] | Which Linden: | Whoa, It's Zero! |
[11:11] | Zero Linden: | I promise to keep quiet! |
[11:12] | Wyn Galbraith: | Wow, two Zeros in one day. |
[11:12] | Phli Foxchase: | the troubles come from the rolling retart ? |
[11:12] | Zero Linden: | Rex - the asset system is having some trouble - so the other sim couldn't get to it, but this sim could... |
[11:12] | Sabina Stenvaag thought there was always something weird going on in SL! | |
[11:12] | Zero Linden: | perhaps it was cached here |
[11:12] | Phli Foxchase: | restart* |
[11:13] | Which Linden: | I'm sorry, I'll be a little distracted for a bit -- I'm trying to get Chet in here |
[11:13] | Wyn Galbraith: | I had problems TPing here. |
[11:13] | JetZep Zabelin: | hi |
[11:13] | Which Linden: | Also my Mac is randomly freezing. WTF |
[11:13] | Morgaine Dinova: | Who would a restart invalidate a landmark that's been operating for months though? Coords are almost identical to Which's slurl |
[11:13] | Rex Cronon: | hi everybody |
[11:14] | Morgaine Dinova: | Hi Rex |
[11:14] | Phli Foxchase: | the update has been delayed three times |
[11:14] | Wyn Galbraith: | That means Where's the File, right Which? |
[11:14] | Which Linden: | Yes, where's the damn file that's messing up my mac?!!? |
[11:14] | Tillie Ariantho: | ao off |
[11:15] | Morgaine Dinova: | Sai had to roll back his Leopard to get Inkscape to work. |
[11:15] | Which Linden: | OK, well, does anyone have questions about my recent project update to sldev? |
[11:15] | Which Linden: | It might be too recent for any of that |
[11:16] | Rex Cronon: | zero, i just wish that i would be able to know what sims don't have problems with the asses server |
[11:16] | Zero Linden: | Rex - so would we! |
[11:16] | JetZep Zabelin: | =) |
[11:17] | Rex Cronon: | isn't there a way to test? |
[11:18] | Wyn Galbraith: | Lag test |
[11:18] | Tillie Ariantho doesnt start her Mac again before a Leopard patch... .) | |
[11:20] | Which Linden: | OK, well, let's try and get these office hours on the road? |
[11:20] | Which Linden: | ! |
[11:20] | Wyn Galbraith: | Just when I was going to sneak out to check the laundry. |
[11:20] | Squirrel Wood: | /ao off |
[11:20] | Wyn Galbraith is suppose to be doing domestic stuff. | |
[11:20] | Wyn Galbraith: | :P |
[11:20] | Squirrel Wood: | Yellow! |
[11:20] | Wyn Galbraith: | Green, Squirrel! |
[11:21] | Which Linden: | Ha ha. Well, the news from my project update is that a dude from IBM is going to be joining the Certified HTTP project |
[11:21] | Which Linden: | This would be Chet |
[11:21] | Which Linden: | He is cool |
[11:21] | Phli Foxchase: | cool |
[11:21] | Which Linden: | OK, I wasn't sure if that was going to be a big deal or not |
[11:22] | Which Linden: | Not a big deal from my perspective, obviously |
[11:22] | Phli Foxchase: | You're working to change the protocols ? |
[11:22] | JetZep Zabelin: | sorry its all over my head so it sounds like a great big deal to me ;) |
[11:22] | Morgaine Dinova: | I just read your post Which, welcome Chet :-) |
[11:23] | Which Linden: | He's logging in now |
[11:23] | Wyn Galbraith: | Can we give him noob nuggies? |
[11:23] | Phli Foxchase: | :) |
[11:23] | Which Linden: | Ha ha, uh, maybe |
[11:24] | Which Linden: | He's been following along from home with our chttp development |
[11:24] | Morgaine Dinova: | Well the standard MO for AWG newbies at Zha's in a dunk in the pond. That could work here too :-)) |
[11:25] | Which Linden: | Here he is. Seeping Blister everybody! |
[11:25] | Morgaine Dinova: | Ew, I home the av isn't in character with the name ;-) |
[11:26] | Which Linden: | Maybe we could get Flea Bussy to cook something appropriate up |
[11:26] | Morgaine Dinova: | Hi Seep :-) |
[11:26] | Seeping Blister: | onepresumes not, except on bad days |
[11:26] | Which Linden have offered friendship to Seeping Blister | |
[11:27] | Which Linden: | So, OK, the next thing we're gonna do for CHTTP is garbage collection and a MySQL persistence layer |
[11:27] | Morgaine Dinova: | Stir Fry deployed, ready for mission. |
[11:27] | Which Linden: | Ha welcome Morgaine |
[11:27] | Which Linden: | We currently have a "test" persistence layer that sucks |
[11:27] | Which Linden: | IT is really slow |
[11:27] | Wyn Galbraith: | Welcome newone. |
[11:28] | Tillie Ariantho: | Running on the wiki server, I assume. .) |
[11:29] | Which Linden: | The tests I just committed run 100000 permutations of crash/reload, and they're still running, two days later |
[11:29] | Which Linden: | So we're almost done with the first phase of certified http |
[11:29] | Which Linden: | CHTTP-1 |
[11:29] | Zero Linden: | what would the acceptable time be? |
[11:29] | Tillie Ariantho: | =) |
[11:30] | Which Linden: | Zero: minutes, ideally |
[11:30] | Morgaine Dinova: | Which: do you have a feel for the proportion of transactions that are likely to go through cthhp? If it's not a large proportion, then it probably doesn't matter from a scalability standpoint how the garbage collection is done. And the opposite would be true too, it could matter immensely. |
[11:30] | Which Linden: | It takes about a second for each iteration right now, that's probably way longer than it needs to be |
[11:30] | Zero Linden counts the zeros... | |
[11:30] | Which Linden: | Morgaine: well, um, all transactions should go through chttp in the end |
[11:30] | Tillie Ariantho: | hm... yes, that sounds pretty slow. |
[11:30] | Zero Linden divides.... | |
[11:30] | Morgaine Dinova: | Which: ouch |
[11:31] | Wyn Galbraith: | Overflow. |
[11:31] | Which Linden: | All L$ transactions |
[11:31] | Morgaine Dinova: | We'd better pay attention then ... :P |
[11:31] | Which Linden: | It's not that bad -- 1 second to transfer L$ aint that terrible |
[11:31] | Zero Linden: | 166 crash/restores a second would be impressive! |
[11:32] | Which Linden: | I think that the performance of the test is not indicative of the runtime performance |
[11:32] | Which Linden: | It's an upper bound maybe |
[11:32] | Which Linden: | But it would be nice to, you know, not have to schedule a run of the tests |
[11:33] | Morgaine Dinova: | GC isn't a problem as long as you have storage for oldish records. If you're constrained on storage then the dealloc points move down toward the alloc points, and you can get bad things happening for scalability. |
[11:33] | Which Linden: | To give you an idea of why I think GC is important, each CHTTP message takes up 2 oplogs, one on the client side, and one on the server. |
[11:34] | Which Linden: | I think our current L$ transaction rate is 6 per second |
[11:34] | Which Linden does the math | |
[11:35] | Which Linden: | That's 15.5 million oplogs a month |
[11:35] | Which Linden: | I guess that's not too many, actually |
[11:35] | Which Linden: | We could store that if each one was < 10k |
[11:35] | Morgaine Dinova: | Well our population scaling factor is only 200-1000 depending which figures you take (and that's without factoring in other reductions), so it's still only 6k alloc/dealloc a second. That's peanuts really. |
[11:35] | Which Linden: | And this is across all the agenrt hosts |
[11:36] | Which Linden: | So it's only 1 million per |
[11:36] | Zero Linden: | right - because as the number of agents increase, the number of agent stores increases (and presumably the number of escrow services) |
[11:37] | Morgaine Dinova: | Zero: escrow services are going to be statically allocated on region servers? |
[11:37] | Which Linden: | Then you sort of have the question: is keeping these records around slowing performance at all? |
[11:37] | Zero Linden: | so you could take the number of transactions on the grid today / the number of agent hosts today and have a rough rate |
[11:37] | Morgaine Dinova: | s/region/agent/ |
[11:37] | Zero Linden: | Dunno - which? |
[11:38] | Which Linden: | Morgaine: probably our initial implementation will have the excrow servers == the agent servers |
[11:38] | Morgaine Dinova: | Sounds like escrow services should be in their own pool, since we can't really second-guess what will happen. |
[11:38] | Which Linden: | But we'll be careful to abstract that so that we can do their own pool if necessary. |
[11:38] | Which Linden: | OH, yeah, the escrow |
[11:39] | Which Linden: | Is going to increase the number of chttp messages by a factor of..... 6 or so? |
[11:39] | Tillie Ariantho: | Think so. |
[11:40] | Which Linden: | So, 60 GB on each agent host |
[11:40] | Which Linden: | agent store |
[11:40] | Which Linden: | per month |
[11:40] | Which Linden: | So, yeah, we can probably do without GC |
[11:40] | Zero Linden: | but what is the expry? Isn't one day enough? |
[11:41] | Which Linden: | Right, well expiry *is* GC |
[11:41] | Morgaine Dinova: | As long as you use a dynamic pool (logical one) that sysadmins can pop another server into, should be fine. These numbers aren't all that bad, and very low in a web context. |
[11:41] | Which Linden: | But a lot of that information is redundant, so it just seems nicer to package that up in a log record and shuttle that off to a massive archive under a mountain somewhere |
[11:42] | Which Linden: | Seems cleaner somehow |
[11:42] | Zero Linden wonders if you can define programming langue with semantics of "expiry = GC" | |
[11:42] | Zero Linden: | "Oh - you assigned that variable 2 minutes ago... it's gone now!" |
[11:42] | Morgaine Dinova: | That's just the log though, for rollback and for roll-forward on error. The actual active records are in memory, otherwise performance will be dire. |
[11:43] | Tillie Ariantho: | true. |
[11:43] | Which Linden: | In the escrow (and CHTTP for that matter), there's a clear point where you can say "OK, we both agree on this transaction/message" |
[11:44] | Which Linden: | So after that point, why bother keeping the details of that txn/message? |
[11:45] | Which Linden: | (or rather, why bother keeping them live on the host?) |
[11:45] | Morgaine Dinova: | "Dire" aka you'll need more servers and your disk platters will run hotter. It's not a huge problem I think, but even from just a latency aspect, you really don't want active transaction records to go to disk, logging aside. (Loggin can be async) |
[11:46] | Tillie Ariantho: | Don't know, any of that information required for hunting down bugs, fraud etc.? |
[11:46] | Morgaine Dinova: | Yeah Tillie, but just the log |
[11:46] | Which Linden: | So, me might implement the GC so that it *doesn't* delete them but instead just marks them. |
[11:46] | Seeping Blister: | One thing I think would be really good, is if CHTTP were sufficiently robust and low-overhead, that one could -imagine- it being used -instead- of normal HTTP for most traffic. |
[11:46] | Which Linden: | Morgaine: agreed about going to disk |
[11:46] | Seeping Blister: | I'm not saying tht for something like SL's needs, tht would happen |
[11:47] | Morgaine Dinova: | Half mark, the other half when the logger saves it to disk platter (not just to disk buffers). |
[11:47] | Seeping Blister: | but I work with a (shall we say) (ahem) different sort of customer, and those guys, well, it'd be nice if a certain set of ailures just stopped being an issue for them at -all- |
[11:49] | Which Linden: | So, I think it's not in dispute that we want/need the step where the client DELETEs the server's X-Message-Url |
[11:49] | Morgaine Dinova: | Seeping: yeah, it's an interesting topic, we were talking about it in an early meeting here. Even the most noddy UI action that pokes REST could do with being immune to double clicks for example. But will it be efficient enough? |
[11:49] | Tillie Ariantho: | yes, sure not. Clients should not be able to delete stuff just like that. |
[11:50] | Morgaine Dinova: | As always, separate mechanism from policy :-) Let chttp be selectable by policy, don't hardwire it in :-) |
[11:50] | Which Linden: | Right, so what the server does when you DELETE teh X-Message-Url is policy-dependent |
[11:50] | Seeping Blister: | well, I certainly think it's a viable target. And that's partof why GCing completed tran-logs is something I'm eager to make sure is working wel. |
[11:51] | Morgaine Dinova: | Ooooh ... now that's an interesting question, Which! in other words, how to avoid needing to look up policy. LOL, good one :-) |
[11:51] | Morgaine Dinova: | CAPs neg I guess. |
[11:51] | Which Linden: | For the escrow servers, as you say, it might make sense that the tombstoning (i.e. the DELETE) is just a marking that says "done" |
[11:52] | Tillie Ariantho: | yup. |
[11:52] | Morgaine Dinova nods | |
[11:52] | Which Linden: | But if you're using CHTTP to transport images or large files, you want that tombstoning to mean "free the disk!" |
[11:52] | Seeping Blister: | yes, that is certainly a way of doing it. A largeexchange I've worked with, they don't bother to truncate their DB logs until th eend of each trading day. |
[11:52] | Seeping Blister: | after all, what's a little GB of disk, between friends. |
[11:52] | Which Linden: | ha ha |
[11:53] | Seeping Blister: | So the trans that are committed, are effectively tombstoned, adn nobody cares -- they just live there on disk until the end of the trading day. |
[11:53] | Morgaine Dinova: | Yeah, but that's disk space. I'm thinking about the active records in memory, because we don't want disk latency to appear in the transaction performance. |
[11:53] | Which Linden: | Ah, Morgaine, I see now |
[11:54] | Which Linden: | Well, we kinda have to flush to disk in case of crash |
[11:54] | Morgaine Dinova: | Yep |
[11:54] | Morgaine Dinova: | But that can be async. |
[11:54] | Tillie Ariantho: | Or for upgrade purposes / fixes ... |
[11:54] | Wyn Galbraith: | What's a few Gs between friends. |
[11:54] | Which Linden: | Right |
[11:54] | Tillie Ariantho: | some kind of suspend-to-disk :) |
[11:55] | Seeping Blister: | oh, you hvae to flush to disk at every syncpoint or commit point so that if you crash you can recover. |
[11:55] | Tillie Ariantho: | Hm, true, too. |
[11:55] | Seeping Blister: | if you have battery-backup (or flash backup) you can put your log there, but otherwise, there's no way to avoid flushing at every recovery point |
[11:55] | Which Linden: | I actually don't know how MySQL handles this for committed transactions |
[11:55] | Morgaine Dinova: | Seep: eek. better to buy an extra UPS for the escrow servers :-) |
[11:55] | Which Linden: | presumably it does "the right thing" |
[11:56] | Tillie Ariantho: | But hardware can fail, then UPS doesnt help anything. |
[11:56] | Seeping Blister: | heh. In tthe coming flash-memory storage-class-memory nirvana, no worires, you'll only have a few. |
[11:56] | Morgaine Dinova: | Tillie: replication across more than one disk |
[11:56] | Which Linden: | I think we can't completely insulate ourselves from all failures, we can only reduce the probability |
[11:56] | Tillie Ariantho: | No I mean without disk access ... |
[11:56] | Morgaine Dinova: | If you'e really paranoid, multiple controllers. |
[11:57] | Which Linden: | And multiple cpus in multiple cities.... |
[11:57] | Tillie Ariantho: | For databases you log the SQl statements to another filesystem where all the 'transactions' are played into another db... maybe need some 'copying' here, too. |
[11:57] | Which Linden: | Slaving |
[11:58] | Seeping Blister: | well, there's also the interesting issue of whether you want to be immune to machine-level failures -- whethre you want to replicate your tranlogs across more than one machine (or, shudder, datacenter) but I believe we (Ryan/Aaron) are assuming that that sort of question is out-of-scope for this particular effort. |
[11:58] | Tillie Ariantho: | hm |
[11:58] | Morgaine Dinova: | Tillie: I see your point .... but ultimately there can be no perfect solution if the position is that ANYTHING can fail. Just look at MTBF's and draw a line somewhere. |
[11:58] | Which Linden: | Yeah, we're assuming that our persistent store is magically awesome |
[11:58] | Which Linden: | We'll probably be confronted with harsh reality at some point |
[11:59] | Tillie Ariantho: | That's why we should think of it now. .P |
[11:59] | Which Linden: | Circling back to Zero's point about GC vs. expiry. I reverse my previous stance where I said they were the same |
[11:59] | Morgaine Dinova: | Google has a great article on disk MTBF's and related issues, about 9-12 months ago. |
[11:59] | Tillie Ariantho: | At least THINK about it so the architecture isn't expandable for that lateron .) |
[12:00] | Which Linden: | I'm open to suggestions, Tillie. I couldn't think of anything that wouldn't add horrible complexity |
[12:00] | Which Linden: | Horrible complexity == never gonna get finished |
[12:00] | Morgaine Dinova: | Yep |
[12:01] | Morgaine Dinova: | Google's failure analysis may be directly relevant here, since both use commodity PCs |
[12:01] | Which Linden: | So, I think that that assumption is reasonable |
[12:02] | Which Linden: | the assumption being that we will have robust databases |
[12:02] | Which Linden: | Such things already exist |
[12:02] | Tillie Ariantho: | Saving the data to disk on the running server might be a bad idea for performance ... |
[12:03] | Seeping Blister: | One way of thinking of the resilience issue for tran-logs, is tht you already have the problem,and you're already solving it. |
[12:03] | Seeping Blister: | for agent databases. |
[12:03] | Which Linden: | Well, it's bad for latency, but the throughput can be reasonable |
[12:03] | Seeping Blister: | The logs are really recovery information ofr in-progress trans against agent-dbs. |
[12:03] | Seeping Blister: | so they don't need to be any more resilient than the agent-dbs themselves. |
[12:03] | Which Linden: | True. in some cases they can be the same as teh agent dbs |
[12:03] | Seeping Blister: | or, better, they can be mde exactly as resilient as the agent-dbs. |
[12:03] | Morgaine Dinova: | So, we have separate policies for transaction expiry and for cleanup/GC. Sounds clean. |
[12:04] | Which Linden: | Right, so expiry means "couldn't finish" |
[12:04] | Which Linden: | GC means "finished" |
[12:04] | Seeping Blister: | eXXactly |
[12:04] | Which Linden: | So we should send an email upon expiration |
[12:04] | Tillie Ariantho: | So what about 3 servers .... one in front, more or less a proxy of already finished requests, and two behind, who both get all the requests that are not completely handled yet. Both identical except that one responds back to proxy which responds back to the client, and one that does not... that one running only for 'backup purposes' it does 'saves to disk' instead of 'response back'? |
[12:04] | Which Linden: | Actually we don't really have a good story for expiry in our current implementation, we gotta add that |
[12:05] | Morgaine Dinova: | Story? You're using Fit storyboarding for test-driven dev? :-)))) |
[12:05] | Which Linden: | We're using JIRA. :-) |
[12:05] | Morgaine Dinova: | Ah, hehe |
[12:06] | Which Linden: | Tillie: having trouble parsing that |
[12:06] | Tillie Ariantho: | fit storyboarding? ,) |
[12:06] | Tillie Ariantho: | Which: huh? ,) |
[12:06] | Tillie Ariantho: | TOO MANY WORDS ERROR? ,) |
[12:06] | Which Linden: | You're proposing a replication scheme, I presume? |
[12:06] | Seeping Blister: | hey, guys, gotta run, it's been a pleasure to chat, seeya lonline later. |
[12:06] | Tillie Ariantho: | yep |
[12:06] | Which Linden: | That avoids the latency of syncing to disk? |
[12:07] | Wyn Galbraith: | Bye Seeping. |
[12:07] | Rex Cronon: | bye seeping |
[12:07] | Which Linden: | Seeya seeping |
[12:07] | Wyn Galbraith: | Such a lovely image, Seeping Blister |
[12:07] | Morgaine Dinova: | Tillie: the expiry and GC policies were revers-timed there, and Which's GC deleted the record before it was parsed :-))) |
[12:07] | Tillie Ariantho: | hum |
[12:07] | Which Linden: | I should warap this up but I just want o finish discussing this replication scheme here |
[12:08] | Zero Linden: | gotta run - see ya'll - thanks Which for holding these! |
[12:08] | Rex Cronon: | bye zero |
[12:08] | Wyn Galbraith: | Bye Zero. |
[12:08] | Which Linden: | Thanks for coming Zero! |
[12:08] | Morgaine Dinova: | See you Zoro! |
[12:08] | Morgaine Dinova: | Oops, Zero :-) |
[12:08] | Tillie Ariantho: | Yes, we should consult some database design manuals for that and then talk about it again. .) |
[12:09] | Which Linden: | OK! I still think that it's impossible to avoid waiting for the sync, just on theoretical grounds |
[12:09] | Which Linden: | But it's certainly possible to make that sync be quick |
[12:09] | Which Linden: | Anywya, feel free to bother me later or bring it up in office hours next week |
[12:09] | Tillie Ariantho: | How do databases prevent data loss? Especially the ones with lots of transactions, without losing much perfomance by writing the data to disk...? |
[12:09] | Morgaine Dinova: | Well I've been involved in Quest db front-end replication (to give an ISP some backend scalability), but dabases aren't really my forte. |
[12:10] | Morgaine Dinova: | Too hard for me ;-) |
[12:10] | Entering god mode, level 200 | |
[12:10] | Tillie Ariantho: | Maybe we need someone from oracle... or maybe some IBM guy who worked on DB2? .P |
[12:10] | Rex Cronon: | u use RAID? |
[12:10] | Morgaine Dinova: | Rex: front-end |
[12:11] | Tillie Ariantho: | I don't know which kind of hardware the Lindens want to use for that. .) |
[12:12] | Tillie Ariantho: | On companies like the one I am working for would use SANs nor NASs connected via er.. fiber channel? .) |
[12:12] | Tillie Ariantho: | In |
[12:13] | Tillie Ariantho: | Might be too expensive for LL, maybe. |
[12:13] | Morgaine Dinova: | Well what scares me is that, while SL's d/b requirements are to a large extent just storage (and hence noddy), once VW search engines get in on the act, both the complexity of queries and the sheer rate it going to go astronomic. That's going to be painful. |
[12:14] | Tillie Ariantho: | hum |
[12:14] | Which Linden: | Hopefully we'll have web eservices to ease the pain |
[12:15] | Morgaine Dinova: | They only scale the front-end access mechanism though. It's the backend that worries me. |
[12:15] | Which Linden: | Argh where'd my response go? |
[12:15] | Tillie Ariantho: | I still think some replication is required. |
[12:16] | Which Linden: | Anyhow, my office hours are over, in case someone was waiting for the signal |
[12:16] | Which Linden: | Just hanging out now |
[12:16] | Morgaine Dinova: | Thanks Which. Lot of food for thought. |
[12:16] | Which Linden: | Morgaine: yeah, but if there's a search engine doing all sorts of crawling/indexing, then it hits the cached front-end and doesn't add extra backend load |
[12:17] | Which Linden: | And plus its index then prevents others from adding even more back-end load |
[12:17] | Wyn Galbraith has to run. | |
[12:17] | Which Linden: | See ya Wyn! |
[12:17] | Which Linden: | Thanks so much for coming! |