Simulator User Group/Transcripts/2012.05.08
From Second Life Wiki
|Prev 2012.05.04||Next 2012.05.11|
List of Speakers
|Andrew Linden||Ash Qin||Cytrax|
|Draconis Neurocam||DrFran Babcock||FadeOut Razorfen|
|Jonathan Yap||Kallista Destiny||Kelly Linden|
|Motor Loon||Nalates Urriah||NeoBokrug Elytis|
|Qie Niangao||Rex Cronon||Sahkolihaa Contepomi|
|Simon Linden||SnowLeopard Pearl||Sopherian Yumako|
|Squirrel Wood||TankMaster Finesmith||TheBlack Box|
[12:01] TankMaster Finesmith: same here in the pacific NW
[12:01] Motor Loon: crappy in Denmark... as usual...
[12:01] Kallista Destiny: Same in SoCal
[12:01] Kallista Destiny: Good that is
[12:02] Simon Linden: It's nice up in northern CA too
[12:03] Andrew Linden: I don't have any news. I'm just working on misc bugs for pathfinding.
[12:03] Andrew Linden: Perhaps Simon or Kelly have news,.
[12:03] Simon Linden: THere was a server update this morning
[12:03] Rex Cronon: greertings everybody
[12:03] Simon Linden: Details on the release are in the forum: http://community.secondlife.com/t5/Second-Life-Server/Deploys-for-the-week-of-2012-05-07/td-p/1524173
[12:04] Sopherian Yumako: hai.
[12:04] Simon Linden: The RC channels will get updated tomorrow
[12:04] Cytrax: its nice that llGetRegionAgents() is now on the main channel
[12:04] Kelly Linden: DRTSIM-141 got promoted w/ llGetAgentList and PRIM_SLICE and HTTP_BODY_MAXLENGTH
[12:04] Rex Cronon: hi
[12:04] Motor Loon: yep, real nice stuff
[12:05] Kelly Linden: llGetAgentList has a bug in it if attached: SCR-311 which will be fixed in a future release.
[12:05] JIRA-helper: http://jira.secondlife.com/browse/SCR-311
[#SCR-311] llGetAgentList() with scope AGENT_LIST_PARCEL or AGENT_LIST_PARCEL_OWNER returns empty list when attached to avatar
[12:06] Rex Cronon: that is when is very usefull:(
[12:06] Motor Loon: no known bugs when used in a "regular" object right?
[12:06] Rex Cronon: .
[12:06] Kelly Linden: Correct.
[12:06] Motor Loon: Alrighty
[12:07] Andrew Linden: Ok, I guess the table is open.
[12:07] Motor Loon: or... the grill is
[12:07] Sopherian Yumako: haha
[12:08] Draconis Neurocam: is there some technical reason why llGetParcelPrimOwners has a max of 100 strides, i know places where over 100 different people own prims
[12:08] Motor Loon: barbecue wolf
[12:08] Kallista Destiny: Ummm Wolfgang, rare please
[12:08] DrFran Babcock: he looks well done!
[12:08] Object: Hello, Avatar!
[12:08] Simon Linden: That sounds like a list or script size limit, Draconis
[12:08] Sopherian Yumako: he looks kinda burned to me
[12:08] Sopherian Yumako: lol
[12:09] Draconis Neurocam: i can fit over 200 unique keys in a mono script, could possibly be an lsl one
[12:10] Andrew Linden: I'm guessing whoever wrote llGetParcelPrimOwners() figured... 100 seems like a big enough limit
[12:10] Motor Loon: I've seen sims lately having large issues getting connection back with attached sims. Like a region crashes, comes back up... but then you cant see it from other sims - nor fly over the simcrossing... but you can TP direct to it with a landmark..
[12:10] TheBlack Box: ready wolfgang ?
[12:11] Simon Linden: Motor - so they are both running, but not visible from the other?
[12:11] Vincent Nacon: too much fire
[12:11] Simon Linden: too much lighter fluid
[12:11] Vincent Nacon: muhaha! but hang on, I got the fire for it
[12:11] Motor Loon: yes simon
[12:11] Motor Loon: I had Motorworld do it today... took a ticket to get it working normal again
[12:12] Sahkolihaa Contepomi: Syncing issues I've seen as well.
[12:12] Motor Loon: but we've normally only seen it happed on rare occasions... but these days it seems to be almost every time there's a simcrash
[12:12] Motor Loon: yeah sync'ing
[12:12] Sahkolihaa Contepomi: They seem to be cropping up far more again.
[12:12] Vincent Nacon: ahh too big
[12:12] Andrew Linden: Motor Loon, how long were the neighboring regions not connecting to each other before they were fixed? minutes? hours? or days?
[12:13] Motor Loon: today hours...
[12:14] NeoBokrug Elytis: Yeah, at least an hour.
[12:15] Andrew Linden: Hrm... Motorworld has one neighbor to the north: Death Valley
[12:15] Motor Loon: yes
[12:15] Simon Linden: Are other people seeing this too on other regions, or is it just that one spot?
[12:15] NeoBokrug Elytis: It happened all over the Wastelands today
[12:15] Cytrax: i have seen it before myself
[12:15] Motor Loon: I tried restarting Motorworld again... and also tried restarting Death Valley... neither helped on the situation
[12:15] Motor Loon: so we did a ticket... and later it got sorted out
[12:15] Draconis Neurocam: ive seen sims with random net time spikes recently, but it was not connected to anything, so ive no idea if it is related
[12:16] Andrew Linden: What time was that event Motor?
[12:16] NeoBokrug Elytis: It just took time for it to come back. Some regions you could see into, some you couldn't. It was like a one-way mirror.
[12:16] Motor Loon: hm.. well, my ticket was made 2012-05-08 04:27:42
[12:17] Andrew Linden: Oh, that early, huh?
[12:17] Motor Loon: I haven't seen it happen during "normal" restarts... it's usually when a sim has crashed in a griefing attack or something similar
[12:17] Andrew Linden: That was before the rolling restart.
[12:18] Motor Loon: yes or around it
[12:18] Motor Loon: It was an attack today, I know because I was the one cleaning up all our other sims afterwards
[12:19] Simon Linden: Do you know how they attacked?
[12:19] Andrew Linden: Yeah, I see several crashes in a row for Motorworld, starting around 04:49
[12:19] Andrew Linden: that was the attack?
[12:20] Motor Loon: The usual simcrasher arsenal I guess... hard to say for us... once they launch them the sim goes down, and by the time it comes back the stuff is gone.
[12:20] Motor Loon: I've also seen alot of repeat-crashing... you know... where it tried to start back up and immediately goes down again and keeps doing that until your simguardian rolls it back.. or whatever it does
[12:20] Rex Cronon: possibly temp replicators?
[12:21] Motor Loon: I'd guess physics bombs or something like that
[12:21] Andrew Linden: Well, there is a problem where the "region presence" data is cached for a while (30 min? an hour?) such that
[12:21] Sahkolihaa Contepomi: A couple of sims I help maintain (FN ones) have simply crashed several times during the weekend with no reason behind them.
[12:21] Motor Loon: on the other sims it was just the usual petty stuff... particle spammers, self replicators and noisemakers... thats easy to deal with
[12:21] Rex Cronon: physics by themselves have a very hard time crashing a sim
[12:22] Andrew Linden: if a region comes up and asks for the presence info of its neighbors, and they are down, it can cache that state for that cache period
[12:22] Andrew Linden: even if they come up a few minutes later
[12:22] Vincent Nacon: maybe there was some physic-related with some state changes repeatly?
[12:23] Andrew Linden: however, that is mostly a problem for regions that have been down for days, or never up, and suddenly are added to the world, and their neighbors take a while to connect because they have cached data for their list of neighbors
[12:23] Motor Loon: yeah thats not the case here
[12:24] Motor Loon: Do you have the ability to monitor a sim thats being crashed alot? - to find out how they do it I mean?
[12:24] Kallista Destiny: I would think that whne the second region came it, it would ask t's neighbor who would get the hint. "Hey someone's there."
[12:24] Motor Loon: Motorworld usually gets hit once a day... some days even more
[12:24] Kallista Destiny: s/it/up/
[12:25] Simon Linden: We can get lots of log info, Motor
[12:25] Rex Cronon: does motorworld allow rezzing?
[12:25] Motor Loon: Would be nice to see some of these "weapons" taken out... the little stuff is easy to deal with ourself
[12:25] Motor Loon: yes
[12:25] Motor Loon: rex
[12:26] Motor Loon: Motorworld has free rezzing, scripts and object entry allowed. With autoreturn on.
[12:26] Motor Loon: Like any of our public sims.
[12:26] Vincent Nacon: or sandbox
[12:27] Vincent Nacon: if I may ask, what was the issue yesterday's down time?
[12:28] Andrew Linden: Yeah, Kallista, I've been thinking about trying to fix that neighbor awareness problem. It bites me sometimes when I'm bringing up some test regions that are neighbors.
[12:28] Simon Linden: I never heard the cause of that one
[12:28] Vincent Nacon: could be hardwares?
[12:28] Andrew Linden: Simon, the duration of the cache is unknown. I'm not even sure where it is being cached... local squid or some other place.
[12:29] FadeOut Razorfen: not just sandboxes need rez you know
[12:29] Simon Linden: oh, so it's something in the data path between the 2 regions?
[12:30] Kallista Destiny: I'm sure it does
[12:30] Meeter: Timecheck : User Group is half over
[12:30] Simon Linden: I wonder if we could tweak some headers and make that behave a bit better
[12:30] Andrew Linden: No, the region asks the "region presence server" via http for its neighbors. It may go through squid to get it.
[12:31] Motor Loon: Would certainly be nice if you could fix or improve on it... it's bad enough the sim goes down, but having to deal with problems for hours afterwards is a pain
[12:31] Simon Linden: oh right, that makes sense
[12:32] Andrew Linden: Motor, the problem started before the crashes? The timestamp of your ticket was in Pacific time, right?
[12:32] Kallista Destiny: whne it knows it's neighbors does it contact them directly saying "let the bells ring and the chimes chime, I'm here, I'm here" <- bets Jim Bacus voice
[12:33] TankMaster Finesmith: maybe its getting a message the neighboring region is down/unavailable, but when the neighboring region is back up, it doesn't get the message so it doesn't know until its next scheduled update
[12:33] Motor Loon: no I heard there was an attack going on, and when I went to the sims Motorworld was already crashed... so I cleaned up the others and waiting for Motorworld to come back...
[12:33] Andrew Linden: yes Kallista, however if a region doesn't know about its neighbor yet (via region presence) then it refuses to connect.
[12:34] Andrew Linden: It isn't as simple as just accepting all connecting neighbors blindly, because that can caus problems if there are "dupes"
[12:34] Motor Loon: Our sensor system showed the sim being online, but I couldn't access it nor see it from Death Valley.
[12:34] Andrew Linden: where there happen to be two simulator processes running the same region
[12:34] Andrew Linden: which can happen on a very big grid sometimes
[12:34] Kallista Destiny: Ahhhh.... i see the problem, It should go ask again then.
[12:34] Andrew Linden: a simhost might momentarily fall off the net, for some network glitch
[12:35] Andrew Linden: if the time is long enough then it will get replaced
[12:35] Andrew Linden: and if it suddenly comes back... then you've got two of them
[12:35] Vincent Nacon: Reboot!
[12:35] Andrew Linden: anyway, things get complicated when you're managing a big grid
[12:36] Motor Loon: .....For the next 3 nights expect issues from this...
[Posted 9:45am PDT, 8 May 2012] We will be performing scheduled maintenance today, Wednesday and Thursday of this week (May 8, May 9, May 10), beginning at 6:00pm PDT each day. Each maintenance is scheduled to last around 8 hours.
[12:36] NeoBokrug Elytis: Hahah wow.
[12:36] Motor Loon: oh joy
[12:36] Vincent Nacon: yeah
[12:36] Sopherian Yumako: yea
[12:36] Qie Niangao: yeah, was hoping to hear more about that
[12:36] Sopherian Yumako: me too
[12:36] Vincent Nacon: 8 hours of joy! 3 day a row
[12:36] Sopherian Yumako: what are the problems going on right now that they have to do this everyday 8 hours?
[12:37] Motor Loon: Guess I'll be getting my house cleaned up and the dishes done then
[12:37] Vincent Nacon: too bad it wasn't on the weekend, people would have loved it
[12:37] Sopherian Yumako: haha oh yea
[12:37] Andrew Linden: I think part of that is an operating system upgrade on some hosts, not network level maintenance, but I'm not sure
[12:37] Simon Linden: I think that's for some servers to get re-imaged with a more updated version of the OS
[12:37] Vincent Nacon: ah figured
[12:38] Simon Linden: If it goes smoothly, all it means is some restarts of regions.
[12:38] Andrew Linden: we're definitely working on migrating to later versions of debian, but there will be a few upgrades along the way before we arrive at debian/squeeze
[12:38] Simon Linden: They get shutdown, start up on another server, then the now-empty servers can be re-imaged and brought back online
[12:39] TankMaster Finesmith: fun times
[12:39] Simon Linden: That's repeated for every server that needs it ... usually in batches
[12:39] Sopherian Yumako: is this also the reasons why logins are getting disabled lately for some time?
[12:39] Simon Linden: No, that's unrelated to logins
[12:40] Simon Linden: Yesterday they were stopped when something broke ... I don't know the details on what exactly happened
[12:40] Vincent Nacon: ah
[12:40] Sopherian Yumako: oke thank you
[12:40] Vincent Nacon: don't buy cheap parts then
[12:40] Vincent Nacon: muhaha!
[12:40] Sopherian Yumako: lol
[12:41] TankMaster Finesmith: SSDs are not cheap...
[12:41] Motor Loon: LL aint using SSD's are they?
[12:41] Andrew Linden: Yeah, we don't mix re-imaging the login servers with the simhost servers. We're only working on one group of servers at a time for OS upgrades.
[12:41] TankMaster Finesmith: for soem things, yes, like the asset server
[12:41] Qie Niangao: Thanks for that info. That sounds less scary.
[12:41] Andrew Linden: Not yet Motor.
[12:42] Andrew Linden: I think we benchmarked some hardware that had SSD's,.
[12:42] Andrew Linden: Dunno what the results were.
[12:42] Motor Loon: "fast as hell" likely
[12:42] Motor Loon: but price reflects it ofcourse
[12:43] Vincent Nacon: actually, it'll be fast as CPU allow...
[12:43] Motor Loon: [12:39] SnowLeopard Pearl: no motor ive filed over 90 offline and sim crashed reports in the past 30 hours
[12:40] SnowLeopard Pearl: half of regents sims at 5am estern time -3 hours on Second Life time were stuck hung or crashed
[12:44] Ash Qin: I would imagine it depends also on whether there is much I/O, if you're only reading/writing to a few select files - It's likely going to be cached in memory anyway and you're not going to see significant improvements.
[12:44] Simon Linden: 5am eastern is 2am SL time, so that was well before this morning's server rollout
[12:44] Motor Loon nods
[12:45] Ash Qin: Assuming the benchmarks is running load testing programs against the software stack on that hardware.
[12:45] Simon Linden: Our servers do a lot of I/O for lots of files, fwiw
[12:45] Squirrel Wood: then they will benefit from ssds
[12:45] Squirrel Wood: especially if its compressible data
[12:45] Motor Loon: well, SSD mostly READS fast dont they? no so much writing
[12:45] Ash Qin: I would be concerned about reliability when it comes to SSDs.
[12:46] Squirrel Wood: 500/500 r/w :p
[12:46] Squirrel Wood: sata 6G
[12:46] Squirrel Wood: or PCIe.
[12:46] Vincent Nacon: sata for raid
[12:46] Squirrel Wood: reliability: much better than hdds :p
[12:46] NeoBokrug Elytis: Just put it all in ram, but all the ram ever.
[12:46] Jonathan Yap: Writing is a lot faster to a SSD, too
[12:46] Vincent Nacon: PCIe for singles
[12:46] TankMaster Finesmith: http://vr-zone.com/articles/seagate-is-readying-pulsar.2-ssd-with-12gbps-sas-interface/15801.html
[12:46] Squirrel Wood: intel ssds (520 series) give 5 years warranty
[12:46] TankMaster Finesmith: theres a new higher speed SAS port coming out
[12:47] Ash Qin: Squirrel, my past RAID with SSDs failed multiple times in a few months when running an intensive mysql server on it
[12:47] NeoBokrug Elytis: *buy
[12:47] Vincent Nacon: who's gonna pay for it, Squrriel?
[12:47] Jonathan Yap: You can find excellent SSD reviews here http://www.anandtech.com/
[12:47] Squirrel Wood: ash, were you by chance using OCZ ssds?
[12:47] Ash Qin: Admittedly, they were all from a single brand, so not representative of all SSDs.
[12:47] Ash Qin: No.
[12:47] Simon Linden: If we start using SSDs, it won't be putting new disks into old hardware ... it would be fully new rack servers. So they will have updated CPUs, more RAM, etc ... it's all a matter of budget at some point
[12:48] Motor Loon: ofcourse
[12:48] Andrew Linden: Finally the disk read/write speeds are speeding up. Those speeds didn't kept up with CPU clock speedups
[12:48] Simon Linden: They will be faster than what we have now, in any case
[12:48] TankMaster Finesmith: intel has the lowest failer rate be a good margin over the other brands
[12:48] Vincent Nacon: yeah
[12:48] Andrew Linden: meanwhile CPU speeds have sorta flattened out and the disk speeds are increasing
[12:48] Squirrel Wood: http://www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-520-series.html for reference.
[12:49] Squirrel Wood: put one of these in an old laptop at work. bootup times from turn on to prodictivity went from 20 mintues to 2.5 minutes.
[12:49] Andrew Linden: hrm... I just asked an operations engineer. We'll be testing some hardware with SSD's.
[12:49] TankMaster Finesmith: nice
[12:49] Andrew Linden: Probably after we get to debian/squeeze
[12:49] TankMaster Finesmith: should make region restarts faster
[12:50] Vincent Nacon: what's the typical HDD size on sim servers?
[12:50] Vincent Nacon: err... actually what's the typcial HDD size does it really need?
[12:50] Nalates Urriah: Any itea how big the asset database is?
[12:51] Vincent Nacon: I don't mean asset database though
[12:51] TankMaster Finesmith: more than a few MB, nal :P
[12:51] Squirrel Wood: fact is, bigger ssds perform better as they write to multiple channels.
[12:51] Andrew Linden: the drives are actually much bigger than we need... they range form 80GB to 250GB, I think.
[12:51] Nalates Urriah P I want to get a copy on a USB stick...
[12:51] Andrew Linden: larger drives wouldn't help us, but faster I/O would, yes.
[12:51] Vincent Nacon: but they need what?
[12:51] Vincent Nacon: 4 or 5 gb?
[12:52] TankMaster Finesmith: larger drivers would have the dual advantage of longer ware time and higher wright speeds
[12:52] Andrew Linden: they probably need 40 GB or so, mostly from crash logs and data cache
[12:52] Vincent Nacon: ahh ok
[12:53] Vincent Nacon: maybe market for 40 or 80 SDD models?
[12:53] Vincent Nacon: SSD*
[12:53] Squirrel Wood: intel has 60GB ones.
[12:53] Vincent Nacon: that'll do
[12:53] Kallista Destiny: 50% margin
[12:53] Squirrel Wood: but ya. larger ones increase lifetime as the cells get written to less often
[12:54] Jonathan Yap: Does linux support TRIM?
[12:54] Andrew Linden: Hrm.. dunno the size of our asset system
[12:54] Andrew Linden: but we did have a successful garbage collection pass of our asset system
[12:54] Andrew Linden: and the final pile was about 15% of the size of what we started with
[12:54] Squirrel Wood: ssds for data centers: http://www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-710-series.html
[12:54] Draconis Neurocam: couple years ago it was 7 terabytes and doubling yearly, but i don't know if that has changed
[12:55] Nalates Urriah: How long sis that take?
[12:55] Meeter: Timecheck : User Group is almost over
[12:55] Ash Qin: Is the lab also going to be testing hybrid SSD systems?
[12:55] TankMaster Finesmith: i havent really seen any advantages to hybrid drives
[12:55] Nalates Urriah: sis=did
[12:55] TankMaster Finesmith: (and yes, ive used them)
[12:55] Andrew Linden: the garbage collection took years to get right, it was started and stopped a few times as bugs were discovered
[12:55] Andrew Linden: then we punted the problem for a while (postponed indefinitely)
[12:56] Andrew Linden: and got back to it
[12:56] Vincent Nacon: HDD for less used asset and SSD for high traffic items?
[12:56] Nalates Urriah: Whoa... no small task
[12:56] Andrew Linden: in the end, I think the actual scan took... a couple weeks
[12:56] Andrew Linden: maybe a month
[12:56] Simon Linden: That'd be pretty hard to sort, Vincent, but I can imagine putting some stuff into one device and OS and apps in another
[12:57] Andrew Linden: much of the final work was testing scans to see if we thought it would work
[12:58] Simon Linden: There's actually been a big garbage collection project going on recently ... it was getting close to completing soon, I think
[12:58] Rex Cronon: why did it take weeks? u could have done it in parallel, in a few hours
[12:58] Andrew Linden: I asked for a size estimate of our post-garbage-collected asset system, but my contact appears to be AFK
[12:58] Ash Qin: hybrid ssd drives would be capable of doing that technically, since they cache the regularly accessed/modified data and data that irregularly accessed is on the hdd.
[12:58] Andrew Linden: Simon it did complete.
[12:58] Squirrel Wood: I just hope you recycle the unused electrons ^^
[12:59] Ash Qin: Also capable of far larger disk sizes than SSD currently.
[12:59] Andrew Linden: last I heard (two weeks ago) the scan was complete and the garbage pile had gone for 100 hours without a single asset request
[12:59] Ash Qin: Wow.
[12:59] Simon Linden: That's great
[13:00] Andrew Linden: Anyway, the asset system used to be a very scary monster that we worried about a lot, but we think we know how to handle it these days.
[13:00] Nalates Urriah: Pretty awesome
[13:00] Meeter: Thank you for coming to the Server User Group
[13:00] Squirrel Wood: ossm :)
[13:00] Motor Loon: Just feed it lots of raw meat and we're all safe
[13:01] Nalates Urriah: Thanks Andrew ... and Simon
[13:01] Andrew Linden: Yeah, I remember a time... 2007 and 2008, where we were pretty worried about how to scale the asset system.
[13:01] TankMaster Finesmith: thx for your time, simone and andrew
[13:01] Andrew Linden: Cheers. Thanks for coming.
[13:01] Ash Qin: Thanks!
[13:01] Kallista Destiny: Thank you for a most interesting meeting
[13:02] Rex Cronon: tc adrew, simon, kelly and everybody
[13:02] Qie Niangao: Thanks... have fun all.
[13:02] Rex Cronon: have a nice day all:)
[13:02] Sopherian Yumako: Take care everybody
[13:02] Rex Cronon: can't tp:*
[13:03] Rex Cronon: :(
[13:03] Simon Linden: Thanks everyone for coming today
|Prev 2012.05.04||Next 2012.05.11|