Difference between revisions of "Rolling restart"

From Second Life Wiki
Jump to navigation Jump to search
(Replaced content with "{{#Widget:Redirect|url=/t5/English-Knowledge-Base/Grid-status/ta-p/837021}}")
 
(4 intermediate revisions by 3 users not shown)
Line 1: Line 1:
{{KBmaster}}
{{#Widget:Redirect|url=/t5/English-Knowledge-Base/Grid-status/ta-p/837021}}
 
== What is a "rolling restart"? ==
 
The Second Life world has thousands upon thousands of Regions running on a great many servers. Due to the sheer logistics involved with managing them, not to mention all our Residents who'd prefer Second Life to have as much uptime as possible, it's unfeasible to restart every Region at the same time when an upgrade needs to be deployed.
 
[[File:Region_restart.png]]
 
Think of a rolling restart like a wave: it doesn't occur everywhere simultaneously, but travels from one place to another. As some servers are restarting, others have already been restarted several minutes ago, and are coming online shortly. Thus, only a portion of Second Life is down at any one time. [http://status.secondlifegrid.net/2009/10/26/post782/ For example]:
 
: ''As with all of our server deploys, each region will be restarted once during one of the rolling restart periods. Most regions should be down no more than 5-10 minutes, although some fraction of the regions will take 20-30 minutes to upgrade. If your region stays down for more than 30 minutes, please contact support. Each region will receive warnings starting 5 minutes before that region is restarted.''
 
Rolling restarts usually apply to ''all'' of Second Life, including Private Regions.
 
Specific details of how a rolling restart proceeds are sometimes announced on our [http://status.secondlifegrid.net/ Grid Status Reports], and more details are on the [http://forums.secondlife.com/forumdisplay.php?f=348 Server Deploys forum].
 
== Technical Details<ref>[https://wiki.secondlife.com/wiki/Beta_Server_Office_Hours/Minutes/2010-02-11 Lil Linden's Office Hour talk]</ref> ==
 
First, some definitions
 
Colo -- a colocation site which holds many racks of sims
<br>
Sim (or sim host) -- a computer (or server)
<br>
Simulator -- the binary that runs on a sim host
<br>
Regions -- run by a simulator
 
 
Here's the deploy process in a nutshell
<br>
1. New server code is prepared.
 
2. Deploy day rolls around.
 
3. The server code is compiled and the binaries are put into a tarball.
 
4. The tarball is put on the Asset System.
 
5. For each colo a bittorret tracker is started and the sim nodes start sucking down the tarball.  That usually takes less than 1 hour.
 
6. A command is sent out to have each sim unpack the tarball and get it ready for prime time.  That takes another hour or so per colo.
 
7. When it's time for the rolling restart, the grid is put into what's called "Startup Mode".  That means that when a sim goes down, the system won't try to bring the region up on another host.  This way regions only get restarted once.
 
8. While in startup mode, 200 sims are selected at a time and the following steps are taken:
<br>
8a. The simulator is shut down.
<br>
8b. The new binaries are installed.
<br>
8c. The simulator is restarted.
 
This is repeated until all (approximately) 6000 sims have been processed.
 
== References ==
<references />

Latest revision as of 07:07, 6 December 2011

Redirecting to http://community.secondlife.com/t5/English-Knowledge-Base/Grid-status/ta-p/837021