sldev-traffic no 4
sldev discussions through March 23, 2007
- 1 Intro
- 2 Build Errata on Source Download Page
- 3 Unofficial SVN
- 4 Help Prioritize JIRAs
- 5 Forking Big Discussions
- 6 Performance part 1: Profiling
- 7 Performance part 2: Multithreading
- 8 Performance part 3: Decoded Image Cache Reset on Teleport
- 9 Performance part 4: JPEG2000 Decoding and Image Caching
- 10 Performance part 5: Performance Degrading Over Time
- 11 Performance part 6: How Low Do You Go?
- 12 gcc4 Fix
- 13 XUL Editor
- 14 GPL'd Submissions
- 15 Unit Test Harness: Patch 4
- 16 QuickTime Replacement
The big discussions revolved around performance this week. I've tried to break what mostly happened in two hugely intertwined threads into component parts. Feel free to be extra picky and edit the page if you think I've obfuscated anything by breaking apart the topics.
I'm also pulling more from the IRC and Rob Lanphier's (Linden) in-world office hours than in the past. Look for more of this in the future. And please don't say anything in #opensl or at Rob's office that you don't want repeated without noting it in advance. :) --Soft Noel 14:33, 25 March 2007 (PDT)
Build Errata on Source Download Page
There's an additional column on the source download page for bugs that prevent building or running the viewer. If you run into a build issue, please get it into JIRA and add the bug in the new field. If you're stuck, check back to see if a bug has been added.
Dzonatas Sol announced a subversion repository for the viewer, including all existing public source drops. She added an intro to get source and compile and the repository is located here:
Rob Lanphier (Linden) noted that the official LL repository was in the works, and that at least initially it would follow a method much like the one Dzonatas was using, relying on the same source drops seen in the zips and tarballs. During his Monday office hours, Rob indicated that he knew the internal and external repository system was unusual, but hoped the hybrid approach would work until LL can sort out a better approach.
The LL repository will initially be hosted with wush.net, a professional svn hosting service. Rob indicated that he wanted to see if it's something capable enough that LL's devs would feel comfortable using it for day-to-day work.
Help Prioritize JIRAs
There are too many JIRAs for them all to be fully evaluated by LL with current resources. Please remember to vote on issues of importance. Who votes is every bit as important as the number of votes. If you're a productive voice on the mailing list, you can be a an especially productive voter as well.
Forking Big Discussions
Rob Lanphier (Linden) requested that discussions generating more than 5 replies in a day be redirected to the JIRA to make the list easier to follow. If a topic dies down after a couple days, it's okay to bring the discussion back to the list, starting with a summary of the JIRA discussion.
Topics not specifically related to Second Life development should be moved to the JIRA immediately.
There was a small amount of follow-up discussion on the SLDev Talk page. Dzonatas Sol seconded the suggestion while Iron Perth suggested that the wiki and JIRA were better used for documentation and bugs, but would be obstacles to discussion. Rob Linden suggested that forums could be used as an alternative target if appropriate.
Of note, there was an slight drop-off in list subscriptions in the past week as the texture discussion traffic increased, mentioned during Rob's Friday office hours.
Performance part 1: Profiling
Laurent Laborde (kerunix Flan) posted a hierarchical profile of the viewer running on his MacBook Pro and asked for some help in interpreting the output. Tielades offered a few notes. In summary: Unsurprisingly, a large amount of time is spent in OpenGL, and Tleiades noted that nVidia has a profiling tool that may shed extra light on where that time is going. There is a lot of dynamic memory freeing (presumably allocation too -- Soft) going on, and creating resource pools could help performance. LLOctreeTraveler::traverse may be a candidate for optimization -- it would be helpful to investigate how frequently render states change, and how many render batches are dispatched.
Laurent performed a subsequent profile, noting that it looked like QuickTime's MCIdle was being run more often than needed and quoted some material from the Apple developer documentation detailing ways of reducing the load here. Of particular interest may be the QTGetTimeUntilNextTask, which returns the number of milliseconds until an MCIdle call is actually needed again. There was no followup discussion on this point.
Laurent did another profile showing some curious STL behavior, but Dave Parks (Linden) noted that the behavior in question did not exist in release builds, only in debug.
- I confirmed my mistake. The behaviour do not exist in release builds. Kerunix Flan 20:41, 25 March 2007 (PDT)
Performance part 2: Multithreading
Skal Tura asked whether any work was happening on multithreading and asked why it was difficult to do. Tleiades gave a brief primer about the types of work that could and couldn't be parallelized, the difficulties in adding parallelization to an existing object without fine-grained resource locking, and noted cases where threading could hurt performance instead of helping. Tleidas also speculated that the CPU may not be the true bottleneck for the viewer, but the amount of GPU setup if render state is too-often changed, or tiny work packets are being submitted. Tleidas emphasized the importance of measuring and isolating bottlenecks before trying to optimize.
Zack Geers noted that the render pipeline is all in one thread, and other threads include disk IO and jpeg2000 decoding. (Note: You can see where all of the threads are kicked off at the top of main() in viewer.cpp; it's a little bit finer-grained than this -- Soft) Zack said he believed some thread-safety work was being done in advance of adding threading to the render pipeline.
Dave Parks (Linden) stepped in with an extensive post about the performance work being done. From the end:
In summary, the candidates for optimization are (in no particular order): - Put LLPipeline::renderGeom on its own thread - LLVertexBuffer::clientCopy - make it optimal and find the optimal location from which to call it - LLVolumeGeometryManager::rebuildGeom - build better batches, build them faster - LLOctreeTraveler::traverse - the faster the tree is, the better Honorable mention but slightly out of my domain for discussion: - Particle simulation (can't touch particle rendering, sorry). I ported the particle rendering to point sprites, but it turns out 90% of the particle systems out there won't port, so it was a wasted effort. - Avatar animation - I'm sure there's some low hanging fruit in the avatar animation system. In fact, one dev here claims it used to be a ton faster than it is, so there might be a one line bug in there slowing things down. - Flexible object updates - Ugh.
With all that said, Dave implored people to make stability issues a priority. Previously, Lindens have urged the importance of researching crashes in list members' own builds. Some crashes may have never yet been seen by a Linden with a debugger active. This is one area where list members can really help.
Performance part 3: Decoded Image Cache Reset on Teleport
In the multithreading thread, Skal Tura asked why all textures were apparently loaded again when teleporting to a new location and suggested that the slow rate at which the images were reloaded pointed to bad caching. Tateru Nino confirmed the observation by pointing to a point in code where the cache was effectively invalidated after teleporting. Steve (Linden) stepped in with clarification and details about the in-memory cache policy, and reminded list members to look at the First Look branch when doing their own research. He suggested that a good problem to tackle would be that of teleporting to a new location, waiting ~30s, then teleporting back. The system has dropped the in-memory decoded image cache by this point. He also noted that the focus cannot be only on memory, but also the cost of tracking many many in-memory textures.
Performance part 4: JPEG2000 Decoding and Image Caching
Laurent Laborde noted that the jpeg2000 decoding cost a lot of CPU time, and other list members confirmed the observation. There was discussion about whether the viewer used a service thread for jpeg2000 decoding, and several list members confirmed that it was, with Callum Lerwick and Dave Parks (Linden) noting that the improvement was new to the First Look branch. Dzonatas Sol noted that jpeg2000 was designed to incorporate threading though it didn't exist in the current implementation. She pointed to Openjpeg improvements for some work already begun on speeding up jpeg2000 decoding.
Callum Lerwick noted that the KDU jpeg2000 decoder had functionality enabling it to decode for a number of milliseconds and then return, functionality openjpeg lacks. Douglas Soo (Linden) noted that he had added this himself as the threading quantum was large on some OSes, and on systems with more active threads than cores, there would be marked choppiness without voluntarily relinquishing a time slice. Adding this functionality to openjpeg could help considerably on low-end systems if openjpeg is decoding images to completion and enjoying preferential scheduling owing to its irregular load.
From here, the thread bloomed into a very extensive discussion about how already-decoded images could be cached and whether they should be. It was noted that the textures are all of similar fixed sizes which could specify a specialized fragmentation-resistant cache layout, and that they all had unique IDs already, which made one proposed relational database or any kind of hashing unnecessary for their tracking.
As an alternative to decoded image caching, list members also explored the idea of caching images in an alternative file format, either as a permanent local replacement for jpeg2000, or as an interim format for lower resolution images while the full resolution images were still being decoded.
Discussion continued on the texture cache talk page.
Performance part 5: Performance Degrading Over Time
In the multithreading thread, Skal Tura noted that SL takes more and more memory over time, and runs perceptibly slower. No measurements were given. SLDev listmembers came down on both sides of whether this performance degradation existed. Skal Tura noted that he runs for ten hours or more at a time, and by this point there were visible pauses on a very high end machine.
Performance part 6: How Low Do You Go?
There was discussion about SL targeting very low end machines, and about whether it would be advantageous to raise the bar a bit. Laptops, low-end macs still on the market today, international (third-world) users and those on fixed budgets were all mentioned. Some list members also noted that they were sticking with earlier versions of Windows and wanted to avoid anything that might force them to upgrade.
Ismail Dönmez followed the First Look 188.8.131.52315 release with a minor patch for gcc4 support, quipping that someone at Linden Lab must hate gcc4. Jesse Nesbitt noted that getting these into the JIRA would help. Ismail obliged and Tofu Linden accepted and applied it:
https://jira.secondlife.com/browse/VWR-255 (no internal JIRA)
Vincent Capuano inquired about whether there was an approach to editing the XUL (user interface markup) XML files other than hand-hacking XML. John Hurliman said LL developers had previously stated that they edited by hand. Tinker LaFollette and Synthalor Mandelbrot suggested XML editors that might be a step above straight editing:
http://www.activestate.com/products/komodo_edit/ (all platforms)
This thread began in the week prior, but was still continuing when SLDev-Traffic 3 was written. It has since tapered off.
There was extensive about whether and how Linden Lab could accept patches with a GPL license attached. A prior message from Rob Linden was quoted wherein he thanked Alynna for a submission but noted that the viewer has GPL-incompatible components, and so the patch could not be incorporated. Further confusion seemed to stem from contributors not understanding dual licensing, wherein Linden Lab providing the viewer source to the public under the GPL doesn't necessarily mean LL will use the GPL license for their own builds, which include non-free libraries.
Discussion continued in-world during Rob Lanphier's (Linden) Wednesday office hours. Dzonatas Sol wanted guidance on whether she could contribute some GPLd code. Rob said the code could be used by Dzonatas, but not submitted to Linden Lab under GPL. Discussion continued about whether the code, owned by Dzonatas Sol, could be relicensed in a way compatible with LL's needs and Dzonatas' needs alike. Rob asked Dzonatas to lay out the issue in the JIRA and then forward the request to LL licensing.
On-list again, Jason Giglio listed the non-free libraries that would currently prevent distributing an all-free version of the client. These included:
APR *Cg FMOD Kakadu OpenSSL SmartHeap SpeedTree
Callum Lerwick detailed substitutions and work-arounds for most of the above, pointing to APR and OpenSSL as the most substantial blockers.
Argent Stonecutter noted that QuickTime was not on the list, but probably should be. While GPL allows linking to libraries distributed with the OS, QT is only distributed with OS X, not with Windows.
Unit Test Harness: Patch 4
Gaurav Sharma posted the fourth Adroit unit test harness. This week it properly appears in the list archives:
Linden Lab are investigating a replacement for QuickTime under Linux -- currently gstreamer, as mentioned during Rob Lanphier's (Linden) Friday office hours. One obstacle is that many .mov files are actually Sorenson video CODEC files, not mpeg4 H.264 CODEC files as many users believed they were. Current .mov is mostly H.264, but there may be other catches.