Difference between revisions of "User:Qie Niangao/Optimizing Parcels for Search"

From Second Life Wiki
Jump to navigation Jump to search
(New page: ''[This article is a compendium of information from [http://forums.secondlife.com/showthread.php?t=252152 a "sticky" posted on the Resident Answers Forum], started by Cristalle Karami with...)
 
m
Line 1: Line 1:
''[This article is a compendium of information from [http://forums.secondlife.com/showthread.php?t=252152 a "sticky" posted on the Resident Answers Forum], started by Cristalle Karami with generous contributions from other residents.]''
''This article is a compendium of information from [http://forums.secondlife.com/showthread.php?t=252152 a "sticky" posted on the Resident Answers Forum], started by Cristalle Karami with generous contributions from other residents.


----
----

Revision as of 08:47, 30 May 2008

This article is a compendium of information from a "sticky" posted on the Resident Answers Forum, started by Cristalle Karami with generous contributions from other residents.


Posted by Phil Deakins

The idea of creating hundreds of prims, all with different names and descriptions, to gain in the All search is limited. The webpage has a limit on the number of object details that it contains. E.g. mine has 148 objects listed, and there are a lot more there that are set for sale. 148 is a strange number to choose, but I have to assume that it's the limit. So not every object is listed, and taken account of in the ranking algorithm.

Somebody mentioned that the Title is an important ranking factor, and that's true. But it's nothing to do with object names/titles. It's the HTML page's Title which isn't seen by the user. SL repeats it at the top of the viewable page, adding to its ranking power because it's in an HTML heading tag, and it's the parcel's name.

Google puts very heavy weight on page Titles (that's an HTML tag, and isn't seen in a browser), and on inbound links (IBLs), to the extent that those 2 things are all that is needed to rank pages very highly in Google. In fact IBLs are all that is needed. I won't go into the technical reasons for it in this post - unless asked. A page has just one Title (in SL, it's the parcel name), and multiple IBLs (that's why IBLs have more ranking power than Titles).

In Google, page content isn't even looked at if they can get a large enough set of results from those 2 things.

The SL system will be a bit different. There are 4 basic ranking factors that I can see:-

  1. HTML page Title tag (parcel name)
  2. HTML page Description tag (parcel description) - (the Google engine ignores this)
  3. HTML page content (names and descriptions of objects on the parcel)
  4. IBLs (a simple count of IBLs)

SL will be able to adjust the weight given to each of the factors. They said that they would be tweaking it, and I'm sure that's what they mean.

SL mentioned things like word proximity (how close the words in the searchterm are on the webpage), and they may have adjustment knobs for things like that as well, but those are the 4 basic ranking factors.

Since the number of objects listed in the webpage for a parcel is limited, and since the heart of Google's concept is to put a great deal of weight on IBLs, effort is better spent by increasing the number of IBLs, although using objects shouldn't be overlooked, because it does get rankable content onto the webpage - up to a certain limit.


Posted by Phil Deakins

How it works, in a nutshell - I hope:

There are a number of ranking factors, and each of them can be given a different weight. For instance, a single IBL could count for more than the searchterm (the words you searched on) in the page, and it probably does. A score is calculated for each matching page, and they are listed according to that score.

A simplistic way of looking at the page score calculation is something like this. For every link that points to the page, add 1. For every instance of the searchterm in the page, add 0.75. If the full searchterm is in the page's Title, add 5. And then list the pages in score order.

That should give you an idea of what's going when a search query is being processed but, in reality, it's not as simplistic as that

This is Google technology, and Google weighs page Titles (the parcel name) heavily, so the chances are that SL does too, and it's likely to be a default weighting. Also, pages will match if they have only 1 of the words from the searchterm in them, and if they have more of the words but not continuous, and so on. The proximity of the words in the page is calculated and pages that have the exact phrase, for instance, will score better than those that don't.

In Google, the weight of a word or phrase is decreased with each instance of it in the page, until any more instances of it won't add anything to the score. I.e. the score for the page increases by a smaller and smaller amount with each instance of the word, until any more instances add nothing to the score. Imo, that's likely to be in the SL system, so adding hundreds of prims with the same phrase in them is likely to be a gross waste of prims. Adding some for various phrases - yes.

So the way to improve rankings is:

  1. Craft the parcel's name and description carefully, with the most important searchterm(s) in the name, and the most important one at the front, because that's what is put into the page's important Title tag.
  2. Get people to add the place to their Picks, and to LM it. That increases the IBLs, which probably weigh heavier than the words on the page, simply because they can be seen as votes for the place - that's something that Google's whole system is based on. I've tested the words in people's Picks, and they aren't used, so getting them to change the place name to suitable target phrases is no good. IBLs are merely counted, as was stated in the blog.
  3. Add various searchterms in the names and descriptions of the objects on the land, and make sure that the objects are set to show in search. That gets those phrases onto the page where they will improve the page's score, so add at least several instances of each phrase. Existing objects can have the names and descriptions modified for the purpose. It doesn't necessarily need a load of extra prims.

Keep on with 2 and 3 until you move up the rankings. Don't forget that there may be quite a gap before you catch the ones ahead of you, and they may be doing the same things too

The All search is updated every 12 hours, so it never takes long for changes to show any effect.


Posted by Phil Deakins

Word repetitions on the page: The way that Google stated how they treat them, it didn't sound like the number of repetitions, before any further instances are of no value, is anywhere near as low as 2. AlatVista used to count only 2 instances of a word on a page, but Google counts more - lowering the ranking value for each one.

Proximity: Proximity is how close together the words in the searchterm occur on the page. The closer the words are together, the better they will score. Perfect proximity is when the exact searchterm (phrase) is on the page, and we can ensure perfect proximity by naming and describing the parcel and its objects accordingly.

Prominence: The nearer the top of the page a word/phrase is, the higher its prominence, and the better it scores. In the case the the parcel name (as used in the page's Title tag), the nearer the front of the name, the higher its proximity is, and the better it scores.


Posted by Kitty Barnett

The/a "master page" (it looks like the one I found is the root of the crawler, but there could be more of them I guess) has 6 links:

  • agents
  • classifieds
  • events
  • groups
  • regions
  • traffic

(Which would suggest that places are only found through either avie picks, the region page or if it has enough traffic to warrant listing on the traffic pages)

Agents: this one is a bit odd. It has a major header for each day of the week, each followed by 30 links to pages full of avie names.

Classifieds: subcategorized into All and 13 "Top XXX" categories (would suggest that the top 10 paying classifieds used to get a total of 14 incoming links and so on, but all the end pages linked to don't exist anymore so it's essentially useless)

Events: a page with 4 links to all events happening on that day (at this time May 19th through May 22nd)

Groups: 256 links (based on the first two "letters" of their UUID)

Regions: 16 links (based on the first "letter" of their UUID)

Traffic: 12 links of "Top XXX" categories (I already found out about this a while ago so that was to be expected)