Archive

Archive for the ‘Search Engine Optimization (SEO)’ Category

How Can Lower Page Rank Web Pages Achieve Better Search Engine Positions?

June 1st, 2009 No comments

I have been using dawjee.com to analyse search terms for a while now – toying with keyword suggestions and looking for interesting patterns or anomalies in the result data (e.g. pages with query string data often report an erroneous Google Page Rank of zero).

After discussing some findings with a colleague, I realised a lot of what I’m learning will also be of use and/or interest to other webmasters. I shall therefore catalogue all future findings in a series of articles / blog posts.

This first article shall observe the current Google results for the term “dandruff”.

At the of writing, the top 3 results are:

Position 1

Title : Dandruff – Wikipedia, the free encyclopedia

URL : http : // en . wikipedia.org/wiki/Dandruff

PR 6

Position 2

Title: Dandruff – MayoClinic.com

URL: http : // www . mayoclinic.com/health/dandruff/DS00456

PR 4

Position 3

Title: Dandruff

URL: http : // www . coolnurse.com/dandruff.htm

PR 5

(PR = Google Page Rank)

What’s interesting is that P2 (Position 2) has a lower PR (Google Page Rank) than P3 (Position 3). Both web pages contain Dandruff in their TITLE tags and urls. So a cursory glance would suggest that P3 should have a better (i.e. lower) rank than P2.

From taking a look at both pages, both are on topic and contain genuine information about dandruff. P2 is only a small page, and is part of a full article (i.e. many small interlinked dandruff pages). While P3 is a full article on a single page.

The keyword densities are 4.86% for P2 and 6.39% for P3. Neither page makes obvious use of contrived keyword stuffing.

However, if we drill down into the data for each result (use the small magnifying glass next to each result, or follow the link the Resources list below), we’ll find some significant differences in their hosts and backlinks.

P2 has 6 Google backlinks, with 4,990 to its host. While P3 has 3 and 270 respectively.

P2 also dwarfs P3 in terms of the number of pages Google has indexed (Pages in Host), and also has a higher host page rank 7 (vs 5 for P3).

To summarise, P2 may have a lower page rank, but it has a few more links and its home page is overwhelmingly more popular than P3’s. It is also part of a set of pages about dandruff. Instead of having the full article on a single page, it has been split among many. P2 therefore has several interlinked pages about dandruff.

It therefore appears that the pages hosted on popular (in terms of search engines) web sites can rank better than other pages, even if they have lower Google Page ranks. Splitting up articles into many small pages helps as well.

How to use Robot.txt

June 1st, 2009 No comments

Procedures through a search engine robot (also known as spider), automatically visit web pages on the Internet and obtain information page.

You can on your site to create a plain text file robots.txt, statements in this document in the site visit do not want to be part robot, so that the site of some or all of the content on the search engines do not have to be included, or designated search engine only the contents of the specified record.  robots.txt file should be placed on the root site.

When a search robot (some call search spiders) to visit a site, it will first check the site under the root directory of the existence of robots.txt, if it exists, the search robot will be in accordance with the contents of the document to determine the visit scope; If the file does not exist, then the search robots to crawl along the link.

robots.txt file:

“Robots.txt” file that contains one or more of the records through the blank lines to separate (by CR, CR / NL, or NL as at the end), each record format is as follows:

“<field>: <optionalspace> <value> <optionalspace>”.

In that paper, you can use # for comments, the specific methods and the use of UNIX in the same practice.  The document is usually recorded in one or more lines of User-agent year, followed by a number of Disallow lines, as follows:

User-agent:

The value of the search engine robot is used to describe the name of the “robots.txt” file, if the number of User-agent records that have more than one robot will be subject to the restrictions of the agreement, the document, at least There is a User-agent record.  If the value is set to *, the agreements are valid for any robot in the “robots.txt” file, “User-agent: *” This can only have a record.

Disallow:

The values do not wish to be used to describe a visit to the URL, the URL can be a complete path, it could be a part of, any Disallow the URL at the beginning will not be access to the robot.  For example, “Disallow: / help” on / help.html and / help / index.html search engines are not allowed to visit, and the “Disallow: / help /” allows robot to visit / help.html, and not be able to access / help / index . html.  Disallow any record is empty, that all parts of the site are allowed to be visited, in the “/ robots.txt” file, at least record a Disallow.  If the “/ robots.txt” is an empty file, then for all the search engine robot, the site is open.

For example the use of robots.txt file:

Example 1. A ban on all search engines to visit any part of the site to download the robots.txt file User-agent: * Disallow: /

Example 2. To allow the robot to visit all (or also can be used to build an empty file “/ robots.txt” file) User-agent: * Disallow:

Example 3. To prohibit access to a search engine User-agent: BadBotDisallow: /

Example 4. Permit to visit a search engine User-agent: baiduspiderDisallow: User-agent: * Disallow: /

Example 5. A simple example in this case, there are three directory of the Web site search engine to limit access so that search engines will not visit the three directories.  It should be noted that a directory for each statement should be kept separate, and not written in “Disallow: / cgi-bin / / tmp /”.  User-agent: * after a special meaning, representing “any robot”, so the document can not be “Disallow: / tmp / *” or “Disallow: *. gif” it was recorded there.  User-agent: * Disallow: / cgi-bin/Disallow: / tmp / Disallow: / ~ joe /

Robot special parameters:

1. Google

Allow Googlebot:

If you want to block in addition to all the roaming outside Googlebot to access your pages, you can use the following syntax:

User-agent: Disallow: /

User-agent: Googlebot

Disallow:

Googlebot follows the line of its own point, rather than point to the line of all robots.

“Allow” extension:

Googlebot identifiable as “Allow” extension of the robots.txt standard.  Other search engine bots may not be able to identify this extension, so please use your interesting to find other search engines.  ”Allow” the role of line with the principle of “Disallow” line, like.  You want to allow only listed in the directory or page you.

You can also use “Disallow” and the “Allow”.  For example, to intercept a subdirectory in the page other than all the pages, you can use the following entries:

User-Agent: Googlebot

Disallow: / folder1 /

Allow: / folder1/myfile.html

These entries will be in addition to intercept folder1 directory of all the pages outside myfile.html.

If you want to block Google’s Googlebot and allow the other robots (such as Googlebot-Mobile), can use the “Allow” rules to allow access to the robots.  For example:

User-agent: Googlebot

Disallow: /

User-agent: Googlebot-Mobile

Allow:

Use * to match its character sequence:

You can use an asterisk (*) to match the character sequence.  For example, to block all private visit at the beginning of the subdirectory, use the following entries:

User-Agent: Googlebot

Disallow: / private * /

To block all contain a question mark (?) Visit the web site, you can use the following entries:

User-agent: *

Disallow: / *?  *

Using the $ character matches the end of the URL

You can use the $ character the end of the URL specified with the matching characters.  For example, to block to. Asp at the end of the URL, you can use the following entries:

User-Agent: Googlebot

Disallow: / *. asp $

You can match this model used in conjunction with the Allow directive.  For example, if?  That a session ID, you can exclude all of the URL contains the ID to ensure that Googlebot will not crawl duplicate pages.  However, in order to?  At the end of the URL may be that you want to include the version of the page.  In this case, the robots.txt file can be set as follows:

User-agent: *

Allow: / *?  $

Disallow: / *?

Disallow: / *?  And his party will block contains?  Website (specifically, it will block all your domain name at the beginning, followed by any string, followed by a question mark (?), And then the string is arbitrary URL).

Allow: / *?  $ And his party will be allowed to contain any?  At the end of the web site (specifically, it would allow to include all your domain name at the beginning, followed by any string, followed by a question mark (?), There is no question mark after the character of the site).

Sitemap Site Map:

Site Map for the support of the new approach is the robots.txt file, including direct links sitemap file.

Like this:

Sitemap: http://www.eastsem.com/sitemap.xml

Expressed support for the current search engine company Google, Yahoo, Ask and MSN.

However, I would suggest or submit to Google Sitemap, which features a lot of links you can analyze the state of

Robots.txt benefits:

1. Almost all the search engines gives Spider follow robots.txt crawl rules, search engine Spider agreement to enter a Web site that is the entrance to the site’s robots.txt, of course, the prerequisite is the existence of the website this document.  Robots.txt is not configured for the site, Spider will be redirected to a 404 error page, the relevant studies have shown that if the site uses a custom 404 error page, then the Spider will be regarded as its robots.txt– although the is not a pure text file – Spider Index This site will bring big problems, the impact of search engine included on the site page.

2. Robots.txt to stop the unnecessary occupation of the search engines valuable server bandwidth, such as email retrievers, the majority of this type of search engine sites is meaningless; Another example image strippers, for most types of non-graphics Web site for its and has little significance, but a considerable amount of bandwidth.

3. Robots.txt to stop search engine to non-public page crawling and indexing, such as the site background processes, management procedures, in fact, for some in the operation of the site have a temporary page, if not configured robots.txt , search engines and even those temporary files will be indexed.

4. For the rich, there are many pages of web sites, configure the robots.txt is more important significance, because very often a search engine of its Spider face tremendous pressure to give Web site: Spider-like visit to the flood, if not checked and even affect the normal web site visit.

5. Similarly, if the existence of duplicate content sites, use the robots.txt page limit will not be part of search engine indexing and recorded, can be avoided by the search engine site duplicate content on penalties to ensure that Web site’s ranking will not be affected.

the risks associated with robots.txt and solutions:

1.?????everything, robots.txt at the same time also brought a certain degree of risk: the attacker also pointed out the site’s directory structure and location of private data.  Although the Web server’s security configuration properly under the premise of this is not a serious problem, but those ill reduced the difficulty of the attack.

For example, if the site privacy data www.yourdomain.com / private / index.html visit, then the settings in the robots.txt may be as follows:

User-agent: *

Disallow: / private /

In this way, an attacker can simply look at robots.txt to know the content you want to hide where the input in the browser will be able to visit our www.yourdomain.com/private/ did not like the content.  Of this situation, the general approach taken is as follows:

Set access permissions on the / private / content in password-protected so that attackers will not be able to enter.

Another approach is to the default directory changed its name to the main document index.html other, for example, abc-protect.html, so that the content will become the address www.yourdomain.com / private / abc-protect.htm, At the same time, the production of a new index.html file, the content along the lines of “you do not have permission to access this page” like, so that an attacker because I do not know the actual file name and do not have access to private content.

2. If the settings wrong, will lead the search engine will index all the data deleted.

User-agent: *

Disallow: /

The above code will be banned from all of the search engine index data.

Currently, the vast majority of search engine robots have to comply with the rules of robots.txt, and the Robots META tags are not currently supported, but is gradually increased, such as the well-known search engine on the full support of GOOGLE and GOOGLE also adds a command “archive”, can be restricted to whether or not to retain GOOGLE snapshot page.  For example:

<META NAME=”googlebot” CONTENT=”index,follow,noarchive”>

That crawl the site page and link pages to crawl along, but not to keep GOOLGE web page snapshot of the page

ShareThis

Source
How to use Robot.txt

How to Increase Page Rank: Another Google Myth Revealed

May 25th, 2009 No comments

In the SEO world there is an addiction. Its the green blocks that appear on the google toolbar that is know as Page Rank or PR

In the SEO world there is an addiction. Its the green blocks that appear on the google toolbar that is know as Page Rank or PR. Due to the recent update of this counting system, which is powered by Google, its time to review the real understanding of PR. Focus on this mis-information, which, doesnt really help them get a top ranking

Lets look at the base of information that is Google PR. Much has been written and discussed. Google has used PR as a form of ranking and, for a long time is was seen as a major factor of getting results . If you search for increase page rank you will find many articles, features and opinions.

Enter 2007 and beyond, Google Page rank has created services who sell links, based on PR. Some content sites with with good PR are selling focused keyword text links to those who want links, as a way of getting a higher ranking. Google has come out saying this is essentially attempting to buy a ranking to manipulate the search results.

Also submission to search directorys is seen as a valid way of gaining backlinks .Pricing can be based on the PR of the home page. Here is a statement you need to remember:

Google ranks pages not web sites”

Remember when doing submissions to directory sites and more importantly paid listings, it’s the Page Rank of the PAGE that the link is on, that is important. Keeping in mind that it the internal page has a good ranking, it should send traffic to your site too.

It is possible for internal pages to gain a higher page rank than the home page. However keep in mind that if a site has page rank (of some amount) then it is seen as trusted source of information, both by visitors (if they are webmasters) and by Google

So is there a way to increase PR and your ranking?

Firstly dont focus on PR- in can be fickle and misleading. It is subject to Google changing the rules, which they reduced serveral times at least once a year. If you focus on creating reasons for Google search engine spider, to come back, that by itself will help with PR and ranking.

The simplest thing you can do is change and update your pages. Keeping your pages fresh will be seen as new content and bring back the google spider. The more this occurs, the more likely your Page Rank will increase.

This is why blogs (in general) do well in natural search results because they provide regular changing information.

This may seem blunt, but Page Rank is not the key to rankings. You are best focused on building a site that is 100% search engine friendly and focusing on getting links you cant go wrong.


Source
How to Increase Page Rank: Another Google Myth Revealed

Does your domain name prevent your website from getting high rankings?

June 3rd, 2008 No comments

The top level domain of your website can have an influence on your website rankings. Last week, many websites with a special top level domain were delisted from Google’s search results.

No more visitors from Google. What has happened?

Last week, many webmasters observed that all of their websites with an .info domain name disappeared from Google’s search results.

Some websites removed traffic drops from several hundreds of visitors per day to zero visitors per day. It seemed that all websites that used the .info top level domain had been removed from Google’s index.

A few days later, the websites with the .info domains reappeared in Google’s search results.

Why did this happen?

It looks as if Google updated its filters for special domain names and went a little too far. Earlier this year, the head of Google’s anti-spam team made the following statement:

“A top-level domain (TLD registry) will offer domains for under $4. The result will be another TLD blighted by spammy domain registrations.”

Domain names with a .info ending have been available for 99 Cent for some time. It’s likely that very many .info domain names have been purchased for spamming purposes.

Google might have intended to block .info domains that spam and a bug in the algorithm wiped all .info domains from Google’s results. Fortunately, Google’s engineers fixed the bug within days.

What does this mean for your website?

Filtering all .info domains just because many of them are used for spamming is a very drastic measure. Although Google doesn’t do this, it’s clear that there is some kind of filter for these domains.

If you want to succeed with your online business, it might be better to use a .com domain or the local top level domain of your country instead of a .info domain.

How to get Google Sitelinks for your website

March 16th, 2008 No comments

Many webmasters wonder how they can make Google display additional Sitelinks for their websites. What exactly are Sitelinks, how can you get them and are they worth the effort?What are Google Sitelinks?

Google Sitelinks are a collection of links that appears below the result of a website. These additional links link to main pages of the website. They are randomly and automatically chosen by Google’s algorithm.

 sitelinks.gif

As an example, here are the sitelinks that you get for HP.com when you search for “HP“:

Sitelinks only appear for general search terms. You’ll get Sitelinks if you search for “HP” but you won’t get Sitelinks if you search for a term like “HP printer supplies”. Sitelinks show up most often for searches on brand names.

Which links does Google use for the Sitelinks?

Google seems to use the first level links on a website for the Sitelinks. That means that all links that are not present on the homepage of your site won’t be used as Sitelinks.

The links should be descriptive text links or image links with a descriptive IMG ALT attribute. JavaScript or Flash links are not considered for Sitelinks. Google uses 2 to 8 links for the Sitelinks of a website. Unfortunately, it’s unclear how Google assigns the number of links to each website.

The text that is used for the Sitelinks can be the text that are used for the link (anchor text) on the homepage or the title of the linked page. It seems that Google prefers links that appear at the top of a web page.

How can you get Sitelinks for your website?

Unfortunately, there is nothing certain about Google’s Sitelinks. The following factors seem to influence whether Google displays Sitelinks or not:

1.       Your website must have a stable #1 ranking for the searched keyword. Other websites don’t seem to get Sitelinks.

2.       Your website must be at least 2 years old. It seems that younger websites don’t get Sitelinks.

3.       The number of searches and the number of clicks that your website gets for a certain keyword seem to be considered. Keywords that aren’t searched often enough don’t get Sitelinks. It also seems that your website has to get many clicks for the searched keyword.

4.       The number of links that point to your website with the searched keyword as the anchor text seem to influence the creation of Sitelinks. Sitelinks only seem to appear for the main keywords of a website, not for all keywords for which a website is listed.

If your website meets these criteria Google might assign Sitelinks to your website for your most important keywords.

Sitelinks can be a nice addition for searches for general keywords but they usually won’t appear for searches that consist of two to four words. These words are the most important keywords for website promotion and search engine optimization.

Do Banner Ads Count as Backlinks?

March 3rd, 2008 No comments

Banner ads are bright, flashy, and in-your-face. They demand that people sit up and take notice of your website or product. But are they really the best source of advertising?

With other ads—text ads, for example, or static image ads—there are not one, but two benefits. First, of course, people who actually click through the ads will end up at your site and may purchase your product. That is well and good, of course, and highly desirable. But it´s not all they offer.

Even if surfers don´t click through to your site, text ads and static image ads still offer a very real benefit; they count as backlinks with most search engines. And backlinks are a key factor in getting your website on a higher Search Engine Ranking Page (SERP). Even if no one ever clicks on your ads, they´re still serving a vital purpose.

Banner ads don´t offer that. They irritate more people than they entice, for starters, and while a conversion ratio of one click per thousand views might be fine for a text ad, where you had a secondary benefit, it´s pretty pathetic for a banner ad. At that conversion ratio, you would almost have to limit banner ads to places with hundreds of thousands of visitors per day, if you wanted them to have any worthwhile effect.

And search engines don´t count banner ads as backlinks. This is because most banner ads—the ones that move or change—have no text for the search engine to read. Spiders can´t crawl banner ads. And you deprive yourself of what might have been a valuable resource; the ability to make one ad count for two purposes.

New technologies detect black-hat SEO methods

February 2nd, 2008 No comments

Search engine optimization methods are divided in two categories: black hat SEO and white hat SEO. Both methods can help you to get high rankings on search engines.

However, one method is likely to get your website banned on search engines and recent developments indicate that websites that use that method will be in trouble soon.

What is white hat SEO?

white hat seo

White hat SEO means that the webmaster doesn’t try to trick search engines. White hat SEO means playing by the rules. Web pages that are created with white-hat SEO methods are beneficial to web surfers, search engines and webmasters.

What is black hat SEO?

Black hat SEO attempts to improve rankings in ways that are disapproved of by the search engines, or involve deception. These methods include cloaking, doorway pages, hidden text, etc.

Google and other search engines have made it clear that they penalize websites that use black hat SEO methods when they detect them.

Black-hat SEO methods seem to work. So why not use them?

Some black-hat SEO methods can lead to good results. There are quite a few webmasters who obtained high rankings for their web pages although they optimized them with methods that were not approved by Google and the other search engines.

black hat SEOYou have probably also seen some web pages in the search results that looked strange or hardly related to what you’ve actually searched. So do black-hat SEO methods seem to work? Should you use them?

Nearly all black-hat SEO methods have been detected by search engines sooner or later. Javascript redirects or doorway pages used to work in the past but nowadays, these methods are usually the ticket to the land of banned websites.

While some cloaking methods continue to work at this time (if your competitors don’t peach on you), it’s likely that Google can detect them soon. The same is true for paid links. Some paid links can still not be detected by Google but it’s only a matter of time until Google has the algorithms that can.

You might get in trouble even if you used black-hat methods years ago

The problem is that things that cannot be detected by Google now might be detected by Google tomorrow. And Google might also be able to find out what you did in the past.

A good example for a spam filter that also considers things that have been done in the past is the WikiScanner. WikiScanner can find manipulations that have been made in the past and it can also associate anonymous changes to the people and companies who made these “anonymous” changes.

Combine such a spam scanner which a web page archive like Archive.org and you have an easy way to track the spam history of a web page.

Things that you have done in the past might backfire on you.

Don’t use black-hat SEO methods. As technical possibilities evolve, it’s very likely that these methods will be detected even if you don’t use them anymore. It’s better to use tools that focus on white-hat SEO methods.