Establish of Sitemap

June 3rd, 2009 No comments

Aims to establish a Web site to increase visibility and user traffic.  Search engine optimization to increase website traffic methods.  Another method is to use the Site Map, which allows you to specify a search engine which pages included or index.  Site Map of the concept was first developed by Google, Yahoo and MSN also recently agreed to the application of this standard.  This week our Help Sitemap standards.

The need to establish standards

The use of search engine spiders crawl the Internet, positioning them Indexed web pages to the database.  This process needs a lot of resources, and sometimes, you want to index the page is ignored, without being included with the page.  Search for changes in the network with the new web page, record them and the corresponding Google Googlebot spider classification is a typical example.

Site map for the site which clearly should be the index page, as well as the site has been added which provides a new method.  Basically, its search engine and Web site to provide a communication channel.  In theory, it is the search engine spiders by reducing the processing resources to alleviate the burden of the object, but can not replace the current site map crawling process.

About Site Map

Site Map is an XML document which contains a web site URL and the associated attribute table, the index should be a detailed description of what a particular site.  Site Map must be UTF-8 encoding.  The following is a site map XML file needed elements:

  • <Urlset> – site map file to the start and end tags, start tags must include the namespace (xmlns) attribute.
  • <Url> – the document that contains every page of this element.
  • <Loc> – a document specified in the actual address of the page.  It is a subset of elements <url> elements.

Document the following optional elements:

  • <Lastmod> – <url> elements of a subset of elements.  It was last updated the page specified time.
  • <Changefreq> – <url> elements of a subset of elements.  It has been designated by the frequency of page updates (regular, hourly, daily, weekly, monthly, and has never been updated annually).
  • <Priority> – <url> elements of a subset of elements.  Its designated website pages in relation to the importance of other pages, valid values for the 0.0-1.0, the default value of 0.5.

Samples of the following Web site explains how to map a sample page in the application of these elements.  It assumes that the Web site as a designated home page, as well as its update frequency, last update time and the priority sites.

<? xml version = “1.0″ encoding = “UTF-8″?>

<urlsetxmlns=”http://www.sitemaps.org/schemas/sitemap/0.9″>

<url>

<loc> http://www.test.com/ </ loc>

<lastmod> 2006-11-20 </ lastmod>

<changefreq> daily </ changefreq>

<priority> 0.3 </ priority>

</ url>

</ urlset>

Sitemap file from your own decision, but it determines the location of the document may contain the URL set.  For example, if the above sample file is located in Site Map http://www.test.com/sitemap.xml, then the site map file may contain any http://www.test.com/ at the beginning of the URL.  Therefore, we propose that the site map file stored in the root site.  Site Map file size should not exceed 10MB.  If a file beyond this limit, we need to use gzip to compress.

The establishment of a Site Map

Since the site map in XML-based, you can easily use any text editor to create and edit them, but also specialized tools can be applied.  The following list provides some of the current tools:

  • Map node: one to generate and verify XML documents site map tool
  • Gsitemap: an application. NET Framework to establish the site map generation tool.
  • GSiteCrawler: a site map file to generate the Windows tool.
  • phpSitemapNG: PHP with the preparation of a free site map generator.
  • Google Sitemap Generator: A site map can be used to generate the Python script file.

Circular search engine

The establishment of a site map file, it should be submitted to a search engine.  Each search has its own interface to submit the site map.  Google’s Webmaster Tools to concentrate contains a site map submitted to the page.  Before the use of an account you must be registered.  Yahoo also has a free site map to submit the page, but the same must be registered before an account.  Will follow the example of other search engines Google, Yahoo and MSN, to provide similar functions.

Other tools

Indexed pages search engine to crawl the process of slow and needs a lot of resources.  Site map for the site specified what should be included which provides a way to search.  They set up the same as a simple XML text files, but there are many tools can help you create a site map file.  Now, they only deal with the current process to add.

ShareThis

Source
Establish of Sitemap

Categories: General Search Engine Optimize Tags:

Advanced Link Text Strategies

June 2nd, 2009 No comments

There are some advanced things you can do when considering your link text that will allow you to gain even more benefit from them. Lets look at some ideas and suggestions, as well as think about some more high levels strategies revolving about link text.

Place Plain Text Around Links

Placing some plain text next to your links is better than just having bare links all over the place. Many search engines use the text surrounding a link to help figure out what a link is about. Although I would not go so far as to place plain text in your navigation menu, and it may not look good down near the bottom of your page near the copyright notice, but when available, its always a good idea to add a line of descriptive text next to your link.

Varying Your Destination URL?

There has been alot of debate in the last few years about whether when you are out there submitting your links, asking for links, and linking your own pages up, whether you should change out the desination URL. What I mean is, ‘domain.com’ and ‘www.domain.com’ usually point to the exact same place. Also depending on my website’s setup, something like ‘domain.com/index.php’ or ‘domain.com/index.htm’ also points to the same place.

So should you be changing up the destination URL constantly? No. Should you change it up a bit here and there? Yes. You should find the scheme that you prefer most, and stick with that 80-90% of the time. So if you decide that you like the ‘domain.com’ version better than the ‘www.domain.com’ version, then stick with that most of the time. Sticking with the same scheme will give you better search engine results in general, however you should vary it using the different flavors of your URL at least 10% or so of the time.

How Many Links Per Page?

This is certainly a good question, and something that really deserves its own page. However with the present state of the web, I would personally not put more than 30 links on one page at any given time. Now there may be some exceptions to this. For instance your own sitemap page, or a page that has 50 links that all point to other areas of your site. Thats not a big deal in the eyes of the search engines. What I would avoid however, is placing large amounts of links to external sites on one big page. You don’t want to have your page be viewed as some sort of link farm right?

What If My Link Has To Be My Name?

Blog comments are a common place where there are links that are expected to be a name. Many blog systems allow you attach a link to the name you use to post comments. This free link of sorts tends to be highly abused however, and the blog owners do not look too kindly upon people who jam a bunch of keywords in as their ‘name’ when posting comments. This smells like spam and they will delete and/or blacklist you in a second. However, you can often give yourself a ‘title’ type of name. As long as you are backing it up with a long,useful,relevant comment that blows away all the other comments on the page, most blog owners would be happy to put up a comment like that.

Examples of good names to use would be ‘Sewing Sally’ instead of just ‘Sally’. Yes you are still calling yourself Sally, but at least you have the keyword ’sewing’ now added to the link as well. Other examples that you can try would be things like ‘Cooking Fan’, ‘Joe the Plumber’, or ‘Marketing Guru’.

Have you found this article useful? If so, visit http://www.EasyNetSuccess.com for many more useful marketing strategies.
Categories: Link Popularity Tags:

How Can Lower Page Rank Web Pages Achieve Better Search Engine Positions?

June 1st, 2009 No comments

I have been using dawjee.com to analyse search terms for a while now – toying with keyword suggestions and looking for interesting patterns or anomalies in the result data (e.g. pages with query string data often report an erroneous Google Page Rank of zero).

After discussing some findings with a colleague, I realised a lot of what I’m learning will also be of use and/or interest to other webmasters. I shall therefore catalogue all future findings in a series of articles / blog posts.

This first article shall observe the current Google results for the term “dandruff”.

At the of writing, the top 3 results are:

Position 1

Title : Dandruff – Wikipedia, the free encyclopedia

URL : http : // en . wikipedia.org/wiki/Dandruff

PR 6

Position 2

Title: Dandruff – MayoClinic.com

URL: http : // www . mayoclinic.com/health/dandruff/DS00456

PR 4

Position 3

Title: Dandruff

URL: http : // www . coolnurse.com/dandruff.htm

PR 5

(PR = Google Page Rank)

What’s interesting is that P2 (Position 2) has a lower PR (Google Page Rank) than P3 (Position 3). Both web pages contain Dandruff in their TITLE tags and urls. So a cursory glance would suggest that P3 should have a better (i.e. lower) rank than P2.

From taking a look at both pages, both are on topic and contain genuine information about dandruff. P2 is only a small page, and is part of a full article (i.e. many small interlinked dandruff pages). While P3 is a full article on a single page.

The keyword densities are 4.86% for P2 and 6.39% for P3. Neither page makes obvious use of contrived keyword stuffing.

However, if we drill down into the data for each result (use the small magnifying glass next to each result, or follow the link the Resources list below), we’ll find some significant differences in their hosts and backlinks.

P2 has 6 Google backlinks, with 4,990 to its host. While P3 has 3 and 270 respectively.

P2 also dwarfs P3 in terms of the number of pages Google has indexed (Pages in Host), and also has a higher host page rank 7 (vs 5 for P3).

To summarise, P2 may have a lower page rank, but it has a few more links and its home page is overwhelmingly more popular than P3’s. It is also part of a set of pages about dandruff. Instead of having the full article on a single page, it has been split among many. P2 therefore has several interlinked pages about dandruff.

It therefore appears that the pages hosted on popular (in terms of search engines) web sites can rank better than other pages, even if they have lower Google Page ranks. Splitting up articles into many small pages helps as well.

Increase your Link Popularity and Search Engine Ranking

June 1st, 2009 No comments

Have you ever tried to exchange links with other websites? The process usually requires finding a website that has a link directory and then contacting the webmaster by email to request a link exchange. If you are lucky, they may respond to your email within a week or two depending on how busy they are. If they do reply with their link information, you will then need to manually add them to your link directory page. Then there is the process of checking for dead links on a weekly basis to make sure you are not giving away free links. ExpressLinkExchange does all of this for you.

Their programmers developed ExpressLinkExchange to be a fully automated system that allows you to get new links to your website while you are working, playing golf, out to lunch, even while you are sleeping! Once you subscribe to their service, within 15 minutes you can have up to 7,841 of their members linking to your website. Setup is made easy by using their Easy Setup Wizard.

All you have to do is enter your website address, website title, a small description of your website, and choose a directory category for your website to display in. Then you have the option of using their code to create a links page on your website, or allowing them to host a links page for you on their free web hosting service. Finally, you add a link to your new links page to your home page, and you are done!

If you prefer to exchange links exclusively with websites that are relevant to your website theme, you can use their Exclusion Filters™ to deny exchanges with websites and even entire categories that you do not wish to exchange links with. Some members use the filters to gradually increase their incoming links by releasing one category each week or so. You can also choose to either automatically accept link exchanges with all new members as they subscribe or make them wait until you approve each new member. All of this is done from within your member control panel.

Because to get started, getting 1000’s of inbound links instantly on your website, Visit: http://www.ExpressLinkExchange.com now!

Shanon Sandquist is CEO of Homeworkers Publications. A company that has been in business for six years.

Categories: Link Popularity Tags:

How to use Robot.txt

June 1st, 2009 No comments

Procedures through a search engine robot (also known as spider), automatically visit web pages on the Internet and obtain information page.

You can on your site to create a plain text file robots.txt, statements in this document in the site visit do not want to be part robot, so that the site of some or all of the content on the search engines do not have to be included, or designated search engine only the contents of the specified record.  robots.txt file should be placed on the root site.

When a search robot (some call search spiders) to visit a site, it will first check the site under the root directory of the existence of robots.txt, if it exists, the search robot will be in accordance with the contents of the document to determine the visit scope; If the file does not exist, then the search robots to crawl along the link.

robots.txt file:

“Robots.txt” file that contains one or more of the records through the blank lines to separate (by CR, CR / NL, or NL as at the end), each record format is as follows:

“<field>: <optionalspace> <value> <optionalspace>”.

In that paper, you can use # for comments, the specific methods and the use of UNIX in the same practice.  The document is usually recorded in one or more lines of User-agent year, followed by a number of Disallow lines, as follows:

User-agent:

The value of the search engine robot is used to describe the name of the “robots.txt” file, if the number of User-agent records that have more than one robot will be subject to the restrictions of the agreement, the document, at least There is a User-agent record.  If the value is set to *, the agreements are valid for any robot in the “robots.txt” file, “User-agent: *” This can only have a record.

Disallow:

The values do not wish to be used to describe a visit to the URL, the URL can be a complete path, it could be a part of, any Disallow the URL at the beginning will not be access to the robot.  For example, “Disallow: / help” on / help.html and / help / index.html search engines are not allowed to visit, and the “Disallow: / help /” allows robot to visit / help.html, and not be able to access / help / index . html.  Disallow any record is empty, that all parts of the site are allowed to be visited, in the “/ robots.txt” file, at least record a Disallow.  If the “/ robots.txt” is an empty file, then for all the search engine robot, the site is open.

For example the use of robots.txt file:

Example 1. A ban on all search engines to visit any part of the site to download the robots.txt file User-agent: * Disallow: /

Example 2. To allow the robot to visit all (or also can be used to build an empty file “/ robots.txt” file) User-agent: * Disallow:

Example 3. To prohibit access to a search engine User-agent: BadBotDisallow: /

Example 4. Permit to visit a search engine User-agent: baiduspiderDisallow: User-agent: * Disallow: /

Example 5. A simple example in this case, there are three directory of the Web site search engine to limit access so that search engines will not visit the three directories.  It should be noted that a directory for each statement should be kept separate, and not written in “Disallow: / cgi-bin / / tmp /”.  User-agent: * after a special meaning, representing “any robot”, so the document can not be “Disallow: / tmp / *” or “Disallow: *. gif” it was recorded there.  User-agent: * Disallow: / cgi-bin/Disallow: / tmp / Disallow: / ~ joe /

Robot special parameters:

1. Google

Allow Googlebot:

If you want to block in addition to all the roaming outside Googlebot to access your pages, you can use the following syntax:

User-agent: Disallow: /

User-agent: Googlebot

Disallow:

Googlebot follows the line of its own point, rather than point to the line of all robots.

“Allow” extension:

Googlebot identifiable as “Allow” extension of the robots.txt standard.  Other search engine bots may not be able to identify this extension, so please use your interesting to find other search engines.  ”Allow” the role of line with the principle of “Disallow” line, like.  You want to allow only listed in the directory or page you.

You can also use “Disallow” and the “Allow”.  For example, to intercept a subdirectory in the page other than all the pages, you can use the following entries:

User-Agent: Googlebot

Disallow: / folder1 /

Allow: / folder1/myfile.html

These entries will be in addition to intercept folder1 directory of all the pages outside myfile.html.

If you want to block Google’s Googlebot and allow the other robots (such as Googlebot-Mobile), can use the “Allow” rules to allow access to the robots.  For example:

User-agent: Googlebot

Disallow: /

User-agent: Googlebot-Mobile

Allow:

Use * to match its character sequence:

You can use an asterisk (*) to match the character sequence.  For example, to block all private visit at the beginning of the subdirectory, use the following entries:

User-Agent: Googlebot

Disallow: / private * /

To block all contain a question mark (?) Visit the web site, you can use the following entries:

User-agent: *

Disallow: / *?  *

Using the $ character matches the end of the URL

You can use the $ character the end of the URL specified with the matching characters.  For example, to block to. Asp at the end of the URL, you can use the following entries:

User-Agent: Googlebot

Disallow: / *. asp $

You can match this model used in conjunction with the Allow directive.  For example, if?  That a session ID, you can exclude all of the URL contains the ID to ensure that Googlebot will not crawl duplicate pages.  However, in order to?  At the end of the URL may be that you want to include the version of the page.  In this case, the robots.txt file can be set as follows:

User-agent: *

Allow: / *?  $

Disallow: / *?

Disallow: / *?  And his party will block contains?  Website (specifically, it will block all your domain name at the beginning, followed by any string, followed by a question mark (?), And then the string is arbitrary URL).

Allow: / *?  $ And his party will be allowed to contain any?  At the end of the web site (specifically, it would allow to include all your domain name at the beginning, followed by any string, followed by a question mark (?), There is no question mark after the character of the site).

Sitemap Site Map:

Site Map for the support of the new approach is the robots.txt file, including direct links sitemap file.

Like this:

Sitemap: http://www.eastsem.com/sitemap.xml

Expressed support for the current search engine company Google, Yahoo, Ask and MSN.

However, I would suggest or submit to Google Sitemap, which features a lot of links you can analyze the state of

Robots.txt benefits:

1. Almost all the search engines gives Spider follow robots.txt crawl rules, search engine Spider agreement to enter a Web site that is the entrance to the site’s robots.txt, of course, the prerequisite is the existence of the website this document.  Robots.txt is not configured for the site, Spider will be redirected to a 404 error page, the relevant studies have shown that if the site uses a custom 404 error page, then the Spider will be regarded as its robots.txt– although the is not a pure text file – Spider Index This site will bring big problems, the impact of search engine included on the site page.

2. Robots.txt to stop the unnecessary occupation of the search engines valuable server bandwidth, such as email retrievers, the majority of this type of search engine sites is meaningless; Another example image strippers, for most types of non-graphics Web site for its and has little significance, but a considerable amount of bandwidth.

3. Robots.txt to stop search engine to non-public page crawling and indexing, such as the site background processes, management procedures, in fact, for some in the operation of the site have a temporary page, if not configured robots.txt , search engines and even those temporary files will be indexed.

4. For the rich, there are many pages of web sites, configure the robots.txt is more important significance, because very often a search engine of its Spider face tremendous pressure to give Web site: Spider-like visit to the flood, if not checked and even affect the normal web site visit.

5. Similarly, if the existence of duplicate content sites, use the robots.txt page limit will not be part of search engine indexing and recorded, can be avoided by the search engine site duplicate content on penalties to ensure that Web site’s ranking will not be affected.

the risks associated with robots.txt and solutions:

1.?????everything, robots.txt at the same time also brought a certain degree of risk: the attacker also pointed out the site’s directory structure and location of private data.  Although the Web server’s security configuration properly under the premise of this is not a serious problem, but those ill reduced the difficulty of the attack.

For example, if the site privacy data www.yourdomain.com / private / index.html visit, then the settings in the robots.txt may be as follows:

User-agent: *

Disallow: / private /

In this way, an attacker can simply look at robots.txt to know the content you want to hide where the input in the browser will be able to visit our www.yourdomain.com/private/ did not like the content.  Of this situation, the general approach taken is as follows:

Set access permissions on the / private / content in password-protected so that attackers will not be able to enter.

Another approach is to the default directory changed its name to the main document index.html other, for example, abc-protect.html, so that the content will become the address www.yourdomain.com / private / abc-protect.htm, At the same time, the production of a new index.html file, the content along the lines of “you do not have permission to access this page” like, so that an attacker because I do not know the actual file name and do not have access to private content.

2. If the settings wrong, will lead the search engine will index all the data deleted.

User-agent: *

Disallow: /

The above code will be banned from all of the search engine index data.

Currently, the vast majority of search engine robots have to comply with the rules of robots.txt, and the Robots META tags are not currently supported, but is gradually increased, such as the well-known search engine on the full support of GOOGLE and GOOGLE also adds a command “archive”, can be restricted to whether or not to retain GOOGLE snapshot page.  For example:

<META NAME=”googlebot” CONTENT=”index,follow,noarchive”>

That crawl the site page and link pages to crawl along, but not to keep GOOLGE web page snapshot of the page

ShareThis

Source
How to use Robot.txt

How to use Robot.txt

June 1st, 2009 No comments

Procedures through a search engine robot (also known as spider), automatically visit web pages on the Internet and obtain information page.

You can on your site to create a plain text file robots.txt, statements in this document in the site visit do not want to be part robot, so that the site of some or all of the content on the search engines do not have to be included, or designated search engine only the contents of the specified record.  robots.txt file should be placed on the root site.

When a search robot (some call search spiders) to visit a site, it will first check the site under the root directory of the existence of robots.txt, if it exists, the search robot will be in accordance with the contents of the document to determine the visit scope; If the file does not exist, then the search robots to crawl along the link.

robots.txt file:

“Robots.txt” file that contains one or more of the records through the blank lines to separate (by CR, CR / NL, or NL as at the end), each record format is as follows:

“<field>: <optionalspace> <value> <optionalspace>”.

In that paper, you can use # for comments, the specific methods and the use of UNIX in the same practice.  The document is usually recorded in one or more lines of User-agent year, followed by a number of Disallow lines, as follows:

User-agent:

The value of the search engine robot is used to describe the name of the “robots.txt” file, if the number of User-agent records that have more than one robot will be subject to the restrictions of the agreement, the document, at least There is a User-agent record.  If the value is set to *, the agreements are valid for any robot in the “robots.txt” file, “User-agent: *” This can only have a record.

Disallow:

The values do not wish to be used to describe a visit to the URL, the URL can be a complete path, it could be a part of, any Disallow the URL at the beginning will not be access to the robot.  For example, “Disallow: / help” on / help.html and / help / index.html search engines are not allowed to visit, and the “Disallow: / help /” allows robot to visit / help.html, and not be able to access / help / index . html.  Disallow any record is empty, that all parts of the site are allowed to be visited, in the “/ robots.txt” file, at least record a Disallow.  If the “/ robots.txt” is an empty file, then for all the search engine robot, the site is open.

For example the use of robots.txt file:

Example 1. A ban on all search engines to visit any part of the site to download the robots.txt file User-agent: * Disallow: /

Example 2. To allow the robot to visit all (or also can be used to build an empty file “/ robots.txt” file) User-agent: * Disallow:

Example 3. To prohibit access to a search engine User-agent: BadBotDisallow: /

Example 4. Permit to visit a search engine User-agent: baiduspiderDisallow: User-agent: * Disallow: /

Example 5. A simple example in this case, there are three directory of the Web site search engine to limit access so that search engines will not visit the three directories.  It should be noted that a directory for each statement should be kept separate, and not written in “Disallow: / cgi-bin / / tmp /”.  User-agent: * after a special meaning, representing “any robot”, so the document can not be “Disallow: / tmp / *” or “Disallow: *. gif” it was recorded there.  User-agent: * Disallow: / cgi-bin/Disallow: / tmp / Disallow: / ~ joe /

Robot special parameters:

1. Google

Allow Googlebot:

If you want to block in addition to all the roaming outside Googlebot to access your pages, you can use the following syntax:

User-agent: Disallow: /

User-agent: Googlebot

Disallow:

Googlebot follows the line of its own point, rather than point to the line of all robots.

“Allow” extension:

Googlebot identifiable as “Allow” extension of the robots.txt standard.  Other search engine bots may not be able to identify this extension, so please use your interesting to find other search engines.  ”Allow” the role of line with the principle of “Disallow” line, like.  You want to allow only listed in the directory or page you.

You can also use “Disallow” and the “Allow”.  For example, to intercept a subdirectory in the page other than all the pages, you can use the following entries:

User-Agent: Googlebot

Disallow: / folder1 /

Allow: / folder1/myfile.html

These entries will be in addition to intercept folder1 directory of all the pages outside myfile.html.

If you want to block Google’s Googlebot and allow the other robots (such as Googlebot-Mobile), can use the “Allow” rules to allow access to the robots.  For example:

User-agent: Googlebot

Disallow: /

User-agent: Googlebot-Mobile

Allow:

Use * to match its character sequence:

You can use an asterisk (*) to match the character sequence.  For example, to block all private visit at the beginning of the subdirectory, use the following entries:

User-Agent: Googlebot

Disallow: / private * /

To block all contain a question mark (?) Visit the web site, you can use the following entries:

User-agent: *

Disallow: / *?  *

Using the $ character matches the end of the URL

You can use the $ character the end of the URL specified with the matching characters.  For example, to block to. Asp at the end of the URL, you can use the following entries:

User-Agent: Googlebot

Disallow: / *. asp $

You can match this model used in conjunction with the Allow directive.  For example, if?  That a session ID, you can exclude all of the URL contains the ID to ensure that Googlebot will not crawl duplicate pages.  However, in order to?  At the end of the URL may be that you want to include the version of the page.  In this case, the robots.txt file can be set as follows:

User-agent: *

Allow: / *?  $

Disallow: / *?

Disallow: / *?  And his party will block contains?  Website (specifically, it will block all your domain name at the beginning, followed by any string, followed by a question mark (?), And then the string is arbitrary URL).

Allow: / *?  $ And his party will be allowed to contain any?  At the end of the web site (specifically, it would allow to include all your domain name at the beginning, followed by any string, followed by a question mark (?), There is no question mark after the character of the site).

Sitemap Site Map:

Site Map for the support of the new approach is the robots.txt file, including direct links sitemap file.

Like this:

Sitemap: http://www.eastsem.com/sitemap.xml

Expressed support for the current search engine company Google, Yahoo, Ask and MSN.

However, I would suggest or submit to Google Sitemap, which features a lot of links you can analyze the state of

Robots.txt benefits:

1. Almost all the search engines gives Spider follow robots.txt crawl rules, search engine Spider agreement to enter a Web site that is the entrance to the site’s robots.txt, of course, the prerequisite is the existence of the website this document.  Robots.txt is not configured for the site, Spider will be redirected to a 404 error page, the relevant studies have shown that if the site uses a custom 404 error page, then the Spider will be regarded as its robots.txt– although the is not a pure text file – Spider Index This site will bring big problems, the impact of search engine included on the site page.

2. Robots.txt to stop the unnecessary occupation of the search engines valuable server bandwidth, such as email retrievers, the majority of this type of search engine sites is meaningless; Another example image strippers, for most types of non-graphics Web site for its and has little significance, but a considerable amount of bandwidth.

3. Robots.txt to stop search engine to non-public page crawling and indexing, such as the site background processes, management procedures, in fact, for some in the operation of the site have a temporary page, if not configured robots.txt , search engines and even those temporary files will be indexed.

4. For the rich, there are many pages of web sites, configure the robots.txt is more important significance, because very often a search engine of its Spider face tremendous pressure to give Web site: Spider-like visit to the flood, if not checked and even affect the normal web site visit.

5. Similarly, if the existence of duplicate content sites, use the robots.txt page limit will not be part of search engine indexing and recorded, can be avoided by the search engine site duplicate content on penalties to ensure that Web site’s ranking will not be affected.

the risks associated with robots.txt and solutions:

1.?????everything, robots.txt at the same time also brought a certain degree of risk: the attacker also pointed out the site’s directory structure and location of private data.  Although the Web server’s security configuration properly under the premise of this is not a serious problem, but those ill reduced the difficulty of the attack.

For example, if the site privacy data www.yourdomain.com / private / index.html visit, then the settings in the robots.txt may be as follows:

User-agent: *

Disallow: / private /

In this way, an attacker can simply look at robots.txt to know the content you want to hide where the input in the browser will be able to visit our www.yourdomain.com/private/ did not like the content.  Of this situation, the general approach taken is as follows:

Set access permissions on the / private / content in password-protected so that attackers will not be able to enter.

Another approach is to the default directory changed its name to the main document index.html other, for example, abc-protect.html, so that the content will become the address www.yourdomain.com / private / abc-protect.htm, At the same time, the production of a new index.html file, the content along the lines of “you do not have permission to access this page” like, so that an attacker because I do not know the actual file name and do not have access to private content.

2. If the settings wrong, will lead the search engine will index all the data deleted.

User-agent: *

Disallow: /

The above code will be banned from all of the search engine index data.

Currently, the vast majority of search engine robots have to comply with the rules of robots.txt, and the Robots META tags are not currently supported, but is gradually increased, such as the well-known search engine on the full support of GOOGLE and GOOGLE also adds a command “archive”, can be restricted to whether or not to retain GOOGLE snapshot page.  For example:

<META NAME=”googlebot” CONTENT=”index,follow,noarchive”>

That crawl the site page and link pages to crawl along, but not to keep GOOLGE web page snapshot of the page

ShareThis

Source
How to use Robot.txt

How to Make More Money by Improving Your Google Page Rank

May 31st, 2009 No comments
Page rank is all about the number of links between your website and other relevant websites. Google is actually interested in links between individual pages of a site rather then just a domain.

The page rank of a web page can be seen in the Google page rank bar on the Google Toolbar. It actually takes 10 links to reach PR1, and then a lot more to reach 2 and up. That’s not it, because it also depends on the Page Rank of the page that will be linking to your page. Nobody knows the exact formula Google uses to determine your page rank, but there are ways to improve your rank and thus build your business through more traffic to your site.

Try out these five Google tips to help get more traffic to your website:

1.How can you increase your Google page rank? By increasing the amount of relevant websites that link back to yours. Keep in mind that the links do not need to be specific to your site, but to your specific pages. This is why it’s called page rank.

TIP: Avoid using link farms or Free For All (FFA) pages!

2. Use keyphrases and not keywords. Keywords are not specific enough anymore to help you rank well in Google due to the large number of websites competing for traffic. Avoid wasting your time optimizing your pages using keywords only. By using specific keyphrases where there is less competition you have a much better chance to rank well in Google and to also improve the quality of your websites traffic.

3. Google likes to see a lot of copy on your web pages. The copy text should contain your keyphrases, but should also be informative and helpful. Why would someone visit your website if you just list your keyphrases? The content of your website should encourage your visitors to laugh, think, get upset, and stay informed. But most importantly should bring them back for more.

TIP: Interesting, informative copy is much more attractive to Google then useless nonsense.

4. The easiest way to find quality backlinks for your site is by installing the free Google Toolbar. With it you can find out the Page Rank (PR) and backlinks for any site you visit and then easily determine if it is worth the time and effort to link to that site.

5. Choose a short catchy domain name. If possible, find a keyword for your domain, one that people will remember and that is easy to type into Google or a browser to find what they’re looking for on the Internet.

We are talking about putting in some time and effort, but if you want it bad enough then you have to pay the price. Those who take the time to optimize their pages for Google and achieve a high page rank are the ones with all the traffic.

Copyright 2008 Joe Rispoli


Source
How to Make More Money by Improving Your Google Page Rank