- What is Robots.txt file?
- How to use this Robots.txt?
- How to create robots.txt file?
- Can we take some advantages from this file?
- Which pages should you need to include in Robots.txt file and which to not?
- Where to place this file?
- How it leads to better visibility and how it leads to drop in ranking?
- Is there any relation between Google Algorithm updates and Robots.txt file
- What Is Robots.txt and How can we use them
ROBOTS.TXT is a protocol for guiding the spiders of search engines, about the parts of website we want them to crawl.
The site owner decides what information and instruction to give the spiders of search engines, when they visit to the websites. This file is always placed in the root of website hierarchy. On the visit of robots/spiders, if they find any robots.txt file , it means that the website owner wants to provide them special instructions before crawling the website, but if doesn’t find any robots.txt file , it means that website owner wants them to crawl the entire website with no special instructions.
How to create robots.txt
The file robots.txt is ASCII text file that we’ll place at the root of website domain. Writing robots file is extremely. All we need is ASCII text editor, in windows notepad is ASCII text editor, you can use notepad to write down robots file.
Robots file contains the names of spiders and file we don’t want them to crawl and other special instructions. We can use wildcard character asterisk (*) , when we want to list all spiders.
The structure of Robots.txt:
1. User-agent Line: Holds the name of crawlers to whom we want to instruct. Value of this field can’t be left empty, if you want to allow all bots, and then simply put the asterisk wildcard.
2. Directive Line: The action or the instruction we want to give to bots.
Here we instruct all the bots to be restricted to the file cgi-bin.
Or if we want some limited bots to be restricted, we can write them as in the example below.
This robot file is for image bots of Google and telling that they are restricted from being accessing the entire website.
Configuration error of robots.txt file leads to a fall in website revenue, due to dropped visibility in SERP.
Configuration errors we can list:
Use of asterisk wildcard in Disallow/Allow directive line, because it doesn’t support it.
Use of trailing forwards slash like “Disallow: /temp/” limits the bots to that much string only, it simply terminates here and not look for sub strings.
Dangling pages is one of major aspect to evaporate page rank. Blocking a URL in robots file implies only that crawlers doesn’t crawl the webpage, that is no outbound links are consider via them, but the incoming links to that URL will pass link juice and hence that dangling page cause a fall in page rank.
As, many developers and designers sometimes neglect it and sometimes by mistake they only put a forward slash in Disallow, this forward slash limit the entire bots to crawl/index/visit your website.
Advantage of robots.txt:
Uses of crawl delay directive solve the server over-loading issues. Useful when aggressive bots or site mirroring bots, affect web server performance.
We can block the URL’s which are leading to duplication and tell search engines the URL containing unique and original content.
If you’ve created URLs but things like your internal search are creating additional and messy URLs that you don’t want Google to see, block them in your Robots.txt files.
Adding sitemap in robots.txt make easy for search engine to index all pages of website.
Syntax for adding sitemap:
Where to place this file?
Put the Robots.txt file in the main directory. It is important to put the file on the proper directory for engines look first in the main directory like [http://www.websitedesign4seo.com/robots.txt]. Once the user agent or the search engine was not able to find the file, they will think that all the web pages should be indexed. We have Meta robots with the properties follow, no follow, index, no index which we can set in content field of Meta and this meta robot is to be added in header section of webpage.
Here, we can conclude that the pages which we want to crawl by crawlers not to be mark as disallow in robots.txt, there is another tag call Allow, which we can use specially to instruct spiders to crawl the pages.
Which pages should you need to include in Robots.txt file and which to not?
The pages which are under construction, the directories which we want not to be crawled by the bots so that they are not go to be index in the search engine database. And sometimes we want to hide contact or mail address details, in simple term for the sake of privacy such that not to be shown to search engines, we use Robots.txt
On other hand, the files, pages, information which we want to be crawled by the search engines for the purpose of indexation in the search engine’s database are not to be put in the robots.txt file
How it leads to better visibility and How it leads to drop in ranking?
Robots.txt file is just a guidance to search crawlers. This file guides crawlers that which pages are to ignore and which to consider. So if we put wrong guidance about our website then crawler ignores our best pages. Hence we face fall in ranking and it may leads to NO Ranking.
So we should take care about this file for better visibility. Although it does not helps in better ranking but it can be a victim of drop in ranking.
Is there any relation between Google ranking algorithm and Robots.txt file?
There is no direct relation between robots.txt file and Google ranking algorithms. But with smart use of this file we can avoid Google plenty.
Duplicate Content Penalty: We can identify the pages that are duplicates to other pages and then can disallow them in robots.txt file to avoid indexing in search engines.