What is Robot.txt??

rsplajit 100 Post Club
edited January 2012 in On Site Optimization
Hello Friends,
Can you please explain, What is Robot.txt? (Please give an example)

Thanks
«13

Posts

  • localnumberone 100 Post Club
    edited December 2011
    Robot.txt is the text file which tells crawler that which part of the website has been index or not.

    Syntax:-

    User-agent: *

    Disallow: /temp/
  • generic Registered Users
    edited December 2011
    Please Can you give me Details of it,Because i am beginner and Right now i am learning,So please help me...
  • johnmayers Registered Users
    edited December 2011
    When a search engine crawler comes to your site, it will look for a special file on your site. That file is called robots.txt and it tells the search engine spider, which Web pages of your site should be indexed and which Web pages should be ignored.
  • alancore2duo Registered Users
    edited December 2011
    it cmmunicates with search engine spiders
    and tells them if u have set any restrictin
  • Thuhihup Registered Users
    edited December 2011
    There is a hidden, relentless force that permeates the web and its billions of web pages and files, unbeknownst to the majority of us sentient beings. I'm talking about search engine crawlers and robots here. Every day hundreds of them go out and scour the web, whether it's Google trying to index the entire web, or a spam bot collecting any email address it could find for less than honorable intentions. As site owners, what little control we have over what robots are allowed to do when they visit our sites exist in a magical little file called "robots.txt."
  • kenontot Registered Users
    edited December 2011
    Robots.txt is a text file that you put on your site to tell search robots which pages you would like them to do visit.The structure of a robots.txt file is quite simple, it is a long list of agents users and files and directories rejected.
  • rogercroft 100 Post Club
    edited December 2011
    Basic purpose of robot.txt is to stop the search engine bots or crawlers to visit certain pages or areas of your website. In other words you can instruct the SE bots how to crawl your website.
  • sky2csky2c Registered Users
    edited January 2012
    Hello Friends,

    There is a hidden, relentless force that permeates the web and its billions of web pages and files, unbeknownst to the majority of us sentient beings. I'm talking about search engine crawlers and robots here. Every day hundreds of them go out and scour the web, whether it's Google trying to index the entire web, or a spam bot collecting any email address it could find for less than honorable intentions. As site owners, what little control we have over what robots are allowed to do when they visit our sites exist in a magical little file called "robots.txt."
    "Robots.txt" is a regular text file that through its name, has special meaning to the majority of "honorable" robots on the web. By defining a few rules in this text file, you can instruct robots to not crawl and index certain files, directories within your site, or at all. For example, you may not want Google to crawl the /images directory of your site, as it's both meaningless to you and a waste of your site's bandwidth. "Robots.txt" lets you tell Google just that.

    Thanks
  • John123WillsJohn123Wills Registered Users
    edited January 2012
    Web Robots (also known as Web Wanderers, Crawlers, or Spiders), are programs that traverse the Web automatically. Search engines such as Google use them to index the web content, spammers use them to scan for email addresses, and they have many other uses.
  • tinil Registered Users
    edited January 2012
    A robots.txt file restricts access to your site by search engine robots that crawl the web. These bots are automated, and before they access pages of a site, they check to see if a robots.txt file exists that prevents them from accessing certain pages.
  • tinilxyz Registered Users
    edited January 2012
    The robots exclusion protocol (REP), or robots.txt is a text file webmasters create to instruct robots (typically search engine robots) on how to crawl & index pages on their website.
  • johnny12345 Registered Users
    edited January 2012
    robots.txt as a solution to help publishers control what content on their websites that Google's indexing spiders would see. A hand shot up.
  • jennasherly 100 Post Club
    edited January 2012
    Robots.txt - Blocking the secured and confidential pages through robots.txt file will prevent search engines to crawl through those pages.
  • giaiphapsovnet Registered Users
    edited January 2012
    Up cho bác, chúc bác bán d?t h
  • Chazujek Registered Users
    edited January 2012
    Web indexing robots are used by many search engines such as Google, Inktomi, AltaVista and others. These web indexing robots are also known as spiders. These spiders/robots are the tools used by engines to harvest data for their search engines. When you submit your website to the engines, you are effectively asking the search engines to send their web indexing robot to your website so that it can be crawled and added to their database.
  • optimizenewyorkoptimizenewyork Registered Users
    edited January 2012
    Robots.txt file is in the root directory of a website which has been created to direct the activity of search engine crawlers or spiders.
  • Rishi tomar Registered Users
    edited January 2012
    If a site has duplicate content or pages one to show to crawler and one to show to visitors, in that case we use robot.txt for that duplicate content.
    Not to show to crawlers
  • markalter Registered Users
    edited January 2012
    A robots.txt file is a special text file that is always located in the root of your Web server identifies directory.A robot when it navigates to your site, which is known as the User-agent.
  • adskumar Registered Users
    edited January 2012
    Robot. txt is a text file it is mainly used for allow and disallow purpose
  • harish1402 500 Post Club
    edited January 2012
    Robots.txt file is a file which sits in the root of a site and tells search engines which files not to crawl. Some search engines will still list your URLs as URL only listings even if you block them using a robots.txt file.so i think this file is important to website.
  • JudgeDread Registered Users
    edited January 2012
    you can also declare the location of your sitemap.xml through robots.txt
  • parker1234 Registered Users
    edited January 2012
    Robot.txt is having two values allow and disallow.When we disallow robot .txt then it prevent search engine from indexing by default it has allowed attribute
  • spider Registered Users
    edited January 2012
    Robots.txt is a text (not html) file you put on your site to tell search robots which pages you would like them not to visit. Robots.txt is by no means mandatory for search engines but generally search engines obey what they are asked not to do. It is important to clarify that robots.txt is not a way from preventing search engines from crawling your site (i.e. it is not a firewall, or a kind of password protection) and the fact that you put a robots.txt file is something like putting a note “Please, do not enter” on an unlocked door – e.g. you cannot prevent thieves from coming in but the good guys will not open to door and enter.
  • infobanc Registered Users
    edited January 2012
    The robots exclusion standard, also known more commonly as Robots.txt, is a text file present in the root directory of a website. The Robots.txt file is a convention created to direct the activity of search engine crawlers or web spiders.
  • raafialone Registered Users
    edited January 2012
    robot.txt is a file which gives instructions to search engine bots. It is very important. If your robot.txt say that an area of the websites should not be followed, that area will not be indexed by google.
  • drtact Registered Users
    edited January 2012
    To allow crawlers to crawl our necessary files and folders .to be secure purpose.
  • jas2011 100 Post Club
    edited January 2012
    Robots.txt is a file which we create to tell the crawler which pages of websites are to be crawled or which are not to be....and sometimes it's used for security purposes
  • MiaJones Registered Users
    edited January 2012
    Robot.txt is file where you add some links of your website which you don't want crawler to crawl. Whenever crawler crawls your website it first crawl robot.txt file of your website.
  • aartikryptonaartikrypton Registered Users
    edited January 2012
    Hi,

    nice post and thnks for this info.......................
  • kevinthomas Registered Users
    edited January 2012
    hi i am kevin thomas if you can found the error then you use the robot .txt
Sign In or Register to comment.