Can anybody help with this please?

Discussion in 'Software' started by jodevizes, Jan 1, 2011.

  1. jodevizes

    jodevizes Private E-2

    Hi,
    I was looking in Awstats for my website and in Robots/Visitors I saw these strange ones :-

    bot[\s_+:,\.\;\/\\-]

    [\s_+:,\.\;\/\\-]bot

    discovery

    checker

    these are in addition to the many unknown robots which are identified by bot or empty user agent string or hit on robot.txt
    I think these may not be good guys but is there any way of stopping them wandering around my website?
     
  2. chaslang

    chaslang MajorGeeks Admin - Master Malware Expert Staff Member

    I don't believe these are malware or issues. As far as I know those are just Java scripts to detect bots. I will move this thread to the Software Forum where other people who are admins of websites can add their comments.
     
  3. Caliban

    Caliban I don't need no steenkin' title!

    Greetings, jodevizes.

    You could use the 'User-Agent' 'Disallow' declarations. For example, in your robots.txt file:

    Code:
    User-agent: checker
    Disallow: /
    Be advised: I hope the 'robot.txt' in your post is a typo - the file must be named 'robots.txt' (sans quotes).
     
  4. pattyandme

    pattyandme Private E-2

    honestly i dont think robots are even a consideration when it comes to search engines they use more of the read page for whats on it seo they are depreaceated as all tech.
    Robots were once used but major engines ignore the statement altogether
     
  5. Caliban

    Caliban I don't need no steenkin' title!

    Agreed, when it comes to search engines. Unfortunately, however, many webstat analysis programs still use the robots protocol for sampling purposes - the really good ones can differentiate between useful site hits and random crawlers, but they are few and far between.
     
  6. jodevizes

    jodevizes Private E-2

    Thank very much for your helpful replies, you have put my mind at rest. There are so many bad guys around it is hard to know just who is who.

    If I try the User Agent disallow, will that stop all the crawlers or just the unnamed ones?
     
  7. Caliban

    Caliban I don't need no steenkin' title!

    In theory, the user-agent wildcard '*' followed by the 'Disallow /' statement should tell all robots to ignore your site. In practice, however, robots can ignore any robots.txt. 'Good' robots (Google, etc.) tend to obey your rules - bad robots just ignore and search for email addresses, etc.

    Bottom line: don't depend on a robots.txt file for site protection - that role must be assumed by good firewall and server exclude protocols.
     

MajorGeeks.Com Menu

Downloads All In One Tweaks \ Android \ Anti-Malware \ Anti-Virus \ Appearance \ Backup \ Browsers \ CD\DVD\Blu-Ray \ Covert Ops \ Drive Utilities \ Drivers \ Graphics \ Internet Tools \ Multimedia \ Networking \ Office Tools \ PC Games \ System Tools \ Mac/Apple/Ipad Downloads

Other News: Top Downloads \ News (Tech) \ Off Base (Other Websites News) \ Way Off Base (Offbeat Stories and Pics)

Social: Facebook \ YouTube \ Twitter \ Tumblr \ Pintrest \ RSS Feeds