Friday, July 6, 2012

Build A Computer Robot

Some computer robots are good and some are evil.


Computer robots (also called bots, crawlers or spiders) are software programs that search the Internet, collecting information from web pages. They are not viruses because they do not inject software into other computers--they just gather information. The software runs on the robot owner's computer and does not use the resources of any other system. Some robots gather information the owners of the visited web pages would rather not be released, but computer robots do have several benign uses and there are some you would probably want to visit your business web page.


Instructions








1. Create a clear algorithm first. Hurrying through this part is the most common mistake made by robot makers. You should know what you want the bot to do in every situation, exactly what information you want to collect and when you want the bot to stop collecting information. The typical structure of a bot is a single loop: Look at the input list of directories, for each file in each directory collect the information you are looking for, if you find links to new directories put them in your list of directories, always check for duplications and keep doing this until the directory list is empty. The list of found items should be recorded in a form that is easy to retrieve.


2. Choose the language and encode the algorithm. You can write a computer robot in any language, but some languages are easier than others. If you are an expert in a particular language then this is the obvious language of choice. If you are well versed in a variety of languages or will be learning a language just to write the bot, some languages are better than others. The languages of choice for bot writers are Perl and Ruby. They are both freely available on the web, relatively easy to learn, run quickly and have features that make bot writing simple for non experts. For example, both languages have simple instructions to capture all the files in a directory.


3. Test the bot. It is easy to create a bot that runs without crashing, but does not collect the proper information. You should test the bot on your own website where you know which information should be collected. The more websites you can test your website on the better.

Tags: collecting information, gather information, information should, list directories, some languages