How Search Engine Robots work and read the website?

Last Updated on July 31, 2021 by Subhash Jain

When the search engine robot visits web page, it looks

  • at the visible text on the page,
  • the content of the various tags in web page’s source code (title tag, meta tags, etc.), and
  • the hyperlinks on your page.

From the words and the links that the robot finds, the search engine decides what web page is about.

There are many factors used to figure out what “matters” and each search engine has its own algorithm in order to evaluate and process the information. Depending on how the robot is set up through the search engine, the information is indexed and then delivered to the search engine’s database.

The information delivered to the databases then becomes part of the search engine and directory ranking process. When the search engine visitor submits their query, the search engine digs through its database to give the final listing that is displayed on the results page.

The search engine databases update at varying times. Once you are in the search engine databases, the robots keep visiting you periodically, to pick up any changes to your pages, and to make sure they have the latest info.

The number of times you are visited depends on how the search engine sets up its visits, which can vary per search engine.

Sometimes visiting robots are unable to access the website they are visiting. If your site is down, or you are experiencing huge amounts of traffic, the robot may not be able to access your site. When this happens, the website may not be re-indexed, depending on the frequency of the robot visits to your website. In most cases, robots that cannot access your pages will try again later, hoping that your site will be accessible then.

Submitting your site to the major search engines: that will help with the “can’t find it” problem. Even having links pointing back to your site can be enough to attract the search engine robots. Google, for example, suggests that you may not have to submit your pages; they will find your site if you have a link pointing back to it from at least one other site on the web.

If the robots can find your site but can’t make sense of it, then you may need to look at the content and technology used on your pages. Frames, Flash, dynamically generated pages, and invalid HTML source code can cause problems when the search engine robot tries to access your web pages. While some search engines are beginning to be able to index dynamically generated pages and Flash (e.g. Google and AllTheWeb), use of some of these technologies can hinder your ability to be indexed by the search engine robots.

Text in images cannot be read by the search engine robots. Using ALT image text is an important way to help the robots “read” your images. Websites with extensive images rely heavily on ALT text to present their content.

How can you get the most Out Of Indexing?
If you know what to “feed” the spidering robots you will help yourself with search engine ranking.

Having a website full of good content is the major factor. Search engines exist to serve their visitors, not to rank your website. You need to be sure to present yourself in your site in the way that will be most useful to the search engine visitor. Each search engine has its own idea of what is important in a page, but they all value text highly. Making sure that the text on your pages includes your most important keyword phrases will help the search engine evaluate the content of those pages.

Making sure that you have good title and meta tags will further assist the search engines in understanding what your page is about.

Another important consideration is that of keeping all of your pages within a small number of “clicks” from your top page. Many robots will not follow links more than two or three levels deep.

Testing Your Website For Search Engine Robot Accessibility
To get an idea just what the search engine robot “sees” on web page, you can look at the Sim Spider tool. You may be surprised at how different your site looks to the robot. You can find this tool at

http://www.searchengineworld.com/cgi-bin/sim_spider.cgi

You will see text and ALT image text show up in the results. If your entire website is built in Flash, you will see nothing at all because robots don’t understand Flash movies.

The Bottom Line When it comes to search engine robots, think simply. Lots of good content and text, hyperlinks the robots can follow, optimization of your pages, topical links pointing back to your site and a sitemap will help insure the best results when the robots come visiting.

Resources
SpiderSpotting – Search Engine Watch
http://searchenginewatch.com/webmasters/spiders.html

Robotstxt.org
List of robots and protocols for setting up a robots.txt file.
http://www.robotstxt.org/

Spider-Food
Tutorials, forums and articles about Search Engine spiders and Search Engine Marketing.
http://spider-food.net/

Spiderhunter.com
Articles and resources about tracking Search Engine spiders.
http://www.spiderhunter.com/

Sim Spider Search Engine Robot Simulator
Search Engine World has a spider that simulates what the Search Engine robots read from your website.
http://www.searchengineworld.com/cgi-bin/sim_spider.cgi

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *