Bots Compliance tasks

Search engines offer extensive information to webmasters for websites'indexation.
Google Seach console and educational resources (second window).
Bing Webmaster help and how-to (second window).
Yahoo Webmaster resources (second window).
Yandex Webmaster Tools (second window).

Search Engines Resources

This information covers, among others:

  • Detailed webmaster's help
  • Fle formats Google can index
  • Non indexable file formats
  • HTML markup
  • Schemas.

Indexable file formats

Google precisely explains which file formats GoogleBot can index (second window) .

Apart from HTML, XML and WRML pages, GoogleBot indexes a long list of formats, some of which are:

  • .pdf (Adobe portable format or others) — to the extent of using OCR recognition , as needed
  • Microsoft Office files (Word, Excel, PowerPoint)
  • OpenOffice files (text, spreadsheet, presentation)
  • .rtf (rich text format)
  • Text files (.txt and various source files)
  • .tex (Text/Latex)
  • .swf (Adobe flash), Silverlight and other rich media files — as possible, for the text it extract from it.
  • .svg (scalable vector graphics) — an image format written as clear text. Search engines are not yet indexing other image and video formats.

Non-indexable file formats

2. Non-indexable file formats shall have matching text (might require some creativity). For a video or rich media format for example, inserting the full text of the animation in the page or offering to download it might allow to have the content fully indexed.

3. All META tags dedicated to robots shall have the proper content Easy and systematic tasks: feeding the TITLE, DESCRIPTION and ROBOTS meta tags. Adding an attribute REL="nofollow" to external links.

HTML markup Advice

4. The HTML tag hierarchy must obey W3C rules. W3C rules are a complicated set of rules. Browsers follow these rules to display web pages. A limited set of simple rules: the presence of one H1 tag for the main page's heading, one or more H2 tags, and one or more H3 tags within each H2 tag, as needed.

5. Scripts might hide content to search engines. Scripting interacts with text and stylesheet. This might affect how search engines index text or do not index it. A few rules.

More about Schemas

Bing offers a page dedicated to schemas, titled "Marking up your site: overview". (second window)

A word about schemas: schemas are intended for schemas for structured data on the Internet, to embed on web pages, in email messages, and beyond. They form a vocabulary usable for many different encodings. They cover entities, relationships, actions. Thus, schemas help clarify the content indexed by bots. They also recommend to submit a video sitemap.

Yandex resources

Yandex is a leading search engine in Russia, which provides both English and Russian languages. Yandex offers eBooks, demos, images, tutorials and various Wordpress and Weebly resources. Among other resources, Yandex offers quick SEO guides for each main search engine and various webmaster tools Yandex Webmaster (second window).

Here we discuss the technical aspects of web pages and web files' compliance, for robots' indexation.

For custom advice, consulting and services, see Compliance vs SEO

Technical advice

Images

IMG tag (GIF, JPG, PNG): if entered, the ALT attribute will recount the image to search engines.
Background images: use the title or contextmenu (html5 only) attribute to do so.

Video

Google recommends to mark up the video content in the body of the page with schema.org. Doing so provides information that allows search engines to index the video. The title and contextmenu attributes may help describe a video. Google adds "Search engines and other sites can recognise it and may use it to improve the display of video content on a page or in search results". See the answer (second window).
Search engines also recommend to submit a sitemap of the videos.

Do NOT use images to display important names or text. Robots do not recognise the text contained in graphics. Instead, we can use the ALT attribute if the content can't include regular HTML. Or we can use a stylesheet and write textual content ON the image, within HTML tags.

To download this page as a quick guide, click Download to save it as PDF file Download

Want to hear about new columns and resources:

Have a specific question? Do not hesitate to contact us.

Leave us a

Go Top