XING Devblog

The war of semantics – that is no war

| Posted by

Website texts serve to provide information and we have various HTML elements available to give texts “more meaning”. That could of course be a paragraph, a header, a list, a blockquote tag, a link or – thanks to HTML5 – a navigation, a footer, an article or one of the other elements. From a machine (e.g. search engine) perspective, these elements each contain a collection of numbers or letters. To start with, machines aren’t really sure what to make of such collections.

Microformats

Tantek Çelik and Dan Cederholm set up microformats.org in 2005 based on the credo “humans first, machines second”. Their aim is to help people to get better search results by supporting machines.

To this end, extra markup is used to make a page’s content “more understandable” for machines. Reserved class selectors assume the main role for microformats, which many people avoid as they involve more markup work. This was also a subject of debate at XING, but in the end microformats were given the green light. The first part of the platform to use microformats were user profiles, the hCard and hCalendar to be more specific.

Here’s what a business card looks like when using microformats:

<div class="vcard">
  <img id="photo" src="userPic.jpg" class="photo" width="140" height="185" alt="userName">
 <a class="url fn" href="http://www.cool.inc">
   <span class="given-name">Max</span> <span class="family-name">Mustermann</span>
 </a>
 <div class="org">Cool Inc.</div>
 <a class="email" href="mailto:dude@cool.tld" type="work">dude@cool.tld</a>
 <div class="adr" type="work">
  <div class="street-address">My Street 1</div>
  <span class="postal-code">20345</span> <span class="locality">Hamburg</span>
  <span class="country-name">Germany</span>
 </div>
 <div class="tel">+49-04-123456789</div>
</div>

Microdata

The onset of HTML5 made us think about changing our approach to semantics. The idea was to use less extra markup and more attributes in existing markup instead. Google, Bing and Yahoo! also shared this idea when they created schema.org in 2011. Both microformats and microdata require vocabulary. To this end, there is data-vocabulary.com, but the Googles and Bings of this world decided to create their own which you can find at schema.org.

Microdata was recently added to the events, companies and jobs sections on XING.

Microdata offers a key benefit: they need less markup and provide far more graduations for certain areas, which in turn helps with semantics. That way, microdata such as a location can be expressed far more precisely with schema.org vocabulary than is possible with just an .adr (perhaps in combination with .geo), as would be the case with microformats. Examples of locations could be “landform”, “localbusiness”, “touristattraction”, to name but a few.

Here’s what a business card looks like when using microdata:

<div itemscope itemtype="http://schema.org/Person">
  <span itemprop="name">Max Mustermann</span>
  <img src="janedoe.jpg" itemprop="image" />
  <span itemprop="jobTitle">Head of Chaos</span>
  <div itemprop="address" itemscope itemtype="http://schema.org/PostalAddress">
    <span itemprop="streetAddress">
      My Street 1
    </span>
    <span itemprop="addressLocality">Hamburg</span>,
    <span itemprop="postalCode">20345</span>
  </div>
  <span itemprop="telephone">+49-04-123456789</span>
  <a href="mailto:dude@cool.tld" itemprop="email">
    dude@cool.tld
  </a>
  <a href="http://www.cool.inc" itemprop="url">cool.inc</a>
</div>

A downside of microdata is the low adoption rate. Microformats have proven popular with Google using them in their search results, e.g. for starred ratings.

<div class="hreview">
  <span class="rating">
  <span class="value">8</span> of <span class="best">10</span>
  </span>
</div>

This could result in something like this:

a rich snippet in Google with rating stars

This could be a result of semantically marked content. Beautiful rating stars.

To achieve the same thing with microdata, you’ll need something like this:

<p itemprop="aggregateRating" itemscope="" itemtype="http://schema.org/AggregateRating">
  <span itemprop="ratingValue">5</span><span itemprop="ratingCount">3</span>
</p>

So microformats are doing well and have reached the seven-year mark. They have in fact proven most popular when it comes to structured page content with around 70% of structured content using microformats. Microdata simply isn’t picking up as much speed as microformats, despite being propagated by Google et al.

So why choose microdata?

That’s a valid question. Why choose microdata when microformats is already well established? Well, the answer is simply that microdata will eventually come out on top. If a search engine or, in this case, three search engines come up with and recommend a technique, you can be fairly sure that it’s going to become an established standard sooner or later.

Isn’t it worth waiting until microdata has been universally adopted? I don’t think so. As was the case with microformats, developers need to be a bit daring and give this new technology a try as it will act as an incentive for search engine operators to use microdata for more and improved indexing. If developers hadn’t started using microformats on their sites over five years ago, it would never have gotten off the ground. So let’s all start using microdata on our sites. It’ll be like saying to search engine operators “Hey! We’ve got something for you, so come and get it!”

Aside from the streamline and more precise markup…

There are the successors to microformats

Inspired by this new semantic markup development, the makers of microformats came up with a more modern version. Yes, you guessed it, microformats 2. This approach is similar to that of microdata which can be easily checked to make sure it has been nested correctly by using the online tool Live Microdata. All you need to do is enter your microdata and you’ll receive a JSON object that’s so well presented that you can quickly identify any errors.

Microformats 2 is similar to this. It uses prefixes to generate a hierarchy within the content’s markup. Here’s a list of the prefixes in the new format:

  • h-* for root class names (e.g. h-card, h-event)
  • p-* for simple properties (e.g. p-fn, p-summary)
  • u-* for URL properties (e.g. u-url, u-photo)
  • dt-* for date time properties (e.g. dt-start, dt-bday)
  • e-* for element tree properties (e.g. e-content)

Here’s a simple example using microformats 2:

<a class="h-card" href="http://www.cool.inc">
 <img alt="Max Mustermann" src="userPic.jpg" />
</a>

And here’s the JSON object I mentioned above:

{
  "items": [{ 
    "type": ["h-card"],
    "properties": {
      "name": ["Max Mustermann"],
      "url": ["http://www.cool.inc"],
      "photo": ["userPic.jpg"]
    }
  }]
}

Wow, that looks really cool now. You can test your microformats 2 online with this testing parser that uses node.js.

And this is very similar to the microdata output, as you can see below:

{
  "items": [
    {
      "type": [
        "http://schema.org/Person"
      ],
      "properties": {
        "name": [
          "Max Mustermann"
        ],
        "image": [
          "http://example.com/janedoe.jpg"
        ],
        "url": [
          "http://www.cool.inc/"
        ]
      }
    }
  ]
}

Summary

No matter which format you use, microformats really help to give a page’s content more meaning. Microformats and microformats 2 manage it thanks to special class selectors, but this is far from ideal. Microformats 2 reduce the level of extra markup required, but they’re still in their infancy. Microdata is more “streamline”, and “only” uses additional attributes permitted in HTML5 that don’t “cause any damage”.

Microformats are already well established, but it took several years before they became universal. Microdata, i.e. the vocabulary generated by Google, Bing and Yahoo!, is still at an early stage and not currently used on a large scale. Having said that, if these three search engines continue to use the microdata specification, it’s bound to catch on at some point so it makes sense for developers to jump on board early, which will in turn encourage search engines to join the club and improve their indexing.

About the author

Nils LaukNils Lauk works as a Frontend Architect at XING. He loves semantics and accessibility. He's also a big fan of microformats.


Leave a Reply

Your email address will not be published. Please fill out the required fields.

  • You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>