XING Devblog

How to write an OAuth implementation

| Posted by

This is the second part of a two part series on our own OAuth implementation. Read the first part to find out why we decided to write an own implementation.

After we decided that it’s best to write our own OAuth implementation, we started with some planning. We identified three properties, that our new implementation should fulfil.

  1. Compatibility with the “oauth” gem
  2. Run-time efficiency
  3. Fixing the bug

Our biggest fear was that certain requests that were previously accepted would get rejected by our new implementation. We thought that we might get a little detail in the implementation wrong or that the gem allowed certain edge cases we didn’t know about.

Fail fast and without impact

We chose to run the new implementation in parallel with the current one and use our live API traffic to figure out how good the new implementation was. To do this, we kept the “oauth” gem in charge of deciding whether the request was valid. After the gem version did their judgement, we called our own implementation and compared the results.

As we didn’t know how efficient our own implementation would be, we decided to only test a very small percentage of the overall traffic against it. For limiting the traffic, we used a very simple method – the Kernel#rand method. Of course we knew that this isn’t a very exact way to get a fixed percentage of the traffic, but it proved to be a good estimate and was at the same time easy and fast to implement.

check_new_oauth_implementation(old_oauth) if rand < 0.01

We setup statsd buckets to measure the results of our comparison. To display the data and track our progress, we configured graphs in Graphite.

first-attemptWe deployed our first and very naive version, after we put everything in place. The Graphite graph validated our assumption: our implementation was naive. The majority of requests, that the new implementation processed, were wrong. The first orange line (“r1”) in the graph shows that.

Now that we knew that the implementation was wrong, we wanted to find out what it did wrong. The graph didn’t show the causes of the errors. That’s why we wanted to log differences in the Signature Base String.

We chose Redis to store the data, because it’s fast, it was really easy to implement a ring buffer and we were already using Redis to store other data. To implement the ring buffer, we only had to combine the lpush and ltrim commands.

def check_new_oauth_implementation(old_oauth)
  # ...

  if old_oauth.result != new_oauth.result
    log_data = {
                 old: old_oauth.basestring,
                 new: new_oauth.basestring
               }.to_json
    # redis manages an established "hiredis" connection
    redis.lpush('oauth_basestring_mismatch', log_data)
    redis.ltrim('oauth_basestring_mismatch', 0, 99)
  end
end

With the help of this log we were able to spot some of the mistakes that the implementation made and deployed the next version (“r2” line). Unfortunately this version made things worse. The graph clearly shows this. Whoops! We decided to call it a day, and to continue tomorrow.

fixing-bugs-increasing-trafficThe next day we fixed the stupid bug (“r3”), that we made and continued to monitor the contents of the Redis-based ring buffer. You can see the progress from that day in the next picture. With release “r4” and “r5” we fixed more implementation differences.

Once the “valid” line was matching the “total” line and the “invalid” line got flat, we were sure that our implementation was good enough.

Is it efficient?

As a next step, we wanted to find out how efficient our implementation was.

We increased the floating-point number at our “rand” switch, to test a higher percentage of our traffic and started to monitor the performance implications in Logjam.

Logjam is the tool that we use at XING to find performance hot spots and errors in our services. It’s similar to New Relic with the added benefit of being Open-Source and hack-able. Logjam is developed and maintained by Stefan Kaes, who’s a XING colleague.

It utilizes the time_bandits gem to measure in which parts of our Rails code a request spends which time.

For the XING API we implemented a custom time_bandits consumer that was responsible to measure the time spent to verify OAuth signatures. We closely monitored this time, as we gradually increased the number of requests that the new implementation was receiving (“r6” and “r7” line).

Fortunately there was no serious performance impact. The next day, we were confident enough, to check 100% of our requests against both OAuth implementations.

Yay, bugs!

Even with 100% of the API traffic, the graphs showed no spikes in errors. At first we were really happy about this, but soon we became skeptical. There had to be some edge-cases that we missed so far, we thought. As we once again had a look at contents of our ring-buffer log, we knew that our gut feeling was right.

The Signature Base String comparison showed that some consumers were duplicating OAuth parameters. The same oauth parameter was sent in the Form-Encoded Body and in the Request URI Query.

POST /v1/users/me/converstations?oauth_consumer_key=barbaz
Content-Type: application/x-www-form-urlencoded

oauth_consumer_key=barbaz

So far we assumed that parameters in the body and in the query part of the URI get combined for the Signature Base String calculation. But how exactly? We thought that they are unique, so that duplicating them has no visible effect in the Signature Base String.

We were wrong. As it turned out they both have to be included in the Signature Base String. Those where the kind of implementation details we were afraid of.

Expectation:

POST&/v1/users/me&oauth_consumer_key=barbaz

Reality:

POST&/v1/users/me&oauth_consumer_key=barbaz&oauth_consumer=barbaz

Being really sure

We continued to run the old and new implementation in parallel for several weeks. During this time we found some even more remote edge cases. We had to consult the RFC a few times, and some of these resulted in further refinements, but we had done enough to wipe away our doubts. We decided to put the new implementation in charge of checking the authenticity of all API requests.

About the author

Jan AhrensJan Ahrens works as a Software Engineer at XING. Most of the time you see a Vim window on his screen. He's secretly in love with Haskell and on his way to become a Foosball champion. XING Profile »


Leave a Reply

Your email address will not be published. Required fields are marked *