Sunday, July 3, 2011

Understanding the event driven model for concurrency

For some time now there’s been this idea that the normal concurrent model for web applications that many of us are used to is not the best way to do things for large amounts of concurrent clients.

The model I’m talking about is of course the One Request - One Thread model . I started to read about it and the main concern show towards this model, is that it can’t scale very well to many requests (really really many) because of the cost of threads. The cost of threads is reflected mainly in memory consumption and context switching. Other concern is the complexity associated with thread programming.

I started to look then at the alternative way that was being all talked about on the web (mainly because of Node.js) which is the Evented, or Event Driven model.

So here I will try to introduce how this model works in an easy way.

The general idea is the utilization of the Reactor Pattern.

The Reactor Pattern works normally as a single threaded service loop, which in every iteration of the loop checks for I/O events in its registered handles and dispatches these events to suitable handlers.

The loop must be continually running without blocking operations. The only moment the loop blocks is when no I/O Events are received, but the moment there is one, it must handles it in a fast way and continue.

Let’s now show an example of EventMachine (A Ruby implementation of the Reactor Pattern), and then we show how the Reactor looks like.

The example is a Server, that when receiving one connection makes a query call to MongoDB and send the query result back to the calling client:


  1. Module Server
  2.   def receive_data(received)
  3.        db = EM::Mongo::Connection.new.db('db')
  4.        collection = db.collection('test')
  5.  
  6.        collection.find('_id' => 123) do |res|
  7.          send_data res.inspect
  8.       end
  9.  end
  10. end
  11.  
  12. EM.run do
  13.  EM.start_server '0.0.0.0', 3001, Server
  14. end

Ok so what does this code do?. Well it starts the Reactor in the EM.run call. Then it initializes an Evented Server in port 3001 (The creation of the server and registration happens before the loop starts iterating) and add it to the Reactor list of descriptors.

Now the reactor starts looping, it will go through its descriptors and wait until one of them change state. Right now there is only the server socket descriptor. So for now the event loop is in waiting state.

If we now connect for example with Telnet to port 3001, the Reactor will detect this and check the kind of I/O ready event that just happened on the server socket(in this case it will be an “accept ready” or similar) and then register a new descriptor for the new socket connection. Then the reactor is waiting again for events.

In the moment that the connection is received from the telnet session, in our code example, EventMachine will create a new instance of the Server module (more exactly an instance of an anonymous class that includes the module).

If for example we now send a text from our telnet session the reactor will detect that our last descriptor is ready for a I/O read operation and will process it accordingly. In our code example, the reactor will call the handler that in our case is the receive_data method of our new Server instance.

Next when the data is received in the receive_data method, we make a call to MongoDB. A simple query. But again this operation is non blocking. Internally the “find” method will create a EventMachine connection that will register another descriptor in the reactor. So now we have 3 descriptors registered in the reactor. When the query is ready processing and the data is available, the descriptor will signal the event, the reactor will notice it and then call the specific handler for the event. In this case the code block passed to the call to find. In this callback, we’ll send the data returned from mongo to our connected socket with the send_data method.

To better understand how the Reactor works. I include next some extracts from the EventMachine Java implementation.

The main class of the EventMachine java implementation (which is the implementation used when working with JRuby, and i understand it better than the C++ version which is the one used in normal Ruby) is the EmReactor class. and the main method on it is the run method, which is the one that runs the event loop:
  1.  
  2. public void run() {
  3.        try {
  4.            mySelector = Selector.open();
  5.            bRunReactor = true;
  6.        } catch (IOException e) {
  7.            throw new RuntimeException ("Could not open selector", e);
  8.        }
  9.  
  10.        while (bRunReactor) {
  11.            runLoopbreaks();
  12.            if (!bRunReactor) break;
  13.  
  14.            runTimers();
  15.            if (!bRunReactor) break;
  16.  
  17.            removeUnboundConnections();
  18.            checkIO();
  19.            addNewConnections();
  20.            processIO();
  21.        }
  22.  
  23.        close();
  24.    }
  25.  

In this code we can see the main parts of the Reactor Pattern:

First the infinite loop (or until bRunReactor is false).

And if we see the last three lines of the While loop we can see the steps we have been talking about.

checkIO(): This method “listen” for I/O events in the registered descriptors. Blocking until one I/O channel is ready for something. In java it is implemented with Selector.select(timeout)

addNewConnections(): This method checks for when a new connection is registering with the looper. For example in our Mongo find call.

processIO(): This method loops to the descriptors that are in ready state, check what state they are in (write ready, read ready, etc) and dispatch to the handlers to deal with the event.

So that’s it.. That is the basics of the working of the Reactor Pattern.

No comments: