Wednesday, April 17, 2013

Pro Spring Security and OAUTH 2

OAuth and Spring Security

My upcoming Pro Spring Security is heavily focused on the inner workings of the Spring Security core framework and how everything fit together under the hood.

And although I do cover very important providers for authentication and authorization (including LDAP, Database, CAS, OpenID, etc) I don’t cover another important provider which is OAuth. The reason it is not covered in the book is because it is not part of the core of Spring Security but instead it is part of the Spring Security extensions project. However I know OAuth is a very popular authorization protocol and I will cover it here as a complement to the information on the book.

In this Post I want to introduce you to using OAuth with Spring Security. Many of the concepts will not be straightforward to understand, and I recommend you to read the book Pro Spring Security to understand the architecture and design of Spring Security and how it works internally.

First let’s take an overall look at the OAuth 2 protocol. I won’t be talking anything about the original OAuth 1 in this post.

OAuth 2 is an authorization protocol that specifies the ways in which authorization can be granted to certain clients to access a determined set of resources.

Directly taken from the specification, we can see in the following paragraph that OAuth 2 defines a specific set of Roles interacting in the security process:

resource owner
     An entity capable of granting access to a protected resource.
     When the resource owner is a person, it is referred to as an end-
     user.
 
resource server
     The server hosting the protected resources, capable of accepting
     and responding to protected resource requests using access tokens.
 
client
     An application making protected resource requests on behalf of the
     resource owner and with its authorization.  The term client does
     not imply any particular implementation characteristics (e.g.
     whether the application executes on a server, a desktop, or other
     devices).
 
authorization server
     The server issuing access tokens to the client after successfully
     authenticating the resource owner and obtaining authorization.

The overall interaction that happens in the protocol is also specified in the specification, and it goes something like this:

You can see the interaction between the 4 roles defined by the protocol.

The main idea to take in mind in the traditional scenario is that an application (the Client) will try to access another application’s secured resources (Resource Server) in the name of a user (Resource Owner). In order to do so, the user needs to prove its identity, and to do so it contacts the Authorization Server presenting its credentials. The Authorization Server authenticates the user and grants the client/user combination a token that can be used to access the particular resource. In many cases the Authorization Server and Resource Server live in the same place. For example if you try to access your facebook account from another application, you will get redirected to the facebook login (Authorization Server) and then you will grant certain permissions to be able to access your facebook wall (a resource in the Resource Server).

Ok, let’s see how all that fits in the Spring Security OAuth project. There will be a couple of things that don’t work exactly the way that sites like Facebook or Twitter work, but we get to that later.

I will create both a OAuth service provider (Authentication Server but also Resource Server) and a OAuth client (Client Application) The first thing I’ll do is to check out the source of the project from Github, which I normally do with open source projects.

git clone git://github.com/SpringSource/spring-security-oauth.git

Then checkout the latest stable tag with git checkout 1.0.1.RELEASE

Next we will be playing with the OAuth 2 samples that come with the project.

I build everything with the command mvn clean install -P bootstrap

We will then open the project sparklr in the samples/oauth2 folder. This project will serve as both the Authorization Server and Resource Server from our previous definitions. Sparklr is a project that offers pictures secured with OAuth.

We will also open the tonr application in the same directory. This is the OAuth client application.

Both projects are built from the previous maven build step. I took both war files and copy them to a Tomcat 7’s webapps directory that I have in my computer under the names sparklr2.war and tonr2.war. The I started Tomcat and visited the address http://localhost:8080/tonr2/. I was received by the page in the following figure:

If I click in the “sparklr pics” tab I am presented with a login screen particular to the tonr application (The Client in this case has a password of its own):

I login with the default username and password and I am redirected to the sparklr login screen now.

Remember this is the Authorization Server as well as the Resource Server, where the resources are pictures.

Again I login with the default username and password. This will ask for confirmation that I am allowing Tonr to access my personal resources on my behalf.

I click on Authorize. I am redirected back to Tonr and now I can see my pictures stored on Sparklr.

Ok, that is the functionality working. Now let’s see how it all happens under the hood.

When we visit the URL http://localhost:8080/tonr2/sparklr/photos by clicking on the tab, the request will arrive at the filter OAuth2ClientContextFilter which is defined by the xml namespaced element <oauth:client id="oauth2ClientFilter" /> in the spring-servlet.xml of the tonr project, which is referenced in the <http> element at the top of said file. Like this:

<http access-denied-page="/login.jsp?authorization_error=true" xmlns="http://www.springframework.org/schema/security">

                <intercept-url pattern="/sparklr/**" access="ROLE_USER" />

                <intercept-url pattern="/facebook/**" access="ROLE_USER" />

                <intercept-url pattern="/**" access="IS_AUTHENTICATED_ANONYMOUSLY" />

                <form-login authentication-failure-url="/login.jsp?authentication_error=true" default-target-url="/index.jsp"

                        login-page="/login.jsp" login-processing-url="/login.do" />

                <logout logout-success-url="/index.jsp" logout-url="/logout.do" />

                <anonymous />

                <custom-filter ref="oauth2ClientFilter" after="EXCEPTION_TRANSLATION_FILTER" />

        </http>

You can see that the access needed for accessing anything with the URL /sparklr/** requires a user with Role ROLE_USER. That means that when trying to access that URL before being logged in an AccessDeniedException will be thrown. The AccessDeniedException will be catched by the OAuth2ClientContextFilter which won’t do anything at all with this exception but rethrow it to be handled by the standard exception handling process of Spring Security.

The application is automatically redirected to the login.jsp.

After we login and still trying to reach the URL /sparklr/photos, the OAuth2ClientContextFilter is reached again. The request will get all the way through the SparklrServiceImpl which is the service in charge of reaching to Sparklr to retrieve the photos. This class will use a RestOperations instance (in particular an instance of OAuth2RestTemplate) to try to retrieve the photos from the remote Sparklr service using the following URL http://localhost:8080/sparklr2/photos?format=xml

This is what the OAuth2RestTemplate does internally when trying to contact the given URL:

This is what the OAuth2RestTemplate does internally when trying to contact the given URL:

It will first try to obtain the Access Token (in the form of an instance of OAuth2AccessToken) from the client context. In particular from the configured DefaultOAuth2ClientContext.

As we still don't have any access token for Sparklr, there is nothing stored on the DefaultOAuth2ClientContext. So the access token will be null at this point.

Next a new AccessTokenRequest will be obtained from the DefaultOAuth2ClientContext. The AccessTokenRequest and the DefaultOAuth2ClientContext are configured automatically by Spring at startup by the use of the namespaced bean definition <oauth:rest-template resource="sparklr" />. That definition will be parsed by the class RestTemplateBeanDefinitionParser.

Next as there is no access token on the context, an instance AccessTokenProvider will be used to try and retrieve the token. In particular an instance of AccessTokenProviderChain will be used. The AccessTokenProviderChain’s obtainAccessToken method will be called with the AccessTokenRequest and a AuthorizationCodeResourceDetails instance. This instance is configured by the resource attribute in the <oauth:rest-template> bean that references a <oauth:resource> element. Like this:

<oauth:resource id="sparklr" type="authorization_code" client-id="tonr" client-secret="secret"

                access-token-uri="${accessTokenUri}" user-authorization-uri="${userAuthorizationUri}" scope="read,write" />

<oauth:rest-template resource="sparklr" />

The <oauth:resource> element will be parsed by the class ResourceBeanDefinitionParser at startup. In this particular case this will create an instance of the class AuthorizationCodeResourceDetails. In this instance, preconfigured client-id and client-secret will be set. Also configured in this instance are the required URLs for authenticating and requesting the Access Token.

The AccessTokenProviderChain’s obtainAccessToken method will go through the configured AccessTokenProvider instances to try and obtain the token. In the current example the only configured AccessTokenProvider is an instance of AuthorizationCodeAccessTokenProvider. This class will try to make a POST call to the URL http://localhost:8080/sparklr2/oauth/authorize to get the authorization code for this client/user.

The call to the /sparklr2/oauth/authorize endpoint will arrive on the FilterSecurityInterceptor for the Sparklr application. The interceptor will notice that the required URL requires a user with role ROLE_USER (in Sparklr) to be logged in to be able to access the contents from this endpoint. This is enforced by the following chunk of XML in the spring-servlet.xml of the Sparklr application (In particular look at the line that intercepts the URL “/oauth/**”):

<http access-denied-page="/login.jsp?authorization_error=true" disable-url-rewriting="true"

                xmlns="http://www.springframework.org/schema/security">

                <intercept-url pattern="/oauth/**" access="ROLE_USER" />

                <intercept-url pattern="/**" access="IS_AUTHENTICATED_ANONYMOUSLY" />

                <form-login authentication-failure-url="/login.jsp?authentication_error=true" default-target-url="/index.jsp"

                        login-page="/login.jsp" login-processing-url="/login.do" />

                <logout logout-success-url="/index.jsp" logout-url="/logout.do" />

                <anonymous />

        </http>

When the interceptor realizes that the security constraint is not met, it will throw an AccessDeniedException. This exception will be picked up by the ExceptionTranslationFilter which will at the end of a couple more redirections, will redirect to the URL http://localhost:8080/sparklr2/login.jsp.

This redirect will be received by the AuthorizationCodeAccessTokenProvider in the Tonr application which will then throw the exception UserRedirectRequiredException. That exception will be then be captured by the Oauth2ClientContextFilter which will make the call to the redirect URL returned by Sparklr and adding the required extra parameters. Something like: http://localhost:8080/sparklr2/login.jsp?response_type=code&client_id=tonr&scope=read+write&state=dWby7l&redirect_uri=http%3A%2F%2Flocalhost%3A8080%2Ftonr2%2Fsparklr%2Fphotos

We are presented now with the login form for Sparklr. When we click on the login button, the following happens:

Username and password will be send to the URL /sparklr2/login.do. This URL is configured to be managed by the UsernamePasswordAuthenticationFilter when the attribute login-processing-url="/login.do" is configured on the <form-login> element. The UsernamePasswordAuthenticationFilter will extract the username and password from the request and will call the ProviderManager implementation of AuthenticationManager.

The AuthenticationManager will find the DaoAuthenticationProvider that is configured with in memory user details thanks to the XML element show next:

        <authentication-manager alias="authenticationManager" xmlns="http://www.springframework.org/schema/security">

                <authentication-provider>

                        <user-service id="userDetailsService">

                                <user name="marissa" password="koala" authorities="ROLE_USER" />

                                <user name="paul" password="emu" authorities="ROLE_USER" />

                        </user-service>

                </authentication-provider>

        </authentication-manager>

The DaoAuthenticationProvider will be able then to find the user marissa with the corresponding role ROLE_USER and authenticate her.

After succesfully authenticating in Sparklr, the framework redirects to the  /sparklr2/oauth/authorize URL that was previously requested. This time access to the endpoint will be granted for the user that had just logged in.

The /oauth/authorize request in Sparklr is handled by the class org.springframework.security.oauth2.provider.endpoint.AuthorizationEndpoint that comes with the installation of the core Spring Security OAuth project. The use of this endpoint class is configured automatically when the following XML configuration is used:

<oauth:authorization-server client-details-service-ref="clientDetails" token-services-ref="tokenServices"

                user-approval-handler-ref="userApprovalHandler">

                <oauth:authorization-code />

                <oauth:implicit />

                <oauth:refresh-token />

                <oauth:client-credentials />

                <oauth:password />

        </oauth:authorization-server>

When that xml element (<oauth:authorization-server> ) is used, many things happen when it is parsed by the class AuthorizationServerBeanDefinitionParser which is a complex parser that handles many different elements and options. Some of them we will see later, but for now we are only looking at the AuthorizationEndpoint.

The AuthorizationEndpoint maps the request /oauth/authorize as I have just said before. The first thing it will do is to check the kind of petition that the client is doing, basically looking at the request param response_type and checking if its value is either “code” or “token” to know if an authorization code or an access token is being requested. For the current call, the value of this parameter is “code” as you may remember.

The next thing that will happen is that the AuthorizationEndpoint will create an instance of org.springframework.security.oauth2.provider.AuthorizationRequest. The main part of the AuthorizationRequest, is the use of the clientId parameter to retrieve the client details of the connecting client. In this requets, the clientId is “tonr” and the details are retrieved from an InMemoryClientDetailsService that get configured by the XML element:

<oauth:client-details-service id="clientDetails">

                <oauth:client client-id="my-trusted-client" authorized-grant-types="password,authorization_code,refresh_token,implicit"

                        authorities="ROLE_CLIENT, ROLE_TRUSTED_CLIENT" scope="read,write,trust" access-token-validity="60" />

                <oauth:client client-id="my-trusted-client-with-secret" authorized-grant-types="password,authorization_code,refresh_token,implicit"

                        secret="somesecret" authorities="ROLE_CLIENT, ROLE_TRUSTED_CLIENT" />

                <oauth:client client-id="my-client-with-secret" authorized-grant-types="client_credentials" authorities="ROLE_CLIENT"

                        scope="read" secret="secret" />

                <oauth:client client-id="my-less-trusted-client" authorized-grant-types="authorization_code,implicit"

                        authorities="ROLE_CLIENT" />

                <oauth:client client-id="my-less-trusted-autoapprove-client" authorized-grant-types="implicit"

                        authorities="ROLE_CLIENT" />

                <oauth:client client-id="my-client-with-registered-redirect" authorized-grant-types="authorization_code,client_credentials"

                        authorities="ROLE_CLIENT" redirect-uri="http://anywhere?key=value" scope="read,trust" />

                <oauth:client client-id="my-untrusted-client-with-registered-redirect" authorized-grant-types="authorization_code"

                        authorities="ROLE_CLIENT" redirect-uri="http://anywhere" scope="read" />

                <oauth:client client-id="tonr" resource-ids="sparklr" authorized-grant-types="authorization_code,implicit"

                        authorities="ROLE_CLIENT" scope="read,write" secret="secret" />

        </oauth:client-details-service>

As you can see in the XML chunk, “tonr” is configured as a client. When we retrieve the details of the client we get an instance of the ClientDetails object. ClientDetails source is shown next:

package org.springframework.security.oauth2.provider;

import java.io.Serializable;

import java.util.Collection;

import java.util.Map;

import java.util.Set;

import org.springframework.security.core.GrantedAuthority;

/**

 * Client details for OAuth 2

 *

 * @author Ryan Heaton

 */

public interface ClientDetails extends Serializable {

        /**

         * The client id.

         *

         * @return The client id.

         */

        String getClientId();

        /**

         * The resources that this client can access. Can be ignored by callers if empty.

         *

         * @return The resources of this client.

         */

        Set<String> getResourceIds();

        /**

         * Whether a secret is required to authenticate this client.

         *

         * @return Whether a secret is required to authenticate this client.

         */

        boolean isSecretRequired();

        /**

         * The client secret. Ignored if the {@link #isSecretRequired() secret isn't required}.

         *

         * @return The client secret.

         */

        String getClientSecret();

        /**

         * Whether this client is limited to a specific scope. If false, the scope of the authentication request will be

         * ignored.

         *

         * @return Whether this client is limited to a specific scope.

         */

        boolean isScoped();

        /**

         * The scope of this client. Empty if the client isn't scoped.

         *

         * @return The scope of this client.

         */

        Set<String> getScope();

        /**

         * The grant types for which this client is authorized.

         *

         * @return The grant types for which this client is authorized.

         */

        Set<String> getAuthorizedGrantTypes();

        /**

         * The pre-defined redirect URI for this client to use during the "authorization_code" access grant. See OAuth spec,

         * section 4.1.1.

         *

         * @return The pre-defined redirect URI for this client.

         */

        Set<String> getRegisteredRedirectUri();

        /**

         * Get the authorities that are granted to the OAuth client. Note that these are NOT the authorities that are

         * granted to the user with an authorized access token. Instead, these authorities are inherent to the client

         * itself.

         *

         * @return The authorities.

         */

        Collection<GrantedAuthority> getAuthorities();

        /**

         * The access token validity period for this client. Null if not set explicitly (implementations might use that fact

         * to provide a default value for instance).

         *

         * @return the access token validity period

         */

        Integer getAccessTokenValiditySeconds();

        /**

         * The refresh token validity period for this client. Zero or negative for default value set by token service.

         *

         * @return the refresh token validity period

         */

        Integer getRefreshTokenValiditySeconds();

        /**

         * Additional information for this client, not neeed by the vanilla OAuth protocol but might be useful, for example,

         * for storing descriptive information.

         *

         * @return a map of additional information

         */

        Map<String, Object> getAdditionalInformation();

}

After the AuthorizationRequest object is created, it is checked to see if it has been approved. As it is just created, it has not been yet approved and the AuthorizationEndpoint will forward the request to the URL forward:/oauth/confirm_access which is internally configured by default in the framework.

 The confirm_access URL needs to be handled in the application that we are using as authentication server, which is in this case is the Sparklr application. So in the Sparklr application the AccessConfirmationController is in charge of handling the request to the /oauth/confirm_access URL. It handles this URL by showing the page access_confirmation.jsp that contains the form for authorizing or denying access to the tonr client to the sparklr server

Now when you click on Authorize, the following happens:

A POST call is made to the URL /oauth/authorize in the Sparklr application. This call is handled by the class org.springframework.security.oauth2.provider.endpoint.AuthorizationEndpoint, in particular by the approveOrDeny method.

The approveOrDeny method will extract the AuthorizationRequest that is requesting access. The AuhtorizationRequest interface looks like:

package org.springframework.security.oauth2.provider;

import java.util.Collection;

import java.util.Map;

import java.util.Set;

import org.springframework.security.core.GrantedAuthority;

/**

 * Base class representing a request for authorization. There are convenience methods for the well-known properties

 * required by the OAUth2 spec, and a set of generic authorizationParameters to allow for extensions.

 *

 * @author Ryan Heaton

 * @author Dave Syer

 * @author Amanda Anganes

 */

public interface AuthorizationRequest {

        public static final String CLIENT_ID = "client_id";

        public static final String STATE = "state";

        public static final String SCOPE = "scope";

        public static final String REDIRECT_URI = "redirect_uri";

        public static final String RESPONSE_TYPE = "response_type";

        public static final String USER_OAUTH_APPROVAL = "user_oauth_approval";

        public Map<String, String> getAuthorizationParameters();

        

        public Map<String, String> getApprovalParameters();

        public String getClientId();

        public Set<String> getScope();

        public Set<String> getResourceIds();

        public Collection<GrantedAuthority> getAuthorities();

        public boolean isApproved();

        public boolean isDenied();

        public String getState();

        public String getRedirectUri();

        public Set<String> getResponseTypes();

}

The redirectUri is extracted and the approval process is evaluated again, this time approving the AuthorizationRequest.

The request is also evaluated to make sure it is requesting the authorization code (by calling the method getResponseTypes on the AuthorizationRequest) and not the token. The framework will generate a new authorization code as requested. Then the approveOrDeny method will finish its work by returning a new instance of RedirectView that will be handled as a redirection back to the tonr application. In particular to the URL http://localhost:8080/tonr2/sparklr/photos;jsessionid=03B2E814391E010B3D1210241ECF6C0A?code=vqMbuf&state=aTSlVl in my current example.

The redirect arrives back on Tonr. Now Tonr will process the URL again, but this time it will try to obtain the access token in order to continue with the processing. As before, Tonr will use the OAuth2RestTemplate in combination with the AuthorizationCodeAccessTokenProvider to try and retrieve the access token from Sparklr. In particular they will make a request to the URL /sparklr/oauth/token with the parameters {grant_type=’authorization_code’, redirect_uri=’http://localhost:8080/tonr2/sparklr/photos’, code=xxxx

That call to Sparklr will be handled by the class org.springframework.security.oauth2.provider.endpoint.TokenEndpoint. This endpoint will create a new instance of OAuth2AccessToken and generate a response with that Bearer token.

Again Tonr receives this response and now tries again to make the call to Sparklr this time it will pass the Bearer access token, received in the previous step, as part of the Authorization header in the request.

That is all Sparklr needs to verify the request. The OAuth2AuthenticationProcessingFilter will extract the token from the request and with the help of the Oauth2AuthenticationManager will validate and authenticate the token.

When the token is verified, the call will finally arrive at the PhotoController which is in charge of serving the photos. Later in the week I will add a couple of UML diagrams that will help understand the relationship between the major classes involved in the OAuth for Spring Security project.

To understand the internal details of how the Spring Security framework work on the core you can check out the book

Sunday, February 10, 2013

Duck typing in Scala (kind of). Structural Typing.

Scala offers a functionality known as Structural Types which allows to set a behaviour very similar to what dynamic languages allow to do when they support Duck Typing (http://en.wikipedia.org/wiki/Duck_typing)

The main difference is that it is a type safe, static typed implementation checked up at compile time. This means that you can create a function (or method) that receives an expected duck. But at compile time it would be checked that anything that is passed can actually quack like a Duck.

Here is an example. Let’s say we want to create a function that expects anything that can quack like a duck. This is how we would do it in Scala with Structural Typing:

def quacker(duck: {def quack(value: String): String}) {
  println (duck.quack("Quack"))
}


You can see that in the definition of the function we are not expecting a particular class or type. We are specifying an Structural Type which in this case means that we are expecting any type that has a method with the signature quack(string: String): String.

So all the following examples will work with that function:

object BigDuck {
  def quack(value: String) = {
    value.toUpperCase
  }
}

object SmallDuck {
  def quack(value: String) = {
    value.toLowerCase
  }
}

object IamNotReallyADuck {
  def quack(value: String) = {
    "prrrrrp"
  }
}

quacker(BigDuck)
quacker(SmallDuck)
quacker(IamNotReallyADuck)


You can see that there is no interface or anything being implemented by any of the three objects we have defined. They simply have to define the method quack in order to work in our function.

If you run the code above you get the output:

QUACK
quack
prrrrrp


If on the other hand you try to create an object without a quack method and try to call the function with that object you would get a compile error. For example trying to do this:

object NoQuaker {

}

quacker(NoQuaker)


You would get the error:

error: type mismatch; found : this.NoQuaker.type required: AnyRef{def quack(value: String): String} quacker(NoQuaker)

Also, you don’t even need to create a new type or class. You could use AnyRef to create an object with the quack method. Like this:

val x = new AnyRef {
  def quack(value: String) = {
    "No type needed "+ value
  }
}


and you can use that object to call the function:

quacker(x)

You can also specify in the function that expects the structural type, the the parameter object must respond to more than one method. Like this:

def quacker(duck: {def quack(value: String): String; def walk(): String}) {
  println (duck.quack("Quack"))
}


There you are saying that any object you pass to the function needs to respond to both methods quack and walk. This is also checked at compile time.

Under the covers the use of Structural Types in this way will be handled by reflection. This means that it is a more expensive operation than the standard method call. So use only when it actually makes sense to use it.

Tuesday, February 5, 2013

Understanding Currying with Scala

Learning Scala is not an easy task. There are many things to learn. Some of them are very different and some of them seem complex.

Currying is a technique not necessarily complex, but is one that you probably are not familiar with if you come from Java background as it is not a straightforward technique to apply in that language. Currying allows to turn a function that expects two arguments into a function that expects only one, and that function returns a function that expects the second argument. Creating basically a chain of functions.

Let’s see with a simplistic example how it works.

Let’s say we have this two simple classes:

case class Message(value: String){

}

case class Endpoint(prompt: String){
 def send(m: Message) {
   println(this.prompt + " " + m.value)
 }
}
 
Now let’s pretend we want to send a Message to an Endpoint in a general function. We can create a function that looks like this:

def routeTo(m:Message, e:Endpoint) = {
  e.send(m)
}
To call this function we would do something like:

routeTo(Message("hola"),Endpoint("sending"))
(You could ask why we just don’t create the endpoint and call send with the message like Endpoint(“sending”).send(Message(“hola”)) and there is no reason in this example. Simply to illustrate currying).

Now let’s suppose we want to send the same message to different endpoints. For that we could use the currying technique. We will basically turn our function into a function that returns another function:

def route(m:Message) = {
  (e: Endpoint) => e.send(m)
}
 
You can see we have created a function that just receives the Message and returns a function that receives the Endpoint and sends the message to that endpoint.

Now we could do something like the following:

val routeCiao = route(Message("ciao"))

routeCiao(Endpoint("sending again"))
routeCiao(Endpoint("sending over again"))
You can see that we are assigning the return of route(Message(“ciao”)), which is a function (Endpoint => Unit) to the val routeCiao. In practice this is basically returning the function (e: Endpoint) => e.send(“ciao”) as the value of the variable m is replaced by “ciao” when calling to the route function.

Then we are simply calling the returned function passing the value of the endpoint as the only argument. So this function is executed and the message is sent to the endpoint.

You could of course call the currying function without storing the mid result in a val. Like this:

route(Message("hi"))(Endpoint("sender"))
Scala supports a more compact way to defining currying functions. We could replace the route function with the following:

def route(m:Message)(e:Endpoint) = {
   e.send(m)
}
The effect is the same if you call like this: route(Message("hi"))(Endpoint("sender")) However to call and store the mid result in a val you need to do it like this:

val routeCiao = route(Message("ciao")) _


Note the underscore at the end of the assignment.

On programmers and programming languages. Your opinion?

This is a small article in which I want to talk about a couple of issues that I sometimes talk about with my colleagues at work, but most of the time keep to myself. These are two subjects that I would like to know the general software community’s opinion about.

The first thing is regarding the role of software developer and using different programming languages in your work.

The second is regarding what makes a good programmer.


Let’s start with the first one:

I consider myself a software developer, more than a Java developer, Ruby developer or any other language specific developer. I think this is a very important thing in this field of work. To be able to produce results in any language that you work with, without having prejudices against them.

I am not saying here that you need to like all programming languages, but what I am saying is that you need to consider that most programming languages have good characteristics and advantages over other languages, no matter if those other languages are your favorite ones.

I am not saying either that you should be an expert on every language out there, but I do think that you should try to be good in at least two languages that are different enough between them to involve certain change of thinking going from one to the other.

It is also fine to be an expert in a particular language, specializing in it. But this doesn’t mean disregard the other ones as useless. I’m talking about this for two main reasons. First because in my current job and in my current position we were very recently looking for Java and Ruby developers. Not Java developers and Ruby developers. But a Java/Ruby developer to work in both languages at the same time. The other reason is that I usually write in this blog about different things in technology and mainly about Ruby and Java as they are the languages I know better.

In both scenarios (interviews and blog writing) the majority of the people that works in one of this languages disregard the other one as complete horrible. The Ruby people say that they will never work with Java as it is cumbersome, full of boilerplate, full of XML, horrible syntax, too many frameworks to care about, boring typing, and a big etc. The Java people say that Ruby is a toy language, some people say working with it is not even programming, they say is just too slow to do anything useful, that everything is a mess without types, that metaprogramming is magic that you just don’t know what is happening and another big list of etc.

So it is really hard to find developers that work and enjoy working with this two different languages (and probably between any two different languages). Sometimes it feels like they don’t even want to give the opportunity to the other language to prove its worthiness and is simply discarded from the beginning. Sometimes I really think it might be a case of laziness to actually try and learn something new and different when you are in the comfort of what you already know. I, on the other hand find this learning really interesting and definitely worth my time as I think it makes me better as an overall developer and help me understand the goods and bads of every language, because I can tell you, both Java and Ruby are very good languages in their own way.


The second point I wanted to talk about is regarding on what makes a good programmer.

What brings me to this particular subject is the way I see myself and the way I see all developers I work with or the ones I interview or even the ones whose code I look on different open source projects.

I can’t really answer what a good programmer is as I think it can be different things to different people. As an example, I try to use good programming practices a lot in my day to day work, I try to use and enforce the use and benefits of TDD on my colleagues. I try to enforce the use of simple principles like DRY, SRP, and also the readability and conciseness of the code that I and my colleagues write. I also think that I find good solutions to the day to day tasks that I have, and I try to think of the best possible ways to solve the problems at hand. So in that sense, I consider myself a good programmer.

However, from a different perspective I would say I am not really good at all. This perspective comes into place in let’s say, for example, an environment where I need to develop really super efficient algorithms using mathematical efficient formulas to solve problems. It comes to mind maybe things like 3D gaming, Artificial intelligence and the like.

As an example, I was recently solving a couple of problems in TopCoder. One of the problems took me like 1 hour to solve I would say. It was an implementation that if I don’t remember wrong had a couple of for loops (one nested inside the other) and 3 or 4 if statements. I remember it also had a HashMap structure for storing some middle results of the final computation. I was happy that I solved, but I knew it didn’t look great (not the code itself, which was OK, but the solution as a whole), so I went on to see other people’s solutions. One of the solutions I found was 2 lines of code with a couple of mathematical formulas that really didn’t make sense to me at that point, and that I had no idea how that small thing could do all that my 30 line algorithm did. So from that point of view, I would say that the other guy is definitely a better programmer than me.

So sometimes when I am interviewing people at my job, asking my questions (What is Dependency Injection, What is AOP useful for, Explain how you use Strategy pattern, etc etc...) if they don’t have any idea of this I always wonder if they might be amazing at this other programming skills that I maybe don’t have. But I guess I can’t really know, as we are looking for a particular profile, and if it doesn’t have the skills we require, we say he is not a good programmer and he doesn’t get hired.
That happened very recently, some colleague was interviewing this Computer Science PhD who didn’t answer any of this questions, and so he was rejected as he was no good at all. But I ended up wondering. If he’s got a PhD he needs to be good right?. Maybe he is just good at different things and not at what we are looking for.

Friday, September 14, 2012

JRuby transparently running methods asynchronously. Combining Ruby metaprogramming techiniques and Java concurrent Future

JRuby is great, it offers an opportunity to combine my two favorite languages and here is another great way of combining Java power with Ruby beauty and convenience.

In this case I created a small gem in a couple of hours (still not really well tested, just some simple unit tests) that allows to use some of the nice metaprogramming techniques from Ruby to transparently execute methods asynchronously, wrapping their return value in a java.util.concurrent.Future, so that when we access any method of the returned object, the future's get method will be called to make sure we have access to the value only when we really need it.

What follows is the source code of the main file in the Gem. Where all the relevant logic is:

require 'java'
java_import 'java.util.concurrent.ExecutorService'
java_import 'java.util.concurrent.Executors'
java_import 'java.util.concurrent.Future'
java_import 'java.util.concurrent.TimeUnit'
java_import 'java.util.concurrent.Callable'

module Futurizeit
  module ClassMethods
    def futurize(*methods)
      Futurizeit.futurize(self, *methods)
    end
  end

  def self.included(klass)
    klass.extend(ClassMethods)
  end

  def self.executor
    @executor ||= Executors.newFixedThreadPool(10)
  end

  def self.futurize(klass, *methods)
    klass.class_eval do
      methods.each do |method|
        alias :"non_futurized_#{method}" :"#{method}"
        define_method :"#{method}" do |*args|
          @future = Futurizeit.executor.submit(CallableRuby.new { self.send(:"non_futurized_#{method}", *args) })
          Futuwrapper.new(@future)
        end
      end
    end
  end
end

module Futurizeit
  class Futuwrapper < BasicObject
    def initialize(future)
      @future = future
    end

    def method_missing(method, *params)
      instance = @future.get
      instance.send(method, *params)
    end
  end

  class CallableRuby
    include Callable

    def initialize(&block)
      @block = block
    end

    def call
      @block.call
    end
  end
end
The functionality can be used in two ways, including the module in a class and calling the macro method futurize on the class, or from the outside calling the Futurizeit.futurize method directly passing a class and the instance methods of that class that we want to run asynchronously.

The way it works is straightforward:

First it creates an alias to the original instance method called "non_futurized_xxx" where xxx is the name of the original method. Then it defines a new method with the original name. This method will create a CallableRuby object which implements (include the module) the Java Callable interface.

This CallableRuby instance is then submitted to a preconfigured ExecutorService. The ExecutorService will create a Future internally and return it inmediately. We the wrap this Future in a Futurewrapper instance.

The Futurewrapper is the object that will be returned by the method. When we try to access any method on this wrapper, it will internally call the future's get method which in turn will return the actual instance that the original method would have returned without the futurizing feature.

Following is the RSpec test that tests the current functionality:

require '../lib/futurizeit'

class Futurized
  def do_something_long
    sleep 3
    "Done!"
  end
end

class FuturizedWithModuleIncluded
  include Futurizeit
  def do_something_long
    sleep 3
    "Done!"
  end
  futurize :do_something_long
end


describe "Futurizer" do
  before(:all) do
    Futurizeit::futurize(Futurized, :do_something_long)
  end

  it "should wrap methods in futures and return correct values" do
    object = Futurized.new
    start_time = Time.now.sec
    value = object.do_something_long
    end_time = Time.now.sec
    (end_time - start_time).should < 2
    value.to_s.should == 'Done!'
  end

  it "should allow calling the value twice" do
     object = Futurized.new
     value = object.do_something_long
     value.to_s.should == 'Done!'
     value.to_s.should == 'Done!'
   end

  it "should increase performance a lot parallelizing work" do
    object1 = Futurized.new
    object2 = Futurized.new
    object3 = Futurized.new
    start_time = Time.now.sec
    value1 = object1.do_something_long
    value2 = object2.do_something_long
    value3 = object3.do_something_long
    value1.to_s.should == 'Done!'
    value2.to_s.should == 'Done!'
    value3.to_s.should == 'Done!'
    end_time = Time.now.sec
    (end_time - start_time).should < 4
  end

  it "should work with class including module" do
      object = FuturizedWithModuleIncluded.new
      start_time = Time.now.sec
      value = object.do_something_long
      end_time = Time.now.sec
      (end_time - start_time).should < 2
      value.to_s.should == 'Done!'
    end

  after(:all) do
    Futurizeit.executor.shutdown
  end
end


All the code is in https://github.com/calo81/futurizeit

Friday, September 7, 2012

Setting up a Hadoop virtual cluster with Vagrant

Usually for testing and using virtual machines, I go online, download the iso image of the machine I want to install, start Virtual Box, tell it to init from the iso, and install the OS manually, and then install the applications I want to use. It is a boring and tedious process but I never really cared very much about However recently I discovered the power of Vagrant and also Puppet. They allow me to automate all the steps I used to manually make before.

Here I test drive the process of automatically configuring a Hadoop cluster in virtual machines for a fully distributed mode.

First of all make sure you have Ruby installed. I’m testing with Ruby 1.9.3. You should also have Virtual Box installed. I have version 4.1.

Then from the command line install the vagrant gem:

gem install vagrant

Vagrant is a great tool that allow us to manage our Virtual Box machines using the command line and simple configuration files.

First we will install a linux Ubuntu virtual machine (or a box as it is called in vagrant)

vagrant box add base-hadoop http://files.vagrantup.com/lucid64.box

Then we go to a directory where we want to have our “workspace” and also the directory to create the vagrant configuration file for our new box and execute. This will create a Vagrantfile file with the vagrant configuration.

vagrant init base-hadoop

The virtual machine is ready to be started up now. You can start it by doing:

vagrant up

That is the virtual machine running. You can connect to it with ssh. type

vagrant ssh

Next step is to download Puppet. Do that going to the URL http://puppetlabs.com/misc/download-options/

Puppet is a tool that allow us to automate the process of provisioning servers. We will use it to manage our virtual machines, installing the required software on them and executing the required services.

So we create a directory where we are going to put our manifests (puppet configuration files)

mkdir manifests

in that new directory we create a file called base-hadoop.pp with the following content:

group { "puppet":
  ensure => "present",
}
 
In the Vagrantfile file that got created previously we uncomment the lines that look like:

config.vm.provision :puppet do |puppet|
     puppet.manifests_path = "manifests"
     puppet.manifest_file  = "base-hadoop.pp"
  end


The next thing we need to do is tell puppet to install Java in our servers. for that we open the base-hadoop.pp file and add the following:

exec { 'apt-get update':
    command => 'apt-get update',
}

package { "openjdk-6-jdk" :
   ensure => present
  require => Exec['apt-get update']
}


Next thing we need to install hadoop. For this we will create a new puppet module. A puppet module is used to encapsulate resources that belong to the same component.

We execute

mkdir -p modules/hadoop/manifests

Then we create an init.pp in this new manifests directory with the following content:

class hadoop {
 $hadoop_home = "/opt/hadoop"

exec { "download_hadoop":
command => "wget -O /tmp/hadoop.tar.gz http://apache.mirrors.timporter.net/hadoop/common/hadoop-1.0.3/hadoop-1.0.3.tar.gz",
path => $path,
unless => "ls /opt | grep hadoop-1.0.3",
require => Package["openjdk-6-jdk"]
}

exec { "unpack_hadoop" :
  command => "tar -zxf /tmp/hadoop.tar.gz -C /opt",
  path => $path,
  creates => "${hadoop_home}-1.0.3",
  require => Exec["download_hadoop"]
}
}


We have done a few things here, and they are almost self-explanatory. We are basically setting a variable to point to our hadoop installation. We are downloading Hadoop’s binaries from its Apache location and we are extracting it into the specified hadoop_home directory.

We need to add our new module to the main puppet configuration file. We add the following line at the top of the base-hadoop.pp file:

include hadoop

Then we add this new modules path to our Vagrantfile. So now our puppet section looks like:

config.vm.provision :puppet do |puppet|
     puppet.manifests_path = "manifests"
     puppet.manifest_file  = "base-hadoop.pp"
     puppet.module_path = "modules"
  end


We execute the following to reload the vagrant machine:

vagrant reload

That command will reload the vagrant machine and execute the puppet recipes. That will install the required software needed.

We will need a cluster of virtual machines. Vagrant supports that. we open our Vagrantfile and replace the content with the following:

Vagrant::Config.run do |config|
  config.vm.box = "base-hadoop"
  config.vm.provision :puppet do |puppet|
     puppet.manifests_path = "manifests"
     puppet.manifest_file  = "base-hadoop.pp"
     puppet.module_path = "modules"
  end
 
  config.vm.define :master do |master_config|
    master_config.vm.network :hostonly, "192.168.1.10"
  end

  config.vm.define :backup do |backup_config|
    backup_config.vm.network :hostonly, "192.168.1.11"
  end
 
  config.vm.define :hadoop1 do |hadoop1_config|
    hadoop1_config.vm.network :hostonly, "192.168.1.12"
  end
 
  config.vm.define :hadoop2 do |hadoop2_config|
    hadoop2_config.vm.network :hostonly, "192.168.1.13"
  end
 
  config.vm.define :hadoop3 do |hadoop3_config|
    hadoop3_config.vm.network :hostonly, "192.168.1.14"
  end
end


After this we execute:

vagrant up

That will start and provision all the servers. That will take a while

But we are not ready. Next we need to configure the hadoop cluster. In the directory modules/hadoop we create another directory called files. Here we will create the needed configuration files for our hadoop cluster.

we create the following files:

core-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 <configuration>
  <property>
   <name>fs.default.name</name>
   <value>hdfs://master:9000</value>
   <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation.</description>
  </property>
 </configuration>


hdfs-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
 <property>
  <name>dfs.replication</name>
  <value>3</value>
  <description>The actual number of replications can be specified when the file is created.</description>
 </property>
</configuration>
 


mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
 <property>
  <name>mapred.job.tracker</name>
  <value>master:9001</value>
  <description>The host and port that the MapReduce job tracker runs at.</description>
 </property>
</configuration>
 


masters

192.168.1.11

slaves

192.168.1.12 192.168.1.13 192.168.1.14

We then need to tell puppet to copy these files to our cluster. So we modify our init.pp file in the hadoop puppet module to contain the following:

class hadoop {
 $hadoop_home = "/opt/hadoop"

exec { "download_hadoop":
command => "wget -O /tmp/hadoop.tar.gz http://apache.mirrors.timporter.net/hadoop/common/hadoop-1.0.3/hadoop-1.0.3.tar.gz",
path => $path,
unless => "ls /opt | grep hadoop-1.0.3",
require => Package["openjdk-6-jdk"]
}

exec { "unpack_hadoop" :
  command => "tar -zxf /tmp/hadoop.tar.gz -C /opt",
  path => $path,
  creates => "${hadoop_home}-1.0.3",
  require => Exec["download_hadoop"]
}
file {
  "${hadoop_home}-1.0.3/conf/slaves":
  source => "puppet:///modules/hadoop/slaves",
  mode => 644,
  owner => root,
  group => root,
  require => Exec["unpack_hadoop"]
 }
 
file {
  "${hadoop_home}-1.0.3/conf/masters":
  source => "puppet:///modules/hadoop/masters",
  mode => 644,
  owner => root,
  group => root,
  require => Exec["unpack_hadoop"]
 }

file {
  "${hadoop_home}-1.0.3/conf/core-site.xml":
  source => "puppet:///modules/hadoop/core-site.xml",
  mode => 644,
  owner => root,
  group => root,
  require => Exec["unpack_hadoop"]
 }
 
file {
  "${hadoop_home}-1.0.3/conf/mapred-site.xml":
  source => "puppet:///modules/hadoop/mapred-site.xml",
  mode => 644,
  owner => root,
  group => root,
  require => Exec["unpack_hadoop"]
 }
 
 file {
  "${hadoop_home}-1.0.3/conf/hdfs-site.xml":
  source => "puppet:///modules/hadoop/hdfs-site.xml",
  mode => 644,
  owner => root,
  group => root,
  require => Exec["unpack_hadoop"]
 }
}
 


We then execute:

vagrant provision

And we get these files copied to all our servers.

We need to setup ssh password-less communication between our servers. We modify our hadoop-base.pp and leave like this:

file {
  "/root/.ssh/id_rsa":
  source => "puppet:///modules/hadoop/id_rsa",
  mode => 600,
  owner => root,
  group => root,
  require => Exec['apt-get update']
 }
 
file {
  "/root/.ssh/id_rsa.pub":
  source => "puppet:///modules/hadoop/id_rsa.pub",
  mode => 644,
  owner => root,
  group => root,
  require => Exec['apt-get update']
 }

ssh_authorized_key { "ssh_key":
    ensure => "present",
    key    => "AAAAB3NzaC1yc2EAAAADAQABAAABAQCeHdBPVGuSPVOO+n94j/Y5f8VKGIAzjaDe30hu9BPetA+CGFpszw4nDkhyRtW5J9zhGKuzmcCqITTuM6BGpHax9ZKP7lRRjG8Lh380sCGA/691EjSVmR8krLvGZIQxeyHKpDBLEmcpJBB5yoSyuFpK+4RhmJLf7ImZA7mtxhgdPGhe6crUYRbLukNgv61utB/hbre9tgNX2giEurBsj9CI5yhPPNgq6iP8ZBOyCXgUNf37bAe7AjQUMV5G6JMZ1clEeNPN+Uy5Yrfojrx3wHfG40NuxuMrFIQo5qCYa3q9/SVOxsJILWt+hZ2bbxdGcQOd9AXYFNNowPayY0BdAkSr",
    type   => "ssh-rsa",
    user   => "root",
    require => File['/root/.ssh/id_rsa.pub']
}
 


We are ready to run our hadoop cluster now. For that, once again we modify the init.pp file in the hadoop puppet module, we add the following at the end, before closing the hadoop class:

 file {
  "${hadoop_home}-1.0.3/conf/hadoop-env.sh":
  source => "puppet:///modules/hadoop/hadoop-env.sh",
  mode => 644,
  owner => root,
  group => root,
  require => Exec["unpack_hadoop"]
 }
 


The haddop-env.sh file is the original one but we have uncommented the JAVA_HOME setting and pointed it to the correct Java installation.

We can give different names to each host in the Vagrantfile. For that we replace its contents with the following:


Vagrant::Config.run do |config|
  config.vm.box = "base-hadoop"
  config.vm.provision :puppet do |puppet|
     puppet.manifests_path = "manifests"
     puppet.manifest_file  = "base-hadoop.pp"
     puppet.module_path = "modules"
  end
 
  config.vm.define :backup do |backup_config|
    backup_config.vm.network :hostonly, "192.168.1.11"
    backup_config.vm.host_name = "backup"
  end
 
  config.vm.define :hadoop1 do |hadoop1_config|
    hadoop1_config.vm.network :hostonly, "192.168.1.12"
    hadoop1_config.vm.host_name = "hadoop1"
  end
 
  config.vm.define :hadoop2 do |hadoop2_config|
    hadoop2_config.vm.network :hostonly, "192.168.1.13"
    hadoop2_config.vm.host_name = "hadoop2"
  end
 
  config.vm.define :hadoop3 do |hadoop3_config|
    hadoop3_config.vm.network :hostonly, "192.168.1.14"
    hadoop3_config.vm.host_name = "hadoop3"
  end

  config.vm.define :master do |master_config|
    master_config.vm.network :hostonly, "192.168.1.10"
    master_config.vm.host_name = "master"
  end

end


Let’s do “vagrant reload” and wait for all systems to reload.

We have provisioned ur systems. Let’s go to our master node and start everything:

vagrant ssh master

then when we are logged in we go to /opt/hadoop-1.0.3/bin

and do:

sudo ./hadoop namenode -format

sudo ./start-all.sh

We have started now our hadoop cluster. Now we can visit http://192.168.1.10:50070/ to access our master node and see that our hadoop cluster is indeed running.

All the files for this example (except for the box itself) exist in git@github.com:calo81/vagrant-hadoop-cluster.git for free use.