Saturday, February 18, 2012

Basic searching in Ruby with Solr

Solr is a server application built on top of the Apache Lucene searching engine. It offers a Http interface for storing and querying data.

Internally the way Solr roughly works (and Lucene as it is the engine that powers solr) is by indexing Documents for later searching and retrieval. A Document is described with a collection of Fields, each of this fields can be individually indexed and/or stored on the index.

The index can be built in different ways. The way the index is built is mainly determined by the analyzers used in each field. So an analyzer simply determines the way a particular field will be indexed.

Of course there is a lot of complexity involved in all this, but this is a basic tutorial, and a basic but functional searching solution can be build using defaults for most options.

This tutorial will allow for a search of movies by title and/or Actor using Ruby and Solr. I will assume you already have Ruby installed and the Gem tool as well.

1. Download and install Solr:
 wget http://apache.mirrors.timporter.net/lucene/solr/3.5.0/apache-solr-3.5.0-src.tgz

2. Decompress it:
 tar zxvf apache-solr-3.5.0-src.tgz

3. Modify the index to accept the kind of documents we want (movies).
 
In our example we will be able to query movies by title and actors. The index will also store a summary of the movie although it won’t be searchable by that. So we will have three Fields in our Document representing the movie. To reflect this go to the directory:

cd apache-solr-3.5.0/solr/example/solr/conf/

then open the file schema.xml with your favorite editor, go down to the definitions and replace all the ones that are there with the following ones:

you replace the section with the following


  1. <fields>
  2.   <field name="id" type="string" indexed="true" stored="true" required="true" />
  3.   <field name="title" type="text_general" indexed="true" stored="true"/>
  4.   <field name="actor" type="text_general" indexed="true" stored="true" multiValued="true"/>
  5.  <field name="summary" type="text_general" indexed="false" stored="true"/>  
  6. </fields>



Here we are specifying that our movie Documents will have these four fields for searching purposes. We can see that the type we are using for all of them is "text_general". Going up in the schema.xml file we can find a description of what being "text_general" means,
This is extracted directly from that description:




So this is a default provided analyzer that wil be good enough for our purposes (and for many purposes).

The other two thing worth mentioning in our field definitions, is the fact that the "actor" field is multivalued, meaning that we can associate more than one actor to the field, and the fact that the "summary" is stored but not indexed. This means that the content of the field will be stored (so it can be retrieved when documents are retrieved) but it is not indexed (we can't search on this field).

Ok, so this is all the configuration we need in Solr. let's start the server now.
 From the directory apache-solr-3.5.0/example. Execute: java -jar start.jar.

That will start the server and will listen in the port 8983 by default.

Ok, so let's move to Ruby side now. We will create a little program that will index a couple of movies, and then search to find them. First require the needed gem:

gem install rsolr
Then let's create a Movie class in a file named "moviesearch.rb":


  1. class Movie
  2.  attr_accessor :id, :title, :actors, :summary
  3.  def initialize
  4.     @actors = []
  5.  end
  6. end


And now let’s create the indexer and searcher classes in the same file:

Indexer:

require 'rsolr'

  1. class Indexer
  2.  def initialize
  3.     @solr = RSolr.connect :url => 'http://localhost:8983/solr/collection1/'
  4.  end
  5.  def index(movies)
  6.     movies.each do |movie|
  7.      @solr.add :id=>movie.id.to_s, :title=>movie.title, :actor => movie.actors
  8.     end
  9.     @solr.update :data => '<commit/>'
  10.  end
  11. end


Searcher:


  1. class Searcher
  2.  def initialize
  3.     @solr = RSolr.connect :url => 'http://localhost:8983/solr/collection1/'
  4.  end
  5.  def search(term)
  6.     term = term.downcase
  7.     response = @solr.get 'select', :params => {:q => "title:#{term}* or actor:#{term}*"}
  8.     list = response["response"]["docs"]
  9.     list
  10.  end
  11. end


That’s it.

Let’s test it on irb:




1.9.2-p290 :001 > require './moviesearcher'
=> true
1.9.2-p290 :013 >   movie_1 = Movie.new
=> #
1.9.2-p290 :014 > mo
module   movie_1
1.9.2-p290 :014 > movie_1.actors << 'Bruce Willis'
=> ["Bruce Willis"]
1.9.2-p290 :015 > movie_1.actors << "Samuel Jackson"
=> ["Bruce Willis", "Samuel Jackson"]
1.9.2-p290 :016 > movie_1.id = '1'
=> "1"
1.9.2-p290 :017 > movie_1.title='Die Hard 3'
=> "Die Hard 3"
1.9.2-p290 :018 > movie_2 = Movie.new
=> #
1.9.2-p290 :019 > movie_2.actors << 'Mel Gibson'
=> ["Mel Gibson"]
1.9.2-p290 :020 > movie_2.actors << 'Danny Glover'
=> ["Mel Gibson", "Danny Glover"]
1.9.2-p290 :021 > movie_2.id = '2'
=> "2"
1.9.2-p290 :022 > movie_2.title = 'Lethal Weapon'
=> "Lethal Weapon"
1.9.2-p290 :041 >   movie_1.summary = "Great movie"
=> "Great movie"
1.9.2-p290 :042 > movie_2.summary = 'Another great movie'
=> "Another great movie"

Indexing

1.9.2-p290 :061 > idxr=Indexer.new
1.9.2-p290 :080 >   idxr.index [movie_1,movie_2]
=> {"responseHeader"=>{"status"=>0, "QTime"=>50}}

Searching

1.9.2-p290 :085 >   searcher = Searcher.new
1.9.2-p290 :086 > searcher.search 'Die'
=> [{"id"=>"1", "title"=>"Die Hard 3", "actor"=>["Bruce Willis", "Samuel Jackson"]}]
1.9.2-p290 :090 >   searcher.search 'Bru'
=> [{"id"=>"1", "title"=>"Die Hard 3", "actor"=>["Bruce Willis", "Samuel Jackson"]}]

1.9.2-p290 :091 > searcher.search 'Glo'
=> [{"id"=>"2", "title"=>"Lethal Weapon", "actor"=>["Mel Gibson", "Danny Glover"]}]





Sunday, February 5, 2012

Private Keys, Public Keys and Certificates

This is a quick tutorial that will cover

- Generate a private key

- Generate a .cert certificate with that private key

- Extract the public key from the certificate.

- Sign a file with private key and verify the signature with the public key

- Import the private key and certificate into a java keystore.


1. Generate a private key


openssl genrsa -out private.key 1024


2. Generate certificate


openssl req -new -x509 -days 365 -key private.key -out certificate.crt


That certificate is a good self signed certificate that is ready to distribute around for testing.


3. Extract public key from certificate


openssl x509 -in certificate.crt -pubkey > public.key


That will copy the certificate and the public key to the file... you need to edit the file and remove the part related to certificate and leave just the public key in the file.


4. We sign a file with private key.

openssl dgst -sha1 -sign private.key -out file_to_sign.sha1 file_to_sign


5. We verify the signature with the public key:


openssl dgst -sha1 -verify public.key -signature file_to_sign.sha1 file_to_sign


6. we import private key and certifcate to a java keystore


first we generate a p12 file


openssl pkcs12 -export -in certificate.crt -inkey private.key > server.p12


then we import this into the keystore


keytool -importkeystore -srckeystore server.p12 -destkeystore keystore.jks -srcstoretype pkcs12