I've been using Lucene for many years on many different projects. Thanks to Dave Balmain, this library is ported to Ruby and it seems to be even faster than original. It's called ferret. I've just started playing with it and so far it looks very promising.
Sometime ago, Zenspider posted a neat trick that allows searching ri database. Since Ruby version 1.8.5, this database is expanded significantly in size, not to mention that every installed gem extends it with its own documentation. It might take some time if you just go through all these yaml files and grep for an interesting information.
This particular task nicely fits with what ferret can offer.
Install ferret
$ sudo gem install ferret
Build index using the following script (ri_indexer):
$ cat ri_indexer
#!/usr/bin/env ruby
require "rdoc/ri/ri_driver"
require "rubygems"
require "ferret"
require "find"
require "yaml"
include Ferret
INDEX_FILE = File.expand_path('~/.ri_index')
fis = Index::FieldInfos.new
fis.add_field :name, :term_vector => :no
fis.add_field :content, :store => :no
fis.create_index(INDEX_FILE)
index = I.new(:path => INDEX_FILE, :create => true)
dirs = RI::Paths::PATH
dirs.each do |dir|
Find.find(dir) do |fn|
next unless File.file?(fn)
doc = YAML.load(File.read(fn))
next unless doc.respond_to?(:comment)
next unless doc.comment
index << {
:name => doc.full_name,
:content => doc.comment.map{|f|f.body if f.respond_to?(:body)}.join("\n")
}
end
end
index.optimize
index.close
$ ./ri_indexer
Now you can use this script for searching (ri_search):
$ cat ri_search
#!/usr/bin/env ruby
require "rubygems"
require "ferret"
require "find"
require "rdoc/ri/ri_driver"
include Ferret
INDEX_FILE = File.expand_path('~/.ri_index')
query = ARGV.join(' ')
ARGV.clear
RI::Options.instance.use_stdout = true
ri = RiDriver.new
index = I.new(:path => INDEX_FILE)
index.search_each(query) do |id, score|
puts
begin
ri.get_info_for(index[id][:name])
rescue Exception
puts $!.message
end
end
$ ./ri_search kill
This is way more faster than the original script. Also this script accepts quite sophisticated query expressions. For example,
$ ri_search rescue AND public
$ ri_search +split -String
Refer to ferret's trac web site, where you can find more information about this wonderful library.
Viagra Soft Tabs (Sildenafil) are quick-dissolving lozenges, used to treat male impotence. Proving its high quality and reliability, Viagra Soft has reduced the start time from 30 to 15 minutes with the same response time of 5 hours.