Search::Estraier perl module with example scripts.
Please, use latest Hyper Estraier when using latest Search::Estraier. Search-Estraier-0.08.tar.gz 38 Kb | |
Latest source is always available from Subversion repository |
Search::Estraier - pure perl module to use Hyper Estraier search engine
use Search::Estraier; my $node = new Search::Estraier::Node( url => 'http://localhost:1978/node/test', user => 'admin', passwd => 'admin', create => 1, label => 'Label for node', croak_on_error => 1, ); my $doc = new Search::Estraier::Document; $doc->add_attr('@uri', "http://estraier.gov/example.txt"); $doc->add_attr('@title', "Over the Rainbow"); $doc->add_text("Somewhere over the rainbow. Way up high."); $doc->add_text("There's a land that I heard of once in a lullaby."); die "error: ", $node->status,"\n" unless (eval { $node->put_doc($doc) });
use Search::Estraier; my $node = new Search::Estraier::Node( url => 'http://localhost:1978/node/test', user => 'admin', passwd => 'admin', croak_on_error => 1, ); my $cond = new Search::Estraier::Condition; $cond->set_phrase("rainbow AND lullaby"); my $nres = $node->search($cond, 0); if (defined($nres)) { print "Got ", $nres->hits, " results\n"; for my $i ( 0 ... $nres->doc_num - 1 ) { my $rdoc = $nres->get_doc($i); print "URI: ", $rdoc->attr('@uri'),"\n"; print "Title: ", $rdoc->attr('@title'),"\n"; print $rdoc->snippet,"\n"; } } else { die "error: ", $node->status,"\n"; }
This module is implementation of node API of Hyper Estraier. Since it's perl-only module with dependencies only on standard perl modules, it will run on all platforms on which perl runs. It doesn't require compilation or Hyper Estraier development files on target machine.
It is implemented as multiple packages which closly resamble Ruby implementation. It also includes methods to manage nodes.
There are few examples in scripts
directory of this distribution.
This methods should really move somewhere else.
Remove multiple whitespaces from string, as well as whitespaces at beginning or end
my $text = $self->_s(" this is a text "); $text = 'this is a text';
This class implements Document which is single item in Hyper Estraier.
It's is collection of:
'key' => 'value'
pairs which can later be used for filtering of results
You can add common filters to attrindex
in estmaster's _conf
file for better performance. See attrindex
in
Hyper Estraier P2P Guide.
also 'key' => 'value'
pairs
Text which will be used to create searchable corpus of your index and included in snippet output.
Text which will be searchable, but will not be included in snippet.
Create new document, empty or from draft.
my $doc = new Search::HyperEstraier::Document; my $doc2 = new Search::HyperEstraier::Document( $draft );
Add an attribute.
$doc->add_attr( name => 'value' );
Delete attribute using
$doc->add_attr( name => undef );
Add a sentence of text.
$doc->add_text('this is example text to display');
Add a vectors
$doc->add_vector( 'vector_name' => 42, 'another' => 12345, );
Set the substitute score
$doc->set_score(12345);
Get the substitute score
Get the ID number of document. If the object has never been registred, -1
is returned.
print $doc->id;
Returns array with attribute names from document object.
my @attrs = $doc->attr_names;
Returns value of an attribute.
my $value = $doc->attr( 'attribute' );
Returns array with text sentences.
my @texts = $doc->texts;
Return whole text as single scalar.
my $text = $doc->cat_texts;
Dump draft data from document object.
print $doc->dump_draft;
Empty document object
$doc->delete;
This function is addition to original Ruby API, and since it was included in C wrappers it's here as a convinience. Document objects which go out of scope will be destroyed automatically.
my $cond = new Search::HyperEstraier::Condition;
$cond->set_phrase('search phrase');
$cond->add_attr('@URI STRINC /~dpavlin/');
$cond->set_order('@mdate NUMD');
$cond->set_max(42);
$cond->set_options( 'SURE' ); $cond->set_options( qw/AGITO NOIDF SIMPLE/ );
Possible options are:
check every N-gram
check every second N-gram
check every third N-gram
check every fourth N-gram
don't perform TF-IDF tuning
use simplified query phrase
Skipping N-grams will speed up search, but reduce accuracy. Every call to set_options
will reset previous
options;
This option changed in version 0.04
of this module. It's backwards compatibile.
Return search phrase.
print $cond->phrase;
Return search result order.
print $cond->order;
Return search result attrs.
my @cond_attrs = $cond->attrs;
Return maximum number of results.
print $cond->max;
-1
is returned for unitialized value, 0
is unlimited.
Return options for this condition.
print $cond->options;
Options are returned in numerical form.
Set number of skipped documents from beginning of results
$cond->set_skip(42);
Similar to offset
in RDBMS.
Return skip for this condition.
print $cond->skip;
$cond->set_distinct('@author');
Return distinct attribute
print $cond->distinct;
Filter out some links when searching.
Argument array of link numbers, starting with 0 (current node).
$cond->set_mask(qw/0 1 4/);
my $rdoc = new Search::HyperEstraier::ResultDocument( uri => 'http://localhost/document/uri/42', attrs => { foo => 1, bar => 2, }, snippet => 'this is a text of snippet' keywords => 'this\tare\tkeywords' );
Return URI of result document
print $rdoc->uri;
Returns array with attribute names from result document object.
my @attrs = $rdoc->attr_names;
Returns value of an attribute.
my $value = $rdoc->attr( 'attribute' );
Return snippet from result document
print $rdoc->snippet;
Return keywords from result document
print $rdoc->keywords;
my $res = new Search::HyperEstraier::NodeResult( docs => @array_of_rdocs, hits => %hash_with_hints, );
Return number of documents
print $res->doc_num;
This will return real number of documents (limited by max
).
If you want to get total number of hits, see hits
.
Return single document
my $doc = $res->get_doc( 42 );
Returns undef if document doesn't exist.
Return specific hint from results.
print $res->hint( 'VERSION' );
Possible hints are: VERSION
, NODE
, HIT
, HINT#n
, DOCNUM
, WORDNUM
,
TIME
, LINK#n
, VIEW
.
More perlish version of hint
. This one returns hash.
my %hints = $res->hints;
Syntaxtic sugar for total number of hits for this query
print $res->hits;
It's same as
print $res->hint('HIT');
but shorter.
my $node = new Search::HyperEstraier::Node;
or optionally with url
as parametar
my $node = new Search::HyperEstraier::Node( 'http://localhost:1978/node/test' );
or in more verbose form
my $node = new Search::HyperEstraier::Node( url => 'http://localhost:1978/node/test', user => 'admin', passwd => 'admin' create => 1, label => 'optional node label', debug => 1, croak_on_error => 1 );
with following arguments:
URL to node
specify username for node server authentication
password for authentication
create node if it doesn't exists
optional label for new node if create
is used
dumps a lot of debugging output
very helpful during development. It will croak on all errors instead of
silently returning -1
(which is convention of Hyper Estraier API in other
languages).
Specify URL to node server
$node->set_url('http://localhost:1978');
Specify proxy server to connect to node server
$node->set_proxy('proxy.example.com', 8080);
Specify timeout of connection in seconds
$node->set_timeout( 15 );
Specify name and password for authentication to node server.
$node->set_auth('clint','eastwood');
Return status code of last request.
print $node->status;
-1
means connection failure.
Add a document
$node->put_doc( $document_draft ) or die "can't add document";
Return true on success or false on failure.
Remove a document
$node->out_doc( document_id ) or "can't remove document";
Return true on success or false on failture.
Remove a registrated document using it's uri
$node->out_doc_by_uri( 'file:///document/uri/42' ) or "can't remove document";
Return true on success or false on failture.
Edit attributes of a document
$node->edit_doc( $document_draft ) or die "can't edit document";
Return true on success or false on failture.
Retreive document
my $doc = $node->get_doc( document_id ) or die "can't get document";
Return true on success or false on failture.
Retreive document
my $doc = $node->get_doc_by_uri( 'file:///document/uri/42' ) or die "can't get document";
Return true on success or false on failture.
Retrieve the value of an atribute from object
my $val = $node->get_doc_attr( document_id, 'attribute_name' ) or die "can't get document attribute";
Retrieve the value of an atribute from object
my $val = $node->get_doc_attr_by_uri( document_id, 'attribute_name' ) or die "can't get document attribute";
Exctract document keywords
my $keywords = $node->etch_doc( document_id ) or die "can't etch document";
Retreive document
my $keywords = $node->etch_doc_by_uri( 'file:///document/uri/42' ) or die "can't etch document";
Return true on success or false on failture.
Get ID of document specified by URI
my $id = $node->uri_to_id( 'file:///document/uri/42' );
This method won't croak, even if using croak_on_error
.
Private function used for implementing of get_doc
, get_doc_by_uri
,
etch_doc
, etch_doc_by_uri
.
# this will decode received draft into Search::Estraier::Document object my $doc = $node->_fetch_doc( id => 42 ); my $doc = $node->_fetch_doc( uri => 'file:///document/uri/42' ); my $doc = $node->_fetch_doc( id => 42, etch => 1 ); my $doc = $node->_fetch_doc( uri => 'file:///document/uri/42', etch => 1 ); my $doc = $node->_fetch_doc( id => 42, attr => '@mdate' ); my $doc = $node->_fetch_doc( uri => 'file:///document/uri/42', attr => '@mdate' ); my $id = $node->_fetch_doc( uri => 'file:///document/uri/42', path => '/uri_to_id', chomp_resbody => 1 );
my $node_name = $node->name;
my $node_label = $node->label;
my $documents_in_node = $node->doc_num;
my $words_in_node = $node->word_num;
my $node_size = $node->size;
Search documents which match condition
my $nres = $node->search( $cond, $depth );
$cond
is Search::Estraier::Condition
object, while <$depth> specifies
depth for meta search.
Function results Search::Estraier::NodeResult
object.
Return URI encoded string generated from Search::Estraier::Condition
my $args = $node->cond_to_query( $cond, $depth );
This is method which uses LWP::UserAgent
to communicate with Hyper Estraier node
master.
my $rv = shuttle_url( $url, $content_type, $req_body, \$resbody );
$resheads
and $resbody
booleans controll if response headers and/or response
body will be saved within object.
Set width of snippets in results
$node->set_snippet_width( $wwidth, $hwidth, $awidth );
$wwidth
specifies whole width of snippet. It's 480
by default. If it's 0
snippet
is not sent with results. If it is negative, whole document text is sent instead of snippet.
$hwidth
specified width of strings from beginning of string. Default
value is 96
. Negative or zero value keep previous value.
$awidth
specifies width of strings around each highlighted word. It's 96
by default.
If negative of zero value is provided previous value is kept unchanged.
Manage users of node
$node->set_user( 'name', $mode );
$mode
can be one of:
delete account
set administrative right for user
set user account as guest
Return true on success, otherwise false.
Manage node links
$node->set_link('http://localhost:1978/node/another', 'another node label', $credit);
If $credit
is negative, link is removed.
my @admins = @{ $node->admins };
Return array of users with admin rights on node
my @guests = @{ $node->guests };
Return array of users with guest rights on node
my $links = @{ $node->links };
Return array of links for this node
Return cache usage for a node
my $cache = $node->cacheusage;
Set actions on Hyper Estraier node master (estmaster
process)
$node->master( action => 'sync' );
All available actions are documented in http://hyperestraier.sourceforge.net/nguide-en.html#protocol
You could call those directly, but you don't have to. I hope.
Set information for node
$node->_set_info;
Clear information for node
$node->_clear_info;
On next call to name
, label
, doc_num
, word_num
or size
node
info will be fetch again from Hyper Estraier.
Nothing.
http://hyperestraier.sourceforge.net/
Hyper Estraier Ruby interface on which this module is based.
Hyper Estraier now also has pure-perl binding included in distribution. It's
a faster way to access databases directly if you are not running
estmaster
P2P server.
Dobrica Pavlinusic, <dpavlin@rot13.org>
Robert Klep <robert@klep.name> contributed refactored search code
Copyright (C) 2005-2006 by Dobrica Pavlinusic
This library is free software; you can redistribute it and/or modify it under the GPL v2 or later.
2008-01-20 16:51:47 dpavlin r199
/trunk/lib/Search/Estraier.pm: version bump [0.09]
2008-01-20 16:50:59 dpavlin r198
/trunk/t/5_Node.t: fix RT #32457: Victim of Test-Simple 0.74
2007-01-05 22:19:01 dpavlin r197
/trunk/scripts/est-spider: don't exit from sub with next
2006-11-26 12:06:08 dpavlin r196
/trunk/scripts/est-spider: added --skip-images option
2006-11-14 16:39:08 dpavlin r195
/trunk/scripts/dbi-indexer.pl: added new --dbi and --quiet command-line options, saner defaults
2006-11-11 23:34:55 dpavlin r194
/trunk/lib/Search/Estraier.pm, /trunk/MANIFEST, /trunk/lib, /trunk/Makefile.PL,
/trunk/Estraier.pm, /trunk/lib/Search: reorg directory structure to lib/Search/Estraier.pm
2006-11-05 16:28:59 dpavlin r193
/cpan/0.08: CPAN release 0.08
2006-11-05 16:28:31 dpavlin r192
/trunk/t/5_Node.t: fix warning
2006-11-05 16:26:57 dpavlin r191
/trunk/Estraier.pm: bump version to 0.08
2006-11-05 16:25:56 dpavlin r190
/trunk/t/5_Node.t: actually wrap $cond->set_distinct in ok() :-)
2006-11-05 16:23:03 dpavlin r189
/trunk/t/5_Node.t: test set_distinct
2006-11-05 16:08:08 dpavlin r188
/trunk/t/2_Condition.t: added tests for set_distinct and distinct
2006-11-05 16:01:36 dpavlin r187
/trunk/t/5_Node.t: added tests for set_score and score
2006-11-05 15:53:13 dpavlin r186
/trunk/Estraier.pm: removed debugging output
2006-11-05 15:53:01 dpavlin r185
/trunk/t/1_Document.t: add tests for set_score and score
2006-11-04 13:10:29 dpavlin r184
/trunk/Estraier.pm: set_distinct and set_score patch from Mikio Hirabayashi <mikio@users.sourceforge.net>
2006-08-31 14:43:06 dpavlin r183
/trunk/scripts/dbi-indexer.pl: separate authorisation for estraier (estuser,estpasswd) and database (dbuser,dbpasswd)
2006-08-26 22:35:15 dpavlin r182
/trunk/scripts/est-spider: --force will now skip checking of mtime (as it should)
2006-08-26 22:33:34 dpavlin r181
/trunk/scripts/est-spider: remove script from html when convertin it to text
2006-08-26 22:30:13 dpavlin r180
/trunk/scripts/est-spider: fix deletection of external binaries
2006-08-25 11:59:04 dpavlin r179
/trunk/scripts/est-spider: parse windows help file index (hhc) if available for page titles
2006-08-15 16:38:06 dpavlin r178
/trunk/scripts/est-spider: sync master at end of indexing
2006-08-06 19:29:28 dpavlin r177
/cpan/0.07: release 0.07 to CPAN
2006-08-06 18:43:58 dpavlin r176
/trunk/Estraier.pm: release 0.07
2006-08-06 18:38:51 dpavlin r175
/trunk/MANIFEST: fix manifest
2006-08-06 18:15:56 dpavlin r174
/trunk/Estraier.pm: fixed docs
2006-08-06 18:15:11 dpavlin r173
/trunk/Estraier.pm, /trunk/t/2_Condition.t, /trunk/t/5_Node.t: added $cond->set_mask
2006-08-06 17:20:09 dpavlin r172
/trunk/t/9_pod-coverage.t: test pod coverage
2006-08-06 17:19:51 dpavlin r171
/trunk/t/9_pod.t,
/trunk/t/99_pod.t: rename pod test
2006-08-06 17:15:01 dpavlin r170
/trunk/t/5_Node.t: test $nres->hint()
2006-08-06 16:42:39 dpavlin r169
/trunk/t/5_Node.t: test error handling of $node->get_doc
2006-08-06 16:42:06 dpavlin r168
/trunk/Estraier.pm: test error handling of $node->get_doc
2006-08-06 16:29:34 dpavlin r167
/trunk/t/5_Node.t: test $cond->skip
2006-08-06 12:48:02 dpavlin r166
/trunk/t/1_Document.t, /trunk/Estraier.pm: add_vectors added [0.07_3] and fixed vector handling which was broken
2006-08-06 12:19:37 dpavlin r165
/trunk/scripts/cpanest: create index if it doesn't exist
2006-08-06 12:19:19 dpavlin r164
/trunk/Estraier.pm: documentation improvements
2006-08-02 21:51:31 dpavlin r163
/trunk/Makefile.PL: added cover target to run Devel::Cover
2006-06-27 22:50:25 dpavlin r162
/trunk/Makefile.PL: fine-tune cpan target
2006-06-27 22:38:20 dpavlin r161
/cpan/0.07_2: CPAN pre-release 0.07_2
2006-06-24 15:34:42 dpavlin r160
/trunk/Estraier.pm, /trunk/t/5_Node.t: added cacheusage, version bumped to 0.07_2
2006-05-25 19:18:14 dpavlin r159
/trunk/scripts/dbi-indexer.pl: added command-line options and debug levels with increasing verbosity
2006-05-22 14:48:14 dpavlin r158
/trunk/Makefile.PL: new target to make cpan distribution
2006-05-22 14:43:56 dpavlin r157
/cpan/0.07_1: CPAN release 0.07_1
2006-05-22 14:42:10 dpavlin r156
/trunk/t/5_Node.t, /trunk/scripts/cpanest, /trunk/Estraier.pm: pre-release of 0.07_1
2006-05-18 14:31:42 dpavlin r155
/trunk/Estraier.pm, /trunk/t/5_Node.t: bugfix: set_skip now really work
2006-05-16 16:05:23 dpavlin r154
/trunk/Estraier.pm: send correct Content-type for set_user
2006-05-16 16:01:09 dpavlin r153
/trunk/t/5_Node.t: added usage of EST_USER and EST_PASSWD enviroment variables for credentials if they exists (otherwise, it will fallback to admin:admin)
2006-05-16 12:11:39 dpavlin r152
/trunk/t/5_Node.t: skip tests if estmaster isn't running, optional way to test against remove Hyper Estraier server using:
ESTMASTER_URI=http://estraier.example.com:1978 make test
2006-05-16 11:39:53 dpavlin r151
/trunk/Estraier.pm, /trunk/t/5_Node.t: added _clear_info which is called in cases where comands modify stats about node (which will force re-read of those data from Hyper Estraier on next request), explanded test suite
2006-05-15 22:26:08 dpavlin r150
/trunk/Estraier.pm, /trunk/t/5_Node.t: call _set_info to refresh data about node after calling out_doc*
2006-05-15 22:11:22 dpavlin r149
/trunk/Estraier.pm: refresh _set_info after sync
2006-05-15 22:06:14 dpavlin r148
/trunk/t/5_Node.t: extended node tests
2006-05-10 21:41:35 dpavlin r147
/trunk/scripts/dbi-indexer.pl: added pk_col to config hash
2006-05-10 21:33:32 dpavlin r146
/trunk/scripts/dbi-indexer.pl: display rows/s rate
2006-05-10 21:09:05 dpavlin r145
/trunk/scripts/dbi-indexer.pl: added db_encoding
2006-05-10 20:31:02 dpavlin r144
/trunk/scripts/dbi-indexer.pl: create node if needed, moved config into hash
2006-05-10 16:54:23 dpavlin r143
/cpan/0.06: CPAN release 0.06
2006-05-10 14:57:50 dpavlin r142
/trunk/MANIFEST, /trunk/Estraier.pm: getting ready for 0.06
2006-05-10 14:52:28 dpavlin r141
/trunk/scripts/estcp-mt.pl, /trunk/scripts/estcp.pl: estcp scripts cleanup for creating nodes (they now copy source label too)
2006-05-10 14:08:34 dpavlin r140
/trunk/Estraier.pm, /trunk/t/5_Node.t: fix interaction of create and croak_on_error, added tests for it
2006-05-10 13:45:08 dpavlin r139
/trunk/Estraier.pm, /trunk/t/5_Node.t: added create and label to new Search::Estraier::Node, so that nodes will be automatically created if needed.
2006-05-10 13:34:17 dpavlin r138
/trunk/t/5_Node.t: better explanation of 46 nodes limit (with just 1024 file descriptors) in Hyper Estraier.
2006-05-09 14:05:57 dpavlin r137
/trunk/t/5_Node.t: test whole new ->master API except for shutdown, backup and logrtt (those operations are specific to Hyper Estraier installation and sysadmins might not appreciate tests which mess system services)
2006-05-09 14:03:36 dpavlin r136
/trunk/Estraier.pm: fix parametar handling for ->master, chomp response body so that it doesn't end with lf (so you can include it in messages)
2006-05-09 12:42:39 dpavlin r135
/trunk/Estraier.pm, /trunk/t/5_Node.t: fixes and tweaks for master
2006-05-09 12:21:26 dpavlin r134
/trunk/t/5_Node.t, /trunk/Estraier.pm: added Search::Estraier::Node->master to controll estmaster and beginning of tests for it
2006-05-08 21:34:00 dpavlin r133
/trunk/scripts/estcp.pl: copy with admin priviledges
2006-05-08 21:33:37 dpavlin r132
/trunk/Estraier.pm: document and actually implement (but, used in examples, uf, uf) shortcut to specify user and passwd directly to Search::Estraier::Node instead of calling set_auth afterwards.
2006-05-08 21:05:32 dpavlin r131
/trunk/scripts/dbi-indexer.pl: added example script to index DBI table
2006-05-08 20:47:48 dpavlin r130
/trunk/t/5_Node.t: fix number of tests to skip
2006-05-08 12:01:00 dpavlin r129
/trunk/t/5_Node.t: test search without results
2006-05-08 12:00:43 dpavlin r128
/trunk/Estraier.pm: removed old implementation of search in favor of refactored code contributed by Robert Klep
2006-05-06 22:09:01 dpavlin r127
/trunk/scripts/bench_search.pl: script to benchmark old and new implementation of search
2006-05-06 21:38:14 dpavlin r126
/trunk/Estraier.pm: Better implementation of search by Robert Klep <robert@klep.name>
2006-05-03 14:25:40 dpavlin r125
/cpan/0.05: CPAN release 0.05
2006-05-03 14:24:56 dpavlin r124
/trunk/MANIFEST: remove foo (how did it get there?)
2006-05-03 14:23:55 dpavlin r123
/trunk/MANIFEST, /trunk/Estraier.pm: prepare for relese 0.05
2006-05-02 10:19:47 dpavlin r122
/trunk/Estraier.pm: fixed warning
2006-04-17 10:38:17 dpavlin r121
/trunk/scripts/est-spider: skip directories without --all
2006-04-17 10:34:14 dpavlin r120
/trunk/scripts/est-spider: dump_draft is now triggered by --debug not --verbose
2006-04-17 10:31:11 dpavlin r119
/trunk/scripts/est-spider: added croak_on_error, fixed filename and filetype handling (so indexing now actually store content again), added --all option to index file paths of all files
2006-04-16 23:22:54 dpavlin r118
/trunk/scripts/est-spider: dump statistics at end
2006-03-12 19:43:21 dpavlin r117
/trunk/t/5_Node.t: fix skip count
2006-03-12 18:43:24 dpavlin r116
/trunk/Estraier.pm: implemeted Search::Estraier::Condition set_skip and skip (which requires HyperEstraier 1.1.4)
2006-03-12 18:42:34 dpavlin r115
/trunk/t/5_Node.t: added tests for get_doc
2006-03-12 15:26:32 dpavlin r114
/cpan/0.04: CPAN release 0.04
2006-03-12 15:25:06 dpavlin r113
/trunk/MANIFEST: updated
2006-03-12 15:20:06 dpavlin r112
/trunk/Estraier.pm: version 0.04 ready for CPAN
2006-02-21 15:41:57 dpavlin r111
/trunk/Estraier.pm: store all values from _set_info in $self->{inform}
2006-02-21 15:40:54 dpavlin r110
/trunk/t/5_Node.t: support 0 sa word_num (if HyperEstraier didn't have time to sync to disk)
2006-02-20 21:21:04 dpavlin r109
/trunk/t/5_Node.t: fix test if no test2 exists
2006-02-19 17:13:57 dpavlin r108
/trunk/Estraier.pm: fix typo
2006-02-19 17:01:49 dpavlin r107
/trunk/t/5_Node.t, /trunk/Estraier.pm: added node methods admins, guests and links, set_link now refresh info
2006-02-19 14:26:21 dpavlin r106
/trunk/t/5_Node.t: another fix for last few tests
2006-02-19 13:50:50 dpavlin r105
/trunk/t/5_Node.t: fix tests without node test1
2006-02-19 13:50:36 dpavlin r104
/trunk/Makefile.PL: make ChangeLog from svk
2006-01-28 20:44:15 dpavlin r103
/trunk/Estraier.pm, /trunk/t/5_Node.t: uri_to_id doesn't croak, even when used with croak_on_error
2006-01-28 19:46:20 dpavlin r102
/trunk/scripts/example_searcher.pl, /trunk/Estraier.pm, /trunk/scripts/example_indexer.pl: more documentation update
2006-01-28 19:43:23 dpavlin r101
/trunk/scripts/example_indexer.pl, /trunk/scripts/example_searcher.pl: updated example scripts
2006-01-28 19:41:59 dpavlin r100
/trunk/Estraier.pm, /trunk/t/5_Node.t: added $res->hits to get number of hits from estmaster hints
2006-01-28 19:19:25 dpavlin r99
/trunk/t/1_Document.t: removed debugging output
2006-01-28 19:18:13 dpavlin r98
/trunk/Estraier.pm, /trunk/t/2_Condition.t: improved $cond->set_options to support one or more arguments and reset options on each call, e.g. $cond->set_options('SURE') or $cond->set_options(qw/SURE NOIDF/)
2006-01-28 18:19:47 dpavlin r97
/trunk/Estraier.pm, /trunk/t/1_Document.t: another fix for empty values
2006-01-28 17:58:22 dpavlin r96
/trunk/Estraier.pm: fix for 0 values
2006-01-28 17:55:48 dpavlin r95
/trunk/t/1_Document.t: test handling of attributes with value 0
2006-01-28 17:38:00 dpavlin r94
/trunk/t/5_Node.t: cleanup test
2006-01-28 16:43:45 dpavlin r93
/trunk/Estraier.pm: Hyper Estraier 1.0.6 doesn't like attributes with no value (undef in perl), so we skip them in dump_draft
2006-01-26 15:29:20 dpavlin r92
/trunk/t/5_Node.t: fix number of tests skipped if test node is missing
2006-01-26 01:53:58 dpavlin r91
/trunk/t/5_Node.t, /trunk/Estraier.pm: added hints to return all hints from server
2006-01-26 01:53:29 dpavlin r90
/trunk/scripts/est-spider: created separate filter_to_pages sub, added text/postscript support via pstotext
2006-01-25 23:38:57 dpavlin r89
/trunk/scripts/est-spider: removed dependency on (optional in the first place) native HyperEstraier module
2006-01-21 18:25:09 dpavlin r88
/trunk/scripts/estcp.pl: fix URL extraction, make it less chatty (without -d flag)
2006-01-21 17:37:07 dpavlin r87
/trunk/scripts/estcp-mt.pl: fixed node URL extraction, put -1 marker on queue at end so that threads will finish after all documents are processed
2006-01-19 14:33:33 dpavlin r86
/trunk/scripts/estcp-mt.pl: multi-threaded version of estcp
2006-01-17 15:00:50 dpavlin r85
/trunk/scripts/estcp.pl: create desintaion node if it doesn't exist
2006-01-17 11:43:38 dpavlin r84
/trunk/scripts/estcp.pl: don't parse draft into document and back
2006-01-17 00:41:18 dpavlin r83
/trunk/scripts/estcp.pl: fixed to stop coping and prevent cumulation of results
2006-01-17 00:17:50 dpavlin r82
/trunk/scripts/estcp.pl: much better output of progress (requires Time::HiRes)
2006-01-17 00:03:45 dpavlin r81
/trunk/Estraier.pm: allow null (undef in perl) values. Hyper Estraier seems to store them, so we should also support them.
2006-01-16 23:08:07 dpavlin r80
/trunk/scripts/estcp.pl: copy Hyper Estraier index from one node to another
2006-01-16 21:47:21 dpavlin r79
/trunk/t/5_Node.t: test fix
2006-01-16 21:42:09 dpavlin r78
/trunk/Estraier.pm, /trunk/t/5_Node.t: added croak_on_error
2006-01-16 21:34:14 dpavlin r77
/trunk/scripts/est-spider, /trunk/Estraier.pm: fix warning if called without $node->set_auth (anonymous access)
2006-01-16 21:19:44 dpavlin r76
/trunk/Estraier.pm: return $node->{status} and $node->{status_message} if request wasn't succesful
2006-01-16 21:18:50 dpavlin r75
/cpan/0.03: CPAN release 0.03
2006-01-09 15:28:24 dpavlin r74
/trunk/Estraier.pm: 0.03 final
2006-01-09 15:26:50 dpavlin r73
/trunk/t/5_Node.t: fix to work without Hyper Estraier server installed
2006-01-09 15:22:43 dpavlin r72
/trunk/t/5_Node.t: tests now check if nodes 'test1' and 'test2' exists, and it they doesn't skip tests which connect to HyperEstraier
2006-01-09 15:22:05 dpavlin r71
/trunk/Estraier.pm: fix set_link content-type
2006-01-08 16:50:34 dpavlin r70
/cpan, /cpan/0.03_1: 0.03_1
2006-01-08 16:49:53 dpavlin r69
/trunk/MANIFEST, /trunk/Estraier.pm: 0.03_1 on the way to CPAN
2006-01-08 00:13:09 dpavlin r68
/trunk/scripts/example_searcher.pl, /trunk/Estraier.pm, /trunk/scripts/example_indexer.pl: two simple examples included under synopsis in documentation
2006-01-07 23:50:51 dpavlin r67
/trunk/MANIFEST: add META.yml
2006-01-07 23:48:59 dpavlin r66
/trunk/MANIFEST: added example scripts
2006-01-07 23:48:16 dpavlin r65
/trunk/Estraier.pm: add optional node parametar to new Search::Estraier::Node
2006-01-07 23:46:10 dpavlin r64
/trunk/scripts/cpanest, /trunk/scripts/est-spider, /trunk/scripts: added example scripts to crawl filesystem and index cpan
2006-01-07 16:19:31 dpavlin r63
/trunk/Estraier.pm: fix warning
2006-01-07 02:40:57 dpavlin r62
/trunk/Estraier.pm: requre just uri for ResultDocument, all other parametars are optional
2006-01-07 01:21:28 dpavlin r61
/trunk/Estraier.pm: transfer depth to cond_to_query
2006-01-07 00:00:15 dpavlin r60
/trunk/t/1_Document.t, /trunk/Estraier.pm: added few checks to better handle empty documents, array return is not enforced any more.
2006-01-06 23:29:58 dpavlin r59
/trunk/Makefile.PL, /trunk/Estraier.pm: replaced my broken socket code with LWP::UserAgent (as should I really done from beginning)
2006-01-06 21:05:05 dpavlin r58
/trunk/Estraier.pm: fix _set_info size (multiple nls)
2006-01-06 20:58:26 dpavlin r57
/trunk/Estraier.pm, /trunk/t/5_Node.t: added set_link (not working?) and moved debug to option
2006-01-06 20:45:48 dpavlin r56
/trunk/Estraier.pm, /trunk/t/5_Node.t: added set_user
2006-01-06 20:39:58 dpavlin r55
/trunk/t/5_Node.t, /trunk/Estraier.pm: move set_info to private _set_info, added set_snippet_width
2006-01-06 18:35:53 dpavlin r54
/trunk/Makefile.PL: don't leave Makefile.old after make clean (we really need to remove Makefile before clean target does mv Makefile Makefile.old)
2006-01-06 14:39:45 dpavlin r53
/trunk/t/4_NodeResult.t, /trunk/t/5_Node.t, /trunk/Estraier.pm: search work (Content-type and attributes fix), NodeResult->doc_num now return proper number of hits (and not index of last one which isi doc_num - 1)
2006-01-06 14:10:29 dpavlin r52
/trunk/Estraier.pm: search which works
2006-01-06 13:19:50 dpavlin r51
/trunk/t/5_Node.t, /trunk/Estraier.pm: cond_to_query needed for search (which is under construction)
2006-01-06 12:48:14 dpavlin r50
/trunk/t/5_Node.t, /trunk/Estraier.pm: added uri_escape where needed, fix edit_doc test
2006-01-06 12:40:23 dpavlin r49
/trunk/t/5_Node.t, /trunk/Makefile.PL, /trunk/Estraier.pm: added get_doc_attr and get_doc_attr_by_uri by (again) extending _fetch_doc, fixed etch_doc (typo in name)
2006-01-06 02:07:10 dpavlin r48
/trunk/Estraier.pm, /trunk/t/5_Node.t: added name, label, doc_num, word_num and size properties for which I had to implement set_info.
2006-01-06 01:51:28 dpavlin r47
/trunk/Estraier.pm: more checks, but still no hope for edit_doc
2006-01-06 01:40:04 dpavlin r46
/trunk/t/5_Node.t: small tweaks and corrections to tests
2006-01-06 01:36:09 dpavlin r45
/trunk/t/5_Node.t, /trunk/Estraier.pm: uri_to_id and important fix for _fetch_doc
2006-01-06 01:12:10 dpavlin r44
/trunk/t/5_Node.t, /trunk/Estraier.pm: added etch_doc and etch_doc_by_uri by extending _fetch_doc
2006-01-06 00:04:28 dpavlin r43
/trunk/Estraier.pm, /trunk/t/5_Node.t: better error messages, added get_doc and get_doc_by_uri
2006-01-05 23:38:32 dpavlin r42
/trunk/Estraier.pm, /trunk/t/5_Node.t: edit_doc, add massive amount of vertical whitespace to make source more readable
2006-01-05 23:32:31 dpavlin r41
/trunk/Estraier.pm, /trunk/t/5_Node.t: out_doc, out_doc_by_uri
2006-01-05 23:00:22 dpavlin r40
/trunk/t/5_Node.t, /trunk/Estraier.pm: a lot of tuning and fixes, and put_doc which works!
2006-01-05 22:36:10 dpavlin r39
/trunk/Estraier.pm, /trunk/t/5_Node.t: added optional parametar to Node to turn on debugging (which isn't documented and probably won't be because it spits output using warn)
2006-01-05 22:27:03 dpavlin r38
/trunk/Estraier.pm, /trunk/t/5_Node.t: more fun with http
2006-01-05 22:16:21 dpavlin r37
/trunk/Estraier.pm, /trunk/t/5_Node.t: much better error messages
2006-01-05 21:51:54 dpavlin r36
/trunk/Makefile.PL, /trunk/Estraier.pm: base64 encode basic auth data. Oh, why didn't I just use LWP?
2006-01-05 21:51:29 dpavlin r35
/trunk/t/5_Node.t: shuttle_url test which fails (and it shouldn't)
2006-01-05 21:09:53 dpavlin r34
/trunk/MANIFEST: fix tests
2006-01-05 17:54:18 dpavlin r33
/trunk/Makefile.PL, /trunk/Estraier.pm: implemetation of shuttle_url (using IO::Socket::INET instead of LWP for speed)
2006-01-05 15:38:34 dpavlin r32
/trunk/Estraier.pm, /trunk/t/5_Node.t: status
2006-01-05 15:36:25 dpavlin r31
/trunk/t/5_Node.t, /trunk/Estraier.pm: set_auth
2006-01-05 15:33:48 dpavlin r30
/trunk/Estraier.pm, /trunk/t/5_Node.t: set_timeout
2006-01-05 15:30:35 dpavlin r29
/trunk/t/5_Node.t, /trunk/Estraier.pm: set_url, set_proxy
2006-01-05 15:21:41 dpavlin r28
/trunk/t/5_Node.t: test
2006-01-05 15:21:29 dpavlin r27
/trunk/Estraier.pm: begin work on Search::Estraier::Node
2006-01-05 15:05:58 dpavlin r26
/trunk/t/1_Document.t: cleanup
2006-01-05 15:01:56 dpavlin r25
/trunk/Estraier.pm, /trunk/t/4_NodeResult.t: implemented Search::Estraier::NodeResult
2006-01-05 14:33:05 dpavlin r24
/trunk/Estraier.pm: cleanup
2006-01-05 14:30:42 dpavlin r23
/trunk/t/3_ResultDocument.t, /trunk/Estraier.pm: finished ResultDocument
2006-01-05 13:55:55 dpavlin r22
/trunk/t/1_document.t, /trunk/t/1_Document.t,/trunk/t/2_condition.t, /trunk/t/2_Condition.t: change case of test files
2006-01-05 13:55:17 dpavlin r21
/trunk/t/3_ResultDocument.t: test
2006-01-05 13:55:06 dpavlin r20
/trunk/Estraier.pm: begin work on Search::HyperEstraier::ResultDocument
2006-01-04 23:10:48 dpavlin r19
/trunk/t/2_condition.t, /trunk/Estraier.pm: finished Condition adding orders, attrs, max and options
2006-01-04 22:48:29 dpavlin r18
/trunk/Estraier.pm, /trunk/t/2_condition.t: phrase
2006-01-04 22:46:16 dpavlin r17
/trunk/t/2_condition.t: missing test for set_options (in last commit)
2006-01-04 22:43:24 dpavlin r16
/trunk/Estraier.pm, /trunk/t/2_condition.t: Search::Estraier::Condition, new, set_phrase, set_order, set_max
2006-01-04 22:24:57 dpavlin r15
/trunk/MANIFEST, /trunk/Estraier.pm, /trunk/t/2_condition.t: begin work on Search::Estraier::Condition, _s moved to Search::Estraier which other modules inherit
2006-01-04 21:51:01 dpavlin r14
/trunk/t/1_document.t, /trunk/Estraier.pm: new Document now accepts draft.
2006-01-04 19:37:38 dpavlin r13
/trunk/Estraier.pm: added implementation of dump_draft
2006-01-04 19:28:30 dpavlin r12
/trunk/t/1_document.t, /trunk/Estraier.pm: added cat_texts
2006-01-04 15:50:08 dpavlin r11
/trunk/t/1_document.t, /trunk/Estraier.pm: fix texts
2006-01-04 15:48:00 dpavlin r10
/trunk/t/1_document.t, /trunk/Estraier.pm: demonstrate bug with texts
2006-01-04 15:28:39 dpavlin r9
/trunk/t/1_document.t, /trunk/Makefile.PL, /trunk/Estraier.pm: added texts, fixed add_attr to delete atributes, tests now pass
2006-01-04 15:04:58 dpavlin r8
/trunk/Estraier.pm, /trunk/t/1_document.t: added $doc->attr('name'), fixed $doc->add_attr('name','value');
2006-01-04 14:57:27 dpavlin r7
/trunk/Estraier.pm: added attr_names
2006-01-04 14:48:11 dpavlin r6
/trunk/Estraier.pm, /trunk/t/1_document.t: added id, documentation, rename of vars in test
2006-01-04 14:38:35 dpavlin r5
/trunk/Estraier.pm: add_text, add_hidden_text
2006-01-04 13:33:07 dpavlin r4
/trunk/Estraier.pm, /trunk/t/1_document.t: added $doc->delete and internal _s
2006-01-04 13:13:06 dpavlin r3
/trunk/MANIFEST, /trunk/Makefile.PL: fix for path modifications
2006-01-04 13:11:43 dpavlin r2
/trunk/t, /trunk/t/1_document.t, /trunk/MANIFEST, /trunk/Makefile.PL, /trunk/t/99_pod.t, /trunk/Estraier.pm: begin work on pure perl implementation of HyperEstraier module
2006-01-04 13:11:32 dpavlin r1
/trunk: Directory for svk import.