Download this archive and install it. You can also find this module
on CPAN, so you can also install it via
CPAN shell.
jsFind-0.06.tar.gz 34 Kb | |
Latest source is always available from Subversion repository |
jsFind - generate index for full text search engine in JavaScript
use jsFind; my $t = new jsFind(B => 4); my $f = 1; foreach my $k (qw{minima ut dolorem sapiente voluptatem}) { $t->B_search(Key => $k, Data => { "path" => { t => "word $k", f => $f }, }, Insert => 1, Append => 1, ); }
This module can be used to create index files for jsFind, powerful tool for adding a search engine to a CDROM archive or catalog without requiring the user to install anything.
Main difference between this module and scripts delivered with jsFind are:
search.html
pageYou can also examine examples which come as tests with this module,
for example t/04words.t
or t/10homer.t
.
jsFind search engine was written by Shawn Garbett from eLucid Software. The search engine itself is a small piece of JavaScript (1.2 with level 2 DOM). It is easily customizable to fit into a current set of HTML. This JavaScript searches an XML index dataset for the appropriate links, and can filter and sort the results.
JavaScript code distributed with this module is based on version 0.0.3 which
was current when this module development started. Various changes where done
on JavaScript code to fix bugs, add features and remove warnings. For
complete list see Changes
file which comes with distribution.
This module has been tested using html/test.html
with following browsers:
using DOM 2 document.implementation.createDocument
using ActiveX Microsoft.XMLDOM
or MSXML2.DOMDocument
using DOM 2 document.implementation.createDocument
using experimental iframe implementation which is much slower than other methods.
If searching doesn't work for your combination of operating system and
browser, please open html/test.html
file and wait a while. It will search sample
file included with distribution and report results. Reports with included
test debugging are welcomed.
jsFind
is mode implementing methods which you, the user, are going to
use to create indexes.
Create new tree. Arguments are B
which is maximum numbers of keys in
each node and optional Root
node. Each root node may have child nodes.
All nodes are objects from jsFind::Node
.
my $t = new jsFind(B => 4);
Search, insert, append or replace data in B-Tree
$t->B_search( Key => 'key value', Data => { "path" => { "t" => "title of document", "f" => 99, }, }, Insert => 1, Append => 1, );
Semantics:
If key not found, insert it iff Insert
argument is present.
If key is found, replace existing data iff Replace
argument
is present or add new datum to existing iff Append
argument is present.
Return B (maximum number of keys)
my $max_size = $t->B;
Returns root node
my $root = $t->root;
Returns if node is overfull
if ($node->node_overfull) { something }
Returns your tree as formatted string.
my $text = $root->to_string;
Mostly usefull for debugging as output leaves much to be desired.
Create Graphviz graph of your tree
my $dot_graph = $root->to_dot;
Create xml index files for jsFind. This should be called after your B-Tree has been filled with data.
$root->to_jsfind( dir => '/full/path/to/index/dir/', data_codepage => 'ISO-8859-2', index_codepage => 'UTF-8', output_filter => sub { my $t = shift || return; $t =~ s/è/e/; } );
All options except dir
are optional.
Returns number of nodes in created tree.
Options:
Full path to directory for index (which will be created if needed).
If your imput data isn't in ISO-8859-1
encoding, you will have to specify
this option.
If your index encoding is not UTF-8
use this option.
If you are not using supplied JavaScript search code, or your browser is terribly broken and thinks that index shouldn't be in UTF-8 encoding, use this option to specify encoding for created XML index.
this is just draft of documentation for option which is not implemented!
Code ref to sub which can do modifications on resulting XML file for node.
Encoding of this data will be in index_codepage and you have to take care
not to break XML structure. Calling xmllint on your result index
(like t/90xmllint.t
does in this distribution) is a good idea after using
this option.
This option is also right place to plug in unaccenting function using Text::Unaccent.
This is internal function to recode charset.
It will also try to decode entities in data using HTML::Entities.
Each node has k
key-data pairs, with B
<= k
<= 2B
, and
each has k+1
subnodes, which might be null.
The node is a blessed reference to a list with three elements:
($keylist, $datalist, $subnodelist)
each is a reference to a list list.
The null node is represented by a blessed reference to an empty list.
Create New node
my $node = new jsFind::Node ($keylist, $datalist, $subnodelist);
You can also mit argument list to create empty node.
my $empty_node = new jsFind::Node;
Locate key in node using linear search. This should probably be replaced by binary search for better performance.
my ($found, $index) = $node->locate_key($key, $cmp_coderef);
Argument $cmp_coderef
is optional reference to custom comparison
operator.
Returns (1, $index) if $key[$index] eq $key.
Returns (0, $index) if key could be found in $subnode[$index].
In scalar context, just returns 1 or 0.
Creates new empty node
$node = $root->emptynode; $new_node = $node->emptynode;
Test if node is empty
if ($node->is_empty) { something }
Return $i
th key from node
my $key = $node->key($i);
Return $i
th data from node
my $data = $node->data($i);
Set key data pair for $i
th element in node
$node->kdp_replace($i, "key value" => { "data key 1" => "data value 1", "data key 2" => "data value 2", };
Insert key/data pair in tree
$node->kdp_insert("key value" => "data value");
No return value.
Adds new data keys and values to $i
th element in node
$node->kdp_append($i, "key value" => { "added data key" => "added data value", };
Set new or return existing subnode
# return 4th subnode my $my_node = $node->subnode(4); $node->subnode(5, $my_node);
Test if node is leaf
if ($node->is_leaf) { something }
Return number of keys in the node
my $nr = $node->size;
Split node into two halves so that keys 0 .. $n-1
are in one node
and keys $n+1 ... $size
are in the other.
my ($left_node, $right_node, $kdp) = $node->halves($n);
Dumps tree as string
my $str = $root->to_string;
Recursivly walk nodes of tree
Escape <, >, & and ", and to produce valid XML
Convert number to base x (used for jsFind index filenames).
my $n = $tree->base_x(50);
Create jsFind xml files
my $nr=$tree->to_jsfind('/path/to/index','0');
Returns number of elements created
jsFind web site http://www.elucidsoft.net/projects/jsfind/
B-Trees in perl web site http://perl.plover.com/BTree/
This module web site http://www.rot13.org/~dpavlin/jsFind.html
Mark-Jonson Dominus <mjd@pobox.com> wrote BTree.pm
which was
base for this module
Shawn P. Garbett <shawn@elucidsoft.net> wrote jsFind
Dobrica Pavlinusic <dpavlin@rot13.org> wrote this module
Copyright (C) 2004 by Dobrica Pavlinusic
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
2008-02-24 20:04:59 dpavlin r43
/trunk/t/04words.t: fix tests
2005-10-19 23:54:21 dpavlin r42
/trunk/html/js/usage.js: cludge for Opera (no header update, ugly, should have own id)
2005-10-19 23:06:03 dpavlin r41
/trunk/html/search.html: fixes dreaded "Operation aborted" message on Internet Explorer by playing nice and not trying to modify any elements which are not yet drawn. So, we fire search onLoad, as we should (to catch results and debug div, for example).
2005-10-19 10:25:46 dpavlin r40
/trunk/t/90xmllint.t: better xmllint testing
2004-12-19 23:26:23 dpavlin r39
/trunk/jsFind.pm: support for older Export which doesn't export 'import'. This should make jsFind functional on perl 5.8.2 (on Darwin for example)
2004-10-31 01:34:01 dpavlin r38
/trunk/MANIFEST: update of MANIFEST before release
2004-10-30 21:48:31 dpavlin r37
/trunk/t/10homer.t: homer test can now index any text file supplied as first (and only) argument in command line. This is useful to generate test data from other sources. I don't recommend input files which are not plain 7-bit ASCII, because generated JavaScript array might have wrong encoding for 8-bit characters. This is serious problem. However, since JavaScript comparison and sort order are locale dependent, it's much easier to use something like Text::Unaccent on input data than to fix sort/comparison order (which could also be done, see my js_locale project)
2004-10-30 20:50:39 dpavlin r36
/trunk/jsFind.pm, /trunk/t/06base_x.t,
/trunk/t/06base62.t, /trunk/html/js/search.js, /trunk/t/10homer.t: remove all capital letters from base62 encoding which made it base 36 I guess. They are not supported under Windows (because filesystem is case preserving, you can burn files with capital and lower letters, but browser will read wrong one)
2004-10-24 11:13:22 dpavlin r35
/trunk/t/10homer.t, /trunk/t/04words.t, /trunk/jsFind.pm, /trunk/t/03insert.t, /trunk/t/05entities.t: new version 0.06 with API change on to_jsfind. It also documents (still unimplemented) output_filter option.
2004-10-10 05:11:29 dpavlin r34
/trunk/Changes, /trunk/MANIFEST.SKIP, /trunk/README, /trunk/MANIFEST, /trunk/Makefile.PL: last changes before packaging 0.05
2004-10-10 05:10:25 dpavlin r33
/trunk/jsFind.pm: Version 0.05: much better documentation, no API change
2004-10-10 05:07:37 dpavlin r32
/trunk/html/search.html: create debug box on bottom of page
2004-10-10 04:53:22 dpavlin r31
/trunk/html/js/search.js, /trunk/html/js/usage.js: experimental iframe implementation (mostly for Opera 7.54 without Java)
2004-10-10 01:19:36 dpavlin r30
/trunk/html/search.html: make XHTML 1.0 strict
2004-10-08 12:22:18 dpavlin r29
/trunk/html/test.html: generate color report, simplify code
2004-10-07 22:47:39 dpavlin r28
/trunk/t/10homer.t: create additional homer_freq.txt debug file
2004-10-07 22:43:49 dpavlin r27
/trunk/html/test.html: re-wrote test to create output with document.createElement, much better output, print just failed tests and some statistics (more verbose output of correct tests can be turned on by uncommenting one line)
2004-10-07 22:41:16 dpavlin r26
/trunk/html/js/search.js: create debug with appendChild, great speedup for test
2004-10-07 17:30:03 dpavlin r25
/trunk/html/test.html, /trunk/html/search.html, /trunk/t/10homer.t: create elements in test.html (instead of updating innerHTML), 10homer.t will now split document in lines instead of paragraphs.
2004-10-07 16:26:31 dpavlin r24
/trunk/Makefile.PL, /trunk/MANIFEST: fix MANIFEST, better clean
2004-10-07 16:21:54 dpavlin r23
/trunk/t/04words.t,
/trunk/t/10homer.pl, /trunk/t/03insert.t, /trunk/t/05entities.t, /trunk/t/10homer.t: convert all print STDERR to diag, rename homer test to end with .t (woops), create homer_words.txt and homer_text.txt, preserve order when inserting into index (it doesn't really matter for index, but it's nice to have debugging output which is correct)
2004-10-06 15:39:56 dpavlin r22
/trunk/MANIFEST: added Homer to manifest
2004-10-06 15:33:33 dpavlin r21
/trunk/html/test.html, /trunk/t/10homer.pl, /trunk/t/homer.txt,
/trunk/t/10ulyss.pl: added Homer's The Odyssey as test data
2004-10-04 19:51:05 dpavlin r20
/trunk/html/test.html, /trunk/t/10ulyss.pl: start of tests for jsFind
2004-10-03 21:26:37 dpavlin r19
/trunk/html/js/usage.js: added try{} around header.replaceData which doesn't exists in IE5.5, better scope header variable, Robert Avilov re-wrote form handler (thanks).
2004-10-03 21:24:57 dpavlin r18
/trunk/html/search.html, /trunk/html/js/search.js: re-wrote debug function to prevent Internet Explorer from returning "operation aborted", move xmldoc to outer scope to prevent garbage collector dereferencing it before XML loads (SeaMonkey/FireFox).
2004-09-13 14:36:06 dpavlin r17
/trunk/t/99pod.t: check if Test::Pod exists before testing
2004-09-05 18:00:46 dpavlin r16
/trunk/MANIFEST, /trunk/t/06base62.t: added forgot base62 test
2004-09-05 17:57:21 dpavlin r15
/trunk/Makefile.PL, /trunk/t/04words.t, /trunk/jsFind.pm: version 0.04: fix bug when creating jsFind index files without first encoding numbers in base62
2004-08-28 15:19:22 dpavlin r14
/trunk/Makefile.PL, /trunk/jsFind.pm, /trunk/README: final touches before first release to CPAN
2004-08-28 14:37:13 dpavlin r13
/trunk/MANIFEST: updated manifest
2004-08-28 14:31:58 dpavlin r12
/trunk/Makefile.PL, /trunk/jsFind.pm: documentation improvements
2004-07-26 20:30:12 dpavlin r11
/trunk/t/05entities.t, /trunk/jsFind.pm: to_jsfind will try to decode entities from data, and recode then to target encoding (UTF-8 by default)
2004-07-26 20:17:57 dpavlin r10
/trunk/t/90xmllint.t: xmllint test for produces jsFind index files
2004-07-21 23:37:49 dpavlin r9
/trunk/jsFind.pm: Version 0.02: API extension: to_jsfind now accepts also data and xml encoding as optional parametars
2004-07-21 15:44:15 dpavlin r8
/trunk/jsFind.pm: more fixes
2004-07-21 15:34:03 dpavlin r7
/trunk/jsFind.pm: documentaton fix and better info message
2004-07-20 17:59:47 dpavlin r6
/trunk/html/search.html, /trunk/html/js/usage.js: cleanup and fix
2004-07-20 17:47:30 dpavlin r5
/trunk/jsFind.pm, /trunk/MANIFEST: B_search documentation, updated MANIFEST
2004-07-20 17:08:06 dpavlin r4
/trunk/html/js/getargs.js, /trunk/t/04words.t, /trunk/html/search.html, /trunk/t/03insert.t, /trunk/html/js/usage.js, /trunk/html/js/search.js: support for searching more than one index from same page (using index_name variable which is actually a name of directory in which index is located)
2004-07-20 17:07:20 dpavlin r3
/trunk/Makefile.PL: make html documentation with "make html"
2004-07-11 21:15:44 dpavlin r2
/trunk/html/js/search.js: print "Noting Found" in all cases (especially on W2K, don't know why)
2004-07-11 20:18:25 dpavlin r1
/trunk/t, /trunk/html, /trunk/MANIFEST, /trunk/t/04words.t, /trunk/Changes, /trunk/t/02btree.t, /trunk/t/99pod.t, /trunk/t/03insert.t, /trunk/html/search.html, /trunk/html/js, /trunk/t/01load.t, /trunk/Makefile.PL, /trunk/html/js/getargs.js, /trunk/jsFind.pm, /trunk/README, /trunk/html/js/usage.js, /trunk/html/js/search.js, /trunk: initial import into subversion of version 0.1