Main

Search Engines Archives

June 26, 2007

Apache Lucene

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

Apache Lucene is an open source project available for free download. Please use the links on the left to access Lucene.

Features Lucene offers powerful features through a simple API:

Scalable, High-Performance Indexing

* over 20MB/minute on Pentium M 1.5GHz
* small RAM requirements -- only 1MB heap
* incremental indexing as fast as batch indexing
* index size roughly 20-30% the size of text indexed

Powerful, Accurate and Efficient Search Algorithms

* ranked searching -- best results returned first
* many powerful query types: phrase queries, wildcard queries, proximity queries, range queries and more
* fielded searching (e.g., title, author, contents)
* date-range searching
* sorting by any field
* multiple-index searching with merged results
* allows simultaneous update and searching

Cross-Platform Solution

* Available as Open Source software under the Apache License which lets you use Lucene in both commercial and Open Source programs
* 100%-pure Java
* Implementations in other programming languages available that are index-compatible

Lucene implementations in languages other than Java:

CLucene - Lucene implementation in C++
dotLucene - Lucene implementation in .NET
Lucene4c - Lucene implementation in C LuceneKit - Lucene implementation in Objective-C (Cocoa/GNUstep support)
Lupy - Lucene implementation in Python (RETIRED)
NLucene - another Lucene implementation in .NET (out of date)
Zend Search - Lucene implementation in the Zend Framework for PHP 5
Plucene - Lucene implementation in Perl KinoSearch - a new Lucene implementation in Perl PyLucene - GCJ-compiled version of Java Lucene integrated with Python
MUTIS - Lucene implementation in Delphi Ferret - Lucene implementation in Ruby

Lucene Project Home Page
http://lucene.apache.org/java/

Download Lucene Search Engine http://lucene.apache.org/java/docs/developer-resources.html

Lucene Documentation http://lucene.apache.org/java/docs/api/index.html

Egothor Search Engine

Egothor is an Open Source, high-performance, full-featured text search engine written entirely in Java. It is technology suitable for nearly any application that requires full-text search, especially cross-platform. It can be configured as a standalone engine, metasearcher, peer-to-peer HUB, and, moreover, it can be used as a library for an application that needs full-text search.

Key features of egothor

* Written in JAVA for cross platform compatibility.
* New dynamization algorithm for fast index updating.
* Fully 64-bit kernel
* Queries can be solved in a parallel manner
* Able to recognize the most familiar file formats: HTML, PDF, PS, and Microsoft's DOC, and XLS.
* High capacity robot which supports robots.txt recommendation.
* The best compression methods are used, i.e. Golomb, Elias-Gamma, Block coding.
* Based on the extended Boolean model which can operate as the Vector or Boolean models.
* Universal stemmer that processes any language.

The engine implements a special algorithm that lowers demand for I/O operation during index updating. Other Open Source packages do not offer the same feature and use suboptimal solutions.

The engine may solve queries in a parallel manner since egothor 1.x. We are not aware of any other Open Source package that supports the same feature in its kernel.

The kernel is fully 64bit. There are not any other technical limits.

Egothor Search Engine Home Page
http://www.egothor.org/

Download Egothor Search Engine Home Page
http://www.egothor.org/download.shtml

August 2, 2007

regain

regain is a search engine similar to web search engines like Google, with the difference that you don't search the web, but your own files and documents. Using regain you can search through large portions of data (several gigabytes!) in split seconds!

This is possible by using a search index. regain crawles through your files or webpages, extracts all text and puts it in a smart search index. All this happens in the background. So if you want to search something you get the results immediately.

There are two versions of regain: The desktop search and the server search. The desktop search is to be used on a normal desktop computer and it offers you a fast search for documents or intranet webpages. The server search you can install on web servers. It provides searching functionality for a website or for intranet fileservers.

regain is written in Java and thus applicable on all Java compatible platforms (amongst others Windows, Linux, Mac OS, Solaris). The server search works with Java Server Pages (JSPs) and a tag library, the desktop search comes with its own small webserver.

regain is released under the open source license LGPL (Lesser General Public License). I.e. regain may be used for free without any temporal limit.

regain project home page
http://regain.sourceforge.net/

Download regain Search Engine
http://regain.sourceforge.net/download.php

regain documentation
http://regain.sourceforge.net/docs.php

About Search Engines

This page contains an archive of all entries posted to Open Source Java Community and OpenJDK Resources. Latest News, podcasts, Updates, downloads. in the Search Engines category. They are listed from oldest to newest.

PDF Libraries is the previous category.

Source Code Formatter is the next category.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.35