1.多線程索引,共享同一個(gè)IndexWriter對(duì)象
這種方式效率很慢,主要原因是因?yàn)椋?br />
- public void addDocument(Document doc, Analyzer analyzer) throws IOException {
- SegmentInfo newSegmentInfo = buildSingleDocSegment(doc, analyzer);
- synchronized (this) {
- ramSegmentInfos.addElement(newSegmentInfo);//這句很占用效率
- maybeFlushRamSegments();
- }
- }
2 多線程索引, 先寫(xiě)到RAMDirectory,再一次性寫(xiě)到FSDirectory
功能:首先向RAMDirectory里寫(xiě),當(dāng)達(dá)到1000個(gè)Document後,再向FSDirectory里寫(xiě)。
當(dāng)多線程執(zhí)行時(shí),會(huì)大量報(bào)java.lang.NullPointerException
自己寫(xiě)的多線程索引的類(lèi)為(IndexWriterServer,該對(duì)象只在Server啟動(dòng)時(shí)初始化一次):
- public class IndexWriterServer{
- private static IndexWriter indexWriter = null;
- //private String indexDir ;//索引目錄;
- private static CJKAnalyzer analyzer = null;
- private static RAMDirectory ramDir = new RAMDirectory();
- private static IndexWriter ramWriter = null;
- private static int diskFactor = 0;//內(nèi)存中現(xiàn)在有多少Document
- private static long ramToDistTime = 0;//內(nèi)存向硬盤(pán)寫(xiě)需要多少時(shí)間
- private int initValue = 1000;//內(nèi)存中達(dá)到多少Document,才向硬盤(pán)寫(xiě)
- private static IndexItem []indexItems = null;
- public IndexWriterServer(String indexDir){
- initIndexWriter(indexDir);
- }
- public void initIndexWriter(String indexDir){
- boolean create = false;//是否創(chuàng)建新的
- analyzer = new CJKAnalyzer();
- Directory directory = this.getDirectory(indexDir);
- //判斷是否為索引目錄
- if(!IndexReader.indexExists(indexDir)){
- create = true;
- }
- indexWriter = getIndexWriter(directory,create);
- try{
- ramWriter = new IndexWriter(ramDir, analyzer, true);
- }catch(Exception e){
- logger.info(e);
- }
- indexItems = new IndexItem[initValue+2];
- }
- /**
- * 生成單個(gè)Item索引
- */
- public boolean generatorItemIndex(IndexItem item, Current __current) throws DatabaseError, RuntimeError{
- boolean isSuccess = true;//是否索引成功
- try{
- Document doc = getItemDocument(item);
- ramWriter.addDocument(doc);//關(guān)鍵代碼,錯(cuò)誤就是從這里報(bào)出來(lái)的
- indexItems[diskFactor] = item;//為數(shù)據(jù)挖掘使用
- diskFactor ++;
- if((diskFactor % initValue) == 0){
- ramToDisk(ramDir,ramWriter,indexWriter);
- //ramWriter = new IndexWriter(ramDir, analyzer, true);
- diskFactor = 0;
- //數(shù)據(jù)挖掘
- isSuccess = MiningData();
- }
- doc = null;
- logger.info("generator index item link:" + item.itemLink +" success");
- }catch(Exception e){
- logger.info(e);
- e.printStackTrace();
- logger.info("generator index item link:" + item.itemLink +" faiture");
- isSuccess = false;
- }finally{
- item = null;
- }
- return isSuccess;
- }
- public void ramToDisk(RAMDirectory ramDir, IndexWriter ramWriter,IndexWriter writer){
- try{
- ramWriter.close();//關(guān)鍵代碼,把fileMap賦值為null了
- ramWriter = new IndexWriter(ramDir, analyzer, true);//重新構(gòu)建一個(gè)ramWriter對(duì)象。因?yàn)樗膄ileMap為null了,但是好像并沒(méi)有太大作用
- Directory ramDirArray[] = new Directory[1];
- ramDirArray[0] = ramDir;
- mergeDirs(writer, ramDirArray);
- }catch(Exception e){
- logger.info(e);
- }
- }
- /**
- * 將內(nèi)存里的索引信息寫(xiě)到硬盤(pán)里
- * @param writer
- * @param ramDirArray
- */
- public void mergeDirs(IndexWriter writer,Directory[] ramDirArray){
- try {
- writer.addIndexes(ramDirArray);
- //optimize();
- } catch (IOException e) {
- logger.info(e);
- }
- }
- }
主要原因大概是因?yàn)椋涸谡{(diào)用ramWriter.close();時(shí),Lucene2.1里RAMDirectory 的close()方法
- public final void close() {
- fileMap = null;
- }
把fileMap 給置null了,當(dāng)多線程執(zhí)行ramWriter.addDocument(doc);時(shí),最終執(zhí)行RAMDirectory 的方法:
- public IndexOutput createOutput(String name) {
- RAMFile file = new RAMFile(this);
- synchronized (this) {
- RAMFile existing = (RAMFile)fileMap.get(name);//fileMap為null,所以報(bào):NullPointerException,
- if (existing!=null) {
- sizeInBytes -= existing.sizeInBytes;
- existing.directory = null;
- }
- fileMap.put(name, file);
- }
- return new RAMOutputStream(file);
- }
提示:在網(wǎng)上搜索了一下,好像這個(gè)是lucene的一個(gè)bug(http://www.opensubscriber.com/message/java-user@lucene.apache.org/6227647.html),但是好像并沒(méi)有給出解決方案。
3.多線程索引,每個(gè)線程一個(gè)IndexWriter對(duì)象,每個(gè)IndexWriter 綁定一個(gè)FSDirectory對(duì)象。每個(gè)FSDirectory綁定一個(gè)本地的磁盤(pán)目錄(唯一的)。單獨(dú)開(kāi)辟一個(gè)線程出來(lái)監(jiān)控這些索引線程(監(jiān)控線程),也就是說(shuō)負(fù)責(zé)索引的線程索引完了以后,給這個(gè)監(jiān)控線程的queue里發(fā)送一個(gè)對(duì)象:queue.add(directory);,這個(gè)監(jiān)控現(xiàn)成的queue對(duì)象是個(gè)全局的。當(dāng)這個(gè)queue的size() > 20 時(shí),監(jiān)控線程 把這20個(gè)索引目錄合并(merge):indexWriter.addIndexes(dirs);//合并索引,合并到真正的索引目錄里。,合并完了以后,然后刪除掉這些已經(jīng)合并了的目錄。
但是這樣也有幾個(gè)bug:
a. 合并線程的速度 小于 索引線程的速度。導(dǎo)致 目錄越來(lái)越多
b.經(jīng)常會(huì)報(bào)一個(gè)類(lèi)似這樣的錯(cuò)誤:
2007-06-08 10:49:18 INFO [Thread-2] (IndexWriter.java:1070) - java.io.FileNotFoundException: /home/spider/luceneserver/merge/item_d28686afe01f365c5669e1f19a2492c8/_1.cfs (No such file or directory)
4.單線程索引,調(diào)幾個(gè)參數(shù)後,效率也非常快(索引一條信息大概在6-30 ms之間)。感覺(jué)一般的需求單線程就夠用了。這些參數(shù)如下:
private int mergeFactor = 100;//磁盤(pán)里達(dá)到多少後會(huì)自動(dòng)合并
private int maxMergeDocs = 1000;//內(nèi)存中達(dá)到多少會(huì)向磁盤(pán)寫(xiě)入
private int minMergeDocs = 1000;//lucene2.0已經(jīng)取消了
private int maxFieldLength = 2000;//索引的最大文章長(zhǎng)度
private int maxBufferedDocs = 10000;//這個(gè)參數(shù)不能要,要不然不會(huì)自動(dòng)合并了
得出的結(jié)論是:Lucene的多線程索引會(huì)有些問(wèn)題,如果沒(méi)有特殊需求,單線程的效率幾乎就能滿足需求.
如果單線程的速度滿足不了你的需求,你可以多開(kāi)幾個(gè)應(yīng)用。每個(gè)應(yīng)用都綁定一個(gè)FSDirectory,然后通過(guò)search時(shí)通過(guò)RMI去這些索引目錄進(jìn)行搜索。
RMI Server端,關(guān)鍵性代碼:
- private void initRMI(){
- //第一安全配置
- if (System.getSecurityManager() == null) {
- System.setSecurityManager( new RMISecurityManager() );
- }
- //注冊(cè)
- startRMIRegistry(serverUrl);
- SearcherWork searcherWork = new SearcherWork("//" + serverUrl + "/" + bindName, directory);
- searcherWork.run();
- }
- public class SearcherWork {
- // Logger
- private static Logger logger = Logger.getLogger(SearcherWork.class);
- private String serverUrl =null;
- private Directory directory =null;
- public SearcherWork(){
- }
- public SearcherWork(String serverUrl, Directory directory){
- this.serverUrl = serverUrl;
- this.directory = directory;
- }
- public void run(){
- try{
- Searchable searcher = new IndexSearcher(directory);
- SearchService service = new SearchService(searcher);
- Naming.rebind(serverUrl, service);
- logger.info("RMI Server bind " + serverUrl + " success");
- }catch(Exception e){
- logger.info(e);
- System.out.println(e);
- }
- }
- }
- public class SearchService extends RemoteSearchable implements Searchable {
- public SearchService (Searchable local) throws RemoteException {
- super(local);
- }
- }
客戶(hù)端關(guān)鍵性代碼:
- RemoteLuceneConnector rlc= new RemoteLuceneConnector();
- RemoteSearchable[] rs= rlc.getRemoteSearchers();
- MultiSearcher multi = new MultiSearcher(rs);
- Hits hits = multi.search(new TermQuery(new Term("content","中國(guó)")));
安徽新華電腦學(xué)校專(zhuān)業(yè)職業(yè)規(guī)劃師為你提供更多幫助【在線咨詢(xún)】