Archive of UserLand's first discussion group, started October 5, 1998.

Re: GDBs and memory

Author:Lixian B. Chiu
Posted:2/23/1999; 9:09:22 AM
Topic:GDBs and memory
Msg #:3153 (In response to 3122)
Prev/Next:3152 / 3154

I modified the search engine to index Chinese pages, so far it works fine, but I am having a major roadblock -- I am working on a project to index 25 official books of the Chinese history, and they are huge. The total words will be somewhere around 800,000,000 Chinese characters. And since Chinese charc. is 2-byte, that means they have about 1,600,000,000 ascii charc. The problem that I have is that I constantly ran out of memory when I tried to index the site. I have set the memory allocation to around 50Mb (I use a Mac), but I have never gotten it to index more than 2,000,000 Chinese characters. I checked my codes very carefully, and I didn't find anything that would caused the memory leak. So, I decided to run a test on the built-in "English" indexer on a large site (I wrote a script to translate some of the Chinese pages into some meaningless English pages). Then I found that even the built-in indexer has the same problem when indexing an extreme large site.

Any help?


There are responses to this message:


This page was archived on 6/13/2001; 4:48:04 PM.

© Copyright 1998-2001 UserLand Software, Inc.