The San Antonio Web Development Group Message Board › Question about in-memory data structure, shared memory functions

Question about in-memory data structure, shared memory functions

A former member
Post #: 2
Hi all,

I keep wanting to come to the meetings but I never get a chance to with working full time, a 4-year old, and trying to get a Masters at the same time. I hope you will not hold that against me and try to help me with the issue below.

I am working on a project that does a lot of word processing. Some of the processing requires looking up certain values for each word in a bunch of different documents and performing some calculations. The lookup data has about 650000 distinct words. Since there are a lot of lookups in each execution (likely 10000+), I am trying to avoid quering a database for the values. Instead, I tried using an associative array with the data.

Loading the array from file takes about 25 seconds, so I tried using shared memory functions (shm_attach, shm_set_value, shm_get_value, etc), to ensure the array is only constructed once for all users of the system, and then shared among them. However, I am finding that it still takes 15-20 seconds to get the array with shm_get_value, even though it is already in memory. I think it is because the value is kept serialized in the memory block (by the way, the block is about 80Mb), and has to be unserialized when loaded up in the script. I am including below the code I wrote for this. It is part of a class used staticly for some of the document processing.

Do any of you happen to have any better idea for sharing this large in-memory data structure with all users of the system? Or do you see any errors in my code that would be the cause of the delay? I appreciate your assitance.

Thanks,
David

protected static $_idfMemKey = array('/tmp/doc.txt','i', 1);
protected static $_idfDataFile = '/config/word_data.txt';
protected static $_idfMemSize = 80000000;
protected static $_shmId = null;
protected static $_idfData = null;

protected static function _loadIdf(){
if(is_array(self::$_idfData) && !empty(self::$_idfData)) return true;

//resource key that uniquely identifies the memory block resouce
if(!@file_exists(APP_DIR . self::$_idfMemKey[0])){
$writeFile = @fopen(APP_DIR . self::$_idfMemKey[0], 'w') or die("Can't open temp file for writing");
fclose($writeFile);
}
$shm_key = ftok(APP_DIR . self::$_idfMemKey[0], self::$_idfMemKey[1]);

//open the memory block with the resource key retrieved
self::$_shmId = shm_attach($shm_key);

self::$_idfData = @shm_get_var(self::$_shmId, self::$_idfMemKey[2]);

if(is_array(self::$_idfData) && !empty(self::$_idfData)) return true;

//data not in memory. allocate more memory
shm_remove(self::$_shmId);
self::closeShm();
$shm_key = ftok(APP_DIR . self::$_idfMemKey[0], self::$_idfMemKey[1]);
self::$_shmId = shm_attach($shm_key, self::$_idfMemSize);

$readFile = @fopen(APP_DIR . self::$_idfDataFile, 'r') or die("Can't open IDF file for reading");

if ($readFile) {
$idfTerms = array();
while (!feof($readFile)) {
$buffer = fgets($readFile, 4096);
if($buffer){
$offset = strpos($buffer, ' ');
$term = substr($buffer, 0, $offset);
$idf = floatval(substr($buffer, $offset + 1, strlen($buffer)));
$idfTerms[$term] = $idf;
}
}
fclose($readFile);
}

if (!shm_put_var(self::$_shmId, self::$_idfMemKey[2], $idfTerms)) {
echo 'Failed to store variable "$idfTerms".<br/>';
}

self::$_idfData = @shm_get_var(self::$_shmId, self::$_idfMemKey[2]);

if(is_array(self::$_idfData) && !empty(self::$_idfData)){
return true;
} else {
return false;
}
}

public static function closeShm(){
if(self::$_shmId !== null) shm_detach(self::$_shmId);
}
Powered by mvnForum

Our Sponsors

People in this
Meetup are also in:

Log in

Not registered with us yet?

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy