I'm writing a maintenance script in PHP. The script keeps about 100,000 key-value pairs in an associative array and compare a bunch of other data with that array.
The keys are 12- or 16-byte hexademical strings.
The values are arrays containing 1-10 strings. Each string is around 50 bytes.
I'm populating my array by reading a text file line-by-line with fgets()
in a loop.
All is fine until I hit about 44,000 keys, but after that the memory usage suddenly skyrockets.
No matter how much I increase the memory limit (and I'm relunctant to give it any more than 256MB at the moment), the memory usage increases exponentially until it hits the new limit. This is weird!
The following is a table with the number of keys on the left and the memory usage on the right.
10000 6668460
20000 12697828
30000 18917768
40000 25045068
41000 25658148
42000 26760304
43000 27350368
44000 27920400
45000 33438520
46000 77800344
47000 114203960
48000 161989660
49000 168419992
50000 206265572
Fatal error: Allowed memory size of 268435456 bytes exhausted
As you can see, the memory usage is consistent at 620-660 bytes per key until I reach 44,000 keys. After that, memory usage suddenly begins to increase, until it reaches over 4KB per key at 50,000 keys. This is very strange because the size of my keys and values are always similar.
It seems like I'm hitting some sort of internal limit on the number of keys I can have in an array, beyond which it all becomes very inefficient.
If I can maintain a memory usage of 620-600 bytes per key (which sounds reasonable given the usual overhead of using an array), my entire dataset should fit in approx. 64MB of memory and therefore easily accessible when I need to reference it later in the same script. This was the assumption when I first started writing the script. It's a maintenance script run from the CLI, so it's OK to use 64MB of memory from time to time.
But if the memory usage keeps increasing like the above, I'll have no choice but to offload the key-value dataset to an external daemon like Memcached, Redis, or an SQL database, and the network overhead will greatly slow down the maintenance script.
What I tried so far:
SplFixedArray
because my keys are not numeric (and can't be converted to numbers within the integer range) and the array needs to be mutable.The test server is a virtual machine running Ubuntu 12.04 LTS, 32-bit, with PHP 5.3.10-1ubuntu3.9.
Any ideas?
Thanks!
It's the garbage collection I think. At some point you're using operations that allocate temporary space which then cannot be freed during the hard work so php will eat up your memory for no real purpose.
When I faced this problem I finally got to the conclusion that garbage is being thrown away at specific events only, like exiting a function. So what you should try is, make the job in several smaller steps and let your variables "relax" between them - make a function that only does a thousand elements at a time, then call it again to continue where it left off.
Hope this helps.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments