Well I'm back after Christmas, I've been trying to focus on family rather than work but have had some time to work on my new project. I'm writing a new profiling tool, or rather rewriting an old tool from scratch with the original developers permission.
I'm taking things slowly, the first phase is to load the original trace files into a new more efficient memory model. I'm focusing on speed and memory efficiency so I have innita+lly at least abandoned an object model for the profiling events, method entry exit etc., and gone for a straight byte array. I am paying for the cost of constantly encoding and decoding ints and longs to and from bytes but modern processors and JITs seem to gobble this stuff up with ease.
One major problem is how to allocate the array, in one big chunk? it multiple smaller arrays? If the latter how can they be easily stitched together?
I have been experimenting with smaller arrays, in fact a two dimensional byte array where the second dimension is allocated "on demand" and both dimensions are referenced
with a single pointer, it turns out to be quite simple and the costs are not too great (~25%). It gives me much more flexibility for a reasonable overhead.
Essentially you take a single int pointer and AND it with a far and near mask yielding the indices's for the first and second dimension;
For example:
final static int FAR_MASK = 0x7f800000; // The first 7 means we don't go negative
final static int NEAR_MASK = 0x007fffff; // Effectively the inverse of the above
.....
public int readTreeInt(int offset) {
int value = 0;
value |= (tree[(offset & FAR_MASK) >>> 23][offset++ & NEAR_MASK] & 0xff) << 24;
value |= (tree[(offset & FAR_MASK) >>> 23][offset++ & NEAR_MASK] & 0xff) << 16;
value |= (tree[(offset & FAR_MASK) >>> 23][offset++ & NEAR_MASK] & 0xff) << 8;
value |= tree[(offset & FAR_MASK) >>> 23][offset & NEAR_MASK] & 0xff;
return value;
}
On each new event read (it could be 5 or 10 depending on potential event size) I check with another mask if I'm getting close to the end (<32K) of the current "near" array, if so I just pre-allocate the next array segment.
In this example the masks yield a 256x8M or the maximum 2G space seamlessly accessed by a single pointer. Reminiscent of the old intel 8086 memory model I'm sure you'll agree.