CSE 613 Final Report
I’ve been asked, as a final assignment for CSE613, Parallel Programming, to convert existing code for the Barnes-Hut n-body simulation to an implementation prepared with a “holie” shared global virtual address space. This implementation employs the use of a tagging of the shared global virtual memory space at regular intervals to aid in experimenting with a new parallelization technique. The implementation seeks to introduce logic to all shared global memory access operations that skip over and preserve this recognizable underlying tag pattern. The hope is that this regular pattern will allow the shared global space of the application to be reconstructed and unified faster across cores running the code in parallel in different spaces.
From a developer perspective, memory should behave as expected. The programmer may allocate or delete blocks of shared global memory that can be arbitrarily read or modified. The computation will behave in the expected way, computing the proper result of the code. From a system perspective, however, behind-the-scenes work will be done. In a concept similar to the mapping of virtual memory to physical memory, an additional indirection step will be introduced to all accesses of the virtual memory that effectively sidestep around a repeating “tag-id pair” in the shared global virtual memory. At regular intervals, a special bit-pattern tag will exist in the space, paired with an index that reveals the position of the subsequent chunk of virtual memory within the overall shared global memory space. The tag and index pair will be spaced appropriately such that at least one is guaranteed to exist in each cache block. The tag will help identify it, and the id will allow cores caching the data to “piece together” the global space that has become distributed across the cores. The implementation must guarantee that no memory query or modification will alter the tag or the id. The implementation must further ensure that the result of the computation is exactly as if the tag-id pairs are not present.
Note that this virtual memory tagging scheme is general. It may be applied to anything using the notion of random access virtual memory. For this reason, I’m seeking a general solution to avoiding modification of the tags. I want an easily applied process that implements this for the Barnes Hut code, but also one that might be applied easily to any C or C++ program that can benefit from the possibilities of this cache-aware, pattern-based tagging scheme. My aim is to override the concepts of “memory access” with special, centralized routines that operate knowing only addresses and block-sizes of interest. These routines will intercept virtual memory access and perform the needed skipping behind the scenes. Existing C or C++ code need only introduce minimal #include directives at the start of the include tree, automatically modifying the behavior of memory operations in the global shared address space to account for the new patterned tagging scheme.
This “concept” based coding generality can be achieved in modern C++ through use of its “template” system. Templates overcome the tendency of code routines to be tied with particular “types.” They allow the programmer to disconnect their classes and routines from the data types and bind them instead toward a higher level of concept-based operation. For example, several object types have the concept of multiplication. Integer and floating point types can be multiplied, but also can vectors of several dimensions, matrices, colors, and the like. One might even define the notion of “multiplication” for an arbitrary object represented in C++ through use of the “operator overloading” feature. Given this, it’s often useful to write code that works only knowing concepts such as “multiplication.” The function might apply to anything that can be “multiplied,” regardless of what the object being multiplied is. For this reason, a programmer should have the ability to write such a function only once without creating flavors for each particular type unless an explicit deviance from the normal multiplication behavior is necessary (often referred to as a specialization). The details of the multiplication are left for the object to resolve. Templates allow for this by abstracting the type, turning it into what is essentially a type variable (“typename/class”). Templates are “instantiated” for particular types, and the compiler will generate efficient code for particular type instantiations provided that the classes, types, or constants instantiating the template can successfully resolve the concepts used by the template. A template multiplying a “Color” object by a double won’t make sense if Color has no meaning for multiplication, but if Color implements the “multiplication” concept through operator overloading, it can be used without issue in the template.
In C and C++, “memory access” is itself a concept. Assuming that the primary “dereference” operation (->), and “address of” operation (&) can be overloaded in C++ in a general way, I attempt to provide a solution that replaces the objects in shared global memory space with new objects behaving with a modified concept of memory access. These special global shared objects in Barnes Hut have overloaded memory access operations that are aware of and attempt to avoid the tags within the virtual memory.
To get a working prototype, I’ve strayed from the initial ideals. I originally wanted a solution involving a simple change of the #include statements, adding only a new header responsible for overriding memory access operations. I wanted only to add this header at the top of the include tree of Barnes Hut. Unfortunately, I haven’t been able to create a header that completely overrides all subsequent shared global addressing operations.
Special class and structure types are defined in Barnes Hut at points before the true application logic that are used within the global shared memory of the application. To the best of my knowledge and research, I can not find a way to override the addressing operations of programmer-defined types before first encountering their declaration within the source. This means that I must introduce code changes after the declaration of programmer types that direct the application to override memory access operations for the specific type.
I’ve settled on a solution that uses a template class to exploit the common concept of “memory address” operations across all types. Memory operations such as address-of, pointer arithmatic, and address dereference are common in both programmer-defined and built-in types. I’ve created a template class, which I call “AddressManager,” that deals only with the concepts of memory-related operations for any general class or structure type that it is instantiated with.
This solution requires that the declaration of any type instances in shared-global-memory, whether static data or globally accessible heap-space data, built-in or custom types, be changed to declarations of the AddressManager class object, instantiated with the type being replaced. This introduces only a change in declaration of global-shared data structures within the original source code. The aim of the AddressManager is to change the behavior of all “addressing” concepts in a general way, making all memory-related operations aware of the patterned tagging scheme through use of operator overloading in the AddressManager class only. This class is declared in a single AddressManager.cpp/h file pair that implements a thin wrapper around any class, structure, or basic type, intercepting the concepts of all address operations. The template class can wrap not only the shared global object types, but also can wrap arbitrarily deep pointers to such object types.
If a particular object type or pointer to an object type lives in shared global memory, changing the declaration of this object in the code with an instance of an AddressManager template instantiated with the type instead will overload the address-of, dereference, increment/decrement and modification operations to account for the underlying patterned address space.
The presence of a significant amount of pointer arithmetic in Barnes Hut left me worried that I might miss subtle operations performed on addresses in relatively indirect ways. For example, parts of Barnes Hut code look as follows:
for (nn = Subp(n); nn < Subp(n) + NSUB; nn++)
{
if (*nn != NULL)
{
walksub(*nn, dsq / 4.0, ProcessId);
}
}
Here, ‘nn’ is a double-indirect pointer to a programmer-defined ‘node’ type, and the address is being incremented, compared against an address with a constant added to it, and dereferenced. If the incremental addition on the address isn’t accounting for the tagged pattern properly, if the dereferenced data passed to walksub doesn’t maintain knowledge that it is part of the shared global memory, the application will cause issue. Luckily, the template solution exposed situations such as this at compile-time, and once I started the conversion process changing globally shared pointers into AddressManager-wrapped objects, compiler warnings directed me toward more subtle areas that needed declaration conversion as well. These included sub-declarations in the tree node data structures and the like.
Since Barnes Hut employed a parallelism technique that relied on pre-allocated shared global memory, I’ve maintained this assumption in order to isolate the region of global shared memory that must be tagged. I allocate the shared block of data that AddressManager objects deal with in a single allocation phase before launching the parallel execution of the application.
[ Finish discussing batch alloc ? to prepare the tags. ]
[ Discuss notes about work to overcome size issue of the node that exceeds 16 words]
[ Discuss the modulo arithmatic worked out for skipping the tagged areas of memory, factoring in the possibility of issues with datatypes larger than the virtual page size – tag size. Discuss associated compromises ]
[ Discuss status of compilation ~ compiling on Windows but with mismatched results from Unix. Current state: pulled back over to a unix system ( a mac ) ) pending re- compilation and compute result comparison. ]
on August 9th, 2011 at 6:15 am
Great One…
I must say, its worth it! My link, http://burnett.sier.no/,thanks haha…