SimonHF's Blog

Just another site

libsxe, shared-memory, and parallel state-driven algorithms February 27, 2011

I recently came across the following paper: “Memory Models: A Case for Rethinking Parallel Languages and Hardware” by Sarita V. Adve and Hans-J. Boehm

The paper starts off:

The era of parallel computing for the masses is here, but writing correct parallel programs remains far more difficult than writing sequential programs. Aside from a few domains, most parallel programs are written using a shared-memory approach. The memory model, which specifies the meaning of shared variables, is at the heart of this programming model. Unfortunately, it has involved a tradeoff between programmability and performance, and has arguably been one of the most challenging and contentious areas in both hardware architecture and programming language specification. Recent broad  community-scale efforts have finally led to a convergence in this debate, with popular languages such as Java and C++ and most hardware vendors publishing compatible memory model specifications. Although this convergence is a dramatic improvement, it has exposed fundamental shortcomings in current popular languages and systems that prevent achieving the vision of structured and safe parallel programming. This paper discusses the path to the above convergence, the hard lessons learned, and their implications. …

And then introduces the idea of “disciplined shared-memory models”:

Moving forward, we believe a critical research agenda to enable “parallelism for the masses” is to develop and promote disciplined shared-memory models that:
• are simple enough to be easily teachable to undergraduates; i.e., minimally provide sequential consistency to programs that obey the required discipline;
• enable the enforcement of the discipline; i.e., violations of the discipline should not have undefined or horrendously complex semantics, but should be caught and returned back to the programmer as illegal;
• are general-purpose enough to express important parallel algorithms and patterns; and
• enable high and scalable performance.

This is interesting because libsxe has a disciplined shared-memory model which goes a long way towards fulfilling the criteria above in the form of the sxe pool library. So what is a sxe pool and how does it offer us a disciplined shared-memory model?

The sxe pool library was invented for different reasons than offering a disciplined shared-memory model. A shared-memory option was added later as a pool construction option. In short, sxe pools offer a way to create C arrays of structs with the following generic benefits:
• The size of the array is persisted
• Each element of the array gets its own state which is persisted outside the element struct in a linked list
• Each element of the array gets its own timestamp which is persisted outside the element struct in a linked list
• Each element is accessed using regular & concise C code, e.g. myarray[myindex].mystructmember

The sxe pool library caller can manipulate the state & timestamp element properties using the following API:

sxe_pool_get_number_in_state(void * array, unsigned state)
sxe_pool_index_to_state(void * array, unsigned id)
sxe_pool_set_indexed_element_state(void * array, unsigned id, unsigned old_state, unsigned new_state)
sxe_pool_set_oldest_element_state(void * array, unsigned old_state, unsigned new_state)
sxe_pool_get_oldest_element_index(void * array, unsigned state)
sxe_pool_get_oldest_element_time(void * array, unsigned state)
sxe_pool_get_element_time_by_index(void * array, unsigned element)
sxe_pool_touch_indexed_element(void * array, unsigned id)

Converting the sxe pool library API to support shared-memory was relatively simple. The sxe_pool_new() function got an option to share the pool memory. The API functions to change the pool element state use atomic assembler instructions if the pool was constructed as a shared-memory pool. It’s also interesting to note that sxe pools can be shared between processes as well as between threads in the same process. This is because the sxe pool library internal implementation avoids absolute pointers; which is also something that I encourage from libsxe developers and C developers in general.

This API is “general-purpose enough to express important parallel algorithms and patterns” and most interestingly is the same API whether the algorithm is threaded or not. It’s also “simple enough to be easily teachable to undergraduates” or even junior developers as we have found out at Sophos. The atomic sxe_pool_set_[indexed|oldest]_element_state() API functions “enable the enforcement of the discipline” by requiring both the old state and the new state of the array element; if the developer supplies the wrong old state then sxe pool will assert. Because the sxe pool library manages the element states itself then an assert is very unlikely when using a single pool. However, more complicated algorithms often make use of chains of pools in order to implement multiplexing and/or combining of parallel results, etc. In these cases, it is common to keep references to pool array element indexes and/or pool array element states in the caller supplied pool element structs. Finally, by implementing algorithms using the sxe pool API then it is possible to “enable high and scalable performance” using a minimum of simple to understand C source code. The developer is forced into thinking about the algorithm as a state model which often simplifies the hardest problems. And the generic part of the implementation complexity — e.g. locking, shared memory, keeping state, double linked lists, timeout handling, & memory allocation — is all handled by the sxe pool library and backed by automated tests with 100% library code coverage. The resulting performance is excellent as can be seen by the figures published in earlier blog entries; tens of thousands of network requests per second per core.

As you can see, the sxe pool is an incredibly powerful and code saving and code simplifying generic data structure. It’s a sort of Swiss Army knife for parallel, event driven algorithms. In a future article I’ll show some of the implementation patterns.


%d bloggers like this: