forked from skaphan/stmmap
-
Notifications
You must be signed in to change notification settings - Fork 0
STM (Software Transactional Memory) implementation in C, based on memory-mapping.
License
wyklq/stmmap
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
/* README Copyright 2009 Shel Kaphan This file is part of stmmap. stmmap is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. stmmap is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with stmmap. If not, see <http://www.gnu.org/licenses/>. */ INSTALLATION ============ The primary product here is a library you can use to get Software Transactional Memory in your C and C++ programs. Actually it is two libraries, the difference being in the optional memory allocator that is included. One of them, libstm.a, supports transactions in only one thread per process. Processes can share memory, with transactions, between them. The other one, libstm-th.a, supports transactions in many threads per process (and still allows multiple processes too). If you want to use regular pointers in the objects in shared memory, you should stick to libstm.a, because you can control the location where the one and only shared memory region will live. If you use libstm-th.a, the shared memory region will be mapped at a different address for each thread doing transactions. So you will either need to avoid storing pointers in shared objects, or you will need to use a smart pointer class like offset_ptr, as supplied by the Boost C++ libraries (www.boost.org). Libstm-th.a makes use of offset_ptr. Because of offset_ptr, libstm-th.a needs to use some C++, so if you don't want any C++ in your program, you must stick to libstm.a. Note, though, that the C++ doesn't escape stmmap's internals, so though it requires a bit of C++ at compile-time, no APIs are affected and no runtime libraries are required. There is a Makefile that builds the libraries and two tests: stmtest1 and stmtest2. these are, respectively,a single-threaded and a multi-threading-capable version of the same thing. They run the test case in example.c, which is just a test, not part of the core package. Example.c also shows how to set up and call the stm package. To test stmtest1, you need to run multiple copies of it at the same time, in different processes. Here is a manifest of the files and what they do: Essentials: stm.[ch] core STM functionality atomic-compat.[ch] atomic operations needed by stm.c. Currently uses atomic-builtins. Support: (you can use stm.c without any of this if you want) stmalloc.[ch] memory allocator suitable for stmmap's shared segments. segalloc.[ch] C implementation of low level memory allocator used by stmalloc.c segalloc.{cpp,h} C++ implementation of low level memory allocator used by stmalloc.c AVLtree.[ch] C AVL tree implementation that supports memory allocator AVLtree.{cpp,hpp} C++ AVL tree implementation that supports memory allocator Tests & Configuration: Makefile autoconfigure.c The Makefile uses this To use stmmap-th.a and the C++ versions of the memory allocator, you will need the Boost C++ library available at www.boost.org. The only thing from there that is used is offset_ptr, and it is self-contained as a header file. No libraries are required. Offset_ptr provides position-independent "smart pointers" that make it possible to have the same object mapped into the address space in multiple locations and still have its pointers work correctly. This is used in the multi-threading version of stmmap. It is only used in the memory allocator. The header files your program will need are stm.h and stmalloc.h. Stmalloc.h is only needed if you are using the included memory allocator. You may need to port atomic-compat.[ch] to your OS environment, as different OS versions have different atomic primitives. (Please send me the changes!) Right now it uses the atomic builtins as specified by Intel and implemented in gcc. Or you can use the set that Apple provides. Example.c gives a very simple example of how to use this package. In fact, this is what I used to debug it. THEORY OF OPERATION =================== The purpose of this package is to provide "Software Transactional Memory" functionality in C and C++, without requiring much extra complexity for the programmer. It is built on top of file-mapping -- in particular Unix's mmap() system call. One or more shared memory areas are opened by each thread or process that uses this package. When no transaction is in progress, each shared area is potentially visible to any processes that access that segment of mapped memory. When a transaction is in progress, the shared area is access-protected, and the first access by the process performing the transaction to any page is trapped. Other processes are not affected. A signal handler keeps track of the accessed pages. Upon first access in a transaction, a private copy of each page is made so that any modifications are not visible to any other process. If, on first access to a page within a transaction, it can be determined that another process has modified that page since the transaction started, the transaction is aborted. When a transaction is committed, there is an attempt to establish ownership of all modified pages and then re-map those private pages as shared. This package uses "optimistic locking" in that the modified pages are only locked at the end of the transaction, during the commit process. At commit time, there is another check to see if any other processes have modified the pages that have been accessed during the transaction, by means of a transaction ID associated with each page. If they have, the transaction is aborted and all changes discarded. Though this system works using mapped files, no I/O needs to occur normally -- the writes just affect the mapped file's memory buffers. This mechanism provides read consistency in that if a transaction A succeeds, it is guaranteed that no other transaction B will have modified the pages accessed by transaction A. However, there is no guarantee the transaction will succeed. Transactions that are automatically aborted are also automatically retried, with a backoff mechanism, until they succeed. The access control on shared segments between (not during) transactions can be specified for each shared segment. If access is permitted, reads and writes to shared segments not during transactions are unchecked and there is no guarantee of consistency (so this is generally not recommended!). Transactions are composable -- that is, they can be nested. This is so that complex transactions can be built up out of simpler ones. Processes can have a number of shared memory areas, limited by the number of file descriptors and virtual memory available. A recent (December '09) addition to stmmap is that it now works at both the thread and process level. If you use it to communicate between single-threaded processes, you have the option of setting things up so that the shared segment is mapped at the same virtual address in any process that accesses it. This means you can use ordinary pointers within the shared segment and it will be correct in all processes that map that segment at the same virtual address. When you use this package in a multi-threaded mode, each thread that maps a shared segment will map it at a different virtual address. The process of which the threads are part therefore has the same shared segment open multiple times, by multiple threads. This implies that you cannot use pointers in the ordinary way within a shared segment. Although the STM core mechanism is the same in single- and multi-threaded mode, the memory allocator that is provided is different in each case. In the multi-threaded version, it uses the Boost C++ Library's "offset_ptr", which provides pointer-like functionality but will work correctly no matter where the object of which they are a part is mapped in virtual space. If you use the multi-threaded version of stmmap, you may find this, or something like it, useful. However, in either single- or multh-threaded usage, these shared data structures should not refer outside the shared area to objects that only exist in the private parts of a process's address space! You can mix and match, and have multiple threads in multiple processes all sharing the same memory, safely, by using stmmap. Another limitation of this package is that it operates on the OS page level. That is, only one process can write to a page during a transaction, and any other thread's transaction that accesses the same page will have to be aborted and retried. This may not be as bad as it sounds if you code your transactions to be relatively short. The arbitration between processes seeking to own a page during a transaction is on a first-come first-served basis. Since pages are always locked in order of virtual address, there can be no deadlock. As soon as a transaction tries to obtain a page that has been modified by another process's transaction, it aborts. Another restriction is that this package will only work on systems where the contents of shared, mapped files are immediately visible to all processes that have them mapped shared. This could possibly fail on some systems without sufficient cache coherency, for example. This package does *not* depend on private mappings being kept up to date with the current contents of a file. This occurs in some OS versions but not others. It does depend on writes into private pages not being visible to other mappings, and it uses "copy-on-write" semantics of private mapping to ensure that private pagesare private and will not be arbitrarily overwritten with data from another process or thread. Warning: since transactions will be retried until they succeed, any variables outside the explicitly shared memory segment(s) that are referenced within a transaction must be handled with care. In particular, you should not modify anything (outside the shared memory area) that you access earlier in the same transaction. If the transaction is retried, the reference will pick up the value set in a previous try. You can set and then use a variable in the same transaction, but you can't read a variable that was initialized prior to a transaction, then set it to a new value, and expect that to work right if retries are necessary. It is helpful to think of the code in a transaction as being the body of a loop, that you don't know how many times is going to be executed. Only information in the shared segments is managed transactionally. ISSUES ====== Is Private Mapping Private? --------------------------- stmmap depends heavily on mmap(). Some features of mmap() are implemented differently on different systems. Of particular concern is the behavior of private mapping, which you get with the MAP_PRIVATE flag. stmmap uses this when a transaction accesses pages in memory. Some systems treat MAP_PRIVATE as a true private mapping, whether reading or writing. Most, on the other hand, treat the mapping as if it were shared until there is a write, at which time they make a private copy. Here, for example, is a quote from http://docs.hp.com/en/5992-3373/ch10s03.html: "In case of mmap(2), if MAP_PRIVATE mapping is created for a file for which MAP_SHARED exists, a separate copy of the page is created for MAP_PRIVATE only when it first writes to the page. As long as MAP_PRIVATE reads, it shares the page with MAP_SHARED mapping. That is, updates made by shared mapping will be visible to private mapping until private mapping writes. This change makes HP-UX mmap(2) compliant with industry standard, thus helping application portability." Also, from Understanding the Linux Kernel, 3rd Edition, by Daniel P. Bovet abd Marco Cesati http://my.safaribooksonline.com/0596005652: "[...] private mapping is more efficient than shared mapping. But each write operation on a privately mapped page will cause it to stop mapping the page in the file. Thus, a write does not change the file on disk, nor is the change visible to any other processes that access the same file. However, pages of a private memory mapping that have not been modified by the process are affected by file updates performed by other processes." However, experimentally, Mac OSX does not behave this way. Private mappings are actually private on Mac OSX even when only reading a privately mapped file. The Makefile autoconfigures itself to compile the source files the right way depending on the specific behavior of the system it is compiling on. The relevant compiler flag is -DPRIVATE_MAPPING_IS_PRIVATE, which must be defined for systems that behave like Mac OSX does, but may be undefined for others (Linux, FreeBSD, etc.) Debugging --------- stmmap makes liberal use of the SIGBUS signal. This can make it difficult to debug programs which use stmmap unless you can get your debugger to pass through the SIGBUS signals without getting all confused about it. gdb, in particular, requires some coaxing. There is a bit of code in example.c that was needed to debug while using stmmap on Mac OS X. It is likely to be needed on any Mach system. Also, when using gdb on any system, you should type this line before your code calls on anything in stmmap: handle SIGBUS nostop noprint pass PROS AND CONS ============= Here are some pros and cons of this approach: Pros ---- * Very easy to use * Avoids accidental memory sharing if used in single-threaded process mode. * No extra code executed on memory reads and writes, except the first access to each page during a transaction. * Enables STM in C-like languages Cons ---- * Can be a little tricky to do the right thing with non-transactionally controlled data in the body of a transaction -- until you "get it." * Coarse grain approach -- pages supported by OS -- leads to more retried transactions because of more contention. * System call overhead in mmap() is non-trivial. * Requires language such as C/C++ in which memory and file-mapped pages are directly accessible. * Debuggers may have a hard time dealing with a program that depends on SIGBUS to work at all! HOPES AND DREAMS ================ Right now mmap() allows you to map a given region of memory to a region in a file. It would be useful if there were a call that did the complementary operation -- taking the contents of memory as the source and associating that to a region of a file (basically what pwrite() does...). Alternatively, simply the ability to re-map a region (a set of pages) of virtual memory to some other virtual address would be very useful. Either way, this could avoid some memory-to-memory copying in the stmmap implementation. Shared memory mapping that is backed only by swap space and not by a file could also be a good thing, but there needs to be some way to name it. (Mac OS has this). Someone should fix the standard behavior of mmap() so that there is a way to get either copy-on-write or private mapping, and not to confuse the two as appears to be the present situation. ACKNOWLEDGEMENTS ================ Thanks to Bryan Woods who convinced me a multi-threaded version of stmmap could work and showed me offset_ptr, and who also came up with a way to make it possible to use gdb to debug programs that use stmmap. Shel Kaphan, Oct. 17, 2009
About
STM (Software Transactional Memory) implementation in C, based on memory-mapping.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published