Software Transactional Memory in GCC
p

Concurrency is Important

Concurrency is Hard

Transactional Memory

  • Goal: How do we do concurrency better?
    • Traditional locks
    • Shared-nothing asynchronous message passing? (cough erlang cough)

Red Black Trees & Concurrency

inside square

Transactional Memory

100
101
102
103
104
105
106
/* Move item from one list to another */
int move(list *from, list *to) {
    __transaction_atomic {
        node *n = pop(from);
        push(to, n);
    }
}

Transactional Memory

  • First-class language support for concurrency

  • Added in GCC 4.7

  • Either everything in a transaction happens atomically, or nothing does

  • Data oriented, not process oriented (like OpenMP)

Locks vs Transactional Memory

Stoplights vs Roundabouts

Transactional Memory

  • Optimistic progress

  • Fine grained conflict detection

  • Simple programming model

Why do YOU care

  • Early days

    lots of opportunity to contribute

  • Growing interest

    IBM and Intel adding hardware support

  • Better Programming Model

    first class language support

  • Another tool in the concurrency toolbox

Disadvantages

  • The cost of a collision can be significant

Disadvantages

  • The cost of a collision can be significant
  • Overhead of transactional monitoring

  • Difficult to debug

  • Can still forget to wrap something in a transaction

Better Programming Model

  • No deadlocks

    Nested Transactions are fine!

  • return in the middle of a function

  • similar to Resource Acquisition Is Initialization (RAII)

  • Compiler/Runtime determine conflict regions

STM in Java Languages

CCSTM

Scala STM

  • Lots of STM support in JVM languages

  • Easier to encapsulate object changes than memory changes

  • Used in Clojure/Scala/et. al.

GCC Implementation

New Language Keywords

GCC STM Wiki

  • __transaction_atomic { … }

  • __transaction_relaxed { … }

  • __transaction_cancel

  • attribute((transaction_safe))

  • attribute((transaction_pure))

GCC Coode

Bulk of the changes in:

gcc/trans-mem.c

gcc/trans-mem.h

Simplest Transaction

100
101
102
103
104
void testfunc(int *x, int *y) {
    __transaction_atomic {
        *x += *y;
    }
}

Compile with “-fgnu-tm”

GCC -fdump-tree-tmlower

testfunc (int * x, int * y)
{
  int D.1894;
  int D.1893;
  int D.1892;

  __transaction_atomic  // SUBCODE=[ GTMA_HAVE_LOAD GTMA_HAVE_STORE ]
  try
    {
      D.1892 = *x;
      D.1893 = *y;
      D.1894 = D.1892 + D.1893;
      *x = D.1894;
    }
  finally
    {
      __builtin__ITM_commitTransaction ();
    }
  return;
}

Assembly

    .globl _testfunc
_testfunc:
    pushq   %rbp
    movq    %rsp, %rbp
    pushq   %rbx
    subq    $24, %rsp
    movq    %rdi, -24(%rbp)
    movq    %rsi, -32(%rbp)
    movl    $43, %edi
    movl    $0, %eax
    call    __ITM_beginTransaction
    andl    $2, %eax
    testl   %eax, %eax
    je  L2
L1: movq    -24(%rbp), %rax
    movl    (%rax), %edx
    movq    -32(%rbp), %rax
    movl    (%rax), %eax
    addl    %eax, %edx
    movq    -24(%rbp), %rax
    movl    %edx, (%rax)
    call    __ITM_commitTransaction
    jmp L1
    movq    -24(%rbp), %rax
    movq    %rax, %rdi
    call    __ITM_RU4
    movl    %eax, %ebx
    movq    -32(%rbp), %rax
    movq    %rax, %rdi
    call    __ITM_RU4
    addl    %ebx, %eax
    movl    %eax, %edx
    movq    -24(%rbp), %rax
    movl    %edx, %esi
    movq    %rax, %rdi
    call    __ITM_WU4
    call    __ITM_commitTransaction
L2: addq    $24, %rsp
    popq    %rbx
    popq    %rbp
    ret

What are the calls to _ITM*?

  • Calls to libitm.so
  • Tracks every read and write happening in a transaction
  • Detects conflicting transactions and rolls back
  • ** libitm.so can be replaced! **

  • ABI is (somewhat) standardized

  • Intel ABI Spec

libitm

Interface:

libitm/libitm.h

Provides transaction safe versions of:

  • malloc
  • free
  • memset
  • memcmp

Requires combined compiler/library change to support more!

Unsafe Calling

100
101
102
103
104
int testfunc(char *a, char *b) {
    __transaction_atomic {
        return strcmp(a,b);
    }
}

test2.c:3:9: error: unsafe function call ‘strcmp’ within atomic transaction

100
101
102
103
104
int testfunc(char *a, char *b) {
    __transaction_relaxed {
        return strcmp(a,b);
    }
}

Simplest Transaction Safe Function

100
101
102
103
__attribute__((transaction_safe)) 
void testfunc(int *x, int *y) {
    *x += *y;
}

Standard Assembly

    .globl _testfunc
_testfunc:
    pushq   %rbp
    movq    %rsp, %rbp
    movq    %rdi, -8(%rbp)
    movq    %rsi, -16(%rbp)
    movq    -8(%rbp), %rax
    movl    (%rax), %edx
    movq    -16(%rbp), %rax
    movl    (%rax), %eax
    addl    %eax, %edx
    movq    -8(%rbp), %rax
    movl    %edx, (%rax)
    popq    %rbp
    ret

AND Transaction Safe Assembly

    .globl __ZGTt8testfunc
__ZGTt8testfunc:
    pushq   %rbp
    movq    %rsp, %rbp
    pushq   %rbx
    subq    $24, %rsp
    movq    %rdi, -24(%rbp)
    movq    %rsi, -32(%rbp)
    movq    -24(%rbp), %rax
    movq    %rax, %rdi
    call    __ITM_RU4
    movl    %eax, %ebx
    movq    -32(%rbp), %rax
    movq    %rax, %rdi
    call    __ITM_RU4
    addl    %ebx, %eax
    movl    %eax, %edx
    movq    -24(%rbp), %rax
    movl    %edx, %esi
    movq    %rax, %rdi
    call    __ITM_WU4
    addq    $24, %rsp
    popq    %rbx
    popq    %rbp
    ret

GCC 4.7 Notes

"The support is experimental.

In particular, this also means that several parts of the implementation are not yet optimized.

If you observe performance that is lower than expected, you should not assume that transactional memory is inherently slow; instead, please just file a bug."

GCC Codebase - as a user

GCC Codebase - as a developer

Hardware Support

Hardware Support

  • Makes use of “snoopy cache”

  • The cache already detects conflicting cores reading/writing memory

  • Size of transaction is limited to cache size

  • Can detect NON-transactional conflicts within a transaction

Hardware TM issues

  • Syscalls

  • Interrupts

  • Debugger traps

  • Context Switches

  • Limited cache size

Intel Hardware Support

Transactional Synchronization Extensions (TSX) in Haswell (2013)

Handle cases where a conflict is not wrapped in a transaction

Two flavors:

  • Hardware Lock Elision (HLE)

  • Restricted Transactional Memory (RTM)

  • Haswell simulator is available

Intel Spec

Intel Hardware Lock Elision

Backward Compatible Extension

  • XACQUIRE prefix before a lock instruction

  • XRELEASE prefix before unlock instruction

  • Rolls back if conflict is detected

  • Started adding support in gcc 4.8 (-mrtm)

Intel Restricted Transactional Memory

  • XBEGIN

  • XEND

  • XABORT

  • XBEGIN includes pointer to fallback function

IBM Power8 TM Support

  • tbegin
  • tend
  • tabort
  • tsuspend
  • tresume
  • TDOOMED status bit

  • Transaction EXception And Summary Register (TEXASR)

  • Expose speculative vs committed state

Power.org

GCC Performance

GCC Performance

GCC Performance

GCC Performance

Conclusion

Transactional Memory

  • Its hot

  • It needs work