Merge branch ‘futex-fixes’ (futex fixes from Thomas Gleixner) [Linux 3.15]

This Linux kernel change "Merge branch ‘futex-fixes’ (futex fixes from Thomas Gleixner)" is included in the Linux 3.15 release. This change is authored by Linus Torvalds <torvalds [at] linux-foundation.org> on Thu Jun 5 12:31:32 2014 -0700. The commit for this change in Linux stable tree is 1c5aefb (patch). Other info about this change: Merge: 54539cd 54a2178

Merge branch 'futex-fixes' (futex fixes from Thomas Gleixner)

Merge futex fixes from Thomas Gleixner:
 "So with more awake and less futex wreckaged brain, I went through my
  list of points again and came up with the following 4 patches.

  1) Prevent pi requeueing on the same futex

     I kept Kees check for uaddr1 == uaddr2 as a early check for private
     futexes and added a key comparison to both futex_requeue and
     futex_wait_requeue_pi.

     Sebastian, sorry for the confusion yesterday night.  I really
     misunderstood your question.

     You are right the check is pointless for shared futexes where the
     same physical address is mapped to two different virtual addresses.

  2) Sanity check atomic acquisiton in futex_lock_pi_atomic

     That's basically what Darren suggested.

     I just simplified it to use futex_top_waiter() to find kernel
     internal state.  If state is found return -EINVAL and do not bother
     to fix up the user space variable.  It's corrupted already.

  3) Ensure state consistency in futex_unlock_pi

     The code is silly versus the owner died bit.  There is no point to
     preserve it on unlock when the user space thread owns the futex.

     What's worse is that it does not update the user space value when
     the owner died bit is set.  So the kernel itself creates observable
     inconsistency.

     Another "optimization" is to retry an atomic unlock.  That's
     pointless as in a sane environment user space would not call into
     that code if it could have unlocked it atomically.  So we always
     check whether there is kernel state around and only if there is
     none, we do the unlock by setting the user space value to 0.

  4) Sanitize lookup_pi_state

     lookup_pi_state is ambigous about TID == 0 in the user space value.

     This can be a valid state even if there is kernel state on this
     uaddr, but we miss a few corner case checks.

     I tried to come up with a smaller solution hacking the checks into
     the current cruft, but it turned out to be ugly as hell and I got
     more confused than I was before.  So I rewrote the sanity checks
     along the state documentation with awful lots of commentry"

* emailed patches from Thomas Gleixner <tglx@linutronix.de>:
  futex: Make lookup_pi_state more robust
  futex: Always cleanup owner tid in unlock_pi
  futex: Validate atomic acquisition in futex_lock_pi_atomic()
  futex-prevent-requeue-pi-on-same-futex.patch futex: Forbid uaddr == uaddr2 in futex_requeue(..., requeue_pi=1)

There is no are 0 lines of Linux source code added/deleted in this change. Code changes to Linux kernel are as follows.

 kernel/futex.c | 134 +++++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 106 insertions(+), 28 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index e1cb1ba..de938d2 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -743,10 +743,58 @@ void exit_pi_state_list(struct task_struct *curr)
    raw_spin_unlock_irq(&curr->pi_lock);
 }

+/*
+ * We need to check the following states:
+ *
+ *      Waiter | pi_state | pi->owner | uTID      | uODIED | ?
+ *
+ * [1]  NULL   | ---      | ---       | 0         | 0/1    | Valid
+ * [2]  NULL   | ---      | ---       | >0        | 0/1    | Valid
+ *
+ * [3]  Found  | NULL     | --        | Any       | 0/1    | Invalid
+ *
+ * [4]  Found  | Found    | NULL      | 0         | 1      | Valid
+ * [5]  Found  | Found    | NULL      | >0        | 1      | Invalid
+ *
+ * [6]  Found  | Found    | task      | 0         | 1      | Valid
+ *
+ * [7]  Found  | Found    | NULL      | Any       | 0      | Invalid
+ *
+ * [8]  Found  | Found    | task      | ==taskTID | 0/1    | Valid
+ * [9]  Found  | Found    | task      | 0         | 0      | Invalid
+ * [10] Found  | Found    | task      | !=taskTID | 0/1    | Invalid
+ *
+ * [1] Indicates that the kernel can acquire the futex atomically. We
+ * came came here due to a stale FUTEX_WAITERS/FUTEX_OWNER_DIED bit.
+ *
+ * [2] Valid, if TID does not belong to a kernel thread. If no matching
+ *      thread is found then it indicates that the owner TID has died.
+ *
+ * [3] Invalid. The waiter is queued on a non PI futex
+ *
+ * [4] Valid state after exit_robust_list(), which sets the user space
+ * value to FUTEX_WAITERS | FUTEX_OWNER_DIED.
+ *
+ * [5] The user space value got manipulated between exit_robust_list()
+ * and exit_pi_state_list()
+ *
+ * [6] Valid state after exit_pi_state_list() which sets the new owner in
+ * the pi_state but cannot access the user space value.
+ *
+ * [7] pi_state->owner can only be NULL when the OWNER_DIED bit is set.
+ *
+ * [8] Owner and user space value match
+ *
+ * [9] There is no transient state which sets the user space TID to 0
+ * except exit_robust_list(), but this is indicated by the
+ * FUTEX_OWNER_DIED bit. See [4]
+ *
+ * [10] There is no transient state which leaves owner and user space
+ * TID out of sync.
+ */
 static int
 lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
-       union futex_key *key, struct futex_pi_state **ps,
-       struct task_struct *task)
+       union futex_key *key, struct futex_pi_state **ps)
 {
    struct futex_pi_state *pi_state = NULL;
    struct futex_q *this, *next;
@@ -756,12 +804,13 @@ void exit_pi_state_list(struct task_struct *curr)
    plist_for_each_entry_safe(this, next, &hb->chain, list) {
        if (match_futex(&this->key, key)) {
            /*
-            * Another waiter already exists - bump up
-            * the refcount and return its pi_state:
+            * Sanity check the waiter before increasing
+            * the refcount and attaching to it.
             */
            pi_state = this->pi_state;
            /*
-            * Userspace might have messed up non-PI and PI futexes
+            * Userspace might have messed up non-PI and
+            * PI futexes [3]
             */
            if (unlikely(!pi_state))
                return -EINVAL;
@@ -769,44 +818,70 @@ void exit_pi_state_list(struct task_struct *curr)
            WARN_ON(!atomic_read(&pi_state->refcount));

            /*
-            * When pi_state->owner is NULL then the owner died
-            * and another waiter is on the fly. pi_state->owner
-            * is fixed up by the task which acquires
-            * pi_state->rt_mutex.
-            *
-            * We do not check for pid == 0 which can happen when
-            * the owner died and robust_list_exit() cleared the
-            * TID.
+            * Handle the owner died case:
             */
-           if (pid && pi_state->owner) {
+           if (uval & FUTEX_OWNER_DIED) {
                /*
-                * Bail out if user space manipulated the
-                * futex value.
+                * exit_pi_state_list sets owner to NULL and
+                * wakes the topmost waiter. The task which
+                * acquires the pi_state->rt_mutex will fixup
+                * owner.
                 */
-               if (pid != task_pid_vnr(pi_state->owner))
+               if (!pi_state->owner) {
+                   /*
+                    * No pi state owner, but the user
+                    * space TID is not 0. Inconsistent
+                    * state. [5]
+                    */
+                   if (pid)
+                       return -EINVAL;
+                   /*
+                    * Take a ref on the state and
+                    * return. [4]
+                    */
+                   goto out_state;
+               }
+
+               /*
+                * If TID is 0, then either the dying owner
+                * has not yet executed exit_pi_state_list()
+                * or some waiter acquired the rtmutex in the
+                * pi state, but did not yet fixup the TID in
+                * user space.
+                *
+                * Take a ref on the state and return. [6]
+                */
+               if (!pid)
+                   goto out_state;
+           } else {
+               /*
+                * If the owner died bit is not set,
+                * then the pi_state must have an
+                * owner. [7]
+                */
+               if (!pi_state->owner)
                    return -EINVAL;
            }

            /*
-            * Protect against a corrupted uval. If uval
-            * is 0x80000000 then pid is 0 and the waiter
-            * bit is set. So the deadlock check in the
-            * calling code has failed and we did not fall
-            * into the check above due to !pid.
+            * Bail out if user space manipulated the
+            * futex value. If pi state exists then the
+            * owner TID must be the same as the user
+            * space TID. [9/10]
             */
-           if (task && pi_state->owner == task)
-               return -EDEADLK;
+           if (pid != task_pid_vnr(pi_state->owner))
+               return -EINVAL;

+       out_state:
            atomic_inc(&pi_state->refcount);
            *ps = pi_state;
-
            return 0;
        }
    }

    /*
     * We are the first waiter - try to look up the real owner and attach
-    * the new pi_state to it, but bail out when TID = 0
+    * the new pi_state to it, but bail out when TID = 0 [1]
     */
    if (!pid)
        return -ESRCH;
@@ -839,6 +914,9 @@ void exit_pi_state_list(struct task_struct *curr)
        return ret;
    }

+   /*
+    * No existing pi state. First waiter. [2]
+    */
    pi_state = alloc_pi_state();

    /*
@@ -959,7 +1037,7 @@ static int futex_lock_pi_atomic(u32 __user *uaddr, struct futex_hash_bucket *hb,
     * We dont have the lock. Look up the PI state (or create it if
     * we are the first waiter):
     */
-   ret = lookup_pi_state(uval, hb, key, ps, task);
+   ret = lookup_pi_state(uval, hb, key, ps);

    if (unlikely(ret)) {
        switch (ret) {
@@ -1565,7 +1643,7 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags,
             * rereading and handing potential crap to
             * lookup_pi_state.
             */
-           ret = lookup_pi_state(ret, hb2, &key2, &pi_state, NULL);
+           ret = lookup_pi_state(ret, hb2, &key2, &pi_state);
        }

        switch (ret) {

Leave a Reply

Your email address will not be published. Required fields are marked *