MIT 6.5830 simpleDB Lab4

Exercise 1

需要完成的是为事务加锁，涉及到修改BufferPool.java中的getPage()、unsafeReleasePage()和holdsLock()方法。实验手册建议定义一个LockManager类来维护事务和锁的状态。

实验手册为了实现事务的ACID，要求实现共享锁和排他锁，而实验代码已经提供了Permissions类，刚好对应只读操作和读写操作。

getPage代码：

    public Page getPage(TransactionId tid, PageId pid, Permissions perm)
            throws TransactionAbortedException, DbException {
        // TODO: some code goes here
        try {
            if (!lockManager.acquireLock(tid, pid, perm, 0)) {
                throw new TransactionAbortedException();
            }
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
            throw new TransactionAbortedException();
        }

        Page page = bufferPool.get(pid);
        if (page != null) {
            bufferPool.remove(pid);
            bufferPool.put(pid, page);
            return page;
        }
        if (bufferPool.size() >= numPages) {
            evictPage();
        }
        DbFile dbFile = Database.getCatalog().getDatabaseFile(pid.getTableId());
        page = dbFile.readPage(pid);
        bufferPool.put(pid, page);
        return page;
    }

在方法的开始增加了先获得锁的步骤。

unsafeReleasePage代码：

    public void unsafeReleasePage(TransactionId tid, PageId pid) {
        // TODO: some code goes here
        // not necessary for lab1|lab2
        lockManager.releaseLock(tid, pid);
    }

调用lockManager的方法即可。

holdsLock代码：

    public boolean holdsLock(TransactionId tid, PageId p) {
        // TODO: some code goes here
        // not necessary for lab1|lab2
        return lockManager.holdsLock(tid, p);
    }

调用lockManager的方法即可。

需要定义一个内部类实现LockManager，注意acquireLock和releaseLock方法需要用synchronized关键字修饰，保证线程安全。

public class BufferPool {
    // Other existing codes...
    private final LockManager lockManager = new LockManager();
    // Other existing codes...
    private class LockManager {
        private final ConcurrentHashMap<PageId, ConcurrentHashMap<TransactionId, PageLock>> lockTable;

        public LockManager() {
            lockTable = new ConcurrentHashMap<>();
        }

        public class PageLock {
            private final PageId pid;
            private final TransactionId tid;
            private final Permissions perm;

            public PageLock(PageId pid, TransactionId tid, Permissions perm) {
                this.pid = pid;
                this.tid = tid;
                this.perm = perm;
            }
        }

        public synchronized boolean acquireLock(TransactionId tid, PageId pid, Permissions perm, int retry)
                throws TransactionAbortedException, InterruptedException {
            if (retry == 3) {
                return false;
            }
            ConcurrentHashMap<TransactionId, PageLock> locks = lockTable.get(pid);
            if (locks == null) {
                locks = new ConcurrentHashMap<>();
                PageLock lock = new PageLock(pid, tid, perm);
                locks.put(tid, lock);
                lockTable.put(pid, locks);
                return true;
            }
            
            if (locks.get(tid) == null) {
                if (perm == Permissions.READ_WRITE) {
                    wait(100);
                    return acquireLock(tid, pid, perm, retry + 1);
                } else if (perm == Permissions.READ_ONLY) {
                    if (locks.size() > 1) {
                        PageLock lock = new PageLock(pid, tid, perm);
                        locks.put(tid, lock);
                        return true;
                    } else {
                        Collection<PageLock> values = locks.values();
                        for (PageLock lock : values) {
                            if (lock.perm == Permissions.READ_WRITE) {
                                wait(100);
                                return acquireLock(tid, pid, perm, retry + 1);
                            } else {
                                PageLock newLock = new PageLock(pid, tid, perm);
                                locks.put(tid, newLock);
                                return true;
                            }
                        }
                    }
                }
            } else {
                if (perm == Permissions.READ_ONLY) {
                    locks.remove(tid);
                    PageLock lock = new PageLock(pid, tid, perm);
                    locks.put(tid, lock);
                    return true;
                } else {
                    if (locks.get(tid).perm == Permissions.READ_WRITE) {
                        return true;
                    } else {
                        if (locks.size() > 1) {
                            wait(100);
                            return acquireLock(tid, pid, perm, retry + 1);
                        } else {
                            locks.remove(tid);
                            PageLock lock = new PageLock(pid, tid, perm);
                            locks.put(tid, lock);
                            return true;
                        }
                    }
                }
            }
            return false;
        }

        public synchronized void releaseLock(TransactionId tid, PageId pid) {
            if (holdsLock(tid, pid)) {
                ConcurrentHashMap<TransactionId, PageLock> locks = lockTable.get(pid);
                locks.remove(tid);
                if (locks.isEmpty()) {
                    lockTable.remove(pid);
                }
                this.notifyAll();
            }
        }

        public boolean holdsLock(TransactionId tid, PageId pid) {
            if (lockTable.get(pid) == null) {
                return false;
            }
            return lockTable.get(pid).get(tid) != null;
        }
    }
}

acquireLock的步骤是：

开始
  |
  v
检查重试次数(retry)是否等于3
  |
  |--> 是：返回false
  |
  v
获取锁表(lockTable)中对应的锁(locks)
  |
  |--> 如果locks为null，说明无锁
  |       |
  |       v
  |       创建新的锁，添加到锁表中，返回true
  |
  v
检查locks中是否存在tid的锁
  |
  |--> 如果不存在
  |       |
  |       |--> 如果perm为READ_WRITE
  |       |       |
  |       |       v
  |       |       等待100ms，递归调用acquireLock函数，增加重试次数
  |       |
  |       |--> 如果perm为READ_ONLY
  |               |
  |               |--> 如果locks的大小大于1，说明上锁的是共享锁
  |               |       |
  |               |       v
  |               |       创建新的锁，添加到锁表中，返回true
  |               |
  |               |--> 如果locks的大小等于1
  |                       |
  |                       |--> 如果存在的锁是排他锁
  |                       |       |
  |                       |       v
  |                       |       等待100ms，递归调用acquireLock函数，增加重试次数
  |                       |
  |                       |--> 如果存在的锁是共享锁
  |                               |
  |                               v
  |                               创建新的锁，添加到锁表中，返回true
  |
  |--> 如果存在
          |
          |--> 如果perm为READ_ONLY
          |       |
          |       v
          |       移除旧的锁，创建新的锁，添加到锁表中，返回true
          |
          |--> 如果perm为READ_WRITE
                  |
                  |--> 如果tid的锁的perm为READ_WRITE，保持即可
                  |       |
                  |       v
                  |       返回true
                  |
                  |--> 如果tid的锁的perm不为READ_WRITE
                          |
                          |--> 如果locks的大小大于1，则存在多个共享锁
                          |       |
                          |       v
                          |       等待100ms，递归调用acquireLock函数，增加重试次数
                          |
                          |--> 如果locks的大小等于1，直接升级为排他锁
                                  |
                                  v
                                  移除旧的锁，创建新的锁，添加到锁表中，返回true

单元测试和Exercise 2一并测试。

Exercise 2

需要实现的是释放锁的步骤。实验手册提示在HeapFile.iterator()、HeapFile.insertTuple()和HeapFile.deleteTuple()中使用的是BufferPool.getPage()即可，此外查找插入元组的空闲槽位时如果在当前页面上没有找到空闲槽位可以立即释放锁，不会影响事务的ACID。

iterator之前的实现就是正确的，其余两个方法代码：

    public List<Page> insertTuple(TransactionId tid, Tuple t)
            throws DbException, IOException, TransactionAbortedException {
        // TODO: some code goes here
        // not necessary for lab1
        ArrayList<Page> pages = new ArrayList<Page>();
        for (int i = 0; i < numPages(); i++) {
            HeapPageId pid = new HeapPageId(getId(), i);
            HeapPage page = (HeapPage) Database.getBufferPool().getPage(tid, pid, Permissions.READ_WRITE);
            if (page.getNumUnusedSlots() == 0) {
                Database.getBufferPool().unsafeReleasePage(tid, pid);
                continue;
            }
            page.insertTuple(t);
            pages.add(page);
            return pages;
        }

        BufferedOutputStream bw = new BufferedOutputStream(new FileOutputStream(file, true));
        byte[] emptyData = HeapPage.createEmptyPageData();
        bw.write(emptyData);
        bw.close();
        HeapPage page = (HeapPage) Database.getBufferPool().getPage(tid, new HeapPageId(getId(), numPages() - 1), Permissions.READ_WRITE);
        page.insertTuple(t);
        page.markDirty(true, tid);
        pages.add(page);
        return pages;
    }

    public List<Page> deleteTuple(TransactionId tid, Tuple t) throws DbException,
            TransactionAbortedException {
        // TODO: some code goes here
        // not necessary for lab1
        HeapPage page = (HeapPage) Database.getBufferPool().getPage(tid, t.getRecordId().getPageId(), Permissions.READ_WRITE);
        page.deleteTuple(t);
        page.markDirty(true, tid);
        return Collections.singletonList(page);
    }

增加了markDirty调用，并且insertTuple中如果getNumUnusedSlots为0时直接释放锁。

执行单元测试：

ant runtest -Dtest=LockingTest

结果应该是successful的。

Exercise 3

需要实现NO STEAL策略，修改evictPage()方法，使其绝不会驱逐脏页，如果缓冲区中页面全为脏页则抛出DbException。

evictPage方法代码：

    private synchronized void evictPage() throws DbException {
        // TODO: some code goes here
        // not necessary for lab1
        Set<PageId> pids = new HashSet<>(bufferPool.keySet());
        for (PageId pid : pids) {
            if (bufferPool.get(pid).isDirty() == null) {
                bufferPool.remove(pid);
                return;
            }
        }
        throw new DbException("All pages are dirty");
    }

Exercise 4

需要实现BufferPool.java中的transactionComplete()方法，一共两个版本。当参数为提交时，需要将事务关联的脏页写入磁盘，当参数为中止时，撤销脏页的修改。执行transactionComplete之后，脏页将不再为脏，锁也将释放。

transactionComplete方法代码：

    public void transactionComplete(TransactionId tid) {
        // TODO: some code goes here
        // not necessary for lab1|lab2
        transactionComplete(tid, true);
    }

    public void transactionComplete(TransactionId tid, boolean commit) {
        // TODO: some code goes here
        // not necessary for lab1|lab2
        if (commit) {
            try {
                flushPages(tid);
            } catch (IOException e) {
                e.printStackTrace();
            }
        } else {
            rollBack(tid);
        }
    }

    public synchronized void rollBack(TransactionId tid) {
        Set<PageId> pids = new HashSet<>(bufferPool.keySet());
        for (PageId pid : pids) {
            Page page = bufferPool.get(pid);
            if (tid.equals(page.isDirty())) {
                int tableId = pid.getTableId();
                DbFile table = Database.getCatalog().getDatabaseFile(tableId);
                Page readPage = table.readPage(pid);
                bufferPool.remove(pid, page);
                bufferPool.put(pid, readPage);
            }
        }
        lockManager.lockTable.keySet().removeIf(pid -> lockManager.holdsLock(tid, pid));
    }

    public synchronized void flushPages(TransactionId tid) throws IOException {
        // TODO: some code goes here
        // not necessary for lab1|lab2
        Set<PageId> pids = new HashSet<>(bufferPool.keySet());
        for (PageId pid : pids) {
            Page page = bufferPool.get(pid);
            if (tid.equals(page.isDirty())) {
                flushPage(pid);
            }
        }
        lockManager.lockTable.keySet().removeIf(pid -> lockManager.holdsLock(tid, pid));
    }

无参版本之间调用有参版本且参数为true，也就是提交事务。有参版本当参数为提交时调用flushPages()方法，将事务相关的脏页写入磁盘；参数为中止时，调用rollBack()方法，将事务相关的脏页删除并重新读入磁盘上的版本。两个参数都需要完成后释放事务相关的锁。

Exercise 5

死锁是多线程编程中的一种现象，指的是两个或多个线程因为互相持有和等待对方持有的资源而无法继续执行的情况。为了避免死锁，可以采取以下几种策略：

加锁顺序：如果多个线程需要加锁多个资源，应该保证它们按照相同的顺序加锁。例如，如果线程A先获取资源1，然后获取资源2，而线程B先获取资源2，然后获取资源1，那么当线程A持有资源1并等待资源2时，线程B持有资源2并等待资源1，就会发生死锁。
资源分配图：可以创建一个资源分配图来检测死锁。如果图中存在一个循环，那么就存在死锁。在分配资源时，可以检查资源分配图，确保不会出现循环。
超时等待：当一个线程尝试获取资源时，如果该资源已经被其他线程持有，则该线程可以等待一段时间，而不是无限期等待。如果等待超时，则可以放弃获取该资源，并尝试获取其他资源。
死锁检测和恢复：在某些情况下，可以通过检测死锁并采取措施来恢复。例如，可以选择一个或多个线程，强制释放它们持有的资源，然后重新尝试获取资源。
避免资源竞争：尽量减少资源的使用，或者避免多个线程同时访问相同资源。可以通过增加资源数量、优化资源分配策略等方式来减少资源竞争。
使用乐观锁或无锁算法：在某些情况下，可以使用乐观锁或无锁算法来避免死锁。乐观锁假设冲突很少发生，只在提交时检查冲突；无锁算法则通过原子操作或其他机制来避免线程间的竞争。
使用高级并发工具：现代编程语言和框架提供了高级的并发工具，如Java中的 ReentrantLock 和 Semaphore，这些工具可以更有效地管理和避免死锁。

前面的代码中已经使用了重试3次的策略来获取锁，相当于一个简单的超时等待策略。

执行单元测试和系统测试：

ant runtest -Dtest=TransactionTest
ant runsystest -Dtest=AbortEvictionTest