[Operating System] {ud923} P4L2: Distributed File Systems
- Nelson, Michael N., et. al.
"Caching in the Sprite Network File System".
ACM Transactions on Computer Systems, Vol. 6, No. 1, February 1988, Pages 134-154.
Visual Metaphor
VFS: virtual file system
Distributed File Systems
multiple machines involved in the delivery of the file system service together form a distributed file system.
client => local cache
server => data storage
1 client + 1 server => a mini distributed file system
DFS Models
in this lesson, the foucus is "1 client + 1 server on different machines"
Google/Facebook... use "both" (7th line)
Remote File Service: Extremes
Remote File Service A Compromise
Stateless vs. Stateful File Server
state inforamtion examples:
which clients access which file,
how many different clients are serviced
Caching State in a DFS
File Sharing Semantics on a DFS
Transactional guarantees
=> file system will need to export some interfaces/APIs
=> so that the clients can specify what is the collection of files or the collection of operations that need to be treated like a certain single transaction?
=> and then the file system can make some guarantees that all those changes are tomically committed, atomically made visible into the file system.
File vs Directory Service
Replication and Partitioning
these two techniques can be combined to have a solution where the files are partitioned across different groups or in different volumes.
and each of these groups is then replicated potentially with different degree of replication.
For instance, you can have partitions of read-only files versus files that are also written to, and you can replicate the read-only files to a greater degree.
Or you can consider smaller partitions where there are files that are more frequently accessed, versus larger partitions that consist of more files but less frequently access files
Then you can consider using different degrees of replication for the partition that has more frequently accessed files.
So that overall each machine has approximately the same number of expected client requests.
Total files formula:
- files_stored_per_machine * number_of_machines
Percentage lost formula:
- (files_lost_per_single_failure / total_files) * 100
Networking File System (NFS) Design
NFS Versions
NFS v4 is is stateful, allows it by design to support operations like client cacheing and file logging
NFS allows files to be modified => NOT immutable
Distributed system => no guarantee that an update for a file will immmediately be visible => NOT Unix
for both session and periodic, perhaps there are elements of the sharing semantics that NFS supports that are session like or periodic like
and whether it will behave like with session or periodic semantics, it will really depend on how NFS is configured.
That leaves that by default, NFS is really neither. => not pure session-based or periodic-based
Sprite Distributed File System
- Nelson, Michael N., et. al.
"Caching in the Sprite Network File System".
ACM Transactions on Computer Systems, Vol. 6, No. 1, February 1988, Pages 134-154.
Sprite DFS Access Pattern Analysis
based on these observations, they made first the following decision.
a write back on close, which is what apperas in session sematics, that's not really necessary.
we dont really have two sharing situations and most of the data will get deleted anyways.
so forcing the data to be written back to the server when the file is closed, doesnt seem like it will be useful.
the decisions are not really friendly to concurrency, but they observe that file sharing is very rare => that's okak. no need to optimize for the concurrent situations.
Sprite DFS From Analysis to Design
File Access Operations in Sprite