tracking tech
Paraphrasing the first paragraph of the paper, NFS is a
The design and implementation details of Virtual Filesystem (VFS) is the most important part of this paper.
NFS design is broken into three major parts -
NFS protocol runs on top of RPC. Use of RPC helps in simplifying the protocol definition, organization and service implementation. RPCs are synchronous, thus blocking. This nature of RPC helps create a system analogous to a local filesystem.
RPC is transport protocol independent. NFS uses UDP (User Datagram Protocol) and IP (Internet Protocol) for transport.
RPC is built on top of XDR (External Data Representation). XDR data definition language with a C-like syntax.
The protocol is stateless. Every call to the server contains all the information required to complete the call. Server does not maintain any client state. Statelessness makes any crash recovery very simple. If server crashes, there is no recovery is done. If a client crashes, no recovery is needed on either side. If state was maintained, then recovery would have been an issue with both server and client design.
Signature | Returns | Description |
---|---|---|
null() | () | pings server, RTT time |
lookup(dirfh, name) | (fh, attr) | a new file handle in the directory, with name |
create(dirfh, name, attr) | (newfh, attr) | a new file, its handle and attributes |
remove(dirfh, name) | (status) | removes file from directory |
getattr(fh) | (attr) | returns file attribute, similar to stat call |
setattr(fh, attr) | (attr) | sets file attributes |
read(fh, offset, count) | (attr, data) | returns upto count bytes of data from offset |
write(fh, offset, cout, data) | (attr) | writes count bytes of dat at offset from begining of file. Returns file attributes |
rename(dirfh, name, tofh, toname) | (status) | rename a file in dirfh to a file in_tofh_ |
link(dirfh, name, tofh, toname) | (status) | creates a a link of a file in tofh from dirfh |
symlink(dirfh, name, string) | (status) | creates a symlink in dirfh with value string |
readlink(fh) | (string) | returns the string associated with symlink |
mkdir(dirfh, name, attr) | (fh, newattr) | creates a new directory in dirfh |
rmdir(dirfh, name) | (status) | removes empty directory with name in dirfh |
readdir(dirfh, cookie, count) | (entries) | returns upto count bytes of directory entries from dirfh. Entry consists file name, id, an opaque pointer to next directory entry called cookie. readdir call with a zero value for cookie returns entries starting with first entry |
statfs(fh) | (fsstats) | returns filesystem information |
Server does not keep any client state, hence all transactions has to be persisted in disk. For write calls data block, all modified indirect blocks and inode block all has to be flushed to storage before returning to client.
To achieve stateless sever, inode implementation is updated. The new inode has a generation number and filesystem id. The inode number, generation number and filesystem id together make up the file handle for a file.
Every time a inode is deleted, the generation number is incremented. This way if the inode is deleted, but client still holds it then server can identify that it’s an old inode.
NFS does not use server:path format for file lookup as that is not compatible with other Unix filesystems. Instead the client bind the filesystem at mount time. The client can not access the filesystem until the mount is complete.
VFS is an abstraction layer on top of native filesystems. Clents can adapt to the VFS APIs and transparently interact with multiple filesystems.
VFS is implemented with a structure that contains operations wchich can be applied to whole filesystem. vnode is a structure that contains all operations for a node, a node is a file or directory.
Each mounted filesystem has an associated VFS structure in kernel. Each active node has a vnode associated with it. Each vnode contains 2 VFS pointers, one to parent VFS another to mounted-on VFS. This way client can navigate to any part of the filesystem without any knowledge of underlying filesystem.
File System Operation
Name | Description |
---|---|
mount() | system call to mount filesystems |
mount_root() | mount filesystem as root |
VFS Operation
Name | Return | Description |
---|---|---|
unmount(vfs) | () | unmount filesystems |
root(vfs) | (vnode) | returns vnode filesystem root |
statfs(vfs) | (fsstatbuf) | returns filesystem statistics |
sync(vfs) | () | flush delayed writes |
Vnode Operation
Name | Return | Description |
---|---|---|
open(vnode, flags) | () | marks a file open |
close(vnode, flags) | () | marks a file close |
rdwr(vnode, uio, rwflag, flags) | () | read or write a file |
ioctl(vnode, cmd, data, rwflag) | () | do I/O control operation |
select(vnode, rwflag) | () | do select |
getattr(vnode) | (attr) | returns file attributes |
setattr(vnode, attr) | () | set file attributes |
access(vnode, mode) | () | check access permission |
lookup(dvnode, name) | (vnode) | lookup file name in directory |
create(dvnode, name, attr, excl, mode) | (vnode) | create a file |
remove(dvnode, name) | () | remove file name from directory |
link(dvnode, todvnode, toname) | () | link to a file |
rename(dvnode, name, todvnode, toname) | () | rename a file |
mkdir(dvnode, name, attr) | (dvnode) | create a directory |
rmdir(dvnode, name) | () | remove a directory |
readdir(dvnode) | (entries) | read directory entries |
symlink(dvnode, name, attr, to_name) | () | create symbolic link |
readlink(vp) | (data) | read value of symlink |
fsync(vnode) | () | flush dirty blocks of a file |
inactive(vnode) | () | mark inactive, do cleanup |
bmap(vnode, blk) | (devnode, mappedblk) | map block number |
strategy(bp) | () | read and write filesystem block |
bread(vnode, blockno) | (buf) | read a block |
brelse(vnode, buf) | () | release a block buffer |