Courses/Computer Science/CPSC 457.F2013/Lecture Notes/VFS

= An Overview of The Linux VFS Layer =

The virtual file system interface sits right below Linux's implementation of the file I/O API defined by the file-related system calls. In this session, we will consider how the VFS abstracts out and supplies common operations.

Key Idea: This API simultaneously supports the conventional semantics and syntax of the Unix file--related system calls and provides a place for the kernel to delegate to the specific file system responsible for managing the file handle referenced by the syscall API. Hence, you have two APIs sitting one on top of the other (syscall-->VFS), enabling both (1) stability of user-level programs, (2) stability of the syscall definitions and core functionality and (3) specialization for specific data and file system types.


 * major data structures
 * support for delegating operations to specific file system implementations
 * support for canonicalization and path lookup

Major Kernel Data Structures Supporting File Systems

struct super_block

struct file_operations

struct inode_operations

struct super_operations

struct inode

Major Kernel Data Structures Supporting Files

struct inode (The generic in-memory inode data structure helps unify specific file systems with particular files, hence I list it under both categories)

struct file

struct path

struct dentry

Major Functions of the VFS

...

Example: The Ext4 inode

ext4 inode (observe how this represents the "on disk" inode). Observe the indexing block array

A Journey From Open to Read to Close

 * The open(2) syscall entry point: http://lxr.cpsc.ucalgary.ca/#linux+v2.6.32/fs/open.c#L1053

Observe how it creates a file descriptor (a small integer) based on the file name / path and then associates that with a "struct file" kernel data structure, so that the task_struct's "files" member has an idea of what files the process currently has open.


 * struct file: http://lxr.cpsc.ucalgary.ca/#linux+v2.6.32/include/linux/fs.h#L908
 * do_filep_open: http://lxr.cpsc.ucalgary.ca/#linux+v2.6.32/fs/namei.c#L1669
 * task_struct's files member: http://lxr.cpsc.ucalgary.ca/#linux+v2.6.32/include/linux/sched.h#L1372
 * which is of type "struct files_struct" http://lxr.cpsc.ucalgary.ca/#linux+v2.6.32/include/linux/fdtable.h#L43
 * notice the array of "struct file" objects: http://lxr.cpsc.ucalgary.ca/#linux+v2.6.32/include/linux/fdtable.h#L57
 * limited by NR_OPEN_DEFAULT (which is at least 32, but is typically many more) (see ulimit -a)

[locasto@csl ~]$ ulimit -a core file size         (blocks, -c) 0 data seg size          (kbytes, -d) unlimited scheduling priority            (-e) 0 file size              (blocks, -f) unlimited pending signals                (-i) 127443 max locked memory      (kbytes, -l) 64 max memory size        (kbytes, -m) unlimited open files                     (-n) 1024 pipe size           (512 bytes, -p) 8 POSIX message queues    (bytes, -q) 819200 real-time priority             (-r) 0 stack size             (kbytes, -s) 10240 cpu time              (seconds, -t) unlimited max user processes             (-u) 1024 virtual memory         (kbytes, -v) unlimited file locks                     (-x) unlimited [locasto@csl ~]$

The "fd_install" function associates the new "struct file" with the list of open files in task_struct's "files" list.


 * fd_install: http://lxr.cpsc.ucalgary.ca/#linux+v2.6.32/fs/open.c#L1018


 * The "read(2)" syscall entry point: http://lxr.cpsc.ucalgary.ca/#linux+v2.6.32/fs/read_write.c#L372


 * The "vfs_read" kernel function: http://lxr.cpsc.ucalgary.ca/#linux+v2.6.32/fs/read_write.c#L277
 * Invoking either the generic synchronous "read" functionality or the file system specific read:

291              if (file->f_op->read) 292                       ret = file->f_op->read(file, buf, count, pos); 293               else 294                       ret = do_sync_read(file, buf, count, pos);

Look at how this code uses the file descriptor's f_op's function pointer table's "read" member.


 * struct file_operations: http://lxr.cpsc.ucalgary.ca/#linux+v2.6.32/include/linux/fs.h#L1482


 * do_sync_read: http://lxr.cpsc.ucalgary.ca/#linux+v2.6.32/fs/read_write.c#L252

Observe how this code uses the aio_read file operation function.

To tie this all together, what if the "file->f_op->read" member is not NULL? In this case, the VFS delegates to the file system's specific read routine. In the case of ext2, where/how is this function pointer assigned?


 * ext2_file_operations: http://lxr.cpsc.ucalgary.ca/#linux+v2.6.32/fs/ext2/file.c#L41

We see that "read" is pointed to the kernel's "do_sync_read" routine, which we've seen above. And it's aio_read function points to:


 * generic_file_aio_read: http://lxr.cpsc.ucalgary.ca/#linux+v2.6.32/mm/filemap.c#L1271

which handles asynchronous I/O and delegates to the block layer and its built-in caching (i.e., keeping "disk" data in memory so access if fast).

This can call:


 * do_generic_file_read: http://lxr.cpsc.ucalgary.ca/#linux+v2.6.32/mm/filemap.c#L986

Scribe Notes

 * s1
 * s2
 * s3

Reading
LKD, Chapter 13