Courses/Computer Science/CPSC 457.F2013/Lecture Notes/VFS
An Overview of The Linux VFS Layer
The virtual file system interface sits right below Linux's implementation of the file I/O API defined by the file-related system calls. In this session, we will consider how the VFS abstracts out and supplies common operations.
Key Idea: This API simultaneously supports the conventional semantics and syntax of the Unix file--related system calls and provides a place for the kernel to delegate to the specific file system responsible for managing the file handle referenced by the syscall API. Hence, you have two APIs sitting one on top of the other (syscall-->VFS), enabling both (1) stability of user-level programs, (2) stability of the syscall definitions and core functionality and (3) specialization for specific data and file system types.
- major data structures
- support for delegating operations to specific file system implementations
- support for canonicalization and path lookup
Major Kernel Data Structures Supporting File Systems
Major Kernel Data Structures Supporting Files
struct inode (The generic in-memory inode data structure helps unify specific file systems with particular files, hence I list it under both categories)
Major Functions of the VFS
Example: The Ext4 inode
ext4 inode (observe how this represents the "on disk" inode). Observe the indexing block array
A Journey From Open to Read to Close
- The open(2) syscall entry point: http://lxr.cpsc.ucalgary.ca/#linux+v2.6.32/fs/open.c#L1053
Observe how it creates a file descriptor (a small integer) based on the file name / path and then associates that with a "struct file" kernel data structure, so that the task_struct's "files" member has an idea of what files the process currently has open.
- struct file: http://lxr.cpsc.ucalgary.ca/#linux+v2.6.32/include/linux/fs.h#L908
- do_filep_open: http://lxr.cpsc.ucalgary.ca/#linux+v2.6.32/fs/namei.c#L1669
- task_struct's files member: http://lxr.cpsc.ucalgary.ca/#linux+v2.6.32/include/linux/sched.h#L1372
- which is of type "struct files_struct" http://lxr.cpsc.ucalgary.ca/#linux+v2.6.32/include/linux/fdtable.h#L43
- notice the array of "struct file" objects: http://lxr.cpsc.ucalgary.ca/#linux+v2.6.32/include/linux/fdtable.h#L57
- limited by NR_OPEN_DEFAULT (which is at least 32, but is typically many more) (see ulimit -a)
[locasto@csl ~]$ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 127443 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 1024 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited [locasto@csl ~]$
The "fd_install" function associates the new "struct file" with the list of open files in task_struct's "files" list.
- The "read(2)" syscall entry point: http://lxr.cpsc.ucalgary.ca/#linux+v2.6.32/fs/read_write.c#L372
- The "vfs_read" kernel function: http://lxr.cpsc.ucalgary.ca/#linux+v2.6.32/fs/read_write.c#L277
- Invoking either the generic synchronous "read" functionality or the file system specific read:
291 if (file->f_op->read) 292 ret = file->f_op->read(file, buf, count, pos); 293 else 294 ret = do_sync_read(file, buf, count, pos);
Look at how this code uses the file descriptor's f_op's function pointer table's "read" member.
- struct file_operations: http://lxr.cpsc.ucalgary.ca/#linux+v2.6.32/include/linux/fs.h#L1482
Observe how this code uses the aio_read() file operation function.
To tie this all together, what if the "file->f_op->read" member is not NULL? In this case, the VFS delegates to the file system's specific read routine. In the case of ext2, where/how is this function pointer assigned?
- ext2_file_operations: http://lxr.cpsc.ucalgary.ca/#linux+v2.6.32/fs/ext2/file.c#L41
We see that "read" is pointed to the kernel's "do_sync_read" routine, which we've seen above. And it's aio_read function points to:
- generic_file_aio_read: http://lxr.cpsc.ucalgary.ca/#linux+v2.6.32/mm/filemap.c#L1271
which handles asynchronous I/O and delegates to the block layer and its built-in caching (i.e., keeping "disk" data in memory so access if fast).
This can call:
- do_generic_file_read: http://lxr.cpsc.ucalgary.ca/#linux+v2.6.32/mm/filemap.c#L986
LKD, Chapter 13