Learning FUSE and Python's C interface
For some time now, I wanted to have a look at Python's interfaces to C libraries. And yesterday I wondered how to implement a filesystem with FUSE. So, in this article we will explore implementing a user-space file-system with Python.
WORK IN PROGRESS
Installing FUSE
I'm using XUbuntu, so the installation instructions will be Debian/Ubuntu specific. And unspectacular:~/fuse-test$ sudo apt-get install fuseNow, for some mounts.
Using sshfs
What is sshfs? sshfs is a popular FUSE filesystem that allows you to mount a directory from another host, using ssh as transport.
First, let's prepare a directory to mount.
~/fuse-test$ ssh remotehost murat@remotehost:~$ mkdir test murat@remotehost:~$ touch test/testfile murat@remotehost:~$ exit
For sshfs there's a package readily available in the Ubuntu apt mirrors. Also, we need a directory to mount to.
~/fuse-test$ sudo apt-get install sshfs ~/fuse-test$ mkdir mountpointNow, we can mount the remote directory
test
from remotehost.
~/fuse-test$ sshfs remotehost:test mountpoint ~/fuse-test$ cat /etc/mtab | grep remotehost remotehost:test /home/murat/fuse-test/mountpoint fuse.sshfs rw,nosuid,nodev,user=murat 0 0
mtab
is looking good. So, what's inside our mount?
~/fuse-test$ ls mountpoint testfileCake for everyone! And unmount, using fusermount.
~/fuse-test$ fusermount -u temp-fuse/
Hello World FS
Looking for an easy intermediate step to writing my own file system, I stumbled over the examples that are shipped with FUSE. In particular, there is the Hello World filesystem. Let's make that one work.
The examples are contained in the development package for libfuse
, so we install that.
~/fuse-test$ sudo apt-get install libfuse-dev ~/fuse-test$ ls /usr/share/doc/libfuse-dev/examples/ cusexmp.c fioclient.c fselclient.c fusexmp_fh.c hello_ll.c null.c fioc.c fsel.c fusexmp.c hello.c Makefile ~/fuse-test$ cp -r /usr/share/doc/libfuse-dev/examples ./ ~/fuse-test$ cd examples/
Compile it.
~/fuse-test/examples $ make cc -Wall -D_FILE_OFFSET_BITS=64 -I/usr/include/fuse fusexmp.c -lfuse -o fusexmp # more make output hereWe can now go ahead and mount the filesystem.
~/examples $ ./hello ../mountpoint/ cd ../mountpoint/ ~/fuse-test/mountpoint$ ls hello ~/fuse-test/mountpoint$ cat hello Hello World! ~/fuse-test/mountpoint$ cd .. ~/fuse-test$ fusermount -u mountpointLet's have a look at the available Python bindings.
fuse-python
For starters, I'll have a look at the official Python bindings.
Installation
Probably a piece of work: it's ancient. Installation is trivial, though, because it's available as a package in Ubuntu: apt-get install python-fuse
. (Yes, the name is correct. The package name follows Debian's naming conventions for python packages.)
After installing, its examples can be found in /usr/share/doc/python-fuse/examples
: namely hello.py
and xmp.py
. They work out of the box:
~/fuse-test$ python /usr/share/doc/python-fuse/examples/hello.py mountpoint/ ~/fuse-test$ cat mountpoint/hello Hello World!
NopFS: The Bare Minimum
The most minimal Filesystem I could come up with, is the following NopFS.
import fuse fuse.fuse_python_api = (0, 2) class NopFS(fuse.Fuse): pass def main(): server = NopFS() server.parse() server.main() if __name__ == '__main__': main()
It shows nicely, how the API works:
3: Declare the targeted version of the FUSE API,6: Subclassfuse.Fuse
,13: Call the classesmain()
function when the module is executed.
It can already be mounted like the previous examples:
~/fuse-test$ python nopfs.py mountpoint
When doing an ls
, FUSE notices that this functionality has not been implemented yet:
~/fuse-test$ cd mountpoint && ls ls: reading directory .: Function not implemented
LsFS: Directory Contents
So, let's make ls
work. For that, the following functions are necessary at least: getattr(path) and readdir(path, offset). readdir
provides the list of files in the root directory, and getattr
gives information on each file, so ls
knows whether a path identifies a directory or a file, who owns it, and so on.
By the way, how do you know what methods to implement? Find out in this note about system calls.
Let us add one file some_file
and one directory some_dir
to the root directory. The following LsFS filesystem does just that.
class LsFS(fuse.Fuse): def getattr(self, path): st = fuse.Stat() st.st_nlink = 1 if path[1:] == "some_dir" or path == '/': st.st_mode = stat.S_IFDIR | 0755 elif path[1:] == "some_file": st.st_mode = stat.S_IFREG | 0644 else: return -errno.ENOENT return st def readdir(self, path, offset): if path == "/": for name in [".", "..", "some_file", "some_dir"]: yield fuse.Direntry(name)
3: getattr
's job is to return afuse.Stat
object filled with information aboutpath
. It contains information about type, access rights and times, and so on.4: fuse.Stat
is initialized with “undefined” values. Usually, that boils down to a zero, which is harmless in most cases. Two fields need to be set:st_nlink
(number of hardlinks to that inode) andst_mode
(type, access rights, …). Here, the hardlinks are initialized wrongly, as for example directories should have at least two hardlinks. But I just want to make the code work.5: FUSE always provides the absolute path relative to the mount root. That is why the first character, usually a '/' on UNIX, is removed.6-8: The Stat structure is filled according to the path, makingsome_dir
globally readable and searchable directory, and so on.10: For anything other, the error code for “unknown file” is returned.13: readdir
should return a generator producing anDirEntry
object for each inode in the directory. Note there is another, more low-level mode of operation which uses theoffset
parameter. More about it in the API docs.
After mounting, an ls reveals:
~/fuse-test$ ls -l mountpoint/ total 0 drwxr-xr-x 2 root root 0 Jan 1 1970 some_dir -rw-r--r-- 1 root root 0 Jan 1 1970 some_file
Yay!
A “few” things were omitted. At least: Making some_dir
's contents actually readable. Setting other stat-related fields. Checking access rights.
By the way, ever run into an Invalid argument
error? Weirdly enough, this need not have anything to do with invalid arguments. Read the note on this error to learn more.
For a fuse-python implementation, there are two design choices to be considered.
PyFS: Reading and Writing Files
The methods above we can use to implement the Python filesystem. I will be using the OO approach. Since it's just too much details code, I'll only show the most important snippets here. Refer to the source code of PyFS for the complete implementation. By default, I am importing and making available three modules under /lib
:
~/fuse-test$ ls mountpoint/lib/ json os sys
The next obvious feature is reading: For that we implement the read
method on the file access class.
class FileMapping(object): def read(self, size, offset): return self._read_from_string( self.get_text(), size, offset, )
get_text()
retrieves the string representation of the path. For /lib/os/pathsep
, it resolves the name and returns :
. _read_from_string
returns the proper range of the text, taking size
and offset
into account. (Obviously, this is inefficient. Don't care, though.)
Something very nice that we can do already, is making the files representing the functions executable. For that to work, getattr
needs to set the executable bit in the st_mode
field and read
needs to return an executable file that will call the function and print it's result. Implemented, this looks like so:
~/fuse-test$ mountpoint/lib/os/path/join /this/is/a "path/to" be/joined /this/is/a/path/to/be/joined
The next thing, we can do is add modules to /lib
by appending to the pseudo-file /run/modules
. We could use an executable to be called like so: /bin/import re
. That is nicely readable, but means we're only doing reading again, (reading the script that will parse the input and add it to the filesystem), which is boring. Also, since this executable would be run as a new process, we'd need inter-process communication, which is annoying. So, what about this: echo "re" >> /run/modules
? For this to work, we need getattr
to make this file writable, and the write()
method needs to process the module name that will be written to it:
def write(self, buf, offset): if not self.append and offset != 0: raise IOError(errno.EPERM) addlib(buf.strip()) return len(buf)
First, we make sure that a module is either added to the existing ones, or replaces all of them. Then, the addlib()
function adds the module with the given name to the list of modules accessible under /lib
. Put together:
~/fuse-test$ ls mountpoint/lib/ json os sys ~/fuse-test$ echo re >> mountpoint/run/modules ~/fuse-test$ ls mountpoint/lib/ json os re sys ~/fuse-test$ mountpoint/lib/re/match ".*(l+).*" hello <_sre.SRE_Match object at 0x7ff2a94b44e0>
By the way, you might run into “Invalid Argument” trouble here. The good news: You might be innocent; it's a bug! Have a look at this note regarding write errors.
What about echo "re" > /run/modules
? This would clear the list of modules first, only leaving the re
module mounted afterwards. To support truncating a file, we would need to implement the truncate()
on the filesystem object, and ftruncate()
on the file class. Since the wanted echo command will use the former, here it is:
def truncate(self, path, len): if path != "/run/modules": raise IOError(-errno.EPERM) if len != 0: raise IOError(-errno.EPERM) clear_modules_list()
The first check makes sure, we only clear the module list when called upon the special file. The second check ensure we only do complete truncations. Put together:
~/fuse-test$ ls mountpoint/lib/ json os sys ~/fuse-test$ echo re > mountpoint/run/modules ~/fuse-test$ ls mountpoint/lib/ re
Concluding thoughts
- Not maintained.
- Code-wise, it's a mess.
- API-wise, too. (Either make it pythonic, or don't. Whatever you do, do it right.)
- Documentation: None. Examples only.
- Buggy in release.
So, that's it for fuse-python. The complete source code of all examples is available on Github.
fusepy
fusepy provides ctypes based Python bindings to FUSE.
Installation
Can't be installed in a virtualenv. That means: Must install system-wide and can't tst fuse-python and fusepy at the same time, since both use "fuse" as module name. Meh. This is what you get for unintentionally importing the wrong one:
Traceback (most recent call last): File "/home/murat/ws/fuse-and-python/phuse/fusepython/pyfs.py", line 238, inclass PyFS(fuse.Fuse): AttributeError: 'module' object has no attribute 'Fuse'
NopFS: The bare minimum
Again, here's the minimal filesystem: doing nothing, but mountable.import fuse class NopFS(fuse.Operations): pass if __name__ == '__main__': fuse.FUSE(NopFS(), sys.argv[1])
The code is self-explanatory. The differences to fuse-python are worth mentioning, though.
4: We subclassfuse.Operations
. As can be seen below, this is not the class we hand control to. fusepy prefers composition over inheritance for separating the filesystem logic from the generic code. Makes things much more understandable for me.8: We have to retrieve the mountpoint argument ourselves and then pass it to fusepy.
Now, this can be mounted as before (without the -s
, though). A ls
on the directory reveals something nice about fusepy.
~/fuse-test$ ls mountpoint/ ~/fuse-test$ ls -hal mountpoint/ total 4,0K drwxr-xr-x 2 root root 0 Jan 1 1970 . drwxrwxr-x 10 murat murat 4,0K Dez 26 16:28 ..
fusepy comes along with some minimal, but nice defaults. In this case, readdir
always returns ['.', '..']
and that's why we're not running into errors immediately.
LsFS: Directory Contents
How do we add a file and a directory with fusepy? It's fairly simple:class LsFS(fuse.Operations): def getattr(self, path, fh=None): if path[1:] == "some_dir" or path == '/': return dict( st_mode=(stat.S_IFDIR | 0755), st_nlink=2, ) elif path[1:] == "some_file": return dict( st_mode=(stat.S_IFREG | 0644), st_nlink=1, ) else: raise fuse.FuseOSError(errno.ENOENT) def readdir(self, path, fh): if path == "/": return [".", "..", "some_file", "some_dir"]
4: Subclassfuse.Operations