Learning FUSE and Python's C interface

For some time now, I wanted to have a look at Python's interfaces to C libraries. And yesterday I wondered how to implement a filesystem with FUSE. So, in this article we will explore implementing a user-space file-system with Python.

WORK IN PROGRESS

Installing FUSE

I'm using XUbuntu, so the installation instructions will be Debian/Ubuntu specific. And unspectacular:
    ~/fuse-test$ sudo apt-get install fuse
Now, for some mounts.

Using sshfs

What is sshfs? sshfs is a popular FUSE filesystem that allows you to mount a directory from another host, using ssh as transport.

First, let's prepare a directory to mount.

    ~/fuse-test$ ssh remotehost
    murat@remotehost:~$ mkdir test
    murat@remotehost:~$ touch test/testfile
    murat@remotehost:~$ exit

For sshfs there's a package readily available in the Ubuntu apt mirrors. Also, we need a directory to mount to.

    ~/fuse-test$ sudo apt-get install sshfs
    ~/fuse-test$ mkdir mountpoint
Now, we can mount the remote directory test from remotehost.
    ~/fuse-test$ sshfs remotehost:test mountpoint
    ~/fuse-test$ cat /etc/mtab | grep remotehost
    remotehost:test /home/murat/fuse-test/mountpoint fuse.sshfs rw,nosuid,nodev,user=murat 0 0
mtab is looking good. So, what's inside our mount?
    ~/fuse-test$ ls mountpoint
    testfile
Cake for everyone! And unmount, using fusermount.
    ~/fuse-test$ fusermount -u temp-fuse/

Hello World FS

Looking for an easy intermediate step to writing my own file system, I stumbled over the examples that are shipped with FUSE. In particular, there is the Hello World filesystem. Let's make that one work.

The examples are contained in the development package for libfuse, so we install that.

    ~/fuse-test$ sudo apt-get install libfuse-dev
    ~/fuse-test$ ls /usr/share/doc/libfuse-dev/examples/
    cusexmp.c  fioclient.c  fselclient.c  fusexmp_fh.c  hello_ll.c  null.c
    fioc.c     fsel.c       fusexmp.c     hello.c       Makefile
    ~/fuse-test$ cp -r  /usr/share/doc/libfuse-dev/examples ./
    ~/fuse-test$ cd examples/

Compile it.

    ~/fuse-test/examples $ make
  cc -Wall -D_FILE_OFFSET_BITS=64 -I/usr/include/fuse   fusexmp.c -lfuse   -o fusexmp
  # more make output here
We can now go ahead and mount the filesystem.
    ~/examples $ ./hello ../mountpoint/
    cd ../mountpoint/
    ~/fuse-test/mountpoint$ ls
    hello
    ~/fuse-test/mountpoint$ cat hello 
    Hello World!
    ~/fuse-test/mountpoint$ cd ..
    ~/fuse-test$ fusermount -u mountpoint
Let's have a look at the available Python bindings.

fuse-python

For starters, I'll have a look at the official Python bindings.

Installation

Probably a piece of work: it's ancient. Installation is trivial, though, because it's available as a package in Ubuntu: apt-get install python-fuse. (Yes, the name is correct. The package name follows Debian's naming conventions for python packages.)

After installing, its examples can be found in /usr/share/doc/python-fuse/examples: namely hello.py and xmp.py. They work out of the box:

    ~/fuse-test$ python /usr/share/doc/python-fuse/examples/hello.py mountpoint/
    ~/fuse-test$ cat mountpoint/hello 
    Hello World!

NopFS: The Bare Minimum

The most minimal Filesystem I could come up with, is the following NopFS.

import fuse

fuse.fuse_python_api = (0, 2)


class NopFS(fuse.Fuse):
    pass


def main():
    server = NopFS()
    server.parse()
    server.main()

if __name__ == '__main__':
    main()

It shows nicely, how the API works:

It can already be mounted like the previous examples:

~/fuse-test$ python nopfs.py mountpoint

When doing an ls, FUSE notices that this functionality has not been implemented yet:

~/fuse-test$ cd mountpoint && ls
ls: reading directory .: Function not implemented

LsFS: Directory Contents

So, let's make ls work. For that, the following functions are necessary at least: getattr(path) and readdir(path, offset). readdir provides the list of files in the root directory, and getattr gives information on each file, so ls knows whether a path identifies a directory or a file, who owns it, and so on.

By the way, how do you know what methods to implement? Find out in this note about system calls.

Let us add one file some_file and one directory some_dir to the root directory. The following LsFS filesystem does just that.

class LsFS(fuse.Fuse):
    def getattr(self, path):
        st = fuse.Stat()
        st.st_nlink = 1
        if path[1:] == "some_dir" or path == '/':
            st.st_mode = stat.S_IFDIR | 0755
        elif path[1:] == "some_file":
            st.st_mode = stat.S_IFREG | 0644
        else:
            return -errno.ENOENT
        return st

    def readdir(self, path, offset):
        if path == "/":
            for name in [".", "..", "some_file", "some_dir"]:
                yield fuse.Direntry(name)

After mounting, an ls reveals:

~/fuse-test$ ls -l mountpoint/
total 0
drwxr-xr-x 2 root root 0 Jan  1  1970 some_dir
-rw-r--r-- 1 root root 0 Jan  1  1970 some_file

Yay!

A “few” things were omitted. At least: Making some_dir's contents actually readable. Setting other stat-related fields. Checking access rights.

By the way, ever run into an Invalid argument error? Weirdly enough, this need not have anything to do with invalid arguments. Read the note on this error to learn more.

For a fuse-python implementation, there are two design choices to be considered.

PyFS: Reading and Writing Files

The methods above we can use to implement the Python filesystem. I will be using the OO approach. Since it's just too much details code, I'll only show the most important snippets here. Refer to the source code of PyFS for the complete implementation. By default, I am importing and making available three modules under /lib:

~/fuse-test$ ls mountpoint/lib/
json  os  sys

The next obvious feature is reading: For that we implement the read method on the file access class.

class FileMapping(object):
    def read(self, size, offset):
        return self._read_from_string(
            self.get_text(),
            size,
            offset,
        )

get_text() retrieves the string representation of the path. For /lib/os/pathsep, it resolves the name and returns :. _read_from_string returns the proper range of the text, taking size and offset into account. (Obviously, this is inefficient. Don't care, though.)

Something very nice that we can do already, is making the files representing the functions executable. For that to work, getattr needs to set the executable bit in the st_mode field and read needs to return an executable file that will call the function and print it's result. Implemented, this looks like so:

~/fuse-test$ mountpoint/lib/os/path/join /this/is/a "path/to" be/joined
/this/is/a/path/to/be/joined

The next thing, we can do is add modules to /lib by appending to the pseudo-file /run/modules. We could use an executable to be called like so: /bin/import re. That is nicely readable, but means we're only doing reading again, (reading the script that will parse the input and add it to the filesystem), which is boring. Also, since this executable would be run as a new process, we'd need inter-process communication, which is annoying. So, what about this: echo "re" >> /run/modules? For this to work, we need getattr to make this file writable, and the write() method needs to process the module name that will be written to it:

    def write(self, buf, offset):
        if not self.append and offset != 0:
            raise IOError(errno.EPERM)
        addlib(buf.strip())
        return len(buf)

First, we make sure that a module is either added to the existing ones, or replaces all of them. Then, the addlib() function adds the module with the given name to the list of modules accessible under /lib. Put together:

~/fuse-test$ ls mountpoint/lib/
json  os  sys
~/fuse-test$ echo re >> mountpoint/run/modules 
~/fuse-test$ ls mountpoint/lib/
json  os  re  sys
~/fuse-test$ mountpoint/lib/re/match ".*(l+).*" hello
<_sre.SRE_Match object at 0x7ff2a94b44e0>
  

By the way, you might run into “Invalid Argument” trouble here. The good news: You might be innocent; it's a bug! Have a look at this note regarding write errors.

What about echo "re" > /run/modules? This would clear the list of modules first, only leaving the re module mounted afterwards. To support truncating a file, we would need to implement the truncate() on the filesystem object, and ftruncate() on the file class. Since the wanted echo command will use the former, here it is:

    def truncate(self, path, len):
        if path != "/run/modules":
            raise IOError(-errno.EPERM)
        if len != 0:
            raise IOError(-errno.EPERM)
        clear_modules_list()

The first check makes sure, we only clear the module list when called upon the special file. The second check ensure we only do complete truncations. Put together:

~/fuse-test$ ls mountpoint/lib/
json  os  sys
~/fuse-test$ echo re > mountpoint/run/modules 
~/fuse-test$ ls mountpoint/lib/
re

Concluding thoughts

So, that's it for fuse-python. The complete source code of all examples is available on Github.

fusepy

fusepy provides ctypes based Python bindings to FUSE.

Installation

Can't be installed in a virtualenv. That means: Must install system-wide and can't tst fuse-python and fusepy at the same time, since both use "fuse" as module name. Meh. This is what you get for unintentionally importing the wrong one:

Traceback (most recent call last):
  File "/home/murat/ws/fuse-and-python/phuse/fusepython/pyfs.py", line 238, in 
    class PyFS(fuse.Fuse):
AttributeError: 'module' object has no attribute 'Fuse'
  

NopFS: The bare minimum

Again, here's the minimal filesystem: doing nothing, but mountable.
import fuse


class NopFS(fuse.Operations):
    pass

if __name__ == '__main__':
    fuse.FUSE(NopFS(), sys.argv[1])

The code is self-explanatory. The differences to fuse-python are worth mentioning, though.

Now, this can be mounted as before (without the -s, though). A ls on the directory reveals something nice about fusepy.

~/fuse-test$ ls mountpoint/
~/fuse-test$ ls -hal mountpoint/
total 4,0K
drwxr-xr-x  2 root  root     0 Jan  1  1970 .
drwxrwxr-x 10 murat murat 4,0K Dez 26 16:28 ..
  

fusepy comes along with some minimal, but nice defaults. In this case, readdir always returns ['.', '..'] and that's why we're not running into errors immediately.

LsFS: Directory Contents

How do we add a file and a directory with fusepy? It's fairly simple:
class LsFS(fuse.Operations):
    def getattr(self, path, fh=None):
        if path[1:] == "some_dir" or path == '/':
            return dict(
                st_mode=(stat.S_IFDIR | 0755),
                st_nlink=2,
            )
        elif path[1:] == "some_file":
            return dict(
                st_mode=(stat.S_IFREG | 0644),
                st_nlink=1,
            )
        else:
            raise fuse.FuseOSError(errno.ENOENT)

    def readdir(self, path, fh):
        if path == "/":
            return [".", "..", "some_file", "some_dir"]