The Caboteria / Tech Web / TechNotes / TahoeLAFSNotes (16 Jun 2011, TobyCabot)
Tahoe-LAFS is described as "the first decentralized storage system with provider-independent security". Its name indicates that it's a "file system" but it's different than traditional file systems in ways that are important to understand before you start using it. This page will try to explain at a high level, in plain English, how Tahoe-LAFS works and provide links that will allow you to learn about it in detail.

Before we go any further, please read the one-page summary, then come back here. As you saw on that page, Tahoe-LAFS provides a guarantee that you can store your data on servers that you don't trust, and the administrators of those servers won't be able to read your data. It does this by encrypting the data before it stores it on those servers, so that all they see is random-looking bits and they can't recover the actual content of your files. Tahoe-LAFS also guards against the failure of the storage servers by storing the same data on more than one of them. Of course, this will use more disk storage than simply storing the file once, but you can decide how you'd like to trade off extra storage for fault-tolerance.

Capabilities (vs. Access Control)

Before we get into how Tahoe-LAFS stores files, it will be useful to recap how a traditional file system works. Traditional filesystems start at a well-known "root" and allow users to explore the filesystem from there. Because the root is well-known, you can go to it and list the files in it; you can also go "up" from any directory to its parent. Because users can explore file systems in this way, each user would be able to do anything they wanted unless there were some sort of inline permission check, so these filesystems implement "Access Control List" (ACL) permission checks. These checks specify which users are allowed to access each file and prevent users from doing things they can figure out how to do, but are not permitted to do. In other words, I can discover a directory's existence, and learn its name, but I might not be allowed to read from it. In order to do this, though, the file system has to know who you are, so you need to log in. In order to prove that you are who you claim to be, you have to provide a password and/or other credentials. Then you need to specify who has what kind of access to each file and directory. This approach works well, but it is complex and because of that it's very difficult to ensure that it's secure.

Tahoe-LAFS does away with the complexity inherent in the ACL approach and uses a much simpler approach, called "capabilities". Access to each file (and directory) in Tahoe-LAFS is allowed by a "capability" which is a string of characters that looks something like URI:CHK:riplmjitnwh25ur3jomzyxrww4:et4gkxykswl7lstw5q4g5suf6y2xyyphvid5nn2r3ktvhytbs5da:3:10:3472. A file can have different capabilities, for example, one capability might allow you to read the file but a different capability might allow you to read and write the file. Each capability contains the two things that you need to access the file: how to find the encrypted bits (the "storage index"), and how to decrypt them (the "encryption key").

Access to any given file is a simple yes/no proposition: if you know that file's capability then you'll be able to read it, if you don't then you won't be able to. It doesn't matter who you are, or what group you're in, or if you're a "superuser" or not. In fact, Tahoe-LAFS doesn't have any sense of "identity" at all: you don't have to sign in or provide credentials to prove who you are, because Tahoe-LAFS doesn't know or care.

It's important to understand that a capability specifies the location of a file, but it's different than a traditional file system "path". Tahoe-LAFS has no well-known "root" so there's no way to poke around and try to discover things inside it. Each directory and file can be found only by its capability and can't be discovered in any other way. (How many bits in a capability, i.e. how hard would it be to guess?) A directory capability acts like a traditional file system directory in that users can browse down from it to see files in it and in the tree below it, but they can't browse "up" to see other directories within the same Tahoe-LAFS file system. It's as if each directory in Tahoe-LAFS is a root directory. Users cannot discover things that they're not supposed to know, so the in-line ACL checks implemented by traditional file systems are unnecessary.

If you're curious about the capability model, it's worth taking some time to learn more about it:


As you can imagine, Tahoe-LAFS capability model makes file sharing easy. Sharing a file with another person is easy: just give them that file's capability string. Once you've done that, they will be able to do everything that the capability enables. If you share a read-only capability then the person you shared it with will be able to read the file but not write it.


Edit | Attach | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Copyright © 2008-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding The Caboteria? Send feedback