The Shoes of the Fisherman's Wife Are Some Jive-Ass Slippers

tpot (at) frungy . org

rss

2003
Months
Nov

Tue, 25 Nov 2003

Samba 4

Tridge, and a handful of others on the Samba Team, have been working on a rewrite of Samba. Slashdot trolls notwithstanding, it's coming along very nicely and at a much greater speed than I had expected.

There are a number of interesting design patterns that have emerged.

  • Test-driven development

    Samba 4 started out as a rewrite of the lowest level protocol layer in CIFS, the SMB layer. Each SMB (there are seventy three distinct SMB messages) was re-implemented from scratch and tests written to exercise every possible field of each SMB. For parts of the protocol that return the same information, such as the seventeen different ways of asking for the length of a file, that information is cross checked with the other methods. Using this technique tridge found a number of bugs in Windows 2003 Server, including some that to fix would require the server to be rebooted or the operating system completely reinstalled.

    Once there is a body of test code to be used, refactoring the code becomes a much more manageable task. This is one of the tenets of Extreme Programming. Having test code also encourages a culture of test case development, especially if the tests can be run easily. Contributors to the project can be confident that their change is good if the existing and any new tests pass.

  • Use of code generation tools

    With the low level SMB layer complete, focus has moved to the RPC layer. Again there is a suite of tests for all known RPC operations written in parallel with the code. All the RPC related code (header files, marshaling, unmarshalling and debug code) is generated from IDL files using an IDL compiler written in Perl.

    Previously Samba had handwritten marshaling code that was painful maintain and hard to write in the first place. The advantage of automatically generated code is that alignment bugs can be fixed in the compiler and thus whole classes of bugs can be fixed at once instead of just one instance.

    Now the really neat thing is that there are tests that check the marshaling and unmarshaling code at the same time. When a blob of data is marshaled, it is also passed through the unmarshaler and the two blobs compared. If they are not equal then there is a bug somewhere.

  • Pool based memory allocation

    Allocating memory in pools is an nice technique for managing dynamic memory allocation in the face of complicated data structures. The idea is that all memory allocated is associated with a "pool" which can be freed with a single function call. This frees the programming from having to iterate over elements of a list, array or other deep structure calling free() on memory blocks in the correct order. Samba uses routines talloc.c or "trivial alloc" which is simply a structure that holds a linked list of pointers to allocated blocks. The talloc_free() function simply iterates over the list and frees each block.

    One participant on the #samba-techical IRC channel said that using talloc() was tantamount to "giving up on doing memory allocation properly". While there is something to be said for donning the hair shirt and making sure every single malloc() is matched with a corresponding call to free() this rapidly becomes a difficult task, especially with large nested data structures. Being able to allocate memory and not have to worry about the consequences is almost like using a modern language with built-in garbage collection like Python or Perl. (-:

    Memory bugs in Samba 4 and to a lesser extent Samba 3 are now reduced to simply forgetting to free a talloc context, or allocating memory from the correct context. The "correct context" is the talloc context with the smallest lifetime and is usually obvious from reading the code.

Any discussion of patterns would be incomplete without some cool anti-patterns. There are still a number of things that annoy me about Samba.
  • Global prototype file

    Samba has a big honking automatically generated header file that contains the function prototypes for all non-static functions. While this is a quick way of keeping header file prototypes up to date, it encourages monolithic design because it's easy just to add a function to a random file, type make proto and continue on your way. Samba should have a small number of utility libraries that export interfaces to be used by other parts of Samba, or third party programs.

    Tridge is very much against removing the global header file for a number of reasons. I think the issues are a bit confused. I break them down like this:

    • Problem: It's too hard to manage the dependencies of system header files.

      Autoconf does a great job of working out which header files are where. Why not switch to a global include file that includes every system header from the right place and in the right order?

    • Problem: The global header file is needed to keep function prototypes automatically up to date.

      I think this argument is particularly bogus as gcc does plenty of checking at compile time to ensure the header and it's implementation are consistent. It's a simple matter to cut and paste the prototype or just edit it by hand. Exactly how many functions are you going to be adding or changing at any one time anyway? The Ethereal project has header files maintained by hand and it is not really too much trouble to update the .h file if you change the .c file.

  • No header file dependencies in build system

    Traditionally, having an accurate representation of header file dependencies is one of the main failings of large build systems. This is a hard task as maintaining them by hand is next to impossible so one is left with the various automatic solutions based on scripts or gcc compiler extensions. Usually broken header dependencies is a result of using recursive make (see my favourite discussion of the topic here) but in Samba's case it is laziness encouraged as aresult of the global header include file.

    A symptom of bad dependencies is when a make clean is required before make will rebuild files that need to be recompiled. In Samba's case this problem is linked to the previous one about a global include file.

    This is a sign that the project is badly organised with no separation of the application logic and groups of utility functions needed to implement that logic. Samba 2/3 depends on a large set of files in the lib and libsmb which in turn depend on random parts of each other. This makes the job of dividing the code into modular sections hard.

    My proposed solution is to use some automated generation of header file dependencies as seen in many other projects (c.f ethereal). Unfortunately(?) most of these techniques require the use of GNU make. It would be nice to assert that a requirement for building Samba is that you must have GNU make. (Ha ha - can't compile GNU make on your system). Another solution is to only enable header file dependencies on systems that have GNU make installed. Samba development is primarily done on these systems anyway.

    My final comment is that fixing header file dependencies will require the global include file to be replaced with more smaller files. The reason being is that since everything depends on proto.h changing anything at all in Samba will require every object file to be rebuilt.

Despite the above two gripes, Samba 4 is forming in to a major architectural and technical improvement over Samba 3. posted at: 14:15 | path: /software/samba | permanent link to this entry