tCompressedString in SharePoint 2010 Content Database

http://www.digitude.net/blog/?p=362

In order to understand a specific aspect of SharePoint it is sometimes useful to be able to peek in the databases. One such aspect where the backend storage is important is the usage of content types. If a content type is assigned to a list this information is written down in the content database. Depending on how the association is performed the actual data that is stored changes, the association can be performed based on features or one can use the UI to make the association.

The field where this content type association is stored is ‘tp_ContentTypes’ in the table ‘AllLists’. In SharePoint 2007 this field contained literal xml fragments, in SharePoint 2010 the field contains compressed data, so it is not immediately readable.

The data type of the ‘tp_ContentTypes’ field is tCompressedString which is actually a User Defined Type definition pointing to a varbinary(max) data type. The compression method of these fields are described in this document: MS-WSSF02 (File Operations

Database Communications Version 2 Protocol Specification).

If you go to topic 2.2.5.8 (WSS Compressed Structures) in this document you can see that the zlib compression technique is used to compress the data. From the structure schema you can also see that the offset of the compressed data is 12 bytes. Here starts the zlib compressed data. To my limited understanding zlib is some sort of envelope specification with support for different compression mechanisms, the one that is used most though is the deflate compression technique. The .NET framework has support for this compression technique via the ‘DeflateStream’ class. I have read in some articles that there are more robust ways to work with the deflate compression but for our purposes it will do. In order to use the method ‘CopyTo’ I compiled the code for the .NET 4.0 framework, it makes reading from the underlying stream much easier.

Following code snippet shows how you can decompress the data coming from the ‘tp_ContentTypes’  field:

privatestring Decompress(byte[] compressedBytesBuffer)
{
    string uncompressedString = String.Empty;

    using (MemoryStream compressedMemoryStream = newMemoryStream(compressedBytesBuffer))
    {
        compressedMemoryStream.Position += 12; // Compress Structure Header according to [MS -WSSFO2].
        compressedMemoryStream.Position += 2;  // Zlib header.

        using (DeflateStream deflateStream = newDeflateStream(compressedMemoryStream, CompressionMode.Decompress))
        {
            using (MemoryStream uncompressedMemoryStream = newMemoryStream())
            {
                deflateStream.CopyTo(uncompressedMemoryStream);

                uncompressedMemoryStream.Position = 0;

                using (StreamReader streamReader = newStreamReader(uncompressedMemoryStream))
                {
                    uncompressedString = streamReader.ReadToEnd();
                }
            }
        }
    }

    return uncompressedString;
}

The method itself accepts an array of bytes, in order to convert the string representation of the byte sequence (coming from the textbox) there is a helper function that returns a corresponding byte sequence. This means that the helper method converts for example the string “A8B2” to a byte sequence {0xA8, 0xB2}.

I have created a small windows forms application that performs this decompression. The interface is very basic as I put together this small utility rather quickly.

The source code for this small utility can be found here. The code is written in C# using Visual Studio 2010.

 

阅读全文
类别:moss技术 查看评论
posted @ 2011-04-12 11:45  一只老鼠  阅读(393)  评论(0编辑  收藏  举报