Hiatus

In 15 days’ time, I’ll be taking off on a four months long trip around the world. I’ll be hiking and travelling my way through Japan (March-May), the United States of America (May-July) and the Netherlands (July). I’ve been planning this trip for a long time now, since I came back from Japan for the first time in August 2007 (the photo on the right was taken near the Tsurugaoka Hachiman-gu Shrine in Kamakura), and have taken a long leave from work to accomplish it. What this means is that from March 10th you can expect a long pause in my semi-regular updates.

I am not dead :)

If you’re a reader who lives along my path and want to meet up for a beer and a lively chat, ping me via the contact form. I’ll be reading my emails every once in a while, so don’t worry if you only get an answer after a few days.

If you want to read about my trip, I’ll be posting regular updates to my new travel blog.

Circumventing the KB957543 .NET 3.5 SP1 Regression Bug

A couple of days ago I hit a regression bug in .NET 3.5 SP1, in which when you have a generic class that implements ISerializable and has static variables – you can not serialize it using a BinaryFormatter without your application either hanging (x86) or raising an exception (x64 – a TargetInvocationException containing an OutOfMemoryException). This only happens if you use a reference type as a generic argument.

It’s already well known, but I have yet to find a workaround documented anywhere. You could simply install the hotfix, but well, I wouldn’t if I were you – it hasn’t been thoroughly tested yet. Moreover, you might not even be able to do so due to either internal politics, strict IT rules or the fact that you simply do not have control over the hosting server.

Let’s take the simplest class that causes the issue:

[Serializable]
public class MyClass<T> : ISerializable
{
private static int list = 0;
public MyClass()
{
}
protected MyClass(SerializationInfo info, StreamingContext context)
{
}
void ISerializable.GetObjectData(SerializationInfo info, StreamingContext context)
{
}
}

When using the class as such:

using (MemoryStream stream = new MemoryStream())
{
BinaryFormatter formatter = new BinaryFormatter();
formatter.Serialize(stream, new MyClass<string>());
stream.Position = 0;
MyClass<string> item = (MyClass<string>)formatter.Deserialize(stream);
}

The last line will hit the bug.

To work around this issue, simply move your static variables into a new subclass:

[Serializable]
public class MyClass<T> : ISerializable
{
private static class KB957543
{
public static int list = 0;
}
public MyClass()
{
}
protected MyClass(SerializationInfo info, StreamingContext context)
{
}
void ISerializable.GetObjectData(SerializationInfo info, StreamingContext context)
{
}
}

You can still access all of your static variables and you don’t hit the bug.

Note that when you use anonymous methods or lambdas, they are cached as static variables of the type, meaning that you will have to manually type all of your lambdas.

Here’s an example of such a type that is prone to the bug:

[Serializable]
public class MyClass<T> : ISerializable
{
public MyClass()
{
}
public static string Concat(IEnumerable<int> numbers)
{
return string.Join(", ", numbers.Select(i => i.ToString()).ToArray());
}
protected MyClass(SerializationInfo info, StreamingContext context)
{
}
void ISerializable.GetObjectData(SerializationInfo info, StreamingContext context)
{
}
}

If we look through Reflector, we can see that there is a cached delegate in our type:

Since this is a static member, it makes the type susceptible to the bug and we now need to manually create the cached member ourselves:

[Serializable]
public class MyClass<T> : ISerializable
{
private static class KB957543
{
public static readonly Func<int, string> ToString = i => i.ToString();
}
public MyClass()
{
}
public static string Concat(IEnumerable<int> numbers)
{
return string.Join(", ", numbers.Select(KB957543.ToString).ToArray());
}
protected MyClass(SerializationInfo info, StreamingContext context)
{
}
void ISerializable.GetObjectData(SerializationInfo info, StreamingContext context)
{
}
}

Let SQL Server Tell You Which Indexes to Rebuild

When index fragmentation becomes too high, indexes will be very inefficient. Other than planning a good index design, you should rebuild / reorganize your indexes every once in a while.

SELECT 'ALTER INDEX [' + ix.name + '] ON [' + s.name + '].[' + t.name + '] ' +
CASE WHEN ps.avg_fragmentation_in_percent > 40 THEN 'REBUILD' ELSE 'REORGANIZE' END +
CASE WHEN pc.partition_count > 1 THEN ' PARTITION = ' + cast(ps.partition_number as nvarchar(max)) ELSE '' END
FROM   sys.indexes AS ix INNER JOIN sys.tables t
ON t.object_id = ix.object_id
INNER JOIN sys.schemas s
ON t.schema_id = s.schema_id
INNER JOIN (SELECT object_id, index_id, avg_fragmentation_in_percent, partition_number
FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, NULL)) ps
ON t.object_id = ps.object_id AND ix.index_id = ps.index_id
INNER JOIN (SELECT object_id, index_id, COUNT(DISTINCT partition_number) AS partition_count
FROM sys.partitions
GROUP BY object_id, index_id) pc
ON t.object_id = pc.object_id AND ix.index_id = pc.index_id
WHERE  ps.avg_fragmentation_in_percent > 10 AND
ix.name IS NOT NULL

The above query will give you a list of recommended index rebuild / reorganize statements for your database, according Pinal Dave’s 10-40 rule, although you are welcome to tweak it to your liking. It supports non-partitioned as well as partitioned indexes. If you want a more intense check for fragmentation, change the last NULL in the dm_db_index_physical_stats call to ‘SAMPLED’ or even ‘DETAILED’ (include quotes).

It’s a handy little tool for database administrators and saves a lot of the hassle of monitoring index fragmentation.

Update: Added multi-schema support as suggested by MJ12 and another check for null index names.

SQL Server Management Studio 2008 IntelliSense Doesn’t Recognize Special Characters

ssms

I just filed a new bug with Microsoft Connect. I certainly hope this one doesn’t get shrugged off like many of my other bugs did.

Want to reproduce it yourselves? Just create a table with a character that can only be valid inside the context of brackets (like a comma or braces) and then try to select from it. Don’t keep the CREATE clauses in the same query window or it might just work. See the screenshot below.

This is very frustrating because this means that not only is there no IntelliSense for these objects, but IntelliSense now gets in the way of actual querying as it will autocomplete incorrect names.

Adobe AIR and Hebrew Fonts

Twhirl(I’ve noticed quite a few people were having this problem, so I decided to blog about it. This might have to do with all non-English fonts, but I experienced it only with Hebrew)

TweetDeck

The only reason I use Adobe AIR is for the Twitter application Twhirl. However, I noticed the Hebrew text was incorrectly displayed (word order was reversed).

I did not find a solution for this online and the guys at Twhirl didn’t know what to make of this.

After toying with a few of the options, I finally found the answer – the fonts used by Twhirl were in Hebrew, but AIR wasn’t playing nicely with them. I switched from Calibri to Tahoma and found that the text was just fine.

Both @effifuks and @JonathanRauch (with TweetDeck, where he didn’t see Hebrew text at all – see the left screenshot) experienced the same issue.

MSBuild Script to Compress All JavaScript Files in a Project

I’ve got one project in my solution which has a lot of JavaScript files and they keep on coming. We’ve been using the YUI Compressor for quite a while and it’s proven an effective tool. After a lot of time of fiddling with the project’s MSBuild script, I came up with the following:

<Target Name="BeforeBuild">
<MakeDir
Directories="compressed\%(Content.RelativeDir)"
Condition="(%(Content.Extension) == '.js') And (!(Exists('compressed\%(Content.RelativeDir)')))" /> <Exec
Command="java -jar yuicompressor-x.y.z.jar --type js -o compressed\%(Content.Identity) %(Content.Identity)"
Condition="%(Content.Extension) == '.js'" /> <CreateItem
Include="compressed\%(Content.Identity)"
Condition="%(Content.Extension) == '.js'" /> </Target>

The above takes all of the JavaScript files in your project, compresses them (in the same relative directories) into the compressed directory and then adds them to the project, in case it gets deployed anywhere.

Note that I’m using the Extension, Identity and RelativeDir well-known item metadata attributes in order to impose batching, since batches causes loops instead of the string concatenation that happens when you reference the items themselves.

Your Mouth Says Windows-1255, But Your Eyes Say ISO-8859-1

I recently wrote an engine that gets XML files stored at our clients’ servers using HTTP requests. One of our clients decided to serve the XML file with one encoding and encode the file itself with another. This posed a problem to XDocument.

The client decided to encode their XML using the Windows-1255 encoding (Hebrew), noting the encoding correctly in the XML’s declaration, but served the file stating the ISO-8859-1 (Latin) encoding. This meant that I couldn’t just use XDocument’s normal Load method to load directly from the stream because XDocument looks at the HTTP headers and takes the document’s encoding from them.

Here’s a snippet of the code I used to get over that:

using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
// Use response's charset.
var encoding = Encoding.GetEncoding("ISO-8859-1");
if (!string.IsNullOrEmpty(response.CharacterSet))
encoding = Encoding.GetEncoding(response.CharacterSet);
byte[] bytes = ReadStream(response.GetResponseStream());
// Get the XML with the response's charset.
string xml = new string(encoding.GetChars(bytes));
int endOfDeclaration = xml.IndexOf("?>");
if (endOfDeclaration != -1)
{
// Try to find out the encoding from the declaration.
string decl = xml.Substring(0, endOfDeclaration + 2) + "<duperoot />";
XDocument declDoc = XDocument.Parse(decl);
var docEncoding = Encoding.GetEncoding(declDoc.Declaration.Encoding);
if (docEncoding == encoding)
return xml;
else
return new string(docEncoding.GetChars(bytes));
}
else
{
// Not XML or something... Send up.
    }
}

What I did here was to create a new document with the original XML’s declaration (the Latin characters which make up the XML’s declaration always have the same byte position), add a dupe root and parse that to get the name of the encoding used by the document. I then use that encoding to decode the document correctly.

Note that I’m using ISO-8859-1 as the default response’s encoding, since that is what HTTP’s specification demands.

HTML to JavaScript HTML DOM Converter

One of our products demands converting quite a large amount of HTML to equivalent scripts, written in JavaScript that create said HTML. Looking around the Internet, I found no tool that will automate this process for me, so I went ahead and created this simple little application.

htmltojsdom

I’ve written it using HTML Agility Pack and Simple CSS Parser. It’s hardly perfect and might incorrectly reference attributes, but I’ve tweaked it long enough for it to work, I’m guessing, 95% of the time.

The source is not yet included (I want to set it up as a project on CodePlex later on), but for now you can download it from here.

The Death of System.DateTime?

It’s been a few months since I started getting acquainted with the System.DateTimeOffset type and I can honestly say I don’t see any reason to use System.DateTime anymore.

I’ve even gone as far as ask whether anyone knew when I would rather use DateTime over DateTimeOffset. The responses I got were along the lines of ‘backwards compatibility’ or ‘when you need an abstract time’. My recommendation is that if you haven’t yet looked at the type, go do it now and after that, start using it.

So what is this DateTimeOffset? When representing a date/time, especially in an internationally-faced system, you have to include a time-zone. DateTime did a very poor job handling time-zones, like being insensitive to changes. DateTimeOffset is the exact same thing as DateTime, only it takes heed of time-zones. For instance, comparing two DateTimeOffsets with the same UTC time in different time-zones will result in equality.
Moreover, DateTime also had only three modes: Local, UTC and Unspecified, whereas DateTimeOffset has an Offset property, allowing you to create dates in any time-zone you like.

Things to note:

  1. DateTime can be implicitly converted to DateTimeOffset, but not vice-versa. To do that, you would have to use DateTimeOffset’s DateTime property.
    When converting this way, the DateTime’s kind will always be Unspecified.
    DateTimeOffset dateTimeOffset = DateTimeOffset.UtcNow;
    DateTime dateTime = dateTimeOffset.DateTime;
  2. When parsing a DateTimeOffset, note that you can specify AssumeUniversal and AssumeLocal using the DateTimeStyles enum. These come in handy when the string you’re parsing has no time-zone data.
    if (!DateTimeOffset.TryParse(myDateString, CultureInfo.InvariantCulture.DateTimeFormat, DateTimeStyles.AssumeUniversal, out dateTimeOffset))
    dateTimeOffset = default(DateTimeOffset);
  3. It is a best practice to store all of your dates as UTC in the database, regardless of the physical location of your users / servers. When doing this, be sure to manually change your DateTimeOffset objects to UTC using ToUniversalTime and only then use the DateTime property as I have previously noted.
    SaveTimeToDatabase(dateTimeOffset.ToUniversalTime().DateTime);

    Note that you do not need to convert DateTimeOffset objects to any time-zone to do calculations / comparisons. The only time you need to convert them to a time-zone is for displaying them to the user.

  4. If you want to store a user’s time-zone (on a database that doesn’t support date/times with time-zones), it would be best to have a translation table and use the TimeZoneInfo class’s Ids (for instance: TimeZoneInfo.Local.Id). Then you could use TimeZoneInfo.FindSystemTimeZoneById to translate that value to a TimeZoneInfo and use that object’s BaseUtcOffset property to get the difference from UTC.

    This may seem a bit cumbersome, but considering the fact that time-zones change due to daylight savings time, you can’t just store the difference from UTC and would be better suited allowing Windows to take care of these issues for you.

    Here’s a sample of this method of conversion:

    string id = "Israel Standard Time";
    DateTimeOffset utcnow = DateTimeOffset.UtcNow;
    DateTimeOffset now = utcnow.ToOffset(TimeZoneInfo.FindSystemTimeZoneById(id).BaseUtcOffset);

Side note: When storing these in the database, it would be prudent to use SQL Server 2008’s datetimeoffset type, which is the equivalent of DateTimeOffset and takes care of the time-zones in the same manner.

My View of C# 4.0

I’ve known a bit about C# 4.0 for a while now and have had time to think about it. I’ve just re-read the New features in C# 4.0 paper published by Microsoft and would like to offer the following critique of the language’s new features:

Dynamic Lookup

Microsoft PDC 2008 by Manohar Dasari, CC-BYThis feature just makes me cringe, just like anonymous methods made me cringe when they were introduced in C# 2.0. To this day, I hardly use them, as they always feel like a kludge to me (lambda expressions fixed that).
The dynamic keyword is as open to abuse as anything could be. It takes the principles of static typing and throws the baby out with the bathwater.

What is wrong with it

When looked at initially, the dynamic keyword is great, because it simplifies and speeds up what is usually done with Reflection and Primary Interop Assemblies, both in the aspect of development times and the aspect of run time. Unfortunately, too much of a good thing is bad for you. Imagine the following:

public dynamic GetCustomer()
{
// mystery...
}

What do we have here then? I don’t know and neither does IntelliSense? I guess we’ll have to go with trial and error.
I admit this is quite the dramatization, but you get my point: it’s ripe for abusing an otherwise perfectly fine static syntax.

Moreover, the dynamic keyword’s syntax does what no other feature of C# has ever done – it breaks existing syntax. Should I define in C# 3.0 a type named dynamic, the following piece of code will take a whole different meaning in C# 4.0:

public dynamic GetCustomer()
{
dynamic customer = GetCustomerCOMObject();
return customer;
}
How it can be fixed

Using the dynamic keyword is actually a built-in form of Duck Typing. The idea is good and should be introduced into the language, but I’d like to suggest a different way of doing it:

public ICustomer GetCustomer()
{
dynamic ICustomer customer = GetCustomerCOMObject();
return customer;
}

Here, what I get back is a dynamic dispatch object that must adhere to a specific interface. This means that the object graph is checked for conformity against ICustomer the moment it is cast in the dynamic scope (i.e. returned from GetCustomerCOMObject) and is from this moment on a typed object with dynamic dispatch under the hood. From this moment on, we couldn’t care less about whether this object uses dynamic dispatch or not, since we now treat it as a POCO.
This, along with removing of the ability to send dynamic dispatch objects through the call-stack (as parameters and return types), bringing them to the level of anonymous types, will help stop the deterioration of C# into a dynamic language.

Named and Optional Arguments

Untitled by Long Zheng, CC-BY-NC-SAThis is just silly. Really, this looks like some people cried “we don’t like overloads” hard enough and got some VB into the C# the rest of us liked the way it was. If you want to initialize your method with some of the parameters, use a builder pattern with an object initializer instead.

Here, I’ll take the sample at the bottom of page 6 and fix it, C# 3.0 style:

public void M(int x, MBuilder builder);
public void M(int x)
{
this.M(x, new MBuilder());
}
public class MBuilder
{
public MBuilder() { this.Y = 5; this.Z = 7; }
public int Y { get; set; }
public int Z { get; set; }
}
M(1, new MBuilder { Y = 2, Z = 3 }); // ordinary call of M
M(1, new MBuilder { Y = 2 });        // omitting z – equivalent to M(1, 2, 7)
M(1);                                // omitting both y and z – equivalent to M(1, 5, 7)

Yes, I do realize it’s mainly for COM interop, but most people will just get either confused by all the syntax, abuse it or simply forget it ever existed.

What is wrong with it

It exists.

How it can be fixed

Remove it from C#. There – fixed.
If you want optional parameters in your COM interop calls, just implement the correct overloads in the interface you create for use with the dynamic keyword (see my suggestion for dynamic lookups) and the binding will be done at run time by the parameter names.

Variance, Covariance and Contravariance

These three features are long overdue and finally make an appearance in the language. It’s a great feature and I would love to integrate it into my code as soon as I possibly can.
I would love to know if there are plans to not only include reference conversions, but also the implicit and explicit conversion operators as qualifiers for VC&C.

What is wrong with it

Anders Heilberg at book signing by DBegley, CC-BYAlthough Variance is implicit, the others are explicit. Using the Type<in T> / Type<out T> notation is good for being explicit (for instance when you expect your interface to be expanded in the future), but it doesn’t have to be and can become a bit annoying over time.

How it can be fixed

The compiler can very easily infer the fact that your interface is either input-only or output-only and mark it as such for you. Language-wise, the explicit version should be kept available, for when you want to prevent someone (or yourself) from mistakenly adding a new method that breaks the your input / output only design.

Summary

It looks to me like the team behind C# is going in the wrong direction (DLR) instead of the right direction (Spec#), slowly turning C# into a dynamic language. It looks like all of this is done for the sake of easy interop with dynamic languages and COM objects. It looks as though the designers have succumbed to peer pressure. There are so many features missing from C# and the above are nowhere near the top of my list.

I can only hope someone is listening.