Splitting Hairs

I love reading Phil Haack’s weblog, because he writes about common problems and is also generally a nice guy. :)
He usually provokes me into thinking, causing me to respond to his posts quite a bit. In one of his latest posts, Splitting Pascal/Camel Cased Strings, he shows an example for a technique to split pascal cased strings.

Here’s my take on this:

using System;
using System.Text;
using System.Globalization;
using System.Collections;

namespace Example
    class Program
        public static class UnicodeUtils
            private static Hashtable unicodeCategories;

            /// <summary>
            /// Gets an array of all characters in current default encoding that are of a specific unicode category.
            /// </summary>
            /// <param name="category">The category to filter according to.</param>
            /// <returns>An array of characters that represents the category.</returns>
            public static char[] GetCategoryChars(UnicodeCategory category)
                if (unicodeCategories == null)
                        unicodeCategories = new Hashtable(Enum.GetValues(typeof(UnicodeCategory)).Length);

                        foreach (UnicodeCategory uc in Enum.GetValues(typeof(UnicodeCategory)))
                            ArrayList list = new ArrayList();

                            for (char c = (char)0; (int)c < (Encoding.Default.IsSingleByte ? byte.MaxValue : ushort.MaxValue); c = ((char)((int)c + 1)))
                                if (char.GetUnicodeCategory(c) == uc)

                            unicodeCategories.Add(uc, ((char[])(list.ToArray(typeof(char)))));
                        unicodeCategories = null;

                return ((char[])(unicodeCategories[category]));

        /// <summary>
        /// Gets a sentence from a pascal/camel cased name.
        /// </summary>
        /// <param name="name">The name to convert.</param>
        /// <returns>The sentence that was once the name.</returns>
        private static string GetSentence(string name)
            int index = 1, charsFound = 0;
            StringBuilder builder = new StringBuilder(name);

            while ((index = name.IndexOfAny(UnicodeUtils.GetCategoryChars(UnicodeCategory.UppercaseLetter), index)) != -1)
                builder.Insert(index++ + charsFound++, ' ');

            return builder.ToString();


4 thoughts on “Splitting Hairs

  1. Hey Omer,

    Just so I understand, is there a problem with the Char.IsUpper method? Accoding to the docs, it returns whether or not a unicode character is uppercase. Does it not handle that correctly?

    I’m trying to understand how your solution is differs from mine.


  2. Nah, it’s the same. The only reason I created something else is because of two things:
    1. I use StringBuilder.
    2. It doesn’t split into words and then recombine.

    I originally forgot that string.Split removes the splitted character and thought using that would cause way less code. :P

    Anyway, no oneupmanship here. Just got a bit bored. :)

  3. Yeah, the reason I have the array is I actually wrote that method first, because I needed it. Then for the purpose of that article, I wrote the Join version, which was pure laziness. ;)

    Jon Galloway demonstrates the regex version in my comments, which is probably faster.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s