Splitting Hairs

I love reading Phil Haack’s weblog, because he writes about common problems and is also generally a nice guy. :)
He usually provokes me into thinking, causing me to respond to his posts quite a bit. In one of his latest posts, Splitting Pascal/Camel Cased Strings, he shows an example for a technique to split pascal cased strings.

Here’s my take on this:

using System;
using System.Text;
using System.Globalization;
using System.Collections;

namespace Example
{
    class Program
    {
        public static class UnicodeUtils
        {
            private static Hashtable unicodeCategories;

            /// <summary>
            /// Gets an array of all characters in current default encoding that are of a specific unicode category.
            /// </summary>
            /// <param name="category">The category to filter according to.</param>
            /// <returns>An array of characters that represents the category.</returns>
            public static char[] GetCategoryChars(UnicodeCategory category)
            {
                if (unicodeCategories == null)
                {
                    try
                    {
                        unicodeCategories = new Hashtable(Enum.GetValues(typeof(UnicodeCategory)).Length);

                        foreach (UnicodeCategory uc in Enum.GetValues(typeof(UnicodeCategory)))
                        {
                            ArrayList list = new ArrayList();

                            for (char c = (char)0; (int)c < (Encoding.Default.IsSingleByte ? byte.MaxValue : ushort.MaxValue); c = ((char)((int)c + 1)))
                            {
                                if (char.GetUnicodeCategory(c) == uc)
                                {
                                    list.Add(c);
                                }
                            }

                            unicodeCategories.Add(uc, ((char[])(list.ToArray(typeof(char)))));
                        }
                    }
                    catch
                    {
                        unicodeCategories = null;
                        throw;
                    }
                }

                return ((char[])(unicodeCategories[category]));
            }
        }

        /// <summary>
        /// Gets a sentence from a pascal/camel cased name.
        /// </summary>
        /// <param name="name">The name to convert.</param>
        /// <returns>The sentence that was once the name.</returns>
        private static string GetSentence(string name)
        {
            int index = 1, charsFound = 0;
            StringBuilder builder = new StringBuilder(name);

            while ((index = name.IndexOfAny(UnicodeUtils.GetCategoryChars(UnicodeCategory.UppercaseLetter), index)) != -1)
                builder.Insert(index++ + charsFound++, ' ');

            return builder.ToString();
        }
    }
}

Advertisements

4 thoughts on “Splitting Hairs

  1. Hey Omer,

    Just so I understand, is there a problem with the Char.IsUpper method? Accoding to the docs, it returns whether or not a unicode character is uppercase. Does it not handle that correctly?

    I’m trying to understand how your solution is differs from mine.

    Thanks,
    Phil

  2. Nah, it’s the same. The only reason I created something else is because of two things:
    1. I use StringBuilder.
    2. It doesn’t split into words and then recombine.

    I originally forgot that string.Split removes the splitted character and thought using that would cause way less code. :P

    Anyway, no oneupmanship here. Just got a bit bored. :)

  3. Yeah, the reason I have the array is I actually wrote that method first, because I needed it. Then for the purpose of that article, I wrote the Join version, which was pure laziness. ;)

    Jon Galloway demonstrates the regex version in my comments, which is probably faster.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s