Yesterday, in front of the staff and students at my college, I presented my final project for my C.S. B.Sc.. Once I complete it and it gets reviewed this September, I will have completed my duties for the degree.
The project is a research I’m doing for nuconomy and I’ll release the code once it’s complete. It uses the .NET Framework 3.5 (with C# 3.0) and SQL Server 2005 Integration Services’ NLP engine.
The following is the abstract and you can also download the short presentation.
The advent of Web 2.0, with its introduction of the concept of user generated data, has posed several problems to those developers aiming to make the navigation in such data as simple as possible.
The problem was commonly met by the coupling of meta-data (tags) to the user-generated content itself, which posed another problem, simply because the vast amounts of data were no match for the small number of website operators to cope with. Thus was introduced the concept of the Folksonomy, or social tagging, which took advantage of the content’s users, asking them to explain what the content was about in an engaging way.
Unfortunately, creating a working folksonomy requires a large and cooperative user base, something that can’t be relied upon.
Automation can be introduced into such communities in order to relieve most of the pressure classic folksonomies place on the user base. By automatically analyzing the user-generated content and meta-content and applying to it a base set of tags, such automation saves users the need to come up with those tags in the first place, leaving only the easier process of correction.
Mechsonomy consists of the following building blocks:
- Plain-Text Tagging – user-generated content is taken as-is and processed by a Term Extraction engine to retrieve ‘relevant’ tags.
- Markup Analysis – the placement of terms retrieved in the marked-up source is examined, altering terms’ significance.
- Web Analysis – the relationship between units of content is examined, altering terms’ significance.
- Machine Learning – users interact and rank tags’ relevance to the content, allowing Mechsonomy to learn the impact the site’s markup has on the content’s relevance.