The Syntax Highlighter - Part 1

In my last post (The Cookie Problem), I stared using code blocks and I decided that I wanted to use a syntax highlighter on my blogs. I googled a few Javascript plug-ins and found no shortage of options out there, but I couldn't find something (quickly) that would highlight javascript and ColdFusion.

I have had a number of projects where I have tried to create a syntax highlighter, but I could never figure out how to successfully apply the correct styles to several similar definitions. Thankfully, now a days, we many more open source projects to get ideas from and to point on in the correct direction. And seeing as how this is a blog about my learning to code things (often from scratch) I have decided to build my own syntax highlighter.

So where do I start?

the easiest thing is to start with a code sample to apply my highlighting too.

// test syntax highlighting...
private void function moo() {} // finish this later

var x = y;  
if ((x == y) && (y != cow)) {  
    y++;
}
// that was fun

the next step is to isolate the terms I want to isolate. The easiest way I know how to do that is using regular expressions. Thankfully we have an awesome tool called RegExr to help with that part. After some tinkering, I have a few regular expressions for my definitions which for convenience I am storing in a JSON string.

var _patterns = [{  
    class: 'const',
    pattern: "(?:(var|new|function|private|if|else)\\s)"
}, {
    class: 'operator',
    pattern: "((?:[\\+\\-\\=\\!\\|\\(\\)\\{\\}]|\&){1,}|(?:[^\\*\\/]\\/(?!\\/|\\*)))"
}, {
    class: 'comment',
    pattern: "\\/\\/[^\\(\\n|\\r)]+|\\/\\*|\\*\\/"
}];

Now how do I apply use these to my code base?

My first Inclination is to apply each definition to my code structure. However as soon as I apply my operators definition to the code base, all tell breaks loose because now my previously highlighted code now being stripped by the operators highlighting, and the proceeding definitions are unable to successfully highlight.

So thats not going to work at all.

Its now time to scour the internet for some inspiration. Blogs, GitHub, and SourceForge eventually point me to an interesting idea. Instead of capturing and processing each definition one at a time, Let us capture all our definitions at once, and iterate over those captured groups after the fact to apply our definitions. To do that, we join all our regular expressions together into one big group.

var _computedRegExString = "";  
for (var i in _patterns) {  
    _computedRegExString += _patterns[i].pattern + "|";
}
_computedRegExString = "(" + _computedRegExString.replace(/\|$/, "") + ")";  
var _computedRegEx = new RegExp(_computedRegExString, "gi");

var _codeOutput = _codeInput.replace(_computedRegEx, "$1");  

So where do we go from here? That'll be part two of this article.