Hacking Konqueror for Interface and jQuery 1.1
Update : I am proud to annonce that my suggestion on regexp code has been applied to the KDE trunk.
I am using Konqueror on a daily basis and I am feeling a little bit frustrated about new web applications.
There is a lot of these applications that simply don't work at all with Konqueror. Sometimes some services like gmail even refuse to serve javascript until I tell Konqueror to identify itself as Firefox. In this case gmail works almost entirely well (except for the chat).
As a web developer, I am searching a Javascript toolkit that is entirely cross browser compatible without being completly bloated with browser specific hacks.
I think that jQuery is a good candidate because it seems simple and easy to extend. Furthermore, there is interface (a collection of rich interface components) that works quite well with Konqueror so in any browser.
So when version 1.1 of jQuery and Interface were released I feel disapointed about the fact that Konqueror begin to be slow as hell with these scripts.
Profile Konqueror
What's happen between these two release versions that can decrease Konqueror performance so badly ? I launch my favorite Linux profiler to know more about it
Sysprof is clear about the performance bottleneck. More than 80% of the time
is wasted in a function called KJS::RegExp::match that use a PCRE (Perl Compatible Regular Expressions) for validate UTF8 strings. I searched in the jQuery
code to find where Regular Expressions are used but I learn that jQuery don't use the "match" function but "test" instead.
So if it's not in jQuery, where is it ? I thought that is certainly in the
Dean Edwards Packer used to compress javascript files
which has a "Fast Decode" option. Maybe jQuery guys forget to check this option ?
I searched a little bit more on PCRE options and I found this in the PCRE man pages :
The following comments apply when PCRE is running in UTF-8 mode: 1. When you set the PCRE_UTF8 flag, the strings passed as patterns and subjects are checked for validity on entry to the relevant functions. If an invalid UTF-8 string is passed, an error return is given. In some situations, you may already know that your strings are valid, and therefore want to skip these checks in order to improve performance. If you set the PCRE_NO_UTF8_CHECK flag at compile time or at run time, PCRE assumes that the pattern or subject it is given (respectively) contains only valid UTF-8 codes. In this case, it does not diagnose an invalid UTF-8 string. If you pass an invalid UTF-8 string to PCRE when PCRE_NO_UTF8_CHECK is set, the results are undefined. Your program may crash.
Quick hack on KJS
There is a way do disable this UTF8 validity check, so I decide to try to hack Konqueror javascript implementation. I download the kdelibs sources and find that the code of KJS::RegExp::match function is located in kdelibs/kjs/regexp.cpp. The prototype of the function is the following:
UString RegExp::match(const UString &s, int i, int *pos, int **ovector)
There is 2 use of the pcre_exec function in RegExp::match. I simply added PCRE_NO_UTF8_CHECK to the sixth parameter to be sure that validation will not be done. Basicly it looks like
this :
if (pcre_exec(pcregex, NULL, buffer, bufferSize, startPos,
m_notEmpty ? (PCRE_NOTEMPTY | PCRE_ANCHORED | PCRE_NO_UTF8_CHECK) : 0 | PCRE_NO_UTF8_CHECK , // see man pcretest
ovector ? *ovector : 0L, ovecsize) == PCRE_ERROR_NOMATCH)
{
// Failed to match.
if ((flgs & Global) && m_notEmpty && ovector)
{
// We set m_notEmpty ourselves, to look for a non-empty match
// (see man pcretest or pcretest.c for details).
// So we don't stop here, we want to try again at i+1.
#ifdef KJS_VERBOSE
fprintf(stderr, "No match after m_notEmpty. +1 and keep going.\n");
#endif
m_notEmpty = 0;
if (pcre_exec(pcregex, NULL, buffer, bufferSize, nextPos, 0 | PCRE_NO_UTF8_CHECK,
ovector ? *ovector : 0L, ovecsize) == PCRE_ERROR_NOMATCH)
return UString::null;
}
else // done
return UString::null;
}
I recompiled the library, and looked at the Interface website. Now Konqueror runs quickly :). Despite this, I feel that is a dirty hack. I now looked at Interface website with Safari to know if it suffer of the same issue but it's not. The source code of RegExp::match in Webkit code is quite
different and make no usage of PCRE_NO_UTF8_CHECK so the problem is resolved elsewhere and certainly more
elegantly.
No future ?
Konqueror is a browser with excellent support for CSS3, but it lakes some important features like CSS opacity and Rich Text Editing. I hope that with KDE4, KHTML and Webkit developers will work together with a single code base to make a better web browser for Linux, OSX and windows users.