Template talk:Strlen quick

(Redirected from Template talk:Strlen quick/doc)
Latest comment: 11 years ago by DePiep in topic Zero length string returns length=1

This is the discussion/talk-page for: Template:Strlen_quick.

Created

The fast string-length counter, Template:Strlen_quick, was created by long-term user Wikid77 on 30 January 2011, to provide a very fast string-length template, optimized for improved performance with actual Wikipedia data. It is also optimized to use limited wiki-markup resources in the NewPP MediaWiki preprocessor, by using expansion depth of only 5 levels, rather than 9-to-14 levels used by other string-length templates. -Wikid77 10:09, 30 January 2011 (UTC)

Optimizing for actual string lengths

30-Jan-2011: The Template:Strlen_quick was created, as a faster alternative to {str_len}, by optimizing for real string data as used in articles. Using the actual string searches, from existing Wikipedia articles, it is possible to determine the most-likely string lengths, such as 17/18 characters for titles. Then, optimize to match those lengths faster: for example, suppose the top 1,000 articles all used an infobox code of 9 letters, in that case, checking for length 9, first, could avoid checking other lengths. In the case of 353,000 articles using {{Italic_title}}, the string lengths range from 2-99 letters, with the most-common lengths between 16-19 long, and 88% of all titles < 30 long. The distribution of lengths of titles has been as follows:

  • 84% > 10, 12% < 10, 51% in 10-19, 25% in 20-29, 7% in 30-39, 1.7% in 40-49, 0.6% >50.

For lengths 0-9, the increase is dramatic: almost no titles are 1 or 2 characters, a few are 3, some are 4, then more have lengths 5, 6, 7, 8, with 9 as 19x times more common than length 3. In trying to match title-length quickly, then check for the most-common first, as length 9-to-1 in reverse order.
Among lengths 10-19, the most common are at 17/18, then fewer when farther away, with 10 being the least-frequent length among those. Above 20, the lengths decrease in frequency, 21-to-29, as the reverse of 9-1, so checking 21, first, is 3x times more likely to match than 29. Among 30-39, the titles are quite rare, with 31 being as rare as length 5, and 39 being 3x times more rare, as occurring only 43-per-10,000 titles. By optimizing for the actual lengths of titles, those lengths can be matched perhaps twice as quickly. A pure binary search would give unfair advantage to rare lengths, so the string-search should be prioritized in favor of the more common lengths.

The markup logic, below, uses prioritized steps (the actual markup handles length over 70):

LOGIC to match 1-to-60 lengths in order of most common real data:{{#ifeq: x{{{1}}}|x{{padleft:{{{1}}}|20}}| {{#ifeq: x{{{1}}}|x{{padleft:{{{1}}}|30}}  | {{#ifeq: x{{{1}}}|x{{padleft:{{{1}}}|40}}    | {{#switch: x{{{1}}}      | {{padleft:|41|x{{{1}}}}} = 40      | {{padleft:|42|x{{{1}}}}} = 41      | {{padleft:|43|x{{{1}}}}} = 42      | {{padleft:|44|x{{{1}}}}} = 43      | {{padleft:|45|x{{{1}}}}} = 44      | {{padleft:|46|x{{{1}}}}} = 45      | {{padleft:|47|x{{{1}}}}} = 46      | {{padleft:|48|x{{{1}}}}} = 47      | {{padleft:|49|x{{{1}}}}} = 48      | {{padleft:|50|x{{{1}}}}} = 49      | {{padleft:|51|x{{{1}}}}} = 50      | {{padleft:|52|x{{{1}}}}} = 51      | {{padleft:|53|x{{{1}}}}} = 52      | {{padleft:|54|x{{{1}}}}} = 53      | {{padleft:|55|x{{{1}}}}} = 54      | {{padleft:|56|x{{{1}}}}} = 55      | {{padleft:|57|x{{{1}}}}} = 56      | {{padleft:|58|x{{{1}}}}} = 57      | {{padleft:|59|x{{{1}}}}} = 58      | {{padleft:|60|x{{{1}}}}} = 59      | #default= 60 <!--when >= 60 and none of the above-->      }}<!--endsw 40's++ -->    | {{#switch: x{{{1}}}      | {{padleft:|31|x{{{1}}}}} = 30      | {{padleft:|32|x{{{1}}}}} = 31      | {{padleft:|33|x{{{1}}}}} = 32      | {{padleft:|34|x{{{1}}}}} = 33      | {{padleft:|35|x{{{1}}}}} = 34      | {{padleft:|36|x{{{1}}}}} = 35      | {{padleft:|37|x{{{1}}}}} = 36      | {{padleft:|38|x{{{1}}}}} = 37      | {{padleft:|39|x{{{1}}}}} = 38      | #default= 39      }}<!--endsw 30's-->    }}<!--endifeq 40-->  | {{#switch: x{{{1}}}    | {{padleft:|21|x{{{1}}}}} = 20    | {{padleft:|22|x{{{1}}}}} = 21    | {{padleft:|23|x{{{1}}}}} = 22    | {{padleft:|24|x{{{1}}}}} = 23    | {{padleft:|25|x{{{1}}}}} = 24    | {{padleft:|26|x{{{1}}}}} = 25    | {{padleft:|27|x{{{1}}}}} = 26    | {{padleft:|28|x{{{1}}}}} = 27    | {{padleft:|29|x{{{1}}}}} = 28    | #default= 29    }}<!--endsw 20's-->  }}<!--endifeq 30-->| {{#ifeq: x{{{1}}}|x{{padleft:{{{1}}}|10}}  | {{#switch: x{{{1}}}    | {{padleft:|18|x{{{1}}}}} = 17    | {{padleft:|19|x{{{1}}}}} = 18    | {{padleft:|17|x{{{1}}}}} = 16    | {{padleft:|20|x{{{1}}}}} = 19    | {{padleft:|16|x{{{1}}}}} = 15    | {{padleft:|15|x{{{1}}}}} = 14    | {{padleft:|14|x{{{1}}}}} = 13    | {{padleft:|13|x{{{1}}}}} = 12    | {{padleft:|12|x{{{1}}}}} = 11    | #default= 10 <!--when >= 10 and none of above-->     }}<!--endsw 10's++ -->  | {{#switch: x{{{1}}}    | {{padleft:|10|x{{{1}}}}} = 9    | {{padleft:|9|x{{{1}}}}} = 8    | {{padleft:|8|x{{{1}}}}} = 7    | {{padleft:|7|x{{{1}}}}} = 6    | {{padleft:|6|x{{{1}}}}} = 5    | {{padleft:|5|x{{{1}}}}} = 4    | {{padleft:|4|x{{{1}}}}} = 3    | {{padleft:|3|x{{{1}}}}} = 2    | #default= 1    }}<!--endsw 1's-->  }}<!--endifeq 10-->}}<!--endifeq 20-->

Tests of the above code show that it, in fact, processes actual title lengths about 2x times (twice) as fast as the binary-search markup logic which has been used in template {{str_len}}. -Wikid77 10:09, 30 January 2011, revised01:21, 22 February 2011 (UTC)

Zero length string returns length=1

testing:

I think the last three are in error. -DePiep (talk) 07:54, 15 June 2012 (UTC)