Commit graph

103 commits

Author SHA1 Message Date
Dmitry Olshansky
0d29cdc3d6 Introduce UTF Matchers into std.uni.
It's a step zero to get decode-less std.regex.
UTF matchers are efficient functors around a set of
specific tries. Enables processing Unicode characters
without decoding at speeds on par with decoding itself.

Along the way make staticIota at 'package' protected and reuse it.

Fix a shameful typo in setSearcher.
2014-04-13 14:31:05 +04:00
Dmitry Olshansky
e33cfdb04f Use plain uint array for InversionList
Storage for static data was long since compressed so it no longer plays a role at compile-time.
Memory requirements at run-time are modest anyway.
2014-04-03 01:49:19 +04:00
Jakob Ovrum
bc161b24e6 Enable DDoc for std.uni.byCodePoint
See also #1985
2014-03-23 09:34:09 +09:00
Andrei Alexandrescu
e289a7cdd7 Revert "Merge pull request #1685 from blackwhale/utf8-matcher"
This reverts commit 216ca01ca8, reversing
changes made to d56c1dbc08.
2014-03-15 17:11:17 -07:00
Andrei Alexandrescu
216ca01ca8 Merge pull request #1685 from blackwhale/utf8-matcher
Introduce UTF matchers into std.uni.
2014-03-15 15:32:38 -07:00
Dmitry Olshansky
1b03e409c0 another DDoc tweak 2014-03-16 01:35:00 +04:00
Dmitry Olshansky
0d60e46fa9 address review issues
Spelling, style etc.
2014-03-15 22:26:54 +04:00
k-hara
6f80f6aa03 Add more local imports 2014-03-13 13:13:04 +09:00
Dmitry Olshansky
fa5aa82318 more checking of UTF errors
Overlong sequences, wrong continuation  for UTF-8.
Lone high surrogate for UTf-16/.
2014-03-11 00:40:37 +04:00
Dmitry Olshansky
a9b5c139fb test both narrow strings and range of char for matcher trait 2014-03-10 15:15:39 +04:00
Dmitry Olshansky
6635c8f036 hide subMatcher until its API and utility is proven 2014-03-10 14:47:06 +04:00
Walter Bright
be12a3b1f4 enable Ddoc for byGrapheme 2014-03-08 13:07:19 -08:00
Dmitry Olshansky
528099a600 fold in review comments 2014-03-08 13:55:33 +04:00
Dmitry Olshansky
1e771c06c9 @safety bags for utfMatchers
Granularity is horribly high. Auto-inference for templates has the
downside that it, leaves no explanations or reasons for failure.
2014-03-08 13:55:33 +04:00
Dmitry Olshansky
c264bd6a51 purify std.uni constructs, blocked by std.algortihm.sort 2014-03-08 13:55:33 +04:00
Dmitry Olshansky
cbd9bd3b79 const/pure annotations 2014-03-08 13:55:32 +04:00
Dmitry Olshansky
d0e408d5f4 workaround stable sort (std.move) not CTFE-able 2014-03-08 13:55:32 +04:00
Dmitry Olshansky
1c86ecf0c4 Introduce UTF Matchers into std.uni.
It's a step zero to get decode-less std.regex.
UTF matchers are efficient functors around a set of
specific tries. Enables processing Unicode characters
without decoding at speeds on par with decoding itself.

Along the way make staticIota at 'package' protected and reuse it.

Fix a shameful typo in setSearcher.
2014-03-08 13:55:32 +04:00
Dmitry Olshansky
330d9528a6 Speed up trie construction by factor of ~3.
Take advantage of word-at-once checking for slices of bit-packed
arrays in the trie.
2014-02-27 17:07:43 +04:00
Peter Alexander
53d4e03255 Reserve appender length for the common case 2014-02-22 21:03:42 +00:00
Peter Alexander
372bf6352a Fix Issue 11017 - Improve performance of toLower etc.
Changed to use `appender` instead of repeated `~=`.

https://d.puremagic.com/issues/show_bug.cgi?id=11017
2014-02-22 20:33:25 +00:00
monarch dodra
b82adb42db Fix unicode character ambiguity
'LATIN SMALL LETTER A' => 'CYRILLIC SMALL LETTER A'
2014-02-18 00:47:28 +11:00
k-hara
b391b2ec9f Convert to new alias syntax 2014-02-11 15:27:05 +09:00
Daniel Murphy
a656f26e9e Remove use of automatic adjacent string literal concatenation from phobos 2014-01-20 03:42:21 +11:00
Brad Anderson
e8f706f1b7 Fix two WEB DDoc macros 2014-01-10 23:27:59 -07:00
Dmitry Olshansky
ac81385b2b adjust documentation to reflect new capabilities 2014-01-06 15:06:30 +04:00
Dmitry Olshansky
4e5b777432 fix issue 11808 2014-01-06 15:00:03 +04:00
Dmitry Olshansky
35f4e3e08f [trivial] remvoe trailing whitespace 2014-01-06 02:21:22 +04:00
k-hara
3e791f6bc6 Add import declarations for issue 313 & 314 2013-12-24 09:39:21 +09:00
Martin Nowak
c445f1c288 Merge pull request #1766 from jmdavis/deprecations
Move various deprecations along.

Conflicts:
    std/algorithm.d
    std/zip.d
2013-12-19 23:33:47 +01:00
Ilya
f1775bdc27 Fix issues #11771
improve std.uni intersect  `ref intersect()(dchar ch)`
Fix issues #11771
2013-12-19 15:15:55 +03:00
Daniel Murphy
654cb520ab Merge pull request #1785 from 9il/patch-2
std.uni: allow using "in"
2013-12-19 03:04:07 -08:00
Ilya
db17a50241 Update uni.d 2013-12-19 12:54:29 +03:00
Ilya
b8acdb3375 add unittest for bool opBinaryRight(string op: "in", U)(U ch) const 2013-12-19 12:34:12 +03:00
Ilya
e93d09e83a Update uni.d 2013-12-19 12:10:44 +03:00
Ilya
d8f9b59c15 allow using "in"
opIndex is const
2013-12-19 12:06:04 +03:00
jmdavis
efd6ea0cbf Move various deprecations along. 2013-12-11 23:44:12 -08:00
monarch dodra
8deaf7f5a7 Merge pull request #1699 from blackwhale/tweak-uni
Tweak unicode Trie generation speed
2013-12-10 12:05:16 -08:00
k-hara
64e0573940 fix property enforcement 2013-12-07 22:26:10 +09:00
Dmitry Olshansky
cf7c701c08 other tweaks for std.regex/ctRegex 2013-12-04 23:51:12 +04:00
Dmitry Olshansky
378224db3a tweak algorithm to actually fast-track zero-pages
Also significantly speed up replicateBits for single bit pattern case.
2013-12-04 23:50:33 +04:00
Dmitry Olshansky
f5b012eab7 refactor TrieBuilder 2013-12-04 23:49:53 +04:00
Dmitry Olshansky
42eb21616f Make MultiArray CTFE-able, workaround CTFE bug. 2013-12-04 23:48:05 +04:00
Jakob Ovrum
31a4357955 Add std.uni.byGrapheme and std.uni.byCodePoint 2013-12-03 22:06:30 +09:00
Adam D. Ruppe
5861654c7a import std.typecons outside unittests since it is needed 2013-11-27 09:53:33 -05:00
Dmitry Olshansky
87bff6186c split off rarely used unicode tables
This avoids parsing large files reducing parse time by 30 ms for me
(it took ~70ms to parse tables, now ~40ms).
And move hangul sets to Trie tables as well.
Also saves around 30Kb on "hello world" app.
2013-10-17 18:49:11 +04:00
Martin Nowak
5cecc7622d smaller executables
- Move all tables into functions or structs so that
  dmd's multilib will put them into separate archive
  objects. This allows the linker to only pick the
  tables that are actually used.
2013-10-16 09:03:53 +02:00
Martin Nowak
2ffda8c61c missing imports when using codepointTrie 2013-10-16 01:39:40 +02:00
Martin Nowak
4027518b74 use functions where applicable to reduce compile time/template bloat
- The semantic analysis and object generation only
  needs to be done once when building phobos.
  Using those overloads becomes a simple link dependency.

- add overloads for most common cases
2013-10-14 01:04:02 +02:00
Martin Nowak
26edfc624f deduplicate CodepointTries and leave the data in libphobos2.a
- Store the static immutable CodepointTries in separate functions.
2013-10-14 00:38:30 +02:00