Update DustMite

This updates the distributed version to the latest version including
the improvements from
https://dlang.org/blog/2020/04/13/dustmite-the-general-purpose-data-reduction-tool/

* d785720 Add "indent" split mode
* e77126f splitter: Speed up optimization
* e0138ca dustmite: Preserve no-remove flags when applying "Concat" reductions
* d501228 Don't move code marked as no-remove
* 4d361cb dustmite: Use templates instead of delegates for dump
* 772a8fb dustmite: Use lockingBinaryWriter
* 621991b dustmite: Bulk adjacent writes
* dbd493a splitter: Don't compile in a debug-only field
* d19b15e Add coverage analysis and upload to Coveralls
* 630cf9c Track each entity's parent entity
* f56f6a4 splitter: Make Entity members used only by splitter private
* be2c452 splitter: Fix Dscanner warning
* 70d5503 splitter: Delete unused label
* 0e788b5 Revert "Track each entity's parent entity"
* 1f1f732 splitter: Don't dereference enum AAs at runtime
* 3fea926 dustmite: Mark final classes as such
* 02f8b2e splitter: Mark Entity class final
* 2ca0522 Overhaul tree representation and edit algorithms
* d9da7cf dustmite: Fail no-op concat reductions
* d439fed dustmite: Remove the concatPerformed bodge
* 0ec8fc5 dustmite: Log inapplicable reductions
* 2e19085 dustmite: Fix a TODO
* ad4124f dustmite: Start a new iteration after a successful Concat reduction
* 6d0cd9f dustmite: Remove `reduction == initialReduction` lookahead hack
* f197986 dustmite: Get rid of the "root" global
* 690ab07 dustmite: Update the lookahead iterator's root according to predictions
* 0dc5e04 dustmite: Remove special handling for the first lookahead step
* 8b5f639 dustmite: Handle inapplicable reductions in lookahead
* fd45d61 dustmite: Fix placement of --version in help text
* bf407bc dustmite: Make descendant recounting incremental
* 6878138 dustmite: Clean up test directories before writing to them
* a269d25 dustmite: Distinguish zero and uninitialized digests in lookahead queue
* 9eb4126 dustmite: Move lookahead saving and process creation into worker thread
* 5034c01 polyhash: Initial commit
* 3d28c6e polyhash: Add some comments
* 751ea2b polyhash: Optimize calculating powers of p
* f675253 polyhash: Use decreasing powers of p, instead of increasing
* b1b76cd polyhash: Convert to an output range interface
* 62d145b License under Boost Software License 1.0
* 19f0200 polyhash: Add mod-q (with non-power-of-two q) support
* 5b80b03 Unify incremental and full updates of computed Entity fields
* f85acdf Switch to incremental polynomial hashing
* 401d408 Work around DMD bug 20677
* 575406e Speed up applyReduction.edit
* 23a67fb Re-optimize incrementally for the Concat reduction
* 80b7ba4 dustmite: Speed up removing dependencies under removed nodes
* ec81973 Speed up address comparison
* 26f2039 dustmite: Tweak tree initialization order
* d5523e1 splitter: Clear hash for killed entities
* 048a0fd Keep children of removed nodes in the tree
* 48ed0a5 dustmite: Make findEntity traverse dead nodes
* 196f5f7 dustmite: Improve dump formatting of redirects and dead entities
* 404c8f9 dustmite: With both --trace and --dump, save dumps during trace
* 72cd08c dustmite: Don't attempt to concatenate dead files
* 53d3bf6 dustmite: Recalculate dead entities recursively too
* c3d1215 dustmite: Traverse dead entities when editing them, too
* 226a651 dustmite: Do not copy dead entities for editing
* b8f2844 Maintain cached cumulative dependents per-node
* 9f5a4f1 dustmite: Create less garbage during I/O
* df752dc Maintain cached full content of each node
* 4b165e6 Revert "Maintain cached full content of each node"
* 965fbc3 dustmite: Speed up strategy iteration over dead nodes
* 15d0a8f dustmite: Remove use of lazy arguments in address iteration
* b5c1ec0 splitter: Fix lexing D raw string literals (r"...")
* 2630496 dustmite: Fix "reduced to empty set" message
* 9505bf6 dustmite: Fix recursion for dead nodes in recalculate
* 6764e8d dustmite: Add --in-place
* 3a76633 dustmite: Remove Reduction.target
* c04c843 dustmite: Replace Reduction.address with an Address*
* d2cfa23 dustmite: Allow the test function to take an array of reductions
* 5e510c8 dustmite: Introduce Reduction.Type.Swap
* d4303ca dustmite: Add fuzzing mode
* 5fffd18 dustmite: Split up the DustMiteNoRemove string literals
* 714ea99 dustmite: Allow --reduce-only and --no-remove rules to stack
* ca18a07 dustmite: Add --remove switch
* de92616 dustmite: Reorder --help text
* 157b305 dustmite: Remove trailing punctuation from --help text
* 6746464 Add --white-out option
* 6705a94 dustmite: Update tagline
* e76496f splitter: Make EntityRef.address const
* 4cfed4c dustmite: Add debug=DETERMINISTIC_LOOKAHEAD
* 214d000 dustmite: Add reduction application cache
* e859e86 dustmite: Grow reduction application cache dynamically
* fd3ad29 dustmite: Speed up dependency recalculation
* a10ef7f dustmite: Fix crash with --whiteout + --trace
* 256a651 dustmite: Speed up dumping
* df42f62 dustmite: Add --max-steps
* 886c6f2 dustmite: Make measure's delegate scoped
* 732d0f1 dustmite: Add more performance timers
* 05acf86 dustmite: Implement non-linear lookahead prediction
* 0a7a937 dustmite: Improve prediction formula
* 990b3bc splitter: Improve parsing of successive keyword-prefixed blocks
* cb0855d dustmite: Make detection of suspicious files non-fatal
This commit is contained in:
Vladimir Panteleev 2020-11-21 21:18:44 +00:00 committed by Mathias LANG
parent 42ab606a3b
commit 39c73d82c2
6 changed files with 1863 additions and 566 deletions

View file

@ -1,6 +1,10 @@
This is DustMite, a D source code minimization tool.
DustMite is a general-purpose data reduction tool.
Its main use is to reduce D programs into minimal examples
exhibiting some behavior (usually a compiler bug).
For documentation, see the GitHub wiki:
https://github.com/CyberShadow/DustMite/wiki
DustMite was created by Vladimir Panteleev
and is released into the Public Domain.
DustMite was created by Vladimir Panteleev,
and is distributed under the Boost Software Licence, version 1.0.

File diff suppressed because it is too large Load diff

413
DustMite/polyhash.d Normal file
View file

@ -0,0 +1,413 @@
/// Polynomial hash for partial rehashing.
/// http://stackoverflow.com/a/42112687/21501
/// Written by Vladimir Panteleev <vladimir@thecybershadow.net>
/// License: Boost Software License, Version 1.0
module polyhash;
import std.range.primitives;
import std.traits;
struct PolynomialHash(Value)
{
Value value; /// The hash value of the hashed string
size_t length; /// The length of the hashed string
// Cycle length == 2^^30 for uint, > 2^^46 for ulong
// TODO: find primitive root modulo 2^^32, if one exists
enum Value p = Value(269);
private
{
/// Precalculated table for (p^^(2^^i))
alias Power2s = Value[size_t.sizeof * 8];
static Power2s genTable()
{
Value[size_t.sizeof * 8] result;
Value v = p;
foreach (i; 0 .. result.length)
{
result[i] = v;
v *= v;
}
return result;
}
static if (is(typeof({ enum table = genTable(); })))
static immutable Power2s power2s = genTable(); // Compute at compile-time
else
{
static immutable Power2s power2s;
// Compute at run-time (initialization)
shared static this() { power2s = genTable(); }
}
}
/// Return p^^power (mod q).
static Value pPower(size_t power)
{
Value v = 1;
foreach (b; 0 .. power2s.length)
if ((size_t(1) << b) & power)
v *= power2s[b];
return v;
}
void put(char c)
{
value *= p;
value += Value(c);
length++;
}
void put(in char[] s)
{
foreach (c; s)
{
value *= p;
value += Value(c);
}
length += s.length;
}
void put(ref typeof(this) hash)
{
value *= pPower(hash.length);
value += hash.value;
length += hash.length;
}
static typeof(this) hash(T)(T value)
if (is(typeof({ typeof(this) result; .put(result, value); })))
{
typeof(this) result;
.put(result, value);
return result;
}
unittest
{
assert(hash("").value == 0);
assert(hash([hash(""), hash("")]).value == 0);
// "a" + "" + "b" == "ab"
assert(hash([hash("a"), hash(""), hash("b")]) == hash("ab"));
// "a" + "bc" == "ab" + "c"
assert(hash([hash("a"), hash("bc")]) == hash([hash("ab"), hash("c")]));
// "a" != "b"
assert(hash("a") != hash("b"));
// "ab" != "ba"
assert(hash("ab") != hash("ba"));
assert(hash([hash("a"), hash("b")]) != hash([hash("b"), hash("a")]));
// Test overflow
assert(hash([
hash("Mary"),
hash(" "),
hash("had"),
hash(" "),
hash("a"),
hash(" "),
hash("little"),
hash(" "),
hash("lamb"),
hash("")
]) == hash("Mary had a little lamb"));
}
}
unittest
{
PolynomialHash!uint uintTest;
PolynomialHash!ulong ulongTest;
}
unittest
{
PolynomialHash!(ModQ!(uint, 4294967291)) modQtest;
}
// ****************************************************************************
/// Represents a value and performs calculations in modulo q.
struct ModQ(T, T q)
if (isUnsigned!T)
{
T value;
this(T v)
{
debug assert(v < q);
this.value = v;
}
bool opEquals(T operand) const
{
debug assert(operand < q);
return value == operand;
}
void opOpAssign(string op : "+")(typeof(this) operand)
{
T result = this.value;
result += operand.value;
if (result >= q || result < this.value || result < operand.value)
result -= q;
this.value = result;
}
void opOpAssign(string op : "*")(typeof(this) operand)
{
this.value = longMul(this.value, operand.value).longDiv(q).remainder;
}
T opCast(Q)() const if (is(Q == T)) { return value; }
// Ensure this type is supported whet it is instantiated,
// instead of when the operator overloads are
private static void check() { typeof(this) m; m *= typeof(this)(0); }
}
unittest
{
alias M = ModQ!(ushort, 100);
M value;
value += M(56);
value += M(78);
assert(value == 34);
}
unittest
{
alias M = ModQ!(ushort, 100);
M value;
value += M(12);
value *= M(12);
assert(value == 44);
}
// ****************************************************************************
private:
import std.traits;
/// Get the smallest built-in unsigned integer type
/// that can store this many bits of data.
template TypeForBits(uint bits)
{
static if (bits <= 8)
alias TypeForBits = ubyte;
else
static if (bits <= 16)
alias TypeForBits = ushort;
else
static if (bits <= 32)
alias TypeForBits = uint;
else
static if (bits <= 64)
alias TypeForBits = ulong;
else
static assert(false, "No integer type big enough for " ~ bits.stringof ~ " bits");
}
struct LongInt(uint bits, bool signed)
{
TypeForBits!bits low;
static if (signed)
Signed!(TypeForBits!bits) high;
else
TypeForBits!bits high;
}
alias LongInt(T) = LongInt!(T.sizeof * 8, isSigned!T);
alias Cent = LongInt!long;
alias UCent = LongInt!ulong;
version (X86)
version = Intel;
else
version (X86_64)
version = Intel;
// Hack to work around DMD bug https://issues.dlang.org/show_bug.cgi?id=20677
version (Intel)
public enum modQSupported = size_t.sizeof == 8;
else
public enum modQSupported = false;
version (Intel)
{
version (DigitalMars)
enum x86RegSizePrefix(T) =
T.sizeof == 2 ? "" :
T.sizeof == 4 ? "E" :
T.sizeof == 8 ? "R" :
"?"; // force syntax error
else
{
enum x86RegSizePrefix(T) =
T.sizeof == 2 ? "" :
T.sizeof == 4 ? "e" :
T.sizeof == 8 ? "r" :
"?"; // force syntax error
enum x86SizeOpSuffix(T) =
T.sizeof == 2 ? "w" :
T.sizeof == 4 ? "l" :
T.sizeof == 8 ? "q" :
"?"; // force syntax error
}
enum x86SignedOpPrefix(T) = isSigned!T ? "i" : "";
}
LongInt!T longMul(T)(T a, T b)
if (is(T : long) && T.sizeof >= 2)
{
version (Intel)
{
version (LDC)
{
import ldc.llvmasm;
auto t = __asmtuple!(T, T)(
x86SignedOpPrefix!T~`mul`~x86SizeOpSuffix!T~` $3`,
// Technically, the last one should be "rm", but that generates suboptimal code in many cases
`={`~x86RegSizePrefix!T~`ax},={`~x86RegSizePrefix!T~`dx},{`~x86RegSizePrefix!T~`ax},r`,
a, b
);
return typeof(return)(t.v[0], t.v[1]);
}
else
version (GNU)
{
T low = void, high = void;
mixin(`
asm
{
"`~x86SignedOpPrefix!T~`mul`~x86SizeOpSuffix!T~` %3"
: "=a" low, "=d" high
: "a" a, "rm" b;
}
`);
return typeof(return)(low, high);
}
else
{
T low = void, high = void;
mixin(`
asm
{
mov `~x86RegSizePrefix!T~`AX, a;
`~x86SignedOpPrefix!T~`mul b;
mov low, `~x86RegSizePrefix!T~`AX;
mov high, `~x86RegSizePrefix!T~`DX;
}
`);
return typeof(return)(low, high);
}
}
else
static assert(false, "Not implemented on this architecture");
}
version (Intel)
unittest
{
assert(longMul(1, 1) == LongInt!int(1, 0));
assert(longMul(1, 2) == LongInt!int(2, 0));
assert(longMul(0x1_0000, 0x1_0000) == LongInt!int(0, 1));
assert(longMul(short(1), short(1)) == LongInt!short(1, 0));
assert(longMul(short(0x100), short(0x100)) == LongInt!short(0, 1));
assert(longMul(short(1), short(-1)) == LongInt!short(cast(ushort)-1, -1));
assert(longMul(ushort(1), cast(ushort)-1) == LongInt!ushort(cast(ushort)-1, 0));
version(X86_64)
{
assert(longMul(1L, 1L) == LongInt!long(1, 0));
assert(longMul(0x1_0000_0000L, 0x1_0000_0000L) == LongInt!long(0, 1));
}
}
struct DivResult(T) { T quotient, remainder; }
DivResult!T longDiv(T, L)(L a, T b)
if (is(T : long) && T.sizeof >= 2 && is(L == LongInt!T))
{
version (Intel)
{
version (LDC)
{
import ldc.llvmasm;
auto t = __asmtuple!(T, T)(
x86SignedOpPrefix!T~`div`~x86SizeOpSuffix!T~` $4`,
// Technically, the last one should be "rm", but that generates suboptimal code in many cases
`={`~x86RegSizePrefix!T~`ax},={`~x86RegSizePrefix!T~`dx},{`~x86RegSizePrefix!T~`ax},{`~x86RegSizePrefix!T~`dx},r`,
a.low, a.high, b
);
return typeof(return)(t.v[0], t.v[1]);
}
else
version (GNU)
{
T low = a.low, high = a.high;
T quotient = void;
T remainder = void;
mixin(`
asm
{
"`~x86SignedOpPrefix!T~`div`~x86SizeOpSuffix!T~` %4"
: "=a" quotient, "=d" remainder
: "a" low, "d" high, "rm" b;
}
`);
return typeof(return)(quotient, remainder);
}
else
{
auto low = a.low;
auto high = a.high;
T quotient = void;
T remainder = void;
mixin(`
asm
{
mov `~x86RegSizePrefix!T~`AX, low;
mov `~x86RegSizePrefix!T~`DX, high;
`~x86SignedOpPrefix!T~`div b;
mov quotient, `~x86RegSizePrefix!T~`AX;
mov remainder, `~x86RegSizePrefix!T~`DX;
}
`);
return typeof(return)(quotient, remainder);
}
}
else
static assert(false, "Not implemented on this architecture");
}
version (Intel)
unittest
{
assert(longDiv(LongInt!int(1, 0), 1) == DivResult!int(1, 0));
assert(longDiv(LongInt!int(5, 0), 2) == DivResult!int(2, 1));
assert(longDiv(LongInt!int(0, 1), 0x1_0000) == DivResult!int(0x1_0000, 0));
assert(longDiv(LongInt!short(1, 0), short(1)) == DivResult!short(1, 0));
assert(longDiv(LongInt!short(0, 1), short(0x100)) == DivResult!short(0x100, 0));
assert(longDiv(LongInt!short(cast(ushort)-1, -1), short(-1)) == DivResult!short(1));
assert(longDiv(LongInt!ushort(cast(ushort)-1, 0), cast(ushort)-1) == DivResult!ushort(1));
version(X86_64)
{
assert(longDiv(LongInt!long(1, 0), 1L) == DivResult!long(1));
assert(longDiv(LongInt!long(0, 1), 0x1_0000_0000L) == DivResult!long(0x1_0000_0000));
}
}

View file

@ -1,6 +1,6 @@
/// Simple source code splitter
/// Written by Vladimir Panteleev <vladimir@thecybershadow.net>
/// Released into the Public Domain
/// License: Boost Software License, Version 1.0
module splitter;
@ -17,8 +17,49 @@ import std.string;
import std.traits;
import std.stdio : stderr;
import polyhash;
/// Represents an Entity's position within a program tree.
struct Address
{
Address* parent; /// Upper node's Address. If null, then this is the root node (and index should be 0).
size_t index; /// Index within the parent's children array
size_t depth; /// Distance from the root address
Address*[] children; /// Used to keep a global cached tree of addresses.
ref Address* child(size_t index) const
{
auto mutableThis = cast(Address*)&this; // Break const for caching
if (mutableThis.children.length < index + 1)
mutableThis.children.length = index + 1;
if (!mutableThis.children[index])
mutableThis.children[index] = new Address(mutableThis, index, depth+1);
return mutableThis.children[index];
}
}
struct EntityRef /// Reference to another Entity in the same tree
{
Entity entity; /// Pointer - only valid during splitting / optimizing
const(Address)* address; /// Address - assigned after splitting / optimizing
}
enum largest64bitPrime = 18446744073709551557UL; // 0xFFFFFFFF_FFFFFFC5
// static if (is(ModQ!(ulong, largest64bitPrime)))
static if (modQSupported) // https://issues.dlang.org/show_bug.cgi?id=20677
alias EntityHash = PolynomialHash!(ModQ!(ulong, largest64bitPrime));
else
{
pragma(msg,
"64-bit long multiplication/division is not supported on this platform.\n" ~
"Falling back to working in modulo 2^^64.\n" ~
"Hashing / cache accuracy may be impaired.\n" ~
"---------------------------------------------------------------------");
alias EntityHash = PolynomialHash!ulong;
}
/// Represents a slice of the original code.
class Entity
final class Entity
{
string head; /// This node's "head", e.g. "{" for a statement block.
Entity[] children; /// This node's children nodes, e.g. the statements of the statement block.
@ -29,14 +70,18 @@ class Entity
bool isPair; /// Internal hint for --dump output
bool noRemove; /// Don't try removing this entity (children OK)
bool clean; /// Computed fields are up-to-date
bool removed; /// For dangling dependencies
Entity[] dependencies; /// If any of these entities are omitted, so should this entity.
bool dead; /// Tombstone or redirect
EntityRef[] dependents;/// If this entity is removed, so should all these entities.
Address* redirect; /// If moved, this is where this entity is now
int id; /// For diagnostics
size_t descendants; /// For progress display
DSplitter.Token token; /// Used internally
size_t descendants; /// [Computed] For progress display
EntityHash hash; /// [Computed] Hashed value of this entity's content (as if it were saved to disk).
const(Address)*[] allDependents; /// [Computed] External dependents of this and child nodes
string deadContents; /// [Computed] For --white-out - all of this node's contents, with non-whitespace replaced by whitespace
EntityHash deadHash; /// [Computed] Hash of deadContents
this(string head = null, Entity[] children = null, string tail = null)
{
@ -45,11 +90,10 @@ class Entity
this.tail = tail;
}
string[] comments;
@property string comment()
{
string[] result = comments;
string[] result;
debug result = comments;
if (isPair)
{
assert(token == DSplitter.Token.none);
@ -60,10 +104,33 @@ class Entity
return result.length ? result.join(" / ") : null;
}
override string toString()
override string toString() const
{
return "%(%s%) %s %(%s%)".format([head], children, [tail]);
}
Entity dup() /// Creates a shallow copy
{
auto result = new Entity;
foreach (i, item; this.tupleof)
result.tupleof[i] = this.tupleof[i];
result.children = result.children.dup;
return result;
}
void kill() /// Convert to tombstone/redirect
{
dependents = null;
isPair = false;
descendants = 0;
allDependents = null;
dead = true;
}
private: // Used during parsing only
DSplitter.Token token; /// Used internally
debug string[] comments; /// Used to debug the splitter
}
enum Mode
@ -79,6 +146,7 @@ enum Splitter
words, /// Split by whitespace
D, /// Parse D source code
diff, /// Unified diffs
indent, /// Indentation (Python, YAML...)
}
immutable string[] splitterNames = [EnumMembers!Splitter].map!(e => e.text().toLower()).array();
@ -95,6 +163,7 @@ struct ParseOptions
bool stripComments;
ParseRule[] rules;
Mode mode;
uint tabWidth;
}
/// Parse the given file/directory.
@ -123,15 +192,15 @@ Entity loadFiles(ref string path, ParseOptions options)
enum BIN_SIZE = 2;
void optimize(Entity set)
void optimizeUntil(alias stop)(Entity set)
{
static void group(ref Entity[] set, size_t start, size_t end)
static Entity group(Entity[] children)
{
//set = set[0..start] ~ [new Entity(removable, set[start..end])] ~ set[end..$];
auto children = set[start..end].dup;
if (children.length == 1)
return children[0];
auto e = new Entity(null, children, null);
e.noRemove = children.any!(c => c.noRemove)();
set.replaceInPlace(start, end, [e]);
return e;
}
static void clusterBy(ref Entity[] set, size_t binSize)
@ -141,16 +210,14 @@ void optimize(Entity set)
auto size = set.length >= binSize*2 ? binSize : (set.length+1) / 2;
//auto size = binSize;
auto bins = set.length/size;
if (set.length % size > 1)
group(set, bins*size, set.length);
foreach_reverse (i; 0..bins)
group(set, i*size, (i+1)*size);
set = set.chunks(size).map!group.array;
}
}
static void doOptimize(Entity e)
void doOptimize(Entity e)
{
if (stop(e))
return;
foreach (c; e.children)
doOptimize(c);
clusterBy(e.children, BIN_SIZE);
@ -159,6 +226,8 @@ void optimize(Entity set)
doOptimize(set);
}
alias optimize = optimizeUntil!((Entity e) => false);
private:
/// Override std.string nonsense, which does UTF-8 decoding
@ -219,6 +288,9 @@ Entity loadFile(string name, string path, ParseOptions options)
case Splitter.diff:
result.children = parseDiff(result.contents);
return result;
case Splitter.indent:
result.children = parseIndent(result.contents, options.tabWidth);
return result;
}
}
assert(false); // default * rule should match everything
@ -279,20 +351,24 @@ struct DSplitter
generated0, /// First value of generated tokens (see below)
max = generated0 + tokenLookup.length
max = tokenText.length
}
enum Token[string] tokenLookup = // DMD pr/2824
static immutable string[] tokenText =
{
auto result = new string[Token.generated0];
Token[string] lookup;
auto t = Token.generated0;
Token add(string s)
{
auto p = s in lookup;
if (p)
return *p;
return lookup[s] = t++;
Token t = cast(Token)result.length;
result ~= s;
lookup[s] = t;
return t;
}
foreach (pair; pairs)
@ -304,21 +380,21 @@ struct DSplitter
foreach (i, synonyms; separators)
foreach (sep; synonyms)
add(sep);
return lookup;
}();
static immutable string[Token.max] tokenText =
{
string[Token.max] result;
foreach (k, v; tokenLookup)
result[v] = k;
return result;
}();
static Token lookupToken(string s)
{
if (!__ctfe) assert(false, "Don't use at runtime");
foreach (t; Token.generated0 .. Token.max)
if (s == tokenText[t])
return t;
assert(false, "No such token: " ~ s);
}
enum Token tokenLookup(string s) = lookupToken(s);
struct TokenPair { Token start, end; }
static Token lookupToken(string s) { return tokenLookup[s]; }
static TokenPair makeTokenPair(Pair pair) { return TokenPair(tokenLookup[pair.start], tokenLookup[pair.end]); }
static TokenPair makeTokenPair(Pair pair) { return TokenPair(lookupToken(pair.start), lookupToken(pair.end)); }
alias lookupTokens = arrayMap!(lookupToken, const string);
static immutable TokenPair[] pairTokens = pairs .arrayMap!makeTokenPair();
static immutable Token[][] separatorTokens = separators.arrayMap!lookupTokens ();
@ -338,11 +414,11 @@ struct DSplitter
{
switch (t)
{
case tokenLookup[";"]:
case tokenLookup!";":
return SeparatorType.postfix;
case tokenLookup["import"]:
case tokenLookup!"import":
return SeparatorType.prefix;
case tokenLookup["else"]:
case tokenLookup!"else":
return SeparatorType.binary;
default:
if (pairTokens.any!(pair => pair.start == t))
@ -449,7 +525,7 @@ struct DSplitter
advance();
break;
case 'r':
if (consume(`r"`))
if (consume(`"`))
{
result = Token.other;
while (advance() != '"')
@ -645,7 +721,6 @@ struct DSplitter
return r;
}
tokenLoop:
while (true)
{
Token token;
@ -771,7 +846,7 @@ struct DSplitter
for (size_t i=0; i<entities.length;)
{
auto e = entities[i];
if (e.head.empty && e.tail.empty && e.dependencies.empty)
if (e.head.empty && e.tail.empty && e.dependents.empty)
{
assert(e.token == Token.none);
if (e.children.length == 0)
@ -813,7 +888,7 @@ struct DSplitter
auto head = entities[0..i] ~ group(e.children);
e.children = null;
auto tail = new Entity(null, group(entities[i+1..$]), null);
e.dependencies ~= tail;
tail.dependents ~= EntityRef(e);
entities = group(head ~ e) ~ tail;
foreach (c; entities)
postProcessDependency(c.children);
@ -825,11 +900,10 @@ struct DSplitter
if (!entities.length)
return;
foreach_reverse (i, e; entities[0..$-1])
if (e.token == tokenLookup["!"] && entities[i+1].children.length && entities[i+1].children[0].token == tokenLookup["("])
if (e.token == tokenLookup!"!" && entities[i+1].children.length && entities[i+1].children[0].token == tokenLookup!"(")
{
auto dependency = new Entity;
e.dependencies ~= dependency;
entities[i+1].children[0].dependencies ~= dependency;
dependency.dependents = [EntityRef(e), EntityRef(entities[i+1].children[0])];
entities = entities[0..i+1] ~ dependency ~ entities[i+1..$];
}
}
@ -838,27 +912,22 @@ struct DSplitter
{
foreach (i, e; entities)
if (i && !e.token && e.children.length && getSeparatorType(e.children[0].token) == SeparatorType.binary && !e.children[0].children)
e.children[0].dependencies ~= entities[i-1];
entities[i-1].dependents ~= EntityRef(e.children[0]);
}
static void postProcessBlockKeywords(ref Entity[] entities)
{
for (size_t i=0; i<entities.length;)
foreach_reverse (i; 0 .. entities.length)
{
if (blockKeywordTokens.canFind(entities[i].token) && i+1 < entities.length)
{
auto j = i + 1;
if (j < entities.length && entities[j].token == tokenLookup["("])
if (j < entities.length && entities[j].token == tokenLookup!"(")
j++;
j++; // ; or {
if (j <= entities.length)
{
entities = entities[0..i] ~ group(group(entities[i..j-1]) ~ entities[j-1..j]) ~ entities[j..$];
continue;
}
}
i++;
}
}
@ -879,23 +948,23 @@ struct DSplitter
return false;
}
if (consume(tokenLookup["if"]) || consume(tokenLookup["static if"]))
consume(tokenLookup["else"]);
if (consume(tokenLookup!"if") || consume(tokenLookup!"static if"))
consume(tokenLookup!"else");
else
if (consume(tokenLookup["do"]))
consume(tokenLookup["while"]);
if (consume(tokenLookup!"do"))
consume(tokenLookup!"while");
else
if (consume(tokenLookup["try"]))
if (consume(tokenLookup!"try"))
{
while (consume(tokenLookup["catch"]))
while (consume(tokenLookup!"catch"))
continue;
consume(tokenLookup["finally"]);
consume(tokenLookup!"finally");
}
if (i == j)
{
j++;
while (consume(tokenLookup["in"]) || consume(tokenLookup["out"]) || consume(tokenLookup["body"]))
while (consume(tokenLookup!"in") || consume(tokenLookup!"out") || consume(tokenLookup!"body"))
continue;
}
@ -917,7 +986,7 @@ struct DSplitter
{
// Create pair entities
if (entities[i].token == tokenLookup["{"])
if (entities[i].token == tokenLookup!"{")
{
if (i >= lastPair + 1)
{
@ -932,7 +1001,7 @@ struct DSplitter
lastPair = i + 1;
}
else
if (entities[i].token == tokenLookup[";"])
if (entities[i].token == tokenLookup!";")
lastPair = i + 1;
i++;
@ -948,7 +1017,7 @@ struct DSplitter
auto pparen = firstHead(entities[i+1]);
if (pparen
&& *pparen !is entities[i+1]
&& pparen.token == tokenLookup["("])
&& pparen.token == tokenLookup!"(")
{
auto paren = *pparen;
*pparen = new Entity();
@ -1041,7 +1110,7 @@ struct DSplitter
if (entity.token == Token.other && isValidIdentifier(id) && !entity.tail && !entity.children)
lastID = id;
else
if (lastID && entity.token == tokenLookup["("])
if (lastID && entity.token == tokenLookup!"(")
{
size_t[] stack;
struct Comma { size_t[] addr, after; }
@ -1060,7 +1129,7 @@ struct DSplitter
afterComma = false;
}
if (entity.token == tokenLookup[","])
if (entity.token == tokenLookup!",")
{
commas ~= Comma(stack);
//entity.comments ~= "Comma %d".format(commas.length);
@ -1112,7 +1181,7 @@ struct DSplitter
return;
}
else
if (entity.token == tokenLookup["!"])
if (entity.token == tokenLookup!"!")
{}
else
if (entity.head || entity.tail)
@ -1142,7 +1211,7 @@ struct DSplitter
debug e.comments ~= "%s param %d".format(id, i);
funRoot.children ~= e;
foreach (arg; args)
arg.dependencies ~= e;
e.dependents ~= EntityRef(arg);
}
}
}
@ -1241,6 +1310,50 @@ Entity[] parseDiff(string s)
;
}
Entity[] parseIndent(string s, uint tabWidth)
{
Entity[] root;
Entity[]*[] stack;
foreach (line; s.split2!("\n", ""))
{
size_t indent = 0;
charLoop:
foreach (c; line)
switch (c)
{
case ' ':
indent++;
break;
case '\t':
indent += tabWidth;
break;
case '\r':
case '\n':
// Treat empty (whitespace-only) lines as belonging to the
// immediately higher (most-nested) block.
indent = stack.length;
break charLoop;
default:
break charLoop;
}
auto e = new Entity(line);
foreach_reverse (i; 0 .. min(indent, stack.length)) // non-inclusively up to indent
if (stack[i])
{
*stack[i] ~= e;
goto parentFound;
}
root ~= e;
parentFound:
stack.length = indent + 1;
stack[indent] = &e.children;
}
return root;
}
private:
bool isNewline(char c) { return c == '\r' || c == '\n'; }

View file

@ -74,8 +74,8 @@ changed: $(ROOT)/changed
dman: $(ROOT)/dman
dustmite: $(ROOT)/dustmite
$(ROOT)/dustmite: DustMite/dustmite.d DustMite/splitter.d
$(DMD) $(DFLAGS) -version=Dlang_Tools DustMite/dustmite.d DustMite/splitter.d -of$(@)
$(ROOT)/dustmite: DustMite/dustmite.d DustMite/splitter.d DustMite/polyhash.d
$(DMD) $(DFLAGS) -version=Dlang_Tools DustMite/dustmite.d DustMite/splitter.d DustMite/polyhash.d -of$(@)
$(TOOLS) $(DOC_TOOLS) $(CURL_TOOLS) $(TEST_TOOLS): $(ROOT)/%: %.d
$(DMD) $(DFLAGS) -of$(@) $(<)

View file

@ -65,8 +65,8 @@ $(ROOT)\rdmd.exe : rdmd.d
$(ROOT)\ddemangle.exe : ddemangle.d
$(DMD) $(DFLAGS) -of$@ ddemangle.d
$(ROOT)\dustmite.exe : DustMite/dustmite.d DustMite/splitter.d
$(DMD) $(DFLAGS) -of$@ -version=Dlang_Tools DustMite/dustmite.d DustMite/splitter.d
$(ROOT)\dustmite.exe : DustMite/dustmite.d DustMite/splitter.d DustMite/polyhash.d
$(DMD) $(DFLAGS) -of$@ -version=Dlang_Tools DustMite/dustmite.d DustMite/splitter.d DustMite/polyhash.d
$(ROOT)\changed.exe : changed.d
$(DMD) $(DFLAGS) -of$@ changed.d