Update DustMite

This updates the distributed version to the latest version including the improvements from https://dlang.org/blog/2020/04/13/dustmite-the-general-purpose-data-reduction-tool/ * d785720 Add "indent" split mode * e77126f splitter: Speed up optimization * e0138ca dustmite: Preserve no-remove flags when applying "Concat" reductions * d501228 Don't move code marked as no-remove * 4d361cb dustmite: Use templates instead of delegates for dump * 772a8fb dustmite: Use lockingBinaryWriter * 621991b dustmite: Bulk adjacent writes * dbd493a splitter: Don't compile in a debug-only field * d19b15e Add coverage analysis and upload to Coveralls * 630cf9c Track each entity's parent entity * f56f6a4 splitter: Make Entity members used only by splitter private * be2c452 splitter: Fix Dscanner warning * 70d5503 splitter: Delete unused label * 0e788b5 Revert "Track each entity's parent entity" * 1f1f732 splitter: Don't dereference enum AAs at runtime * 3fea926 dustmite: Mark final classes as such * 02f8b2e splitter: Mark Entity class final * 2ca0522 Overhaul tree representation and edit algorithms * d9da7cf dustmite: Fail no-op concat reductions * d439fed dustmite: Remove the concatPerformed bodge * 0ec8fc5 dustmite: Log inapplicable reductions * 2e19085 dustmite: Fix a TODO * ad4124f dustmite: Start a new iteration after a successful Concat reduction * 6d0cd9f dustmite: Remove `reduction == initialReduction` lookahead hack * f197986 dustmite: Get rid of the "root" global * 690ab07 dustmite: Update the lookahead iterator's root according to predictions * 0dc5e04 dustmite: Remove special handling for the first lookahead step * 8b5f639 dustmite: Handle inapplicable reductions in lookahead * fd45d61 dustmite: Fix placement of --version in help text * bf407bc dustmite: Make descendant recounting incremental * 6878138 dustmite: Clean up test directories before writing to them * a269d25 dustmite: Distinguish zero and uninitialized digests in lookahead queue * 9eb4126 dustmite: Move lookahead saving and process creation into worker thread * 5034c01 polyhash: Initial commit * 3d28c6e polyhash: Add some comments * 751ea2b polyhash: Optimize calculating powers of p * f675253 polyhash: Use decreasing powers of p, instead of increasing * b1b76cd polyhash: Convert to an output range interface * 62d145b License under Boost Software License 1.0 * 19f0200 polyhash: Add mod-q (with non-power-of-two q) support * 5b80b03 Unify incremental and full updates of computed Entity fields * f85acdf Switch to incremental polynomial hashing * 401d408 Work around DMD bug 20677 * 575406e Speed up applyReduction.edit * 23a67fb Re-optimize incrementally for the Concat reduction * 80b7ba4 dustmite: Speed up removing dependencies under removed nodes * ec81973 Speed up address comparison * 26f2039 dustmite: Tweak tree initialization order * d5523e1 splitter: Clear hash for killed entities * 048a0fd Keep children of removed nodes in the tree * 48ed0a5 dustmite: Make findEntity traverse dead nodes * 196f5f7 dustmite: Improve dump formatting of redirects and dead entities * 404c8f9 dustmite: With both --trace and --dump, save dumps during trace * 72cd08c dustmite: Don't attempt to concatenate dead files * 53d3bf6 dustmite: Recalculate dead entities recursively too * c3d1215 dustmite: Traverse dead entities when editing them, too * 226a651 dustmite: Do not copy dead entities for editing * b8f2844 Maintain cached cumulative dependents per-node * 9f5a4f1 dustmite: Create less garbage during I/O * df752dc Maintain cached full content of each node * 4b165e6 Revert "Maintain cached full content of each node" * 965fbc3 dustmite: Speed up strategy iteration over dead nodes * 15d0a8f dustmite: Remove use of lazy arguments in address iteration * b5c1ec0 splitter: Fix lexing D raw string literals (r"...") * 2630496 dustmite: Fix "reduced to empty set" message * 9505bf6 dustmite: Fix recursion for dead nodes in recalculate * 6764e8d dustmite: Add --in-place * 3a76633 dustmite: Remove Reduction.target * c04c843 dustmite: Replace Reduction.address with an Address* * d2cfa23 dustmite: Allow the test function to take an array of reductions * 5e510c8 dustmite: Introduce Reduction.Type.Swap * d4303ca dustmite: Add fuzzing mode * 5fffd18 dustmite: Split up the DustMiteNoRemove string literals * 714ea99 dustmite: Allow --reduce-only and --no-remove rules to stack * ca18a07 dustmite: Add --remove switch * de92616 dustmite: Reorder --help text * 157b305 dustmite: Remove trailing punctuation from --help text * 6746464 Add --white-out option * 6705a94 dustmite: Update tagline * e76496f splitter: Make EntityRef.address const * 4cfed4c dustmite: Add debug=DETERMINISTIC_LOOKAHEAD * 214d000 dustmite: Add reduction application cache * e859e86 dustmite: Grow reduction application cache dynamically * fd3ad29 dustmite: Speed up dependency recalculation * a10ef7f dustmite: Fix crash with --whiteout + --trace * 256a651 dustmite: Speed up dumping * df42f62 dustmite: Add --max-steps * 886c6f2 dustmite: Make measure's delegate scoped * 732d0f1 dustmite: Add more performance timers * 05acf86 dustmite: Implement non-linear lookahead prediction * 0a7a937 dustmite: Improve prediction formula * 990b3bc splitter: Improve parsing of successive keyword-prefixed blocks * cb0855d dustmite: Make detection of suspicious files non-fatal
2025-04-25 20:51:00 +03:00 · 2020-11-21 21:18:44 +00:00 · 2020-11-21 21:18:44 +00:00 · 39c73d82c2
commit 39c73d82c2
parent 42ab606a3b
6 changed files with 1863 additions and 566 deletions
--- a/DustMite/README
+++ b/DustMite/README
@ -1,6 +1,10 @@
-This is DustMite, a D source code minimization tool.
+DustMite is a general-purpose data reduction tool.
+
+Its main use is to reduce D programs into minimal examples
+exhibiting some behavior (usually a compiler bug).
+
 For documentation, see the GitHub wiki:
 https://github.com/CyberShadow/DustMite/wiki

-DustMite was created by Vladimir Panteleev
-and is released into the Public Domain.
+DustMite was created by Vladimir Panteleev,
+and is distributed under the Boost Software Licence, version 1.0.
--- a/DustMite/dustmite.d
+++ b/DustMite/dustmite.d
--- a/DustMite/polyhash.d
+++ b/DustMite/polyhash.d
@ -0,0 +1,413 @@
+/// Polynomial hash for partial rehashing.
+/// http://stackoverflow.com/a/42112687/21501
+/// Written by Vladimir Panteleev <vladimir@thecybershadow.net>
+/// License: Boost Software License, Version 1.0
+
+module polyhash;
+
+import std.range.primitives;
+import std.traits;
+
+struct PolynomialHash(Value)
+{
+	Value value;   /// The hash value of the hashed string
+	size_t length; /// The length of the hashed string
+
+	// Cycle length == 2^^30 for uint, > 2^^46 for ulong
+	// TODO: find primitive root modulo 2^^32, if one exists
+	enum Value p = Value(269);
+
+	private
+	{
+		/// Precalculated table for (p^^(2^^i))
+		alias Power2s = Value[size_t.sizeof * 8];
+
+		static Power2s genTable()
+		{
+			Value[size_t.sizeof * 8] result;
+			Value v = p;
+			foreach (i; 0 .. result.length)
+			{
+				result[i] = v;
+				v *= v;
+			}
+			return result;
+		}
+
+		static if (is(typeof({ enum table = genTable(); })))
+			static immutable Power2s power2s = genTable(); // Compute at compile-time
+		else
+		{
+			static immutable Power2s power2s;
+			// Compute at run-time (initialization)
+			shared static this() { power2s = genTable(); }
+		}
+	}
+
+	/// Return p^^power (mod q).
+	static Value pPower(size_t power)
+	{
+		Value v = 1;
+		foreach (b; 0 .. power2s.length)
+			if ((size_t(1) << b) & power)
+				v *= power2s[b];
+		return v;
+	}
+
+	void put(char c)
+	{
+		value *= p;
+		value += Value(c);
+		length++;
+	}
+
+	void put(in char[] s)
+	{
+		foreach (c; s)
+		{
+			value *= p;
+			value += Value(c);
+		}
+		length += s.length;
+	}
+
+	void put(ref typeof(this) hash)
+	{
+		value *= pPower(hash.length);
+		value += hash.value;
+		length += hash.length;
+	}
+
+	static typeof(this) hash(T)(T value)
+	if (is(typeof({ typeof(this) result; .put(result, value); })))
+	{
+		typeof(this) result;
+		.put(result, value);
+		return result;
+	}
+
+	unittest
+	{
+		assert(hash("").value == 0);
+		assert(hash([hash(""), hash("")]).value == 0);
+
+		// "a" + "" + "b" == "ab"
+		assert(hash([hash("a"), hash(""), hash("b")]) == hash("ab"));
+
+		// "a" + "bc" == "ab" + "c"
+		assert(hash([hash("a"), hash("bc")]) == hash([hash("ab"), hash("c")]));
+
+		// "a" != "b"
+		assert(hash("a") != hash("b"));
+
+		// "ab" != "ba"
+		assert(hash("ab") != hash("ba"));
+		assert(hash([hash("a"), hash("b")]) != hash([hash("b"), hash("a")]));
+
+		// Test overflow
+		assert(hash([
+			hash("Mary"),
+			hash(" "),
+			hash("had"),
+			hash(" "),
+			hash("a"),
+			hash(" "),
+			hash("little"),
+			hash(" "),
+			hash("lamb"),
+			hash("")
+		]) == hash("Mary had a little lamb"));
+	}
+}
+
+unittest
+{
+	PolynomialHash!uint uintTest;
+	PolynomialHash!ulong ulongTest;
+}
+
+unittest
+{
+	PolynomialHash!(ModQ!(uint, 4294967291)) modQtest;
+}
+
+// ****************************************************************************
+
+/// Represents a value and performs calculations in modulo q.
+struct ModQ(T, T q)
+if (isUnsigned!T)
+{
+	T value;
+
+	this(T v)
+	{
+		debug assert(v < q);
+		this.value = v;
+	}
+
+	bool opEquals(T operand) const
+	{
+		debug assert(operand < q);
+		return value == operand;
+	}
+
+	void opOpAssign(string op : "+")(typeof(this) operand)
+	{
+		T result = this.value;
+		result += operand.value;
+		if (result >= q || result < this.value || result < operand.value)
+			result -= q;
+		this.value = result;
+	}
+
+	void opOpAssign(string op : "*")(typeof(this) operand)
+	{
+		this.value = longMul(this.value, operand.value).longDiv(q).remainder;
+	}
+
+	T opCast(Q)() const if (is(Q == T)) { return value; }
+
+	// Ensure this type is supported whet it is instantiated,
+	// instead of when the operator overloads are
+	private static void check() { typeof(this) m; m *= typeof(this)(0); }
+}
+
+unittest
+{
+	alias M = ModQ!(ushort, 100);
+	M value;
+	value += M(56);
+	value += M(78);
+	assert(value == 34);
+}
+
+unittest
+{
+	alias M = ModQ!(ushort, 100);
+	M value;
+	value += M(12);
+	value *= M(12);
+	assert(value == 44);
+}
+
+// ****************************************************************************
+
+private:
+
+import std.traits;
+
+/// Get the smallest built-in unsigned integer type
+/// that can store this many bits of data.
+template TypeForBits(uint bits)
+{
+	static if (bits <= 8)
+		alias TypeForBits = ubyte;
+	else
+	static if (bits <= 16)
+		alias TypeForBits = ushort;
+	else
+	static if (bits <= 32)
+		alias TypeForBits = uint;
+	else
+	static if (bits <= 64)
+		alias TypeForBits = ulong;
+	else
+		static assert(false, "No integer type big enough for " ~ bits.stringof ~ " bits");
+}
+
+struct LongInt(uint bits, bool signed)
+{
+	TypeForBits!bits low;
+	static if (signed)
+		Signed!(TypeForBits!bits) high;
+	else
+		TypeForBits!bits high;
+}
+
+alias LongInt(T) = LongInt!(T.sizeof * 8, isSigned!T);
+
+alias Cent = LongInt!long;
+alias UCent = LongInt!ulong;
+
+version (X86)
+	version = Intel;
+else
+version (X86_64)
+	version = Intel;
+
+// Hack to work around DMD bug https://issues.dlang.org/show_bug.cgi?id=20677
+version (Intel)
+	public enum modQSupported = size_t.sizeof == 8;
+else
+	public enum modQSupported = false;
+
+version (Intel)
+{
+	version (DigitalMars)
+		enum x86RegSizePrefix(T) =
+			T.sizeof == 2 ? "" :
+			T.sizeof == 4 ? "E" :
+			T.sizeof == 8 ? "R" :
+			"?"; // force syntax error
+	else
+	{
+		enum x86RegSizePrefix(T) =
+			T.sizeof == 2 ? "" :
+			T.sizeof == 4 ? "e" :
+			T.sizeof == 8 ? "r" :
+			"?"; // force syntax error
+		enum x86SizeOpSuffix(T) =
+			T.sizeof == 2 ? "w" :
+			T.sizeof == 4 ? "l" :
+			T.sizeof == 8 ? "q" :
+			"?"; // force syntax error
+	}
+
+	enum x86SignedOpPrefix(T) = isSigned!T ? "i" : "";
+}
+
+LongInt!T longMul(T)(T a, T b)
+if (is(T : long) && T.sizeof >= 2)
+{
+	version (Intel)
+	{
+		version (LDC)
+		{
+			import ldc.llvmasm;
+			auto t = __asmtuple!(T, T)(
+				x86SignedOpPrefix!T~`mul`~x86SizeOpSuffix!T~` $3`,
+				// Technically, the last one should be "rm", but that generates suboptimal code in many cases
+				`={`~x86RegSizePrefix!T~`ax},={`~x86RegSizePrefix!T~`dx},{`~x86RegSizePrefix!T~`ax},r`,
+				a, b
+			);
+			return typeof(return)(t.v[0], t.v[1]);
+		}
+		else
+		version (GNU)
+		{
+			T low = void, high = void;
+			mixin(`
+				asm
+				{
+					"`~x86SignedOpPrefix!T~`mul`~x86SizeOpSuffix!T~` %3"
+					: "=a" low, "=d" high
+					: "a" a, "rm" b;
+				}
+			`);
+			return typeof(return)(low, high);
+		}
+		else
+		{
+			T low = void, high = void;
+			mixin(`
+				asm
+				{
+					mov `~x86RegSizePrefix!T~`AX, a;
+					`~x86SignedOpPrefix!T~`mul b;
+					mov low, `~x86RegSizePrefix!T~`AX;
+					mov high, `~x86RegSizePrefix!T~`DX;
+				}
+			`);
+			return typeof(return)(low, high);
+		}
+	}
+	else
+		static assert(false, "Not implemented on this architecture");
+}
+
+version (Intel)
+unittest
+{
+	assert(longMul(1, 1) == LongInt!int(1, 0));
+	assert(longMul(1, 2) == LongInt!int(2, 0));
+	assert(longMul(0x1_0000, 0x1_0000) == LongInt!int(0, 1));
+
+	assert(longMul(short(1), short(1)) == LongInt!short(1, 0));
+	assert(longMul(short(0x100), short(0x100)) == LongInt!short(0, 1));
+
+	assert(longMul(short(1), short(-1)) == LongInt!short(cast(ushort)-1, -1));
+	assert(longMul(ushort(1), cast(ushort)-1) == LongInt!ushort(cast(ushort)-1, 0));
+
+	version(X86_64)
+	{
+		assert(longMul(1L, 1L) == LongInt!long(1, 0));
+		assert(longMul(0x1_0000_0000L, 0x1_0000_0000L) == LongInt!long(0, 1));
+	}
+}
+
+struct DivResult(T) { T quotient, remainder; }
+
+DivResult!T longDiv(T, L)(L a, T b)
+if (is(T : long) && T.sizeof >= 2 && is(L == LongInt!T))
+{
+	version (Intel)
+	{
+		version (LDC)
+		{
+			import ldc.llvmasm;
+			auto t = __asmtuple!(T, T)(
+				x86SignedOpPrefix!T~`div`~x86SizeOpSuffix!T~` $4`,
+				// Technically, the last one should be "rm", but that generates suboptimal code in many cases
+				`={`~x86RegSizePrefix!T~`ax},={`~x86RegSizePrefix!T~`dx},{`~x86RegSizePrefix!T~`ax},{`~x86RegSizePrefix!T~`dx},r`,
+				a.low, a.high, b
+			);
+			return typeof(return)(t.v[0], t.v[1]);
+		}
+		else
+		version (GNU)
+		{
+			T low = a.low, high = a.high;
+			T quotient = void;
+			T remainder = void;
+			mixin(`
+				asm
+				{
+					"`~x86SignedOpPrefix!T~`div`~x86SizeOpSuffix!T~` %4"
+					: "=a" quotient, "=d" remainder
+					: "a" low, "d" high, "rm" b;
+				}
+			`);
+			return typeof(return)(quotient, remainder);
+		}
+		else
+		{
+			auto low = a.low;
+			auto high = a.high;
+			T quotient = void;
+			T remainder = void;
+			mixin(`
+				asm
+				{
+					mov `~x86RegSizePrefix!T~`AX, low;
+					mov `~x86RegSizePrefix!T~`DX, high;
+					`~x86SignedOpPrefix!T~`div b;
+					mov quotient, `~x86RegSizePrefix!T~`AX;
+					mov remainder, `~x86RegSizePrefix!T~`DX;
+				}
+			`);
+			return typeof(return)(quotient, remainder);
+		}
+	}
+	else
+		static assert(false, "Not implemented on this architecture");
+}
+
+version (Intel)
+unittest
+{
+	assert(longDiv(LongInt!int(1, 0), 1) == DivResult!int(1, 0));
+	assert(longDiv(LongInt!int(5, 0), 2) == DivResult!int(2, 1));
+	assert(longDiv(LongInt!int(0, 1), 0x1_0000) == DivResult!int(0x1_0000, 0));
+
+	assert(longDiv(LongInt!short(1, 0), short(1)) == DivResult!short(1, 0));
+	assert(longDiv(LongInt!short(0, 1), short(0x100)) == DivResult!short(0x100, 0));
+
+	assert(longDiv(LongInt!short(cast(ushort)-1, -1), short(-1)) == DivResult!short(1));
+	assert(longDiv(LongInt!ushort(cast(ushort)-1, 0), cast(ushort)-1) == DivResult!ushort(1));
+
+	version(X86_64)
+	{
+		assert(longDiv(LongInt!long(1, 0), 1L) == DivResult!long(1));
+		assert(longDiv(LongInt!long(0, 1), 0x1_0000_0000L) == DivResult!long(0x1_0000_0000));
+	}
+}
--- a/DustMite/splitter.d
+++ b/DustMite/splitter.d
@ -1,6 +1,6 @@
 /// Simple source code splitter
 /// Written by Vladimir Panteleev <vladimir@thecybershadow.net>
-/// Released into the Public Domain
+/// License: Boost Software License, Version 1.0

 module splitter;

@ -17,8 +17,49 @@ import std.string;
 import std.traits;
 import std.stdio : stderr;

+import polyhash;
+
+/// Represents an Entity's position within a program tree.
+struct Address
+{
+	Address* parent;       /// Upper node's Address. If null, then this is the root node (and index should be 0).
+	size_t index;          /// Index within the parent's children array
+	size_t depth;          /// Distance from the root address
+
+	Address*[] children;   /// Used to keep a global cached tree of addresses.
+	ref Address* child(size_t index) const
+	{
+		auto mutableThis = cast(Address*)&this; // Break const for caching
+		if (mutableThis.children.length < index + 1)
+			mutableThis.children.length = index + 1;
+		if (!mutableThis.children[index])
+			mutableThis.children[index] = new Address(mutableThis, index, depth+1);
+		return mutableThis.children[index];
+	}
+}
+
+struct EntityRef             /// Reference to another Entity in the same tree
+{
+	Entity entity;           /// Pointer - only valid during splitting / optimizing
+	const(Address)* address; /// Address - assigned after splitting / optimizing
+}
+
+enum largest64bitPrime = 18446744073709551557UL; // 0xFFFFFFFF_FFFFFFC5
+// static if (is(ModQ!(ulong, largest64bitPrime)))
+static if (modQSupported) // https://issues.dlang.org/show_bug.cgi?id=20677
+	alias EntityHash = PolynomialHash!(ModQ!(ulong, largest64bitPrime));
+else
+{
+	pragma(msg,
+		"64-bit long multiplication/division is not supported on this platform.\n" ~
+		"Falling back to working in modulo 2^^64.\n" ~
+		"Hashing / cache accuracy may be impaired.\n" ~
+		"---------------------------------------------------------------------");
+	alias EntityHash = PolynomialHash!ulong;
+}
+
 /// Represents a slice of the original code.
-class Entity
+final class Entity
 {
 	string head;           /// This node's "head", e.g. "{" for a statement block.
 	Entity[] children;     /// This node's children nodes, e.g. the statements of the statement block.
@ -29,14 +70,18 @@ class Entity

 	bool isPair;           /// Internal hint for --dump output
 	bool noRemove;         /// Don't try removing this entity (children OK)
+	bool clean;            /// Computed fields are up-to-date

-	bool removed;          /// For dangling dependencies
-	Entity[] dependencies; /// If any of these entities are omitted, so should this entity.
+	bool dead;             /// Tombstone or redirect
+	EntityRef[] dependents;/// If this entity is removed, so should all these entities.
+	Address* redirect;     /// If moved, this is where this entity is now

 	int id;                /// For diagnostics
-	size_t descendants;    /// For progress display
-
-	DSplitter.Token token; /// Used internally
+	size_t descendants;    /// [Computed] For progress display
+	EntityHash hash;       /// [Computed] Hashed value of this entity's content (as if it were saved to disk).
+	const(Address)*[] allDependents; /// [Computed] External dependents of this and child nodes
+	string deadContents;   /// [Computed] For --white-out - all of this node's contents, with non-whitespace replaced by whitespace
+	EntityHash deadHash;   /// [Computed] Hash of deadContents

 	this(string head = null, Entity[] children = null, string tail = null)
 	{
@ -45,11 +90,10 @@ class Entity
 		this.tail     = tail;
 	}

-	string[] comments;
-
 	@property string comment()
 	{
-		string[] result = comments;
+		string[] result;
+		debug result = comments;
 		if (isPair)
 		{
 			assert(token == DSplitter.Token.none);
@ -60,10 +104,33 @@ class Entity
 		return result.length ? result.join(" / ") : null;
 	}

-	override string toString()
+	override string toString() const
 	{
 		return "%(%s%) %s %(%s%)".format([head], children, [tail]);
 	}
+
+	Entity dup()           /// Creates a shallow copy
+	{
+		auto result = new Entity;
+		foreach (i, item; this.tupleof)
+			result.tupleof[i] = this.tupleof[i];
+		result.children = result.children.dup;
+		return result;
+	}
+
+	void kill()            /// Convert to tombstone/redirect
+	{
+		dependents = null;
+		isPair = false;
+		descendants = 0;
+		allDependents = null;
+		dead = true;
+	}
+
+private: // Used during parsing only
+	DSplitter.Token token;    /// Used internally
+
+	debug string[] comments;  /// Used to debug the splitter
 }

 enum Mode
@ -79,6 +146,7 @@ enum Splitter
 	words,     /// Split by whitespace
 	D,         /// Parse D source code
 	diff,      /// Unified diffs
+	indent,    /// Indentation (Python, YAML...)
 }
 immutable string[] splitterNames = [EnumMembers!Splitter].map!(e => e.text().toLower()).array();

@ -95,6 +163,7 @@ struct ParseOptions
 	bool stripComments;
 	ParseRule[] rules;
 	Mode mode;
+	uint tabWidth;
 }

 /// Parse the given file/directory.
@ -123,15 +192,15 @@ Entity loadFiles(ref string path, ParseOptions options)

 enum BIN_SIZE = 2;

-void optimize(Entity set)
+void optimizeUntil(alias stop)(Entity set)
 {
-	static void group(ref Entity[] set, size_t start, size_t end)
+	static Entity group(Entity[] children)
 	{
-		//set = set[0..start] ~ [new Entity(removable, set[start..end])] ~ set[end..$];
-		auto children = set[start..end].dup;
+		if (children.length == 1)
+			return children[0];
 		auto e = new Entity(null, children, null);
 		e.noRemove = children.any!(c => c.noRemove)();
-		set.replaceInPlace(start, end, [e]);
+		return e;
 	}

 	static void clusterBy(ref Entity[] set, size_t binSize)
@ -141,16 +210,14 @@ void optimize(Entity set)
 			auto size = set.length >= binSize*2 ? binSize : (set.length+1) / 2;
 			//auto size = binSize;

-			auto bins = set.length/size;
-			if (set.length % size > 1)
-				group(set, bins*size, set.length);
-			foreach_reverse (i; 0..bins)
-				group(set, i*size, (i+1)*size);
+			set = set.chunks(size).map!group.array;
 		}
 	}

-	static void doOptimize(Entity e)
+	void doOptimize(Entity e)
 	{
+		if (stop(e))
+			return;
 		foreach (c; e.children)
 			doOptimize(c);
 		clusterBy(e.children, BIN_SIZE);
@ -159,6 +226,8 @@ void optimize(Entity set)
 	doOptimize(set);
 }

+alias optimize = optimizeUntil!((Entity e) => false);
+
 private:

 /// Override std.string nonsense, which does UTF-8 decoding
@ -219,6 +288,9 @@ Entity loadFile(string name, string path, ParseOptions options)
 				case Splitter.diff:
 					result.children = parseDiff(result.contents);
 					return result;
+				case Splitter.indent:
+					result.children = parseIndent(result.contents, options.tabWidth);
+					return result;
 			}
 		}
 	assert(false); // default * rule should match everything
@ -279,20 +351,24 @@ struct DSplitter

 		generated0, /// First value of generated tokens (see below)

-		max = generated0 + tokenLookup.length
+		max = tokenText.length
 	}

-	enum Token[string] tokenLookup = // DMD pr/2824
+	static immutable string[] tokenText =
 	{
+		auto result = new string[Token.generated0];
 		Token[string] lookup;

-		auto t = Token.generated0;
 		Token add(string s)
 		{
 			auto p = s in lookup;
 			if (p)
 				return *p;
-			return lookup[s] = t++;
+
+			Token t = cast(Token)result.length;
+			result ~= s;
+			lookup[s] = t;
+			return t;
 		}

 		foreach (pair; pairs)
@ -304,21 +380,21 @@ struct DSplitter
 		foreach (i, synonyms; separators)
 			foreach (sep; synonyms)
 				add(sep);
-
-		return lookup;
-	}();
-
-	static immutable string[Token.max] tokenText =
-	{
-		string[Token.max] result;
-		foreach (k, v; tokenLookup)
-			result[v] = k;
 		return result;
 	}();

+	static Token lookupToken(string s)
+	{
+		if (!__ctfe) assert(false, "Don't use at runtime");
+		foreach (t; Token.generated0 .. Token.max)
+			if (s == tokenText[t])
+				return t;
+		assert(false, "No such token: " ~ s);
+	}
+	enum Token tokenLookup(string s) = lookupToken(s);
+
 	struct TokenPair { Token start, end; }
-	static Token lookupToken(string s) { return tokenLookup[s]; }
-	static TokenPair makeTokenPair(Pair pair) { return TokenPair(tokenLookup[pair.start], tokenLookup[pair.end]); }
+	static TokenPair makeTokenPair(Pair pair) { return TokenPair(lookupToken(pair.start), lookupToken(pair.end)); }
 	alias lookupTokens = arrayMap!(lookupToken, const string);
 	static immutable TokenPair[] pairTokens      = pairs     .arrayMap!makeTokenPair();
 	static immutable Token[][]   separatorTokens = separators.arrayMap!lookupTokens ();
@ -338,11 +414,11 @@ struct DSplitter
 	{
 		switch (t)
 		{
-			case tokenLookup[";"]:
+			case tokenLookup!";":
 				return SeparatorType.postfix;
-			case tokenLookup["import"]:
+			case tokenLookup!"import":
 				return SeparatorType.prefix;
-			case tokenLookup["else"]:
+			case tokenLookup!"else":
 				return SeparatorType.binary;
 			default:
 				if (pairTokens.any!(pair => pair.start == t))
@ -449,7 +525,7 @@ struct DSplitter
 				advance();
 				break;
 			case 'r':
-				if (consume(`r"`))
+				if (consume(`"`))
 				{
 					result = Token.other;
 					while (advance() != '"')
@ -645,7 +721,6 @@ struct DSplitter
 			return r;
 		}

-		tokenLoop:
 		while (true)
 		{
 			Token token;
@ -771,7 +846,7 @@ struct DSplitter
 		for (size_t i=0; i<entities.length;)
 		{
 			auto e = entities[i];
-			if (e.head.empty && e.tail.empty && e.dependencies.empty)
+			if (e.head.empty && e.tail.empty && e.dependents.empty)
 			{
 				assert(e.token == Token.none);
 				if (e.children.length == 0)
@ -813,7 +888,7 @@ struct DSplitter
 			auto head = entities[0..i] ~ group(e.children);
 			e.children = null;
 			auto tail = new Entity(null, group(entities[i+1..$]), null);
-			e.dependencies ~= tail;
+			tail.dependents ~= EntityRef(e);
 			entities = group(head ~ e) ~ tail;
 			foreach (c; entities)
 				postProcessDependency(c.children);
@ -825,11 +900,10 @@ struct DSplitter
 		if (!entities.length)
 			return;
 		foreach_reverse (i, e; entities[0..$-1])
-			if (e.token == tokenLookup["!"] && entities[i+1].children.length && entities[i+1].children[0].token == tokenLookup["("])
+			if (e.token == tokenLookup!"!" && entities[i+1].children.length && entities[i+1].children[0].token == tokenLookup!"(")
 			{
 				auto dependency = new Entity;
-				e.dependencies ~= dependency;
-				entities[i+1].children[0].dependencies ~= dependency;
+				dependency.dependents = [EntityRef(e), EntityRef(entities[i+1].children[0])];
 				entities = entities[0..i+1] ~ dependency ~ entities[i+1..$];
 			}
 	}
@ -838,27 +912,22 @@ struct DSplitter
 	{
 		foreach (i, e; entities)
 			if (i && !e.token && e.children.length && getSeparatorType(e.children[0].token) == SeparatorType.binary && !e.children[0].children)
-				e.children[0].dependencies ~= entities[i-1];
+				entities[i-1].dependents ~= EntityRef(e.children[0]);
 	}

 	static void postProcessBlockKeywords(ref Entity[] entities)
 	{
-		for (size_t i=0; i<entities.length;)
+		foreach_reverse (i; 0 .. entities.length)
 		{
 			if (blockKeywordTokens.canFind(entities[i].token) && i+1 < entities.length)
 			{
 				auto j = i + 1;
-				if (j < entities.length && entities[j].token == tokenLookup["("])
+				if (j < entities.length && entities[j].token == tokenLookup!"(")
 					j++;
 				j++; // ; or {
 				if (j <= entities.length)
-				{
 					entities = entities[0..i] ~ group(group(entities[i..j-1]) ~ entities[j-1..j]) ~ entities[j..$];
-					continue;
-				}
 			}
-
-			i++;
 		}
 	}

@ -879,23 +948,23 @@ struct DSplitter
 				return false;
 			}

-			if (consume(tokenLookup["if"]) || consume(tokenLookup["static if"]))
-				consume(tokenLookup["else"]);
+			if (consume(tokenLookup!"if") || consume(tokenLookup!"static if"))
+				consume(tokenLookup!"else");
 			else
-			if (consume(tokenLookup["do"]))
-				consume(tokenLookup["while"]);
+			if (consume(tokenLookup!"do"))
+				consume(tokenLookup!"while");
 			else
-			if (consume(tokenLookup["try"]))
+			if (consume(tokenLookup!"try"))
 			{
-				while (consume(tokenLookup["catch"]))
+				while (consume(tokenLookup!"catch"))
 					continue;
-				consume(tokenLookup["finally"]);
+				consume(tokenLookup!"finally");
 			}

 			if (i == j)
 			{
 				j++;
-				while (consume(tokenLookup["in"]) || consume(tokenLookup["out"]) || consume(tokenLookup["body"]))
+				while (consume(tokenLookup!"in") || consume(tokenLookup!"out") || consume(tokenLookup!"body"))
 					continue;
 			}

@ -917,7 +986,7 @@ struct DSplitter
 		{
 			// Create pair entities

-			if (entities[i].token == tokenLookup["{"])
+			if (entities[i].token == tokenLookup!"{")
 			{
 				if (i >= lastPair + 1)
 				{
@ -932,7 +1001,7 @@ struct DSplitter
 					lastPair = i + 1;
 			}
 			else
-			if (entities[i].token == tokenLookup[";"])
+			if (entities[i].token == tokenLookup!";")
 				lastPair = i + 1;

 			i++;
@ -948,7 +1017,7 @@ struct DSplitter
 				auto pparen = firstHead(entities[i+1]);
 				if (pparen
 				 && *pparen !is entities[i+1]
-				 && pparen.token == tokenLookup["("])
+				 && pparen.token == tokenLookup!"(")
 				{
 					auto paren = *pparen;
 					*pparen = new Entity();
@ -1041,7 +1110,7 @@ struct DSplitter
 			if (entity.token == Token.other && isValidIdentifier(id) && !entity.tail && !entity.children)
 				lastID = id;
 			else
-			if (lastID && entity.token == tokenLookup["("])
+			if (lastID && entity.token == tokenLookup!"(")
 			{
 				size_t[] stack;
 				struct Comma { size_t[] addr, after; }
@ -1060,7 +1129,7 @@ struct DSplitter
 						afterComma = false;
 					}

-					if (entity.token == tokenLookup[","])
+					if (entity.token == tokenLookup!",")
 					{
 						commas ~= Comma(stack);
 						//entity.comments ~= "Comma %d".format(commas.length);
@ -1112,7 +1181,7 @@ struct DSplitter
 				return;
 			}
 			else
-			if (entity.token == tokenLookup["!"])
+			if (entity.token == tokenLookup!"!")
 				{}
 			else
 			if (entity.head || entity.tail)
@ -1142,7 +1211,7 @@ struct DSplitter
 				debug e.comments ~= "%s param %d".format(id, i);
 				funRoot.children ~= e;
 				foreach (arg; args)
-					arg.dependencies ~= e;
+					e.dependents ~= EntityRef(arg);
 			}
 		}
 	}
@ -1241,6 +1310,50 @@ Entity[] parseDiff(string s)
 	;
 }

+Entity[] parseIndent(string s, uint tabWidth)
+{
+	Entity[] root;
+	Entity[]*[] stack;
+
+	foreach (line; s.split2!("\n", ""))
+	{
+		size_t indent = 0;
+	charLoop:
+		foreach (c; line)
+			switch (c)
+			{
+				case ' ':
+					indent++;
+					break;
+				case '\t':
+					indent += tabWidth;
+					break;
+				case '\r':
+				case '\n':
+					// Treat empty (whitespace-only) lines as belonging to the
+					// immediately higher (most-nested) block.
+					indent = stack.length;
+					break charLoop;
+				default:
+					break charLoop;
+			}
+
+		auto e = new Entity(line);
+		foreach_reverse (i; 0 .. min(indent, stack.length)) // non-inclusively up to indent
+			if (stack[i])
+			{
+				*stack[i] ~= e;
+				goto parentFound;
+			}
+		root ~= e;
+	parentFound:
+		stack.length = indent + 1;
+		stack[indent] = &e.children;
+	}
+
+	return root;
+}
+
 private:

 bool isNewline(char c) { return c == '\r' || c == '\n'; }
--- a/posix.mak
+++ b/posix.mak
@ -74,8 +74,8 @@ changed:   $(ROOT)/changed
 dman:      $(ROOT)/dman
 dustmite:  $(ROOT)/dustmite

-$(ROOT)/dustmite: DustMite/dustmite.d DustMite/splitter.d
-	$(DMD) $(DFLAGS) -version=Dlang_Tools DustMite/dustmite.d DustMite/splitter.d -of$(@)
+$(ROOT)/dustmite: DustMite/dustmite.d DustMite/splitter.d DustMite/polyhash.d
+	$(DMD) $(DFLAGS) -version=Dlang_Tools DustMite/dustmite.d DustMite/splitter.d DustMite/polyhash.d -of$(@)

 $(TOOLS) $(DOC_TOOLS) $(CURL_TOOLS) $(TEST_TOOLS): $(ROOT)/%: %.d
 	$(DMD) $(DFLAGS) -of$(@) $(<)
--- a/win32.mak
+++ b/win32.mak
@ -65,8 +65,8 @@ $(ROOT)\rdmd.exe : rdmd.d
 $(ROOT)\ddemangle.exe : ddemangle.d
 	$(DMD) $(DFLAGS) -of$@ ddemangle.d

-$(ROOT)\dustmite.exe : DustMite/dustmite.d DustMite/splitter.d
-	$(DMD) $(DFLAGS) -of$@ -version=Dlang_Tools DustMite/dustmite.d DustMite/splitter.d
+$(ROOT)\dustmite.exe : DustMite/dustmite.d DustMite/splitter.d DustMite/polyhash.d
+	$(DMD) $(DFLAGS) -of$@ -version=Dlang_Tools DustMite/dustmite.d DustMite/splitter.d DustMite/polyhash.d

 $(ROOT)\changed.exe : changed.d
 	$(DMD) $(DFLAGS) -of$@ changed.d