Make -linkonce-templates *not* tamper with the general template
emission algorithm anymore (so on top of default non-allinst or
-allinst modes), and keep those tweaks as experimental
-linkonce-templates-aggressive.
Compiling the druntime/Phobos unittests is only marginally slowed
down compared to the more aggressive variant (~1.5% for debug,
~2.5% for release). It does show some rough 10% increase in required
memory, but that's in line with non-linkonce-templates.
The more aggressive variant has the advantage of skipping
needsCodegen() and potentially codegen'ing less symbols. The problem
is that if an instantiated symbol isn't explicitly referenced, for
instance a CRT ctor, it might not be codegen'd at all.
* Improve ftime-trace implementation.
- Rewrite ftime-trace to our own implementatation instead of using LLVM's time trace code. The disadvantage is that this removes LLVM's work from the trace (optimization), but has the large benefit of being able to tailor the tracing output to our needs.
- Add memory tracing to ftime-trace (not possible with LLVM's implementation)
- Do not output the sum for each "category"/named string. This causes the LLVM output to be _very_ long, because we put more information in each time segment name. Tooling that processes the time trace output can do this summing itself (i.e. Tracy), and makes the time trace much more pleasant to view in trace viewers.
- Use MonoTime, move timescale calculation to output stage, 'measurement' stage uses ticks as unit
- Fix crash on `ldc2 -ftime-trace` without files passed.
To pave the way for simple DLL generation (end goal:
druntime-ldc-shared.dll).
The dllexport storage class for functions definitions is enough; the
automatically generated import .lib seems to resolve the regular symbol
name to the actual symbol (__imp_*), so dllimport for declarations seems
superfluous.
For global variables, things are apparently different unfortunately.
And streamline implicit -singleobj with DMD, also enforcing it when
compiling a *single* module *and* specifying the -of name.
[-makedeps currently depends on -singleobj.]
This uses LLVM's TimeProfiler functionality, such that LLVM's work is traced in the same profile (optimization and machine code gen).
Functionality is meant to be identical to Clang and LLD's --ftime-trace.
I.e., *define* templated symbols in each referencing compilation unit
when using discardable linkonce_odr linkage, analogous to C++.
This makes each compilation unit self-sufficient wrt. templated symbols,
which also means increased opportunity for inlining and less need for
LTO. There should be no more undefined symbol issues caused by buggy
template culling.
The biggest advantage is that the optimizer can discard unused
linkonce_odr symbols early instead of optimizing and forwarding to the
assembler. So this is especially useful with -O to decrease compilation
times and can at least in some scenarios greatly outweigh the
(potentially very much) higher number of symbols defined by the glue
layer.
Libraries compiled with -linkonce-templates can generally not be linked
against dependent code compiled without -linkonce-templates; the other
way around works.
Incl. making sure `-cov=N ... -cov[=ctfe]` doesn't reset the required
percentage to 0.
Use a dummy *bool* option for a better help output (displaying `--cov`,
not `--cov=<value>`).