robust binary support

TODO understanding _start

If we do not have symbols, the only known entry point is _start and that does only call a library function. However it has a regular structure and one can find the address if main by being smart. Unfortunately it's not necessarily possible to do that portably so I kind of want to move it out of the core.


TODO finding _start for !ELF

COFF / MacO has an entry point. unfortunately LLVM doesn't directly let me access this information so currently it does some magic for ELF but not others. Might be possible to find something in LLDB which we might want to have long-term anyway to add debugging capabilities

TODO finding plausible function prologs

Some nice heuristices (implemented as plugins) to get something even if we can't properly find function calls.


TODO routing data through serializing point

Send all relevant data to some management instance. The management instance then notifies all stakeholders of the update. This would make sure the resulting state of the system is reproducible from the serialized stream and is network-streamable.

One will need to take care of potential performance issues.

TODO read/write files

What we actually want is a bag of transactions and a logical (semi) order on them. Idea is to mostly store xml in a zip container.

TODO network stuff

If we can serialize to zip containers we can stream them over the network. Idea is to use XMPP as transport, a central instance that supplies new participants with all past information and a MUC where updates are sent to.

Master/slave setups should be easy. Also normal editing should be almost conflict free. Probably needs some locking to not cause conflicts when scripts runn over the whole binary and add information everywhere.

graph layouting

TODO special entry (exits?) points

B0 with metainformation about the function. also gives a ⊤ and ⊥ for the graph which is nice for all kind of algorithms

TODO routing edges

Edges skipping on layer of blocks sometimes get through (below) other blocks. Fixing that makes the whole thing a complex problem while currently it's a set of rather simple heuristics

TODO Dominator -> anordnung untereinander falls total dominiert

Currently blocks are ordered by address. Sometimes blocks are semantically strictly after another but seen before in the address space. Would be visually nicer to have a semantical order so backward (upward) edges really only happen in loops


TODO design reasonable API

SWIG? We really want a API that looks native for all supported scripting languages and we want a API that is semantically the same for all languages

TODO python

We have guile implemented and working. Python seems to be highly popular so we will want to have it as well some time not to distant in the future.

Non-.text stuff

TODO finding data and strings

identify data types by interpreting the instruction sequence referencing a datum. Probably we want to have all instructions referencing a address in data segments and do some type narrowing based on that. Probably also function calls

Anotating stuff

TODO notification of annotations to stakeholders

TODO Configuration stuff


TODO build up instruction analysis for !arm !x86

TODO instruction alignment on RISC

TODO stop on decoding error?

TODO hlt in _start / general

TODO do not create functions for plt entries

TODO blocks not displayed in i4/cip


Deduce structure

TODO Natural loops

TODO trivial control-flow-split

Non-.text stuff

TODO plt stuff and finding API via parsing /usr/include

We already have a C parser (clang) so if we see a call to function@plt we can see for potential prototypes for that function in /usr/include which would give nice type information for C libraries. We also want to display manpage for that function if available