The Ethereum platform has seen a lot of interest in the last couple of years. Not only from various fields in academia, such as computer science, economics, and law, but also from the financial sector and law enforcement.
Even though the platform is now more than 5 years old, little open source tooling is available to efficiently analyses the entire data corpus. One notable exception is ethereumetl, a project that converts Ethereum’s basic data structures into a relational representation, i.e., SQL tables or CSV files. Ethereumetl relies on off-the-shelf database software to facilitate the analysis and interpretation of the dataset.
With EtherSci, we sets out to build a purpose-tailored database system to efficiently and intuitively analyze data residing on the Ethereum platform. The system architecture is inspired by BlockSci, a popular Bitcoin analysis platform developed at Princeton University. Although the architecture is set, Ethereum brings many unique challenges in its data representations that still need to be overcome.
The aim of a thesis is to push forward the development of EtherSci and thus make data on the platform more accessible and transparent to interested observers.
Topics could be:
- Compact representation for smart contract calls, events, return values etc.
- Implementation of a scripting frontend (e.g., Python)
- EVM integration for re-computation/simulation of state transitions
- De-duplication approaches for deployed source code
- Plug-in system to support different smart contract data representations
- Web dashboard and configuration system
Your own ideas are welcome. We plan to assign several theses on EtherSci related projects.
What we have so far:
- EtherSci prototype written in Rust; support for the main data structures, efficient point queries and in-order traversal
- Basic de-duplication of addresses etc.
- Rainbow table lookup for smart contract calls to resolve parameters