Text publishing technologies for Ethereum dapps

Posted January 18, 2016 by Jonathan Brown ‐ 13 min read

Originally published on jonathanpatrick.me. Retrieved from the Wayback Machine.

Building something like a Reddit or Twitter dapp (distributed app) on Ethereum is certainly an attractive idea, but the obvious question is "where do the messages get stored?" If an autonomous Git or Wikipedia were built on Ethereum the storage requirements would be even greater.

As outlined in a previous post, compressing text before storing it is a very good idea, especially if the storage medium is expensive.

There are many different decentralized technologies that can be used to store and share these blobs. A key factor is permanence. I have analysed the properties of some of these technologies, ordered from most to least permanent:

  1. Contract State

    When a smart contract executes a transaction it can modify its state. The latest version of every contract's state is always available.

    • Cost

      Most expensive. Each non-zero byte in transaction payload costs 68 gas. That's 2,176 gas for 32 bytes. The smart contract must then store the blob in its state. Using a new 32-byte state slot costs 20,000 gas. Overwriting an existing slot costs 5,000. Zeroing an existing slot gives a refund of 15,000. There is an on-going discussion about charging rent for state storage.

      0.050589 USD / kB at time of writing. This would be expected to go down in the long term as Ethereum becomes more efficient.

      It should be noted that if an external storage system is used, an identifier would probably need to be stored in contract state anyway. This should be taken into account when comparing costs.

    • Access to blob contents from a contract

      A contract can potentially read or write a blob stored in state, although reading may not be useful if the blob is compressed or encrypted. Autonomous bots could be written to respond to and/or create posts.

    • Max size

      4kB with current block gas limit, but this will increase in future.

    • Longevity

      Forever.

    • Propagation time

      Instantaneous. Blobs can be read from the state of the "pending" block if not yet mined.

    • Redundancy

      Every full node in the network (on the correct shard), including those that are pruning, will maintain a copy of the blob.

    • Sharding

      Once Ethereum becomes a sharding blockchain it will be possible for contracts to be on separate shards, so only nodes syncing a contract's shard would be storing the blobs for that contract.

    • Full node read performance

      A full node (with the correct shard) has the whole contract state (even if it is pruning) so read is instantaneous.

    • Light client read performance

      Light clients will be able to read contract state from full nodes and each other. The performance characteristics of this looks very promising.

    • Technology maturity

      Operational. No sharding yet.

    • Use cases

      • Very small blobs, e.g. single line text fields.
      • Blobs that you consider to be of very high importance and for which you are prepared to pay a premium to archive for eternity.
  2. Transaction Log

    When a smart contract executes a transaction it can emit events to the transaction log. For example, a dapp user interface could update the display when an event occurs. These transaction logs can also be used for storing blobs of data. Storing in contract state the number of the block that the log was written in is a good idea because this can make it much easier to find the log. This will become even more important with light clients and sharding nodes. Typically the block number would be stored anyway to timestamp the message.

    • Cost

      Expensive. Each non-zero byte of data in transaction payload costs 68 gas. Writing a blob to a log is 8 gas per byte. Typically a log would be being written anyway and the blob can just piggyback on that.

      0.005548 USD / kB at time of writing. This would be expected to go down in the long term as Ethereum becomes more efficient.

      A typical 195 byte comment compresses to 117 bytes with Brotli. This currently costs 0.000649116 USD.

      In theory the cost could be reduced by only storing the blob in the transaction payload and not in the log, but this has a number of disadvantages:

      • Every transaction in the block needs to be checked that it has been sent to the correct address and then have its payload hashed to check if it the the correct transaction. For recent transactions there would be ambiguity over which block the transaction would be in, so multiple blocks would need to be checked.
      • This technique may not work well or at all for light clients or blocks that have been pruned.
      • Unique blob searching code would would have to be written for blobs that are relayed or generated by other contracts.
    • Access to blob contents from a contract

      A contract can write a blob. It is not possible for a contract to read from the transaction log without an oracle.

    • Max size

      40kB with current block gas limit, but this will increase in future.

    • Longevity

      Forever, although really old blobs may only be stored in archival nodes which may charge for access.

    • Propagation time

      Immediate. Geth currently requires one confirmation before the log can be read, although it is possible to write code to extract a blob from transaction payload as described in the cost section above.

      I am an advocate of being able to read zero-confirmation log events in Geth.

    • Redundancy

      Transaction logs are stored by every full node, although at some point it will be possible for full nodes to be pruning. They will not record older transactions, blocks, and logs. This means there will be less redundancy for older blobs. Very old blobs might only be stored in archival nodes that might charge for access.

    • Sharding

      Currently all nodes store everything. This means to synchronize with "light" contracts you also need to synchronize with "heavy" contracts such as ones storing blobs in transaction logs that you may not have any use for. This is undesirable. Once Ethereum becomes a sharding blockchain it will be possible for a contract to be on its own shard and nodes only have to synchronize with heavy shards if they want to.

      Potentially each language could be on its own shard. Non-light client users could sync the shards of the languages they understand.

    • Full node read performance

      Instantaneous if the node has not pruned the block with the log. Otherwise, revert to light client behaviour.

    • Light client read performance

      Light clients will be able to read contract state from full nodes and other light clients. The performance characteristics of this looks very promising. Older blobs will have worse retrieval performance as fewer nodes will have the blob due to pruning.

    • Technology maturity

      Operational. No sharding yet.

      Currently each blob stored in a log has to be stored twice on each node. Once in the transaction payload, and once in the log. Ideally Ethereum would have some sort of blob store built-in so that blobs could be attached to transactions and then simply referred to by an identifier. These blobs would also be able to be read and written by contracts.

    • Use cases

      Potentially this could be used for storing compressed Reddit messages. They would be archived in the "Reddit" shard for eternity. Anyone syncing this shard would be storing everyone's messages, although with pruning some nodes would be only storing more recent messages. Light client users would only get the messages they are interested in.

      Of course, any links to external resources, such as Swarm or IPFS blobs, would not be guaranteed to be archived.

  3. Swarm

    This is the decentralized Dropbox that has been planned since the inception of Ethereum.

    • Cost

      Currently unknown, but it will be an ongoing free-market fee.

    • Access to blob contents from a contract

      Only with an oracle.

    • Max size

      As much as you pay for.

    • Propagation time

      Immediate.

    • Longevity

      As much as you pay for.

    • Redundancy

      As much as you pay for.

    • Read performance

      Very fast, especially for popular content.

    • Technology maturity

      No PoC yet.

      Other similar technologies: Storj, Filecoin, Sia, MaidSafe.

    • Use case

      This makes sense for large blobs, but is not good for archiving as someone has to keep paying forever if the content is not popular. A lot of older content would "go missing". For some users this would be acceptable and probably cheaper than log storage.

  4. IPFS

    InterPlanetary File System can be a powerful technology for sharing content between devices, but there is currently no financial incentivization scheme for the network to store your content as there is with Swarm.

    • Cost

      Free.

    • Access to blob contents from a contract

      No read or write access without an oracle.

    • Max size

      No maximum.

    • Propagation time

      Blobs do not propagate unless another party requests them. This means that whenever you post something, you need to initially stay online for other parties to be able to read the messages.

    • Longevity

      You're own responsibility. Popular content will have greater network longevity.

    • Redundancy

      You're own responsibility. Popular content will have greater network redundancy.

    • Read performance

      Fast, especially for popular content.

    • Technology maturity

      Operational.

      Other similar technologies: BitTorrent, GNUnet, Freenet, Retroshare, Tahoe-LAFS.

    • Use cases

      Because content does not propagate until it is requested, IPFS is only really suitable for devices that are always online.

      It would actually be possible to write a smart contract that would log the IPFS id of every blob that is associated with a specific purpose. That way any interested party could pin all of the IPFS files associated with a "subreddit" for example. The disadvantage is that, unlike with log storage, there is no financial penalty for instructing others to store your blobs.

  5. Whisper

    This is the intended decentralized message passing protocol for Ethereum dapps.

    • Cost

      Free.

    • Access to blob contents from a contract

      No read or write access without an oracle.

    • Max blob size

      No maximum, but the system is intended for messages less than 64kB, typically around 256 bytes. Bigger payloads are more likely to be ignored by other nodes.

    • Propagation time

      Instantaneous, but public messages with high TTL may take longer.

    • Longevity

      Max TTL 2 days.

    • Redundancy

      Message will be stored throughout the network until TTL expires.

    • Read performance

      Immediate once the message has been received.

    • Technology maturity

      Only PoC at this time. Unknown when it will be production ready, but it is to be a core Ethereum technology.

      Other similar technologies: ØMQ, Bitmessage, Telehash, Tox.

    • Use case

      Building Twitter or Reddit on Whisper would be a very different sort of platform compared to building them on a more permanent storage system. However, a more ephemeral system does make more sense for a lot of use-cases. Most of what people say is not of any great importance except in the moment between the parties involved. There is no reason why it should be archived and people have a different psychological attitude to communicating when they don't have to assume that everything is being publicly archived for eternity.