Eth 2.0 Dev Update #56— “Road to Mainnet”

https://medium.com/prysmatic-labs/eth-2-0-dev-update-56-road-to-mainnet-3fbd50dde484?source=rss----33260c5177c6---4

🆕 Road to Launch

🔹 Mainnet release public checklist

https://github.com/ethereum/eth2.0-pm/projects/1

A public checklist for the eth2 phase 0 mainnet launch has finally been created and released to the public here. If you’re curious about the progress towards mainnet and when we might launch without resorting to speculating about dates, this project board is a great way to do so. To give a more granular perspective of our team’s focus before mainnet:

  • Second security audit
  • Implementing the eth2.0-apis standard in Prysm for client interoperability
  • Wrapping up voluntary exits in Prysm
  • A comprehensive web UI for Prysm!
  • Fuzz testing and resolving important bugs before we go to mainnet
  • Slasher improvements
  • Common slashing protection format for transporting keys between eth2 clients
  • Weak subjectivity sync

Out of these, only a few are features, which means that we can likely perform a feature freeze by mid October, allowing us to only work on security improvements and UX before going live. If all goes well, November is still looking good for a launch from our perspective.

🔹 Audit by Trail of Bits

We are pleased to announce the Prysm project is being audited by security firm Trail of Bits. Going into mainnet, having 2 full code audits for our eth2 client is critical for the safety of our stakers and in order for us to also identify ways of improving our client with code best practices. Having two separate organizations, Quantstamp and Trail of Bits, review our code independently is also beneficial because Trail of Bits can already start looking at the code in the context of the previous audit and identify places where the code may have changed since then or places where it would benefit from further review. In particular, this new audit is focusing a lot on slasher, slashing protection, and core specification attack vectors. For the sake of optimization, sometimes we diverge from the spec in certain places and this audit will help determine the safety of our approach.

📝 Merged Code, Pull Requests, and Issues

🔹 Significant security improvements to denial of service attacks

Ethereum Foundation researcher Protolambda has helped us a lot over the past 2 weeks with security analysis for denial of service in Prysm. He has shared with us multiple places where we failed to perform proper checks on inputs to prevent the node from being overwhelmed by data coming from the outside world. We have been tightening up this code recently with many bug fixes merged in recently that would help prevent any catastrophic scenarios in mainnet. We are looking forward to this upcoming audit to further analyze potential issues.

🔹 Removal of hosted eth1 node support, to closer simulate what mainnet will resemble

As many node operators are aware, Prysmatic Labs has been offering their own hosted eth1 nodes to people running Prysm beacon nodes for a long time now. For participating in the testnet over the previous months, stakers didn’t need to run their own eth1 node because their beacon nodes would connect by default to https://goerli.prylabs.network. However, as mainnet gets closer, we will not be hosting eth1 nodes for the public to use. The expectation is that you must either run your own eth1 node, or use a third party provider such as Infura or Alchemy. Running an eth1 node is important if you are running validators because validators include the latest eth1 block root and other information in their blocks to use in a voting process within the beacon chain. To add your own eth1 node, you can follow our instructions in our documentation portal here. As of the last few weeks, Prysm beacon nodes do not connect to our hosted eth1 nodes by default.

🔹 Voluntary exits implemented, with the ability to submit an exit from the Prysm CLI

Our teammate Radek took ownership of the voluntary exits feature in Prysm. We have promised our users for a while that we would allow simple voluntary exits from the command line with Prysm, and we decided to prioritize that as we get closer to mainnet launch. Radek implemented a command: `prysm.sh validator accounts-v2 exit` which guides stakers through an interactive process in which they can submit an exit to their beacon node. Given exits are irreversible, a lot of steps are in place to ensure users know what they are doing before they successfully complete the process. You can see the implementation here.

🔜 Upcoming Work

🔹 Advanced peer scoring added to our p2p routing

Enabling peer scoring and evaluation in Prysm nodes is an ongoing effort which eventually will result in beacon nodes favoring well-behaved peers, while restricting less useful ones.

The problem is being tackled from two sides: scoring peers’ behavior on an application level invariants, and, then, descending to a lower network level invariants too. Application level scoring allows us to restrict peers based on some higher level scenarios (e.g. restricting peers that happen to return less than average number of blocks consistently during two consecutive epochs). Network level invariants help nodes build a healthy network mesh, based on network performance of surrounding peers.

Application level peer scorer has been added in #6579 and #6709, and is still highly experimental, and therefore will sit behind the ` — dev` flag for quite some time (still once we sort out the network mesh scoring, extending it to the app level is just a matter of injecting our scoring function into GossipSub params list).

When it comes to enabling network level scoring, it was blocked by an issue in the upstream protocol, which has recently got resolved (GossipSub support for dynamically setting topic scoring parameters was merged in just a couple of days ago, see this PR in go-libp2p-pubsub). So, with this update to GossipSub, we have been able to progress with introducing network level peer scoring into our p2p routing. If you wish to follow the development, please, refer to issue #6043, which has a corresponding PR#6943. That PR is still a work in progress, but we hope to merge it into the master in the upcoming weeks.

🔹 Weak subjectivity sync

One of the beautiful aspects of eth2 is the concept of “chain finality”, which means that given the consensus voting rules, there exists a certain checkpoint of blocks in which the chain cannot be reverted at all as defined in the protocol. Proof of work chains can always be reverted if an attacker has enough mining power to force the majority to switch to their chain. However, given how fork choice works in proof of stake along with the rules of consensus, proof of stake defines explicit finality in which the protocol itself makes it impossible to revert past a certain checkpoint.

An obvious example is the genesis block, which by definition is meant to be irreversible and is agreed upon by all participants as the starting point of chain sync. However, given enough time and finality, we can pick another checkpoint which is not the genesis block from which it is safe for nodes to sync while having significant validation of the blockchain. This sort of sync is known as “weak subjectivity sync” and has been subject of much research by the Ethereum Foundation over the past years. Before we go to mainnet, implementing weak subjectivity sync is important to mitigate certain attacks on the chain and also avoid having to hard fork in the future to add such a feature. The official write-up on how weak subjectivity will work on eth2 is located here and our teammate Terence has already started on incorporating this into Prysm.

🔹 Common slashing protection format for client interoperability

Exporting keystores from Prysm and importing them into Lighthouse or vice versa is not enough to protect users during a catastrophe. During the Medalla testnet incident, we saw several validators get slashed when they transitioned from Prysm to another client. The reason this happened was because Prysm implements a slashing protection feature but it is not compatible with other clients and the slashing protection summary does not get exported when a user exports their keystores to another client. As we get closer to mainnet, having this protection become a common format between clients is critical, as well as having clearly documented migration paths between clients that keep users safe. Michael Sproul from the Sigma Prime (Lighthouse) started an initiative for slashing protection compatibility, and our teammate Shay has been working with him and other eth2 implementers to ensure we include this feature in Prysm.

🔹 Implementing the standard, eth2.0-apis in Prysm

Our teammate Ivan has been working on a standard implementation of https://github.com/ethereum/eth2.0-APIs which has been decided as the REST API standard for eth2 clients. Over the past year, we have maintained our own API under the repository https://github.com/prysmaticlabs/ethereumapis which has powered Prysm through multiple testnets and served block explorers such as https://beaconcha.in and https://beaconscan.com.

However, it is really critical all teams align to a standard as much as possible before a mainnet launch. In eth1, the two major clients, geth and parity, had certain huge mismatches in API endpoints which made interoperability difficult and a pain for many node operators, block explorers, and companies. Post mainnet, teams will likely be a lot busier with maintenance and improvements to implement such a radical overhaul of their API. This is why we aim to finish our compatibility with eth2.0-APIs before mainnet launch. Ivan has been working on defining all of these protobuf definitions necessary for the API standard here:

https://github.com/prysmaticlabs/ethereumapis/pull/184 and over the course of the coming two weeks we will start implementing them in Prysm. It is important to note we will still support ethereumapis for those who wish to use it.

Miscellaneous

🔹 Awesome project built for Prysm: Typescript Remote Signer Server!

We want to highlight an awesome project by one of our stakers this past week for Prysm. Sven from our discord server recently published https://github.com/ethpos/remote-signer-ts which is a remote signer implementation compatible with Prysm written in Typescript! Remote signers are the most secure kind of wallet setup for anyone participating in eth2 as they completely separate the validating keys from the beacon node via an Internet connection. You can connect Sven’s remote signer to your prysm validator client to perform signing of data and block proposals remotely. For reference, we have a dedicated page in our docs portal towards remote signers here. This page includes all information regarding how a remote signer works, what it takes to build a remote signer, and how to use this as your wallet in Prysm. Check out Sven’s project :).

Interested in Contributing?

We are always looking for devs interested in helping us out. If you know Go or Solidity and want to contribute to the forefront of research on Ethereum, please drop us a line and we’d be more than happy to help onboard you :).

Check out our contributing guidelines and our open projects on Github. Each task and issue is grouped into the Phase 0 milestone along with a specific project it belongs to.

As always, follow us on Twitter or join our Discord server and let us know what you want to help with.

Official, Prysmatic Labs Ether Donation Address

0x9B984D5a03980D8dc0a24506c968465424c81DbE

Official, Prysmatic Labs ENS Name

prysmatic.eth


Eth 2.0 Dev Update #56— “Road to Mainnet” was originally published in Prysmatic Labs on Medium, where people are continuing the conversation by highlighting and responding to this story.

Eth2 Medalla Testnet Incident

https://medium.com/prysmatic-labs/eth2-medalla-testnet-incident-f7fbc3cc934a?source=rss----33260c5177c6---4

Summary

The eth2 public testnet, Medalla, spiraled into a series of cascading failures this past weekend which exposed several vulnerabilities and process faults in how to best handle critical scenarios. Starting with the receipt of bad responses from 6 different time servers which threw off most nodes running our Prysm client at the same time, our team rushed to push a fix to the problem. This fix contained a critical flaw which removed all necessary features for our nodes to function. This problem led to network partitions, with everyone synchronizing the chain at the same time but unable to find a healthy peer, Medalla had a very eventful weekend and offered us the greatest learning experience to prevent this happening again, especially on mainnet. This post will cover the full summary of the incident, its consequences, lessons learned, and concrete action plans moving forward before a mainnet launch for eth2.

Clarifying Questions

Did the eth2 testnet fail because of relying on cloudflare.com’s time servers? Does eth2 need to rely on a single point of failure for timestamping?

No. We were leveraging roughtime cloud servers to give users feedback that their system time might be off, and dynamically adjusting their time based on the responses of these servers as a nice thing to do, but this was not necessary at all and instead was problematic. It is indeed a security risk if we are relying on a single point of failure for something as important as timestamping in eth2, and it is unnecessary. Starting from this incident, we will rely on system time only. If a validator’s time is indeed off, we can tell them, but will not forcefully change it. Other eth2 client implementations only use system time and we will too.

Is the testnet dead?

No. As long as it is possible to run a node and as long as validators can validate, the testnet can always go back to being fully operational. At the moment, client implementation teams have hardened their chain sync code to allow for smooth running of nodes, which will boost validator participation and allow the chain to finalize again. We still have hope. Participation has now climbed from 0–5% to 40%. The chain needs > 66% to finalize.

How does this affect the mainnet release? Is there a new delay?

We believe this incident does not inherently affect the launch date. The Prysmatic Labs team recommends ETH2 launch schedule to continue with no delay. The incident from this weekend was a good stress test for many clients and actually checks off a few requirements on the launch checklist. While the launch date has not been set, we believe the expected launch target of 2 to 3 months from Medalla genesis is still an ideal timeline. There will be a public checklist of requirements for an eth2 launch, and this Medalla incident will definitely add a lot of new items to the list regarding client resilience, security, and proper release. That’s as much information as we have today.

Timeline of events

First signs of trouble

Almost immediately after the incident started, users noticed their Prysm node reporting their clock was off and they were seeing blocks from the future. At the time, Prysm was using the roughtime protocol to adjust the client clock automatically by querying a set of roughtime servers and determining the appropriate clock offset. The roughtime protocol works by querying a set of servers in sequence with a chain of signed requests/responses and the client would take the average of these responses to adjust their clock.

Notice something off about this? The 6th server in the list is reporting a time which is 24 hours further than the other 5 servers. When the roughtime client took the average of these results, it thought that the correct time was Fri, 14 Aug 2020 22:20:23 GMT. One of the key components of the roughtime protocol is accountability. Given that there is a signed “clockchain” of sorts, we could prove that the ticktock server misbehaved and reported the issue effectively. Unfortunately, Prysm did not log the signed chain and we do not have it stored anywhere from the time of the event. After a quick look at the logs, TickTock was off by 24 hours which would explain why we had a 4 hour offset with 6 servers. We got in contact with the maintainers of the roughtime project from cloudflare and they have since determined how to make their client more robust in case of server faults:

Client robustness to misbehaving server · Issue #22 · cloudflare/roughtime

Midpoints (timestamp) returned around the time of the incident

Emergency fix update (alpha.21)

Although we didn’t know that one of the roughtime servers was reporting 24 hours into the future, we knew that something was wrong with roughtime and that it needed to be disabled immediately. Rather than deleting the roughtime code entirely, we modified it to require a runtime flag to adjust the clock instead of adjusting it automatically by default. The network was in a rough state and we wanted to act fast. We decided to push an “emergency release” and ask everyone to update to the new code immediately. However, right before we did this, the roughtime servers recovered.

The mass slashing event

We should have seen this coming in hindsight, but it still shocked everyone as it happened. At approximately 2AM UTC, every validator that was active during the roughtime incident was now attempting to get slashed! It became quite the carnage, with over 3000 slashing events broadcast during a short amount of time and all of our internal validators slashed. We had not configured local slashing protection in-time for our own internal validators while we were really busy with improving the user experience for the testnet users.

Slashing protection in action

Fortunately, Prysm nodes ship by default with a simple slashing protection mechanism that keeps track of attestations and blocks validators produce to prevent them from signing the same messages again. For many users, this saved them from catastrophe!

The real problem: bug discovered in alpha.21

Worried by the urgency of the original problem, we didn’t think too much about all the implications of a potential fix, and focused more on quickly releasing it than checking carefully if it would break anything else in our nodes. Our teammate Nishant was the first to point that our release number alpha.21 was critically flawed. The network could have recovered on its own if we had not acted at all.

In releasing this fix, we accidentally removed the initialization of all critical features for our eth2 beacon node to function, making the problem infinitely worse. After announcing the release to everyone in our discord server and on twitter, stakers started to quickly update their nodes, which is when we realized just how screwed up this was. Even worse, the roughtime bug had fully recovered by now, which would have likely fixed the issues in the network had we not acted so swiftly.

Rollback and syncing troubles

After realizing the scope of the mistake, we immediately recommended users to roll back to a previous release, now that the roughtime issue had been resolved. This ended up being a really rough move, as the network had become so partitioned, with users confused about why there were so many updates in a short span of time. As a consequence of most nodes being down for a while, and users restarting their nodes too often to fetch updates, it seems almost everyone in the network was trying to sync with the chain, making it impossible to reach the chain head. Moreover, it became really difficult for nodes to resolve folks. That is, with so many bad peers in the network, good peers were needles in a haystack. Moreover, resource consumption for nodes was climbing through the roof. Other client implementations were seeing massive memory consumption and Prysm nodes were also suffering from significant CPU usage, which didn’t help when trying to resolve forks.

Current Status

The incident exposed several key, flawed assumptions our node was making in terms of handling forked blocks in the event of a highly-partitioned chain. We were not handling several code paths where there could be multiple blocks in a certain slot time, which was causing our nodes to often get stuck. Moreover, even after we resolved these issues, nodes could not resync to the chain head if they had fallen behind. It is easy to write code that assumes chain stability, but having it function equally in times with many bad peers, forks, network partitions is another beast altogether. Our sync logic was robust enough to handle these scenarios through a change in assumptions, which our teammate Victor Farazdagi was able to resolve quickly. We have since then pushed a fix that has led Prysm nodes to sync to chain head and remain in sync! At the time of writing, most nodes are updating to this version and at this point, we just need more validators to come online and start attesting to and proposing blocks with their synced beacon nodes. For next steps, we are monitoring chain participation and getting in touch with as many individuals as possible who run validators to understand if they still run into issues.

If you are running Prysm, you can download the latest version from our releases page https://github.com/prysmaticlabs/prysm/releases or follow our detailed instructions from our documentation portal here https://docs.prylabs.network/docs/install/install-with-script. We will keep updating on the status via our Discord as the situation progresses.

Lessons Learned

Don’t rush to merge in fixes

This entire incident could have been avoided if we did not rush to fix the roughtime bug. The reason we got into this state in the first place was due to a faulty pull request we merged which reverted all critical features needed for our nodes to function. Worried by the urgency of the original problem, we didn’t think too much about all the implications of a potential fix, and focused more on quickly releasing it than checking carefully if it would break anything else in our nodes. Our teammate Nishant was the first to point that our release number alpha.21 was critically flawed. The network could have recovered on its own if we had not acted at all.

Due to a sense of urgency of seeing all our client’s nodes in the network suffering, we wanted to ease users’ concerns as fast as possible. Although the fix was originally created by an outside contributor, it was our fault that it was not reviewed with utmost care. There was a single line of code that unset all global configurations for our nodes. https://github.com/prysmaticlabs/prysm/pull/6898/files#diff-fb86a5d3c2b85d3e68cad741d5957c29L263. Moving forward, every release candidate that is done in the middle of a difficult situation or a crisis needs to be

  1. Reviewed by the entire team + someone external to the team such as an Ethereum researcher
  2. Needs to be tested in a staging environment for a certain period of time, either the Prysm eth2 attack net or a local testnet that contains the same bug as users are experiencing

Our team typically uses the practice of canary deployments which run newly merged pull requests side-by-side with production deployments to understand how they perform relative to a baseline over a period of time. However, given every node was unhealthy, including those we run internally, there was no way to run a canary against a baseline. We rushed to fix the issue and publicized the release to users as soon as we had it, not realizing it was completely broken. This will not happen moving forward, and we have learned the costly lesson of this. Despite validator balances decreasing and the chain not finalizing, we need 100% confidence in fixes released during such tumultuous periods.

Careful external communication regarding updating nodes in periods of instability is critical

Another mistake that occurred during the incident was our external communication regarding updating nodes. Given 90% or more nodes were having critical issues, and were far away from being synced to the head of the chain, telling everyone “hey, quickly update your nodes!” led to absolute chaos. Nodes also have a default max peers cap of 30, which for people that did not know how to amend this cap would mean almost all their peers were bad or probably also trying to sync. Not everyone has notifications enabled for our discord announcements, and people on different time zones may have been away during the time we asked everyone to update. Having clear communication, understanding the state of the network and the implications of asking everyone to update their nodes, are among the main lessons we learned from this incident.

Make migrations to other eth2 clients seamless and well-documented for users

One of eth2’s main talking points is its decentralized development, with 5 different, independent teams working on building client implementations of the protocol. In the Medalla public testnet, we had 5 clients participating at genesis. Although not every client had the same readiness status, many had improved significantly since the last testnet experiments. At the time of the incident, over 65% of the network was running our Prysm client, which contributed to the network catastrophe once all Prysm nodes went down. The general idea of network resilience is to be able to easily switch between clients in the event of a single client having a critical bug. Unfortunately, the release of the Medalla testnet coincided with teams working on standardizing how their clients manage validator keys, which means we all weren’t 100% prepared on documentation for migrating between Prysm and Lighthouse, for example. This is a high-priority action item moving forward, and something we’ll be adopting into our public documentation portal. Users should be able to easily switch clients whenever they wish, while abiding by security best practices which we need to publicly announce.

Important takeaways for stakers

This was the best thing to happen to a testnet

It would have been really terrifying if the Medalla public testnet ran uninterrupted, with perfect performance right before mainnet, and then this bug occurred with real money at stake once eth2 launched. In terms of worst-case scenarios for a blockchain, having a client running the majority of the chain contain a bug which makes all nodes go offline is indeed a nightmare that has manifested itself in Medalla. Knowing what to do in this situation and being equipped to migrate to a different client if needed, is really important for stakers participating.

The risk of eth2 phase 0

Eth2 phase 0 is a highly ambitious project that poses technical risks for those joining at its genesis time. It is a complete revamp of the Ethereum protocol, introducing Proof of Stake, which will make it debut for the first time following the Casper FFG (friendly-finality gadget) research that has been ongoing for several years. Although the client implementations have matured significantly and will come out of this incident much stronger than before, eth2 is an experiment that can have serious consequences for those joining and not understanding the risks. Had this happened in mainnet, with several days without finality, people would lose millions if not tens of millions of dollars in collective penalties, which would be extremely painful for all including the most ardent supporters. We want to make it clear that these risks are very real, and technical risk is impossible to eliminate. The Eth2 launchpad contains a section on the very real risks of early adoption, and we encourage you to think carefully whether or not staking on eth2 is right for you.

The risk of client dominance

At the time of writing, our Prysm client runs around > 78% of publicly accessible eth2 beacon nodes

Before the incident, the number was around 62%, which is still unreasonably high. Eth2 development is focused on multiple, independent implementations of the protocol that allows users to switch between them in times of crisis or when a bug is found in a single implementation. The current issues in the Medalla testnet are a consequence of having a single client disproportionately run the network, and we believe this is something that should change. Other client teams are making strides in updating their documentation, improving their resilience, and making it easier for people to run them. A lot of stakers have picked our client because it has been easy to set up for them in their personal or cloud setups, and we are humbled by this. However, it is good practice to try other implementations and experiment switching between them as needed, especially for those stakers operating many validators in situations like these.

How to keep your nodes updated

Convincing people to update their nodes is the eternal bane of client implementation teams. As networks become more and more decentralized, there isn’t a single source of communication where we can notify all node operators to update their nodes to fix critical bugs. The best teams can do is rely on their own forms of communication, such as their Discord servers, Github release pages, or even Twitter accounts. Despite telling everyone “hey, update your nodes!”, operators might be in different time zones and therefore offline, or simply might not use social media. Relying on everyone updating at the same time can spell disaster, because if nodes are offline for a long time, and then everyone tries to update, so many peers in the network will be trying to sync at the same time and would flood the network with bad peers. Having a more detailed strategy towards releases, ensuring “official” channels are known to people, is critical. Moreover, knowing that large holders of ETH and people running many nodes can have easy access towards communicating with our dev team is important. We want to work more closely with operators to ensure they know when they should update and discuss with them the changelogs that are relevant to them in every update.

What if this happens in mainnet?

Needless to say, this sort of situation is one of the worst-case scenarios for mainnet. Although these series of events were the best thing to happen to the testnet, as they give us a taste for how to resolve a network catastrophe, this cannot happen the way it did when there is real money at stake. Even though you can have security audits, careful code reviews, and staging, the reality is there will be attacks on the network in the same way eth1 was attacked and DDoS’d many years ago, and this is something we need to prepare for. The folks behind eth1 are battle-hardened and have accumulated so much knowledge regarding appropriate responses to catastrophes. This testnet gave us the following lesson and requirement for client teams if we want to deploy to mainnet:

  1. Have checklists for everything, including release candidates, staging, external communications, monitoring
  2. Have clear instructions for users to migrate between eth2 clients as needed
  3. Have a step-by-step guidebook followed by an eth2 client response squad. Prysmatic Labs has its own internal playbook, but coordinating a central one between eth2 clients would ease a lot of concerns
  4. Have a detailed plan for communicating with stakeholders, node operators, and regular users regarding updating their nodes in critical periods

This scenario truly changed how we approach eth2 development from now on. We have always understood the high stakes of this project, but are now much more equipped to understand how to react in times of crisis, how to keep our cool, and what NOT to do when many are relying on our client to stake.

Conclusion

In conclusion, the Medalla eth2 testnet suffered from cascading failures due to bad decisions on our part regarding handling response fixes to a problem affecting many nodes at once. The testnet did not get into this state only due to roughtime, or due to a central point of failure, but rather through a series of events that culminated in various network partitions. This is the best possible thing that could have happened in a testnet, and all eth2 client teams will now be extremely prepared to avoid any scenario of this kind in mainnet. We will focus on more process, security, and appropriate responses to improve the resilience of eth2 by working together with all the client implementation teams towards these goals.


Eth2 Medalla Testnet Incident was originally published in Prysmatic Labs on Medium, where people are continuing the conversation by highlighting and responding to this story.

Quantstamp Security Audit Results for the Prysm Eth2 Client

https://medium.com/prysmatic-labs/quantstamp-security-audit-results-for-the-prysm-eth2-client-7f949c6c866f?source=rss----33260c5177c6---4

Overview

Quantstamp, a blockchain security firm, recently completed its audit of our Ethereum 2.0 client, Prysm. Over the course of 2 months, Quantstamp examined our full codebase for critical security vulnerabilities, gaps in testing, and important end-user considerations for minimizing security risks. Read the full audit report here.

Quantstamp’s audit of Prysmatic Labs’ eth2 client involved ten engineers who carefully reviewed the following key aspects of our implementation:

  1. Core beacon chain logic
  2. Conformance to the official eth2 specification
  3. P2P networking layer security risks
  4. RPC-API for the Prysm client
  5. Low-level database attack vectors
  6. Account management and key storage
  7. Validator client logic
  8. All of our shared utils and lower-level helper functions

We are pleased to announce the Quantstamp report found minimal critical issues, and most of them had been resolved by our team by the time of the audit’s completion.

We originally defined the scope of a security audit as follows

  • Operational threats
  • Docker deployment ./prysm.sh start script
  • Potential security pitfalls in client side interaction and configuration
  • Data to/from external sources
  • Data to/from internal sources
  • Control flow integrity
  • Potential current exploitable active vulnerabilities
  • Potential security gaps in user interactions
  • Security assumptions, potential future weaknesses in design and implementation
  • Strength of existing security controls and potential improvements that could be made
  • A high-level security review of Prysm dependencies

The expectation for the audit was as follows

For each vulnerability, detailed information containing:

  • Vulnerability description
  • Likelihood of exploitation
  • Impact qualification
  • Overall vulnerability severity
  • Recommended mitigative action
  • Detailed actions to perform to mitigate the vulnerability
  • Recommendation complexity analysis
  • Reproducible/automatable verification of mitigation, where applicable

Quantstamp did an excellent job not only explaining in great-depth some of the vulnerabilities to our team, but also included a comprehensive appendix with suggested best practices to reduce further attack surface. Upon completion of the audit, we created a comprehensive tracking issue to resolve all items found by Quantstamp here.

We are now almost completely done with the important vulnerabilities and are confident our codebase has become more robust as a result.

Security Vulnerability Highlights

To give an idea of the type of vulnerabilities found during the audit, here are 2 critical bugs that required immediate attention from our team and have since been resolved.

QSP-2 The functions IntersectionUint64(), IntersectionInt64() and IntersectionByteSlices() may return union instead of intersection

A few helper functions we use in Prysm, namely IntersectionUint64(), IntersectionInt64() and IntersectionByteSlices(), work correctly only for 2 arguments, i.e., they return intersections. In case more arguments are provided, the functions may return their union. These are important helpers we are using in consensus-critical code paths. Quantstamp was able to detect how they could be misused, and we were able to quickly patch the vulnerability.

QSP-3 Potential issues due to granularity of timestamps

Prysm utilizes a timestamp utility created by cloudflare called roughtime which synchronizes system-time offsets with a series of cloud NTP servers to ensure minimal clock disparity. However, there was a minor detail that can prove to be critical in production regarding these timestamps’ granularities.

If the beacon chain’s slot time is 5000s, and actual time represented by roughtime.Now() is 5900 ms. However, since roughtime.Now().Unix() returns time in seconds — that is, it returns 5s, and currentTime ends up being 5000 (same as slotTime). However, an actual current time is 5900s, which is 900ms higher than slotTime, but 900ms is bigger than 500ms. Therefore, the time difference is actually higher than MAXIMUM_GOSSIP_CLOCK_DISPARITY.

Quantstamp’s Recommendation: If tolerance is less than a second, it makes more sense to do time computations in milliseconds directly, rather than using seconds and then multiplying by 1000. Use https://golang.org/pkg/time/#Time.UnixNano instead of just .Unix(). Same applies to validating other things, e.g., attestations.

We successfully patched the vulnerabilities above thanks to the great communication from the Quantstamp team, and have also completed their team’s documentation improvement suggestions as well as a full tracking issue for best practices.

Disclaimer

It is important to note that security audits are not meant to be guarantees of code being bulletproof. No audit can catch every single vulnerability, but getting as many experts reviewing our lines of code as Quantstamp dedicated to the project was humbling and a productive experience.

Interested in Contributing?

We are always looking for devs interested in helping us out. If you know Go or Solidity and want to contribute to the forefront of research on Ethereum, please drop us a line and we’d be more than happy to help onboard you :).

Check out our contributing guidelines and our open projects on Github. Each task and issue is grouped into the Phase 0 milestone along with a specific project it belongs to.

As always, follow us on Twitter or join our Discord server and let us know what you want to help with.

Official, Prysmatic Labs Ether Donation Address

0x9B984D5a03980D8dc0a24506c968465424c81DbE

Official, Prysmatic Labs ENS Name

prysmatic.eth


Quantstamp Security Audit Results for the Prysm Eth2 Client was originally published in Prysmatic Labs on Medium, where people are continuing the conversation by highlighting and responding to this story.

Eth 2.0 Dev Update #53 — Altona Testnet Launched

https://medium.com/prysmatic-labs/eth-2-0-dev-update-53-altona-testnet-launched-bf41173a8513?source=rss----33260c5177c6---4

Eth 2.0 Dev Update #53 — Altona Testnet Launched

Altona Multiclient Testnet Launched

https://eth2stats.io supporting the new Altona testnet for eth2

There’s a new testnet in town, and with incredible results so far since its launch. Altona is a coordinated, multi-client testnet for eth2 phase 0, different than Prysmatic Labs’ Onyx testnet as it had 4 eth2 clients (Prysm, Lighthouse, Teku, and Nimbus) at genesis instead of just Prysm! Altona was a coordinated effort organized by Afri Schoedon and the Ethereum Foundation as a smaller multiclient testnet. The goal of Altona is to ensure some degree of stability before there is an official announcement of a large scale, “official” multiclient testnet for eth2. You can connect Prysm to Altona using the ` — altona` flag when running your beacon node and validator. Additionally, you can monitor the Altona network on https://altona.beaconcha.in created by Bitfly, showing some awesome graphs regarding its liveness so far. You can monitor some of the nodes in the network by using https://eth2stats.io/altona-testnet. Keep posted for the big announcement of a large-scale multi-client testnet within the coming weeks!

Merged Code, Pull Requests, and Issues

New state management refactor for speed and improved memory usage

Improved non-finalized state access with no database reads.

Some Onyx users have noticed a regression in beacon node restart times since Topaz due to the new state management design. This regression is observed whenever the user initiates a restart to download the latest Prysm release or in the unlikely event that Prysm crashes and needs to be restarted. In either case, these few moments of downtime are critical to user profitability so we have spent much of this week going over the existing design and reworking it to significantly reduce the start up time for an already fully synced node. In this process, we found many areas of improvement that help with the code clarity and overall performance of Prysm during initial synchronization as well as normal runtime processing. Read the full design document and track Github issue 6325.

Significant improvements to syncing with full verification

Prysm syncing Altona around 100bps with full signature verification,

Initial beacon chain synchronization has been one of the core areas of focus for the Prysmatic Labs team in Q2 and we’ve made significant improvements. One of the shortcuts Prysm has used in the past was to skip signature verification of block attestations in initial sync. While this gives a significant boost in sync speed, it is a major risk since we must “not trust but verify” everything, even finalized data. At the time of writing, Prysm takes around 6 minutes to fully sync the Altona chain with full signature verification. Expect even more improvements in the coming weeks as we’ve begun work in areas of improvement which will result in an additional 10% to 20% increase in initial sync block processing times. As these changes land in Q3, the default behavior will soon be to verify all block signatures in preparation for mainnet release.

Audit resolution fixes

As mentioned in previous updates, Prysm recently went through a full security audit from Quantstamp. We will be announcing the specific results of the audit and our experience with it in a separate post next week. However, we have already patched up more than 75% of all audit-related issues brought up by their report, and are making rapid progress towards making Prysm safer to use and prevent unexpected scenarios from happening in mainnet.

Enforced static checks in Prysm for randomness

Our teammate Victor Farazdagi completed a smart, static-analyzer for Prysm to enforce the usage of strong, cryptographic PRNG’s (pseudo-random number generators). “math/rand” from the go standard library does not provide sufficiently strong randomness, so the static analyzer now forbids any package containing /rand from being imported in a Prysm package unless it is our fully-controlled prysmaticlabs/prysm/shared/rand package. Victor’s functionality allows for seeds to be generated using a cryptographically secure source (crypto/rand). So, once a seed is obtained, and generator is seeded, the next generations are deterministic, thus fast. We use randomness quite a bit in our codebase, and ensuring all of those uses are hard to bias is critical to prevent potential attacks on nodes during mainnet. You can read more about how Victor implemented this here.

Upcoming Work

Validator accounts revamp underway

After a fair bit of design discussions, we have decided on how to proceed with a revamp of Prysm validator accounts: https://hackmd.io/@Yl0VNGYRR6aeDrHHQNhuaA/Hyxr5YM08. We aim to make accounts-v2 extensible, well-documented, general-enough, and implementing security best practices we have learned from others’ recommendations. Accounts-v2 will support direct, EIP-2335 keystore.json accounts, derived HD wallet accounts, and remote signers by default, making it easy for users to go from a simple setup to a complex and highly-available one all within the Prysm validator’s commands.

Here is the proposed terminology for Prysm accounts-v2:

  • A wallet is the tangible, on disk metadata about the various accounts that an user owns and there may be multiple accounts within a given wallet. The ./prysm.sh validator accounts command, for example, interacts with an on disk wallet at a specified directory path to perform its responsibilities
  • An account is a unique namespace that identifies a keystore and its associated metadata. We propose that one account should correspond to a single identifier (this is the same as popular eth1 wallets such as Metamask, in which one account is uniquely identified by its name such as account1)
  • A keymanager defines a software interface which provides keystore access and management: can either be “remote”, “derived”, “direct”, “unencrypted” (for interop / local dev). Proposed by Jim Mcdee
  • A keystore, based on EIP-2335 is “a mechanism for storing private keys. It is a JSON file that encrypts a private key and is the standard for interchanging keys between devices as until a user provides their password, their key is safe”

We are almost done wrapping up the EIP-2335 keystore-compliant wallet functionality, and will be working harder on this revamp over the coming weeks.

Preparation for “official”, public multiclient testnet

Even though the Altona multi-client testnet keeps going strong, it is not large scale enough to represent what mainnet will be. Within a few weeks, there will be a public, “official” multiclient testnet launch announced by the Ethereum foundation in which Prysmatic Labs is thrilled to be a part of at genesis. This is the testnet many have been waiting for as it will be the defining factor for mainnet readiness of eth2. So far, Altona has had excellent performance, but it mostly comprises eth2 client teams running validators. It remains to be seen if a large scale multi-client testnet with public participation will break down or not. This public testnet aims to use the “eth2 deposit launchpad”, an initiative that is currently open source and will be the canonical place everyone can use to join as a validator in eth2.

eth2 deposit launchpad — An interface for the first world computer

Interested in Contributing?

We are always looking for devs interested in helping us out. If you know Go or Solidity and want to contribute to the forefront of research on Ethereum, please drop us a line and we’d be more than happy to help onboard you :).

Check out our contributing guidelines and our open projects on Github. Each task and issue is grouped into the Phase 0 milestone along with a specific project it belongs to.

As always, follow us on Twitter or join our Discord server and let us know what you want to help with.

Official, Prysmatic Labs Ether Donation Address

0x9B984D5a03980D8dc0a24506c968465424c81DbE

Official, Prysmatic Labs ENS Name

prysmatic.eth


Eth 2.0 Dev Update #53 — Altona Testnet Launched was originally published in Prysmatic Labs on Medium, where people are continuing the conversation by highlighting and responding to this story.

Eth 2.0 Dev Update #52 — Onyx testnet launched

https://medium.com/prysmatic-labs/eth-2-0-dev-update-52-onyx-testnet-launched-a87a937f292e?source=rss----33260c5177c6---4

Eth 2.0 Dev Update #52 — Onyx testnet launched

V0.12.1 Onyx Testnet is Stable

Since its launch 5 days ago on Sunday, our new Onyx testnet for eth2 phase 0 has been a success! Now with over 20,000 validators, and stable participation, we’re very happy with Onyx’s performance so far. One important feature change in Onyx is that our new state management is default after weeks of work, so knowing with certainty this important functionality is stable enough for production is fantastic!

On top of great stability in our testnet, what makes it more impressive is that we only control <20% of the network! This is great for testing our infrastructure, and really tests the chain’s stability under normal conditions since our unstable changes wouldn’t affect the network nearly as much as when we had the majority of validators.

Merged Code, Pull Requests and Issues

Critical proposer index mismatch bug fixed

Every time we relaunch our testnets we have a unique opportunity to discover consensus breaking bugs or issues that may otherwise not manifest in previous deployments. A new testnet launch entails coordinating many people to prepare for a genesis event of a new chain, often involving unexpected problems. Our teammate Nishant recently fixed a critical bug that has manifested thanks to multiclient testnet experiments here.

Many security patches thanks to the Quantstamp security audit

We have been working very closely with Quantstamp to get Prysm audited for the phase 0 launch. Quantstamp has been doing an excellent job on communicating potential bugs as they discover them. We are lucky to have chosen such a great partner. Here’s a list of potential vulnerability discovered by Quantstamp:

  • 6029 — function to return intersection of int or byte slices don’t work if input has more than 2 slices
  • 6034 — deletion in db has a potential vulnerability under certain condition
  • 6038 — potential overflow vulnerability in pending blocks queue for syncing
  • 6103 — edge case scenario where time difference is higher than intended for newly arrived block
  • 6049 — potential overflow vulnerability in one of the rpc service
  • 6215 — potential dis space exhaustion when finality can not be reached

Quantstamp will be providing us with an initial audit report next week. We look forward to that and can’t wait to share the initial result with the community.

Validator duties streaming

Currently, validators retrieve their assigned duties per epoch via a polling mechanism that requests from a beacon node through RPC. This suffers from several problems in various network conditions such as reorgs in which assignments may change. We want to ensure a validator client is “dumb” in the sense that it relies fully on the beacon node for information. It shouldn’t need to “know” what a reorg is. Instead, it should simply receive new assignments via a push mechanism if they exist or if they changed. We refactored our validator client implementation to use a streaming mechanism for duties. Instead of polling, validators subscribe to a server-side stream which gives them the necessary information to perform their duties. If a reorg occurs, the beacon node simply sends out new duties over the stream in case they change.

Significant slasher resource consumption improvements

Our team members Shay and Ivan have implemented a fantastic improvement to our handling of the slasher database. Previously, historical validator information was stored in a map, meaning saving and writing to the DB required a lot of expensive operations each time the DB was interacted with. Our latest changes have changed this to a much flatter data structure, reducing the disk I/O of the slasher immensely!

Shay also removed some restrictions for the slasher receiving blocks from the beacon node, this is to help with double block detection. Before this change, catching double blocks with the slasher would be very difficult.

Improved p2p peer handling through connection gating

Users that participated in the Sapphire and Topaz testnet likely remember how many noisy logs there were regarding peer disconnections and connection reattempts. Part of Onyx and v0.12.1 of the spec includes better handling of peers through some novel features in the libp2p library. Our teammate Nishant recently merged in a PR that utilizes these new features for connection gating and better management of peers’ lifecycles within Prysm.

Use Connection Gater to Manage Peer Connections by nisdas · Pull Request #6243 · prysmaticlabs/prysm

Upcoming Work

Revamp of validator account management

One of the largest remaining hurdles for Prysm to be ready for phase 0 mainnet launch is to have a robust, easy-to-use, and safer validator account ecosystem. At the moment, Prysm’s validator accounts utilize “key managers” with confusing or difficult to use input parameters. While these key managers have been working well for early testing, these implementations are not secure enough or easy enough to use for mainnet.

Working design for Prysm’s account management hierarchy

In the last few months, we’ve been listening closely to user feedback around accounts and here are some of the common themes:

  • I forgot my password, how do I restore my account?
  • Is it possible to have multisig for withdrawal key?
  • How can I move around my validator to a different computer? Is there something I need to do to move my withdrawal key to cold storage? (some users don’t even know there is a withdrawal key they need to keep safe)
  • I had an account on a different computer I want to move to this computer, but the account had a different password
  • How can I list my public keys / private keys?
  • How can I regenerate my deposit data for my validator? How can I copy paste it if I’m using a pure terminal environment without a mouse? What is this deposit data?
  • How can I make 100+ deposits? How can I manage multiple accounts in the same validator client?
  • What is the keymanager? What are keymanager opts? What are all these flags?
  • Why does a validator client have a datadir and how do I move it to a different computer?
  • How can I specify the password to my validator client as an ENV var or config file?
  • How can I delete an inactive key?
  • How can I initiate a voluntary exit to stop validating?
  • Is there documentation for accounts management? How can I provide a remote signer?

In this new design for Prysm accounts, all of the above issues will be resolved with excellent documentation on par with the Beacon Chain documentation. Subscribe to this github issue for the latest updates and progress on validator account management redesign.

Beacon state representation refactor for improved memory usage

Since we started Prysm, we have used protobufs to represent most critical data structures that are sent via p2p or RPC. In particular, we represented the BeaconState, which is the most critical data structure in eth2, as a protobuf. This is problematic because the state is never sent over p2p and RPC and therefore could be represented by a more optimal data structure, such as a pure Go struct. Protolambda from the Ethereum Foundation shared with us some excellent recommendations on how to refactor our state data structure for this use case. There is a lot of room to do fancy tricks on the beacon state including memory pooling, 0 allocation merkleization, and more. We are excited to start working on this feature very soon as it will be critical for mainnet.

Attestation packing optimizations

For several weeks we have been investigating the best way to optimize attestation packing: both in terms of runtime efficiency and profitability (the more attesters we can include in a block the better). This week we have merged the code of Max k-Cover algorithm (#6205), which will be used for both attestation aggregation and selection of the most profitable attestations to pack within a block. That way we will include more attesters’ votes, in the most efficient manner possible. During the next couple of weeks Victor will wrap up the integration, benchmarking and testing of that newly merged code with our current implementation.

Interested in Contributing?

We are always looking for devs interested in helping us out. If you know Go or Solidity and want to contribute to the forefront of research on Ethereum, please drop us a line and we’d be more than happy to help onboard you :).

Check out our contributing guidelines and our open projects on Github. Each task and issue is grouped into the Phase 0 milestone along with a specific project it belongs to.

As always, follow us on Twitter or join our Discord server and let us know what you want to help with.

Official, Prysmatic Labs Ether Donation Address

0x9B984D5a03980D8dc0a24506c968465424c81DbE

Official, Prysmatic Labs ENS Name

prysmatic.eth


Eth 2.0 Dev Update #52 — Onyx testnet launched was originally published in Prysmatic Labs on Medium, where people are continuing the conversation by highlighting and responding to this story.

Introducing the Onyx Testnet

https://medium.com/prysmatic-labs/introducing-the-onyx-testnet-6dadbd95d873?source=rss----33260c5177c6---4

This update was proudly written by the Prysmatic Labs team.

Several months ago, we successfully launched the Topaz testnet, the first public testnet with mainnet-scale parameters for eth2. Topaz had a great run, kicking off with a successful genesis launch event live streamed on youtube here. Now, we are taking our testnet to the next level and ramping it up to the latest official specification for eth2, the v0.12.1 release which is aimed to be the final, non-trivial spec update before mainnet for phase 0.

We are proud to announce our new and improved testnet for Ethereum 2.0 Phase 0: The Onyx Test Network. This testnet is targeting v0.12.1 of the official Ethereum 2.0 specification, which is aimed to be the final specification before a mainnet launch and multiclient ready.

The 5 Milestones for Prysmatic Labs’ Journey to Eth2 Mainnet

Topaz Testnet Recap

Topaz tesnet progressions

The previous testnet, Topaz, allowed validators to deposit the full 32 ETH on the Goerli ETH1 testnet to participate. The Topaz testnet, at the time of writing, has 39823 active validators, wit h less than 35% of the network controlled by the Prysmatic Labs team. The testnet has sustained a significant amount of decentralization of nodes, with some nodes reporting peer counts as high as 700. Despite its great numbers, the Topaz testnet was not multiclient compatible due to a consensus error that would prevent other client teams from successfully syncing the chain.

What’s New In Onyx

The next iteration of the Topaz, the Onyx testnet, is a new and improved release containing several important quality of life improvements, revamped p2p message handling, and alignment to the latest specification for Eth2.0. The v0.12.1 release of the official specs are the target for a mainnet release barring any significant errors, and Onyx is fully up to date with the spec. Among some of the high-level changes included in the testnet are:

  • Better handling of attestation subnets over p2p, improving the robustness of the networking implementation for eth2
  • Significantly improved testing around dangerous consensus code such as rewards/penalties calculations
  • Improved eth1 data handling
  • Ensure balances remain unchanged for optimal validators during inactivity leak, which will significantly improve user experience

The Onyx test network is now accepting genesis deposits. Soon, Prysmatic Labs will start turning down our Topaz validators and begin sending new genesis deposits. Onyx is a new blockchain with new validators, which will require you to go through the deposit process once again.

If you would like to be a genesis validator, send your deposit before 17:00 UTC Wednesday, June 10th, 2020 as that is when Prysmatic will begin sending deposits en masse to kick off the test network. Get started on the testnet onboarding site and join us on Discord!

We are aiming for the Onyx testnet to be multiclient compatible. We have put in extra work to prevent consensus bugs in this new release, and once other client teams are fully up to date with v0.12.1, we expect there will be significant number of nodes in the Onyx testnet from different eth2 teams.

REMINDER: Onyx is still a testnet using fake, test ETH. Never send any real ETH to the deposit contract!

Onyx Testnet Information

Deposit contract address:

0x0f0f0fc0530007361933eab5db97d09acdd6c1c8

Fun fact: this address has the same hex colors as the Onyx color #0F0F0F!

Configuration: Mainnet

Spec version: v0.12.1 (Latest release)

Prysm version: v1.0.0-alpha10

Testnet site: https://prylabs.net

Prysm API: https://api.prylabs.net

Requesting Goerli ETH Form

https://medium.com/media/3225e64597ab95170a716f6fdba14ecd/href

We have put together a form for requesting bulk amounts of Goerli ETH to join the testnet here. If you want to run a larger number of validators, please fill out the form!

System Requirements

Recommended Specifications:

Operating System: 64-bit Linux, Mac OS X, Windows

Processor: Intel Core i7–4770 or AMD FX-8310 or better

Memory: 8GB RAM

Storage: 100GB available space SSD

Internet: Broadband connection

Minimum Requirements:

Operating System: 64-bit Linux, Mac OS X, Windows

Processor: Intel Core i5–760 or AMD FX-8100 or better

Memory: 4GB RAM

Storage: 20GB available space SSD

Internet: Broadband connection

Interested in Contributing?

We are always looking for devs interested in helping us out. If you know Go or Solidity and want to contribute to the forefront of research on Ethereum, please drop us a line and we’d be more than happy to help onboard you :).

Check out our contributing guidelines and our open projects on Github. Each task and issue is grouped into the Phase 0 milestone along with a specific project it belongs to.

As always, follow us on Twitter or join our Discord server and let us know what you want to help with.

Official, Prysmatic Labs Ether Donation Address

0x9B984D5a03980D8dc0a24506c968465424c81DbE

Official, Prysmatic Labs ENS Name

prysmatic.eth


Introducing the Onyx Testnet was originally published in Prysmatic Labs on Medium, where people are continuing the conversation by highlighting and responding to this story.

Eth 2.0 Dev Update #49 — “Multiclient Testnet + Security Audit”

https://medium.com/prysmatic-labs/eth-2-0-dev-update-49-multiclient-testnet-security-audit-741ae1049ebf?source=rss-c1f08bb57df2------2

Eth 2.0 Dev Update #49 — “Multiclient Testnet + Security Audit”

Our biweekly updates written by the entire Prysmatic Labs team on the Ethereum Serenity roadmap.

🆕 Eth2 Multiclient Testnet

Schlesi network more stable than anticipated, now with 3 clients

One of the most anticipated announcements in eth2 was that of a successful, multiclient testnet launching. In the past 2 weeks, the Schlesi testnet, composed of 50/50 Lighthouse and Prysm validators, was bootstrapped by Ethereum developer Afri Schoedon in a very exciting announcement. Even better, the Teku Java implementation of eth2 built by Consensys was able to successfully join Schlesi, giving us a three-pronged, real, public multiclient network! Although the network is small (around 20–25 peers) and only 240 validators, it is a huge achievement as it has performed better than expected.

https://eth2stats.io shows the Schlesi testnet

This sort of testnet is useful for client teams to perform rapid iteration and determine if there are any critical consensus bugs between clients. The plan was to restart Schlesi every week or as needed, but given it’s decent track record in terms of network liveness, it has been kept alive since genesis successfully. Expect even larger multiclient deployments over the coming weeks, with a more accurate validator count resembling mainnet itself.

Additionally, Bitfly’s block explorer now has native support for the Schlesi testnet which you can see here.

📝 Security Auditor Selected

Quantstamp selected as security auditor

In April, we put out a request for proposal for a comprehensive security review of our ETH2 client Prysm. We reviewed about 10 amazing proposals over the weeks following. It was difficult to choose a single vendor, but we were ultimately most excited about working with Quantstamp. What impressed us the most was their in depth proposal and detailed answers to our many questions as we vetted the various proposals. We believe that Quantstamp has an excellent team of Engineers that have ample experience in software security auditing, ETH2 domain knowledge, and golang quirks. We are considering a posterior security review in addition to Quantstamp’s final report as we understand the importance of a secure validating client for Phase 0 launch later this year.

📝 Merged Code, Pull Requests, and Issues

Native Windows Support

We know there may be users who wish to be a part of Topaz, but have no experience with command prompts or linux. Fear not, for community member pragmatics has made a Prysm GUI for Windows! It uses a new prysm.bat script that he contributed as well. If you’re interested, ask about it on our Discord or check out our docs for Windows. Thank you pragmatics! We appreciate your contributions!

More memory reductions

As we head towards mainnet. It’s important for the client itself to be mainnet ready. We have been monitoring memory usages and optimizing it when we see fit. Some of these memory issues don’t show up unless with a huge amount of validators. We have incorporated multiple optimization techniques to reduce memory usages since the last update. As we can see in #5778, #5760 and #5737. These reduce memory usages from initial syncing all the way to regular syncing. Please do let us know if you have any memory related issues.

Awesome validator dashboards from contributors!

If you don’t think watching the terminal to know how your validator is doing looks cool, thankfully there are 2 new grafana dashboards made by some community members! Metanull and Ocaa Grums have both made awesome grafana dashboards, utilizing grafana. If you’d like to check them out, run grafana and set your dashboard to point at https://localhost:8080 for its metrics.

Ocaas dashboard

Metanulls dashboard

Histograms for Block Network Propagation

We have added histogram metrics for beacon node sending and receiving beacon blocks. These metrics are added to better debug issues such as attestations miss vote, beacon nodes miss receiving blocks. These histograms have been key to address these issues and to conclude concise reports such as https://hackmd.io/dVbmIMHNQ6aby77g0-ME8A With these metrics, we can begin to study network propagation related issues.

🔜 Upcoming Work

Multiclient infrastructure and readiness

Although the multiclient Schlesi testnet is running successfully, we need ways of performing rapid experimentation on multiclient readiness with Prysm. That is, we want to be able to spin up short testnet runs with custom configurations between Prysm and Lighthouse to ensure we can meet the standards required for a public testnet restart. Currently, we operate the Topaz testnet, which is stable and going strong, however, it is not the target spec for multiclient. We have been working internally on revamping our testnet infrastructure to better support multiple clients, including the ability to run lighthouse in our cloud cluster alongside our nodes successfully. Having this new infra will accelerate the launch of a coordinated, multiclient testnet.

Slashing Protection Middleware

Our team member Shay has implemented a protection middleware for checking if a provided block or attestation could be considered slashable votes. This could allow for validator client set ups that use the slashers collection of validator voting history to keep a users validators safe. Very useful for anyone who is concerned about moving around validator setups or just keeping their validators safe!

Interested in Contributing?

We are always looking for devs interested in helping us out. If you know Go or Solidity and want to contribute to the forefront of research on Ethereum, please drop us a line and we’d be more than happy to help onboard you :).

Check out our contributing guidelines and our open projects on Github. Each task and issue is grouped into the Phase 0 milestone along with a specific project it belongs to.

As always, follow us on Twitter or join our Discord server and let us know what you want to help with.

Official, Prysmatic Labs Ether Donation Address

0x9B984D5a03980D8dc0a24506c968465424c81DbE

Official, Prysmatic Labs ENS Name

prysmatic.eth


Eth 2.0 Dev Update #49 — “Multiclient Testnet + Security Audit” was originally published in Prysmatic Labs on Medium, where people are continuing the conversation by highlighting and responding to this story.

Eth 2.0 Dev Update #49 — “Multiclient Testnet + Security Audit”

https://medium.com/prysmatic-labs/eth-2-0-dev-update-49-multiclient-testnet-security-audit-741ae1049ebf?source=rss----33260c5177c6---4

Eth 2.0 Dev Update #49 — “Multiclient Testnet + Security Audit”

Our biweekly updates written by the entire Prysmatic Labs team on the Ethereum Serenity roadmap.

🆕 Eth2 Multiclient Testnet

Schlesi network more stable than anticipated, now with 3 clients

One of the most anticipated announcements in eth2 was that of a successful, multiclient testnet launching. In the past 2 weeks, the Schlesi testnet, composed of 50/50 Lighthouse and Prysm validators, was bootstrapped by Ethereum developer Afri Schoedon in a very exciting announcement. Even better, the Teku Java implementation of eth2 built by Consensys was able to successfully join Schlesi, giving us a three-pronged, real, public multiclient network! Although the network is small (around 20–25 peers) and only 240 validators, it is a huge achievement as it has performed better than expected.

https://eth2stats.io shows the Schlesi testnet

This sort of testnet is useful for client teams to perform rapid iteration and determine if there are any critical consensus bugs between clients. The plan was to restart Schlesi every week or as needed, but given it’s decent track record in terms of network liveness, it has been kept alive since genesis successfully. Expect even larger multiclient deployments over the coming weeks, with a more accurate validator count resembling mainnet itself.

Additionally, Bitfly’s block explorer now has native support for the Schlesi testnet which you can see here.

📝 Security Auditor Selected

Quantstamp selected as security auditor

In April, we put out a request for proposal for a comprehensive security review of our ETH2 client Prysm. We reviewed about 10 amazing proposals over the weeks following. It was difficult to choose a single vendor, but we were ultimately most excited about working with Quantstamp. What impressed us the most was their in depth proposal and detailed answers to our many questions as we vetted the various proposals. We believe that Quantstamp has an excellent team of Engineers that have ample experience in software security auditing, ETH2 domain knowledge, and golang quirks. We are considering a posterior security review in addition to Quantstamp’s final report as we understand the importance of a secure validating client for Phase 0 launch later this year.

📝 Merged Code, Pull Requests, and Issues

Native Windows Support

We know there may be users who wish to be a part of Topaz, but have no experience with command prompts or linux. Fear not, for community member pragmatics has made a Prysm GUI for Windows! It uses a new prysm.bat script that he contributed as well. If you’re interested, ask about it on our Discord or check out our docs for Windows. Thank you pragmatics! We appreciate your contributions!

More memory reductions

As we head towards mainnet. It’s important for the client itself to be mainnet ready. We have been monitoring memory usages and optimizing it when we see fit. Some of these memory issues don’t show up unless with a huge amount of validators. We have incorporated multiple optimization techniques to reduce memory usages since the last update. As we can see in #5778, #5760 and #5737. These reduce memory usages from initial syncing all the way to regular syncing. Please do let us know if you have any memory related issues.

Awesome validator dashboards from contributors!

If you don’t think watching the terminal to know how your validator is doing looks cool, thankfully there are 2 new grafana dashboards made by some community members! Metanull and Ocaa Grums have both made awesome grafana dashboards, utilizing grafana. If you’d like to check them out, run grafana and set your dashboard to point at https://localhost:8080 for its metrics.

Ocaas dashboard

Metanulls dashboard

Histograms for Block Network Propagation

We have added histogram metrics for beacon node sending and receiving beacon blocks. These metrics are added to better debug issues such as attestations miss vote, beacon nodes miss receiving blocks. These histograms have been key to address these issues and to conclude concise reports such as https://hackmd.io/dVbmIMHNQ6aby77g0-ME8A With these metrics, we can begin to study network propagation related issues.

🔜 Upcoming Work

Multiclient infrastructure and readiness

Although the multiclient Schlesi testnet is running successfully, we need ways of performing rapid experimentation on multiclient readiness with Prysm. That is, we want to be able to spin up short testnet runs with custom configurations between Prysm and Lighthouse to ensure we can meet the standards required for a public testnet restart. Currently, we operate the Topaz testnet, which is stable and going strong, however, it is not the target spec for multiclient. We have been working internally on revamping our testnet infrastructure to better support multiple clients, including the ability to run lighthouse in our cloud cluster alongside our nodes successfully. Having this new infra will accelerate the launch of a coordinated, multiclient testnet.

Slashing Protection Middleware

Our team member Shay has implemented a protection middleware for checking if a provided block or attestation could be considered slashable votes. This could allow for validator client set ups that use the slashers collection of validator voting history to keep a users validators safe. Very useful for anyone who is concerned about moving around validator setups or just keeping their validators safe!

Interested in Contributing?

We are always looking for devs interested in helping us out. If you know Go or Solidity and want to contribute to the forefront of research on Ethereum, please drop us a line and we’d be more than happy to help onboard you :).

Check out our contributing guidelines and our open projects on Github. Each task and issue is grouped into the Phase 0 milestone along with a specific project it belongs to.

As always, follow us on Twitter or join our Discord server and let us know what you want to help with.

Official, Prysmatic Labs Ether Donation Address

0x9B984D5a03980D8dc0a24506c968465424c81DbE

Official, Prysmatic Labs ENS Name

prysmatic.eth


Eth 2.0 Dev Update #49 — “Multiclient Testnet + Security Audit” was originally published in Prysmatic Labs on Medium, where people are continuing the conversation by highlighting and responding to this story.

Eth 2.0 Dev Update #48 — “Eth2 Topaz Testnet Going Strong”

https://medium.com/prysmatic-labs/eth-2-0-dev-update-48-eth2-topaz-testnet-going-strong-b7b8cd2fb244?source=rss----33260c5177c6---4

Eth 2.0 Dev Update #48 — “Eth2 Topaz Testnet Going Strong”

Our biweekly updates written by the entire Prysmatic Labs team on the Ethereum Serenity roadmap.

🆕 Topaz Eth2 Testnet Recap

Topaz Testnet’s Successful Genesis Launch Event!

Our new Eth2 public testnet, the Topaz Testnet, had a very successful launch, with a ton of community members involved in making it a reality. The event was livestreamed on youtube here from a zoom call in which attendees were given proof of attendance tokens (POAP) for being at the Topaz launch event! The testnet was publicly available for anyone to join, and there was a period of 24 hours until midnight of the next day once the genesis validator threshold was reached, same as the real mainnet launch of eth2.

https://medium.com/media/58a3a68a9ed76ecc3a030f1b474dd0a5/href

At the start of the chain, we only had around 67% control of the network, much lower than our previous testnet releases. Additionally, the Topaz testnet is special because it shares the exact same parameter configuration as mainnet — that is, 32 ETH deposits, same time-based parameters, and more. Block explorers such as Etherscan (https://beacon.etherscan.io) and Bitfly (https://beaconcha.in) updated their portals for Topaz.

We are very happy for the overwhelming support we received for this event, as it marks an important milestone for Eth2. This testnet is the basis for multiclient experimentation, although it is not the multiclient testnet. There will be another testnet restart which will be coordinated by multiple client teams and have various clients with equal participation at genesis, which is as close to the real thing as we can get before launch.

Validator Participation at 97%, Only 1 Finality Incident

Topaz has been running at an incredible pace of stability and a remarkable amount of participation far beyond what we imagined. We only control 59% of the testnet, and we are seeing 97.4% of active validators properly proposing blocks and voting on blocks consistently. We are seeing nodes with over 300 peers active in the network, while we only run a total of 8 nodes internally at Prysmatic Labs. The one incident that occurred which caused finality downtime was due to many nodes dying because of an experimental feature which was enabled by default. We quickly resolved by disabling the feature and notifying our users to update their nodes. Soon after, the chain reached all-time-high level of participation.

Consensus Bug in Topaz When Performing Interop Testing

While Topaz has been running, we have been working on the side extensively with the Sigma Prime team on getting their lighthouse client to interoperate with Prysm. After attempting chain sync with the testnet, Sigma Prime and EF researcher Protolambda discovered a state root failure, which typically points to a consensus bug. Upon further investigation, Prysm has an order-of-operations bug in its rewards/penalties computation logic, leading to divergence in states after block processing in both clients. Prysm was the source of the bug, meaning that lighthouse will not be able to sync with the topaz testnet without us coordinating a hard fork.

Realizing the gravity of the situation, it makes the most sense to focus on short-lived, private testnets for interoperability testing to ensure we iron out any consensus bugs. At the moment, we do not want to restart the Topaz testnet just to satisfy this bug, but instead will be working hard on the side to ensure client compatibility is top-notch. Once those items are resolved, we will announce a scheduled restart of the network where lighthouse will also be participating.

The EF Research team quickly stepped up to improve the rewards specifications in the spec to ensure there is more test coverage for these types of functions. Functions involving rewards/penalties, unfortunately, have a lot of potential edge cases that are very hard to exhaust through unit tests.

📝 Merged Code, Pull Requests, and Issues

Memory leak in node identified and patched

An unfortunate consequence of the initial Topaz launch was the huge increase in memory nodes were seeing over the course of a few days. We tried pinning down the issue but unfortunately it was difficult to diagnose. Over time, we realized that nodes with the ` — disable-new-state-mgmt` flag performed far better in memory usage than regular nodes. This feature was causing a significant memory burden because it would be keeping a lot of copies of the beacon state in-memory without properly removing them from a cache. After disabling this feature, all of our prod nodes went back down to < 1Gb memory and users’ nodes also stopped hogging computer resources. With further investigations, team root caused the memory leak happened due to unnecessary state copies when verifying an incoming attestation. In the eth2 world, attestations flow more frequently than blocks. If for every incoming attestation, the node needs to copy the beacon state just to verify the attestation, the node will soon run out of memory. The issue has been resolved by #5584 and ` — enable-new-state-mgmt` is ok to be used again.

Better handling of eth1 chain downtime

To all of our users running our testnet, we provide them access to our own eth1 nodes that are running the Goerli testnet, which is used to onboard validators into our eth2 testnet. Nodes constant access so they can track chain logs and block heights of deposits made to the deposit contract in eth2. Validators include the latest eth1 block hash and other metadata as part of blocks in eth2, which undergo a voting process to determine the best block. If there is any downtime from eth1, our validators would not be able to propose blocks successfully, making eth1 a single point of failure. If this happens in reality, we should instead include some random data into the eth2 block. This way, the chain will not stall, validators will still get rewarded, but the eth1 data voting process will be halted until the eth1 chain is back up. Our teammate Preston worked on resolving this issue and it is now included in the latest master branch of Prysm here.

Revamped the documentation portal for Prysm

Our documentation portal became one of our biggest action items during the topaz launch. It was really critical for us to have clear support for the various operating systems used to run Prysm. As part of our knowledge-base, we also made our internal incident reports and testnet postmortems publicly available in our docs portal here. We also now have revised instructions for the various kinds of operating systems, including docker instructions for windows available here. Thanks to Celeste, our docs expert, we continue to grow our knowledge base for anyone running Prysm and getting involved in eth2.

🔜 Upcoming Work

Getting rid of the archival service in beacon nodes

With the capability of generating any historical state at any arbitrary slot, we are excited to finally get rid of archival service. This PR has been pending for more testing. As soon as we are satisfied with the performance, the PR shall be merged and archival service should be deprecated in the canonical code base. This radical improvement will make the code base cleaner and require less service in the run time.

Eth2 API standardization for mainnet

Over the last year, we have collected user feedback and suggestions around the need for ETH2 data API access. In this time, we have launched our v1 alpha version of EthereumAPIs which has enabled block explorers, staking pools, and data interested developers to access the information they need to build upon phase 0. As we work towards our v1 beta and mainnet launch of phase 0, our team has been carefully collaborating with other teams and users to find an excellent minimal API that fulfills the needs of data consumers in ETH2. If you have feedback, comments, concerns, or suggestions around the Prysm API, let us know in our discord or on github.

Interested in Contributing?

We are always looking for devs interested in helping us out. If you know Go or Solidity and want to contribute to the forefront of research on Ethereum, please drop us a line and we’d be more than happy to help onboard you :).

Check out our contributing guidelines and our open projects on Github. Each task and issue is grouped into the Phase 0 milestone along with a specific project it belongs to.

As always, follow us on Twitter or join our Discord server and let us know what you want to help with.

Official, Prysmatic Labs Ether Donation Address

0x9B984D5a03980D8dc0a24506c968465424c81DbE

Official, Prysmatic Labs ENS Name

prysmatic.eth


Eth 2.0 Dev Update #48 — “Eth2 Topaz Testnet Going Strong” was originally published in Prysmatic Labs on Medium, where people are continuing the conversation by highlighting and responding to this story.

Eth 2.0 Dev Update #47 — “Multiclient Target Testnet Restart & Security Audit RFP”

https://medium.com/prysmatic-labs/eth-2-0-dev-update-47-multiclient-target-testnet-restart-security-audit-rfp-9c6cf095802c?source=rss-c1f08bb57df2------2

Eth 2.0 Dev Update #47 — “Multiclient Target Testnet Restart & Security Audit RFP”

Our biweekly updates written by the entire Prysmatic Labs team on the Ethereum Serenity roadmap.

🆕 Testnet Restart Approaching

https://beaconcha.in has been tracking our testnet since its genesis!

🔹Simple Recap

Our current testnet has been live for an incredible 3 months, marking a significant milestone for our team as we have increased stability of our client and worked closely with block explorers, validators, and the community to make Prysm safer, more reliable, and better documented for folks wanting to interact with eth2 at the cutting-edge. To everyone that participated in this experiment so far, we sincerely thank you, it has been a joy being able to have people trying out our experimental features, improving resource usage of the client, and our documentation significantly.

Starting next week, we are planning on restarting our testnet to match the latest specification for eth2, which includes some important, consensus-breaking improvements. Behind the scenes, the whole team has been working on revamping Prysm to prepare for a testnet restart to spec version v0.11.1. This version is the most important release yet, as it will serve as the multiclient testnet target. That is, clients will interop with each other starting from this upcoming testnet restart. This restart is also the prime release of novel features in our beacon chain that will offer significant reduction in resources used by the nodes, including a critical improvement to network bandwidth usage. Among the changes included are:

Revamped state management:

Instead of saving the beacon state every slot to the database, we only persist states after a certain interval, reducing load significantly when it comes to disk writes and risk reads. Upon reading a historical state, we can replay blocks on top of the closest persisted state and recreate whatever is requested quite efficiently. Users can also configure this feature to write states more often or less often, leading to trade-offs in how slow this feature is vs. how heavy the DB can get. This will be enabled by default in our testnet restart and we believe users will experience a much smoother run with the beacon chain then.

Easier and more efficient slashing detection:

Currently, being able to detect if 1 validator has proposed two blocks with different inner data fields is a daunting task, as we must also know the list of committees at such a slot to figure out who the proposers are. The new spec has instead decided on including the validator block proposer index as a new field in blocks, this way, we can almost instantly detect if someone has committed a slashable block offense with our slasher.

Only subscribing to attestations nodes care about via p2p:

Currently, nodes are bombed with attestation messages from 51000+ validators via p2p every epoch. This has prompted many to ask if the network requirements for eth2 are reasonable and whether there is any work being done to mitigate its weight. The answer is a resounding yes! It turns out, only certain nodes care about all these raw, unaggregated attestations floating around the network for the purposes of aggregating them and packing them into smaller messages. Now, nodes only subscribe to what we call attestation subnets based on their needs. Most nodes, especially nodes with 0 validators attached to them, will see a considerable decrease in bandwidth and memory from this new feature we are thrilled to pilot soon.

Initial sync fully revamped:

Initial chain sync has changed completely since we brought in our new teammate, Victor Farazdagi. Victor has been working for weeks now on ensuring our sync is a smooth experience that is able to intelligently navigate difficult edge cases such as peers going away, long periods without finality, and being able to catch up to the chain head if falling behind for any reason. Previously, our sync model would (a) request for a series of blocks, and then (b) sequentially process them. Victor revamped the system to separate the notion of block downloading and block processing, making the experience feel smooth and uninterrupted.

📝 Merged Code, Pull Requests, and Issues

Beacon State Regeneration Complete

With the completion of new state generator service, a beacon node is now capable of generating an historical state on demand. This is a big feature as a prysm beacon node is now capable to serve awesome data providers such as etherscan historical chain info.

There’s still minor work to route existing RPC calls to the new state generator service but it is not a testnet blocker. These features can gradually roll in as testnet launches. We are super excited to get feedback from the community on what more historical data that daily users and stakers will like to see.

Initial sync batch save blocks

During initial sync, a typical workflow is a syncing node will request a batch of blocks from a serving node with the intention to catch up to head. With the proper response from the serving node, the syncing node will process each individual block in the batch sequentially. As the syncing node processes the blocks and computes the new state, the node will then save the last processed block in the DB sequentially. One of the non negligible bottlenecks is saving objects in the DB, and instead of saving the block sequentially a node could save the blocks in batch. As it turns out, it provided some great performance gain, a node was able to sync 0.8x faster at the worst case. We encourage everyone to try this out as this feature has been rolled out as default.

Proposer Slashing Detection

We are now able to do proposer slashing detection on blocks happening in our chain. Thanks to the latest specification, blocks now contain a ProposerIndex field which we can keep track of in order to figure out if someone committed a slashable offense. Our teammate Shay took charge of this proposer slashing detection feature and it is now available in our slasher codebase. We have already caught a bunch of attester slashing successfully over the last couple of weeks, and are looking forward to policing more evil activity in the network 👿

Fuzz Testing P2P to Detect DoS Vectors

Image credit: embedwise.com

Last update, we mentioned how Prysm is now being fuzz tested under the targets provided by beacon-fuzz by Sigma Prime. We have started experimenting with new potential fuzz testing candidates for randomized input for coverage based testing. Ideally, any external source of input is an ideal candidate for this type of testing. So we started fuzz testing a simple p2p status message and almost immediately found a bug where the test input was a 86 byte message that claimed to have a length of over 9000 petabytes. This clearly crashed the application when it tried to allocate 9000+ petabytes of memory and we learned that Prysm was not enforcing a maximum size of messages. This type of issue was a simple oversight in development but a critical bug of the highest degree. An external user would have been able to crash any prysm node by sending a hello or status message!

Completely New Initial Sync Enabled By Default

The new initial sync is smoother than ever, able to catch tricky edge-cases, and can intelligently deal with common network scenarios such as certain important peers going away and catching up with sync after we have fallen behind. The way it works is thanks to a priority queue that is able to sequence blocks ready for processing and push those that can be processed later down the queue. Using two separate goroutines, one for block downloading and one for block processing those items that are ready from the priority queue. You can read more about the detailed design of the feature, created by our teammate Victor, here.

New, Fast Serialization Enabled by Default

We are now using a lightning fast simple serialize implementation, created by one of our contributors, Ferran, here. His approach was to create a code generator from a set of data structure definitions in Go that uses intelligent approaches to reusing bytes buffers and much fast navigation of structs than any other way we could implement it ourselves in Go. Due to lack of generics in the language, code generation is the next best thing to getting the fastest possible performance for something such as serialization in Go. This is now plugged in by default into Prysm, and we believe it will serve a key purpose in reducing memory/cpu load for nodes serving other peers in the network.

🔜 Upcoming Work

Multiclient Testnet

Both Prysm and other teams such as Lighthouse are on the same boat of needing to restart their testnet to v0.11.1 before focusing fully on multiclient. This will be our full focus after the testnet restart. Most of the initial experiments will involve being able to successfully and comfortably sync to the head of the chain, culminating with being able to successfully have both Prysm validators and Lighthouse validators both producing blocks and attestations for the network. Keep posted here on our medium to track the latest progress after we restart our testnet.

Attestation aggregation optimizations

Currently, aggregating attestations is done in the most naive way possible for the sake of simplicity. However, as the number of validators grows in the network, it quickly becomes a client bottleneck which must be addressed. However, improving this aggregation routine is non-trivial, as it is an NP-hard problem. Our teammate Victor is now looking into incorporating important information to create a heuristic approach towards improving the speed of our aggregation in Prysm. Regarding handling aggregation at the networking layer, there have been some very insightful research pieces posted by the Ethereum research team, such as this one by Hsiao-Wei Wang https://notes.ethereum.org/@hww/aggregation#A-note-on-Ethereum-20-attestation-aggregation-strategies with strategies we are currently using for phase 0.

Prysm External Security Audit

Prysmatic Labs is seeking an external security audit of our Ethereum 2.0 client, Prysm. If you are a security focused team, please review our request for proposal.

Interested in Contributing?

We are always looking for devs interested in helping us out. If you know Go or Solidity and want to contribute to the forefront of research on Ethereum, please drop us a line and we’d be more than happy to help onboard you :).

Check out our contributing guidelines and our open projects on Github. Each task and issue is grouped into the Phase 0 milestone along with a specific project it belongs to.

As always, follow us on Twitter or join our Discord server and let us know what you want to help with.

Official, Prysmatic Labs Ether Donation Address

0x9B984D5a03980D8dc0a24506c968465424c81DbE

Official, Prysmatic Labs ENS Name

prysmatic.eth


Eth 2.0 Dev Update #47 — “Multiclient Target Testnet Restart & Security Audit RFP” was originally published in Prysmatic Labs on Medium, where people are continuing the conversation by highlighting and responding to this story.

Eth 2.0 Dev Update #47 — “Multiclient Target Testnet Restart & Security Audit RFP”

https://medium.com/prysmatic-labs/eth-2-0-dev-update-47-multiclient-target-testnet-restart-security-audit-rfp-9c6cf095802c?source=rss----33260c5177c6---4

Eth 2.0 Dev Update #47 — “Multiclient Target Testnet Restart & Security Audit RFP”

Our biweekly updates written by the entire Prysmatic Labs team on the Ethereum Serenity roadmap.

🆕 Testnet Restart Approaching

https://beaconcha.in has been tracking our testnet since its genesis!

🔹Simple Recap

Our current testnet has been live for an incredible 3 months, marking a significant milestone for our team as we have increased stability of our client and worked closely with block explorers, validators, and the community to make Prysm safer, more reliable, and better documented for folks wanting to interact with eth2 at the cutting-edge. To everyone that participated in this experiment so far, we sincerely thank you, it has been a joy being able to have people trying out our experimental features, improving resource usage of the client, and our documentation significantly.

Starting next week, we are planning on restarting our testnet to match the latest specification for eth2, which includes some important, consensus-breaking improvements. Behind the scenes, the whole team has been working on revamping Prysm to prepare for a testnet restart to spec version v0.11.1. This version is the most important release yet, as it will serve as the multiclient testnet target. That is, clients will interop with each other starting from this upcoming testnet restart. This restart is also the prime release of novel features in our beacon chain that will offer significant reduction in resources used by the nodes, including a critical improvement to network bandwidth usage. Among the changes included are:

Revamped state management:

Instead of saving the beacon state every slot to the database, we only persist states after a certain interval, reducing load significantly when it comes to disk writes and risk reads. Upon reading a historical state, we can replay blocks on top of the closest persisted state and recreate whatever is requested quite efficiently. Users can also configure this feature to write states more often or less often, leading to trade-offs in how slow this feature is vs. how heavy the DB can get. This will be enabled by default in our testnet restart and we believe users will experience a much smoother run with the beacon chain then.

Easier and more efficient slashing detection:

Currently, being able to detect if 1 validator has proposed two blocks with different inner data fields is a daunting task, as we must also know the list of committees at such a slot to figure out who the proposers are. The new spec has instead decided on including the validator block proposer index as a new field in blocks, this way, we can almost instantly detect if someone has committed a slashable block offense with our slasher.

Only subscribing to attestations nodes care about via p2p:

Currently, nodes are bombed with attestation messages from 51000+ validators via p2p every epoch. This has prompted many to ask if the network requirements for eth2 are reasonable and whether there is any work being done to mitigate its weight. The answer is a resounding yes! It turns out, only certain nodes care about all these raw, unaggregated attestations floating around the network for the purposes of aggregating them and packing them into smaller messages. Now, nodes only subscribe to what we call attestation subnets based on their needs. Most nodes, especially nodes with 0 validators attached to them, will see a considerable decrease in bandwidth and memory from this new feature we are thrilled to pilot soon.

Initial sync fully revamped:

Initial chain sync has changed completely since we brought in our new teammate, Victor Farazdagi. Victor has been working for weeks now on ensuring our sync is a smooth experience that is able to intelligently navigate difficult edge cases such as peers going away, long periods without finality, and being able to catch up to the chain head if falling behind for any reason. Previously, our sync model would (a) request for a series of blocks, and then (b) sequentially process them. Victor revamped the system to separate the notion of block downloading and block processing, making the experience feel smooth and uninterrupted.

📝 Merged Code, Pull Requests, and Issues

Beacon State Regeneration Complete

With the completion of new state generator service, a beacon node is now capable of generating an historical state on demand. This is a big feature as a prysm beacon node is now capable to serve awesome data providers such as etherscan historical chain info.

There’s still minor work to route existing RPC calls to the new state generator service but it is not a testnet blocker. These features can gradually roll in as testnet launches. We are super excited to get feedback from the community on what more historical data that daily users and stakers will like to see.

Initial sync batch save blocks

During initial sync, a typical workflow is a syncing node will request a batch of blocks from a serving node with the intention to catch up to head. With the proper response from the serving node, the syncing node will process each individual block in the batch sequentially. As the syncing node processes the blocks and computes the new state, the node will then save the last processed block in the DB sequentially. One of the non negligible bottlenecks is saving objects in the DB, and instead of saving the block sequentially a node could save the blocks in batch. As it turns out, it provided some great performance gain, a node was able to sync 0.8x faster at the worst case. We encourage everyone to try this out as this feature has been rolled out as default.

Proposer Slashing Detection

We are now able to do proposer slashing detection on blocks happening in our chain. Thanks to the latest specification, blocks now contain a ProposerIndex field which we can keep track of in order to figure out if someone committed a slashable offense. Our teammate Shay took charge of this proposer slashing detection feature and it is now available in our slasher codebase. We have already caught a bunch of attester slashing successfully over the last couple of weeks, and are looking forward to policing more evil activity in the network 👿

Fuzz Testing P2P to Detect DoS Vectors

Image credit: embedwise.com

Last update, we mentioned how Prysm is now being fuzz tested under the targets provided by beacon-fuzz by Sigma Prime. We have started experimenting with new potential fuzz testing candidates for randomized input for coverage based testing. Ideally, any external source of input is an ideal candidate for this type of testing. So we started fuzz testing a simple p2p status message and almost immediately found a bug where the test input was a 86 byte message that claimed to have a length of over 9000 petabytes. This clearly crashed the application when it tried to allocate 9000+ petabytes of memory and we learned that Prysm was not enforcing a maximum size of messages. This type of issue was a simple oversight in development but a critical bug of the highest degree. An external user would have been able to crash any prysm node by sending a hello or status message!

Completely New Initial Sync Enabled By Default

The new initial sync is smoother than ever, able to catch tricky edge-cases, and can intelligently deal with common network scenarios such as certain important peers going away and catching up with sync after we have fallen behind. The way it works is thanks to a priority queue that is able to sequence blocks ready for processing and push those that can be processed later down the queue. Using two separate goroutines, one for block downloading and one for block processing those items that are ready from the priority queue. You can read more about the detailed design of the feature, created by our teammate Victor, here.

New, Fast Serialization Enabled by Default

We are now using a lightning fast simple serialize implementation, created by one of our contributors, Ferran, here. His approach was to create a code generator from a set of data structure definitions in Go that uses intelligent approaches to reusing bytes buffers and much fast navigation of structs than any other way we could implement it ourselves in Go. Due to lack of generics in the language, code generation is the next best thing to getting the fastest possible performance for something such as serialization in Go. This is now plugged in by default into Prysm, and we believe it will serve a key purpose in reducing memory/cpu load for nodes serving other peers in the network.

🔜 Upcoming Work

Multiclient Testnet

Both Prysm and other teams such as Lighthouse are on the same boat of needing to restart their testnet to v0.11.1 before focusing fully on multiclient. This will be our full focus after the testnet restart. Most of the initial experiments will involve being able to successfully and comfortably sync to the head of the chain, culminating with being able to successfully have both Prysm validators and Lighthouse validators both producing blocks and attestations for the network. Keep posted here on our medium to track the latest progress after we restart our testnet.

Attestation aggregation optimizations

Currently, aggregating attestations is done in the most naive way possible for the sake of simplicity. However, as the number of validators grows in the network, it quickly becomes a client bottleneck which must be addressed. However, improving this aggregation routine is non-trivial, as it is an NP-hard problem. Our teammate Victor is now looking into incorporating important information to create a heuristic approach towards improving the speed of our aggregation in Prysm. Regarding handling aggregation at the networking layer, there have been some very insightful research pieces posted by the Ethereum research team, such as this one by Hsiao-Wei Wang https://notes.ethereum.org/@hww/aggregation#A-note-on-Ethereum-20-attestation-aggregation-strategies with strategies we are currently using for phase 0.

Prysm External Security Audit

Prysmatic Labs is seeking an external security audit of our Ethereum 2.0 client, Prysm. If you are a security focused team, please review our request for proposal.

Interested in Contributing?

We are always looking for devs interested in helping us out. If you know Go or Solidity and want to contribute to the forefront of research on Ethereum, please drop us a line and we’d be more than happy to help onboard you :).

Check out our contributing guidelines and our open projects on Github. Each task and issue is grouped into the Phase 0 milestone along with a specific project it belongs to.

As always, follow us on Twitter or join our Discord server and let us know what you want to help with.

Official, Prysmatic Labs Ether Donation Address

0x9B984D5a03980D8dc0a24506c968465424c81DbE

Official, Prysmatic Labs ENS Name

prysmatic.eth


Eth 2.0 Dev Update #47 — “Multiclient Target Testnet Restart & Security Audit RFP” was originally published in Prysmatic Labs on Medium, where people are continuing the conversation by highlighting and responding to this story.

Eth 2.0 Dev Update #45 — “Cross-Compiles & Slashing Protection”

https://medium.com/prysmatic-labs/eth-2-0-dev-update-45-cross-compiles-slashing-protection-2e6359e15195?source=rss----33260c5177c6---4

Eth 2.0 Dev Update #45 — “Cross-Compiles & Slashing Detection”

Our biweekly updates written by the entire Prysmatic Labs team on the Ethereum Serenity roadmap.

🆕 New Testnet Updates

Stability and validators balances growing!

Our testnet has become significantly resilient — major network bugs have been squashed, nodes are using less resources, and most importantly, validators are making money.

Despite these improvements, we aren’t complacent at all! There is a long way to go to get rid of bottlenecks, user experience improvements, documentation, build requirements, and more. Memory requirements have drastically reduced, and most of the nodes we run internally hover around the 2Gb mark. Over the coming 2 weeks we’ll be including some massive changes that will revamp how we handle the eth2 beacon state and serialization of our data structures. Expect more orders of magnitude in terms of improvements to running a beacon node very soon.

Consistent progress for the Bitfly block explorer

As usual, the Bitfly team has been killing it with the new features added to their block explorer, https://beaconcha.in, adding in a much requested validator monitoring dashboard, where you can plug in as many validators as you want and track their total and relative performance, proposal history, and more. Additionally, you can now see if your validator went offline and the team is now working on opt-in notifications to let you know if your validator has been away for a long time and is being penalized.

📝 Merged Code, Pull Requests, and Issues

Slasher service functional, first testnet slashing coming soon!

If you’ve kept up with our progress, you’ll know we’ve been working hard on the Hash Slinging Slasher to help detect slashings the moment they occur in the network. We’re proud to announce it is now functional! By recording a validators vote history, and all broadcasted attestations, the slasher is able to determine when a slashable act from any validator had been committed. After it finds the conflicting vote, it creates a slashing object and sends it to the beacon chain, to be put into a block.

We’ve confirmed it to work on local testnets, now we need to get it working in mainnet conditions which will definitely stress it differently. We’re hoping to get a slashing on the public testnet soon!

Cross-compilation functional for Prysm

Thanks to SuburbanDad, Prysm finally has the appropriate tooling in place to build cross compilation for the most common architectures. Why is this a big deal? The golang compiler has built-in cross compilation, but this only works for go code. Given the growing complexity of Prysm and it’s dependencies, we needed to cross compile c++ code, in addition to go code. This issue has been in progress for many months until gitcoin increased the bounty to $1000. Thanks gitcoin and thanks SuburbanDad! Users can expect to see compiled binaries uploaded to regular version releases for Prysm as we continue to improve our QA and continuous integration pipelines. Good news for anyone on Windows or linux ARM64 (like raspberry pi!) that wants to use binary executables without the overhead of running a docker container.

Separating block fetching from block processing in sync

Work has been started on improving the initial synchronization, with the immediate focus on refactoring and better utilization of resources by moving to a more concurrent model. At the moment, block fetching and processing is done sequentially, where fetching and processing occur one after another, in a blocking mode. By decoupling those operations, we will be able to download and process blocks at the same time (concurrently), allowing us a more smooth experience during the initial synchronization.

In case you want to follow the progress, the relevant GitHub issue is #4815. We’ve already implemented the fetching part (#4978), and are currently working on completing the queue/processing part.

🔜 Upcoming work

New serialization library for Prysm: hundreds of times better in speed and 8000 times better in terms of memory usage

Mind = Blown

It is no mystery Golang sucks for generic things such as serialization. That is, if we see some data structure, can we serialize it according to some rules, regardless of the data it contains and what it looks like?

Our current approach is to use our own serialization library called https://github.com/prysmaticlabs/go-ssz for every marshal/unmarshal use case in Prysm. Every time we save a block to the database, save the state, or retrieve the state, we have to use this slow library. A better approach is to instead create a code generator, which when given a set of data structures, outputs code which can marshal/unmarshal the data. Generated code is a great alternative to having generics in Go, and can lead to some of the fastest code you can imagine for a certain use case. We put out a bounty a while back for contributors to create custom marshal/unmarshal functions by hand for our data structures, but one of our contributors simply took the opportunity to write a generator that will work for anything as long as its valid under the serialization rules for eth2.

This new library is currently being wrapped up before we integrate it into Prysm, but the benchmarks for marshaling a mainnet, full beacon block speak for themselves:

We see 8700x improvement in memory and almost 250x better in pure speed. We are really thrilled to use this new code in Prysm, and we believe it will improve the experience of syncing with the testnet significantly. We’ll keep everyone posted as soon as this is included in our master branch.

Latest spec version, v0.10.1 complete, pending testnet restart and ready to work hard on multiclient

Currently we have been running the v0.9.4 version of the spec in our Sapphire Testnet. Along with that there has been a parallel effort to update the repo to the v0.10.1 version of the spec over here. The shift to v0.10.x signifies some major changes with how we sign objects with their respective domains. Also it aligns the spec with the current ietf draft. With this update now completed we can now proceed with our testnet restart and multi-client efforts. Given that the majority of the clients are targeting the v0.10.x version of the spec, this is a good base to start of our interop work with other clients. With the spec now stable and no major changes planned in the future, all client teams can work off a stable base for multi-client.

Miscellaneous

2 project built on Prysm at ETHLondon to help eth2 stakers

Our community is always very keen to find ways of improving how validators function in Prysm, either via improvements to block explorers, enhanced logging/UX for Prysm nodes, or more. At ETHLondon, 2 teams created projects built on Prysm’s tools. The first one is a slashing monitor in eth2 called StakeMon created by Ken Chan: “Staking Monitor monitors the beacon chain for validator pubkeys that have committed double signing (slashable), or are offline. Validators can input their pubkeys into the Staking Monitor interface and get notified immediately via Telegram if their validator pubkey was at risk.”

The other project was a validator coordination and protector which helps prevent slashable offenses, created by Julian Koh here. Julian built a middleware that does the following: “Many node operators run 2 or more instances of a validator. This is for redundancy — if 1 goes offline, the other can still validate blocks and earn rewards. However, this greatly increases the risk of slashing, since the protocol does not allow a validator to publish two different blocks at the same height.

We have built a validator coordinator which, for every block height, picks ONE validator to sign the block, even if the operator is running multiple instances of it. Before signing a block, each validator will ask the coordinator if it is safe to sign or not. If the coordinator has seen a validator already try to sign this same block, it will tell this current validator that it is UNSAFE to sign.”

All of this stuff is super cool and will play a role in aiding people running validators in production with real money at stake. We’ll be looking into further elaborating on these projects as we improve the user experience of running a Prysm node.

Interested in Contributing?

We are always looking for devs interested in helping us out. If you know Go or Solidity and want to contribute to the forefront of research on Ethereum, please drop us a line and we’d be more than happy to help onboard you :).

Check out our contributing guidelines and our open projects on Github. Each task and issue is grouped into the Phase 0 milestone along with a specific project it belongs to.

As always, follow us on Twitter or join our Discord server and let us know what you want to help with.

Official, Prysmatic Labs Ether Donation Address

0x9B984D5a03980D8dc0a24506c968465424c81DbE

Official, Prysmatic Labs ENS Name

prysmatic.eth


Eth 2.0 Dev Update #45 — “Cross-Compiles & Slashing Protection” was originally published in Prysmatic Labs on Medium, where people are continuing the conversation by highlighting and responding to this story.