Author: Kan Qiao
Why Test Storage Infrastructure? How Is It Different?
Storage infrastructure testing is critical because it helps to ensure the reliability, availability, and security of our data storage systems. Robust and Correct storage infrastructure makes application developers’ lives easier. It allows them to focus on the application logic and not worry about the underlying storage infrastructure.
In this article, we will delve into the topic of storage infrastructure testing, with a specific focus on its application within the blockchain data systems. In the end, we add an appendix for a more comprehensive list of testing types.
Testing Storage Infrastructure for Blockchain Data Systems
What Is Blockchain Storage System?
A blockchain data storage system is responsible for storing and updating blockchain data in real-time. It then serves the data to clients, including Extract, Transform, Load (ETL) or analytics jobs. Many blockchain infrastructure companies, including Sentio, maintain such systems. In this explanation, we will use Sentio as an example to provide more detailed information on the testing process.
We maintained a in-house system that stores all the blockchain data, in real-time, that exposes various APIs to different internal clients. There are multiple data pipelines, storage systems, services and many layers of abstractions. So the obvious question is how to ensure correctness of the system, given that it is too complicated for human mind.
Unit Tests, Integration Tests and More
We write and run a lot of unit tests. Beyond standard unit tests and integration tests, additional tests are employed to detect more complex issues. For example, we utilize the built-in golang race detector (thread sanitizer) to identify multi-threading issues. One real-world example of an issue that was discovered using this method is:
WARNING: DATA RACE
Write at 0x00c009afb8e0 by goroutine 1119:
Previous write at 0x00c009afb8e0 by goroutine 1116:
external/org_golang_x_sync/errgroup/errgroup.go:75 +0x86Goroutine 1119 (running) created at:
driver/indexer/chain_indexer.go:639 +0x364Goroutine 1116 (finished) created at:
You may notice that there is a variable at line 477 that is being read/written by more than one thread at the same time without proper synchronization, creating a race condition.
Simulation test (defined in appendix) for blockchain storage system is less complex than traditional systems as the data on blockchain is deterministic. Furthermore, if the program is deterministic as well, identical results will always be produced given the same input. If we take a snapshot of the current output (as golden), and compare it with a future output, we could spot discrepancies so that bugs can be identified.
We are also particularly interested in finding out whether the system behaves correctly under abnormal conditions (restarts, network issues, pod being evicted, node being shut down, etc.). To address this, we introduced a framework that runs tests, where failures of all dependencies are simulated, to ensure that the system is robust and able to handle adverse conditions without producing incorrect results.
All the test scenarios and configurations are described in a yaml file.
- type: intermittent-error
The test result contains aggregated historical results and compared with the golden snapshot. Here is a list of historical setups.
We also include performance tests into the same test, enables us to keep track of the historical performance histogram and to detect performance regressions.
In order to capture the individual performance of each component that we depend on, we also have captured the performance histogram of outbound calls made to key APIs.
Testing For Blockchain Reorg
A blockchain reorg occurs when there are two competing chains of the same length. The longer chain becomes the canonical chain, and the shorter one becomes the orphan chain. However, clients may still be using the orphan chain as they may not be aware of the reorg until a new block from the canonical chain is received. To test for this scenario, we ensure eventual consistency by running a test that follows a node for a day or two. Another test is then run to catch up to the same state as the first test. If the blockchain is eventually consistent, the second test should be able to reach the same state as the first test.
Sometimes we found there are some hotspots in the code. We use the
pprof tool to identify the root cause of the hotspot. Here is an example of the
pprof output that indicates the hotspot is decoding:
Sentio has an SDK that allows users to integrate Sentio into their own systems. We have a compatibility test to ensure the SDK is compatible with the latest version of Sentio. Thus, we maintain backward compatibility for our SDK. Here is an example of compatibility matrix.
From the figure, you could notice there is an compatibility issue between SDK v1.13 and runtime v1.11. We need to fix the issue before we can release the Runtime v1.11.
We set up a number of metrics to monitor the system. We also set up a number of alerts to monitor the system. Here is an example of how we monitor our chain synchronization speed.
If the chain synchronization speed is too slow, we will get an alert.
At Sentio, we place a strong emphasis on testing to ensure the correctness and performance of our system. We employ a wide range of tests and monitoring to ensure that our system is functioning properly. We hope that this article has provided valuable insights and can assist in building a robust testing system for your blockchain storage system.
A Little About Sentio
Sentio is an observability platform for Web3. Sentio generates metrics, logs, and traces from existing smart contracts data through our low code solution, which could be used to build dashboards, set up alerts, analyze user behaviors, create API/Webhooks, simulate/debug transactions and more. Sentio supports Ethereum, BSC, Polygon, Solana, and Aptos. Sentio is built by veteran engineers from Google, Linkedin, Microsoft and TikTok, and backed by top investors like Lightspeed Venture Partners, Hashkey Capital and Canonical Crypto.
Appendix: Common Steps for Testing Storage Infrastructure
Early detection of issues is crucial for efficient problem-solving. Unit tests serve as the initial line of defense against bugs and other problems. These tests, written and executed by the developer, examine the smallest units of code, typically functions or methods. While some may argue that unit tests are unnecessary, they are vital for ensuring code quality and integrity.
- Unit tests serve not only as a means of detecting issues, but also as a valuable documentation tool. They document the expected behavior of the code, making it easier for others to understand and utilize. By reading the unit tests, one can gain insight into how the code should behave, providing a clear and concise understanding of its functionality.
- Unit tests provide a safeguard against regression, which occurs when a bug is introduced into the code and causes unexpected behavior. By running unit tests, regressions can be identified and prevented from being incorporated into the code, ensuring that the code behaves as intended.
There are a few more things regarding advanced unit tests:
- Testing against external resources, such as remote procedure calls (RPCs) and databases, is a common practice. To ensure accurate testing, it is important to utilize mocking of these external resources. This allows for testing the code in isolation, without interference from external factors, and provides more reliable and consistent results.
- Testing asynchronous or multithreaded code can present challenges due to the possibility of non-deterministic results. Two common approaches for testing such code include:
- Thread sanitizer: One method for testing asynchronous code is to use a thread sanitizer. This tool can detect issues related to race-conditions and provide valuable insights for resolving them.
- Test interceptor: Another obstacle that can arise when testing asynchronous or multithreaded code is non-determinacy in the order of execution. A solution to this problem is the use of a test interceptor, which can intercept the execution of the code and ensure that it runs in the expected order.
- Memory or address issues refer to accessing invalid memory, which can be difficult to detect due to their non-deterministic nature. Utilizing a memory or address sanitizer can aid in identifying such issues.
Finding that writing unit tests is difficult and complex can often indicate that the code is not well-designed. In such cases, it is essential to refactor the code to make it more testable. To prevent the introduction of obvious issues, unit tests are typically configured to run before code is committed. This approach ensures that the code is thoroughly tested and of high quality.
Unit tests examine code in isolation, but integration tests take it a step further by evaluating how the code functions within the context of the entire system. This is achieved by testing the code using real remote procedure calls (RPCs) and real clusters. While the workload generated during integration testing is simpler and easier to debug, it is still far from the real-world scenario. Integration tests are usually scheduled to run after code is committed, or at regular intervals such as hourly. This is because they are more resource-intensive and may not be feasible to run before every commit.
Simulation tests are considered the most challenging type of tests to write. It is important to note that they also require significant effort from developers to maintain them. Application developers may not choose to implement such tests if the system is not complex enough to justify the effort. However, many infrastructure systems, such as Google Spanner and Apple Foundation DB, have this type of tests in place, particularly for storage systems. This is a clear indication of their importance and value.
- On the write path, a randomized workload is generated and the transaction logs along with their commit timestamps are retrieved. This workload is then replayed against the storage system to ensure that the results match.
- On the read path, a randomized (but valid) abstract syntax tree (AST) is generated. Complex query optimizations are bypassed and a simple query execution engine is implemented. This engine prioritizes correctness over performance. The query is then executed against the storage system and the results are compared to ensure consistency and accuracy.
- During the test runs, 2 operations are performed constantly:
- Fault injection: Faults are deliberately injected into the system to test its resilience. These may include simulating scenarios such as temporary loss of a dependency. The system is then evaluated to ensure that it can continue to function properly even in adverse conditions.
- Upgrade/Rollback: A key aspect of testing infrastructure systems is simulating system upgrades and rollbacks. This helps to ensure that the system’s logic can handle such scenarios and is robust enough to survive rolling upgrades, which are a common occurrence in real-world environments.
It is important to note that real-world systems are significantly more intricate than those described above. However, through experience, it has been determined that these testing methods are the most effective at identifying bugs. The complexity of these systems can make it challenging for humans to fully understand and make it difficult to write deterministic tests for them. Simulation tests are typically scheduled to run periodically, such as every day. Multiple sets of simulation tests may be used to test different configurations.
Client libraries often need to support legacy versions for an extended period, such as 18 months. A common approach for testing these libraries is to create a matrix of all supported versions and test all possible combinations. These tests, which are typically integration tests, ensure that the client libraries are compatible with the server. To ensure compatibility and ensure that the client libraries are compatible with the server, these tests should be run periodically, such as every few hours, and automatically build the most recent commits.
Load Tests and Profiling
Load tests are utilized to evaluate the performance of a system and detect performance regressions. This is typically achieved by generating a workload and measuring the system’s performance, either on a regular basis or before each release to the production environment. If a performance regression is detected, profiling tools can be used to identify the underlying cause. Two common types of profiling are:
- CPU profiling: It is used to identify the hotspots in the code. You could see where it is spending most of the CPU cycles.
- Memory profiling: It is used to identify the memory usage. It is quite useful to debug memory leaks or OOM (out-of-memory) issues.
Release testing is a critical step that is carried out before each release to the production environment. It ensures that the system is thoroughly tested and ready to be deployed. This testing process typically includes all the tests mentioned above, and is essential in ensuring that the system is stable and reliable.
The distinction between testing and monitoring becomes less clear at this stage. While it is not strictly testing, monitoring is an important step in ensuring the reliability and stability of the system. Traditional testing methods, such as randomized tests, may not be able to detect all issues, as the workloads used may not be realistic enough. Monitoring involves collecting metrics, logs, traces, and alerts from the production environment and analyzing them to identify potential issues. In our experience, this approach is effective in identifying issues before they impact the users, as it allows for setting up alerts on deeper service objectives.
This step is often overlooked but is crucial for making consistent progress. Once a bug is detected and fixed through production monitoring, a test should be implemented to prevent it from recurring. As issues may be difficult to reproduce, particularly in the case of asynchronous systems, test interceptors can be used to force a specific execution order and make it deterministic. This feedback loop ensures that the system is continuously improving and that bugs are less likely to occur in the future.