Skip to main content

Dongxu Huang: Building a Database Startup in China

Company

  • Established for four years, with the first two years spent writing code; in the last year and a half, one or two hundred companies have started using it.
  • Infra has a significant advantage in China due to the large market; companies adopt things quickly and aggressively, allowing good infra products to be utilized rapidly.
  • None of the co-founders have database experience.
  • The open-source model is the future; if it's closed-source, negotiations have to be done one by one.
  • The most important business decision: it's not about how advanced the algorithms are or how strong the team is; the key to the moat lies in 1) community 2) MySQL interface (leveraging the MySQL community is crucial, and SQL support is important, as even Kafka and Spark support SQL).

TiDB Database Principles

The more advanced the engineer, the more they prefer high performance, believing that faster is better. However, TiDB's primary goal is not to be the fastest but to achieve availability, reliability, stability, I/O, and infinite scalability. The cost of high performance is too high and should be optimized based on the user's hardware. However, this is a general-purpose database, so optimizing for various scenarios is not feasible. Keep it simple for users and complex for ourselves (contrary to AWS's approach).

They believe eventual consistency is a pseudo-concept; in reality, it means no consistency or weak consistency. For example, with Cassandra, once WRN is set up, how long does it take to resolve? It's uncertain. It's too complex; users should ideally not have to worry about these complicated settings.

Benchmark scores are not the only metric; high TPCC/TPCH scores do not provide much practical guidance. Databases are refined through use. For instance, the first customer, a gaming company, created an astonishing 30,000 tables, and the JSON metadata was very slow to connect.

The architecture is not P2P; different roles are clearly defined.

KV uses RocksDB, but the typical write amplification of LSM trees is 15 times; here, it has been optimized by extracting values to solve the write amplification issue.

Supports MySQL clients and also supports reading from SparkSQL.

The SQL layer does not use MySQL modules. Initially, there was an attempt, but 1) it was challenging to distribute 2) the code was too poor and difficult to modify. If modified drastically, it could take six months; however, the long-term maintenance cost would be higher. Redoing is not troublesome; they have already refactored three times. Using Go is more convenient for refactoring than using C/C++/Rust.

Initially, they only wanted to do F1 and SQL, collaborating with CockroachDB. Later, as they moved towards SQL, they had to focus on storage.

They hired two members from the Rust core team. Rust is challenging to recruit for; typically, they hire C++ developers and then transition them to Rust, as they find that many compiler conventions are resolved for them.

Do not underestimate the difficulty of creating industrial-grade solutions. Using gRPC, Raft, and RocksDB means that if there are new developments in the industry, users will directly benefit.

Chunk (region) splitting took two months, while merging took three years. Merging has undergone formal verification.

Why now?

  1. Hardware
  2. Hot/cold data -> warm data
  3. Log is the new database

Everything is pluggable. The top-level API remains unchanged, while the underlying components are plug-and-play and replaceable.

Distributed Transactions

  • 2PC is the only option
  • Challenges: reduce round-trips

Multi-tenancy achieved through Kubernetes

China Biz Trend

  1. Chinese speed; once it's said, it must be done.
  2. Higher expectations for new technologies to empower businesses. Companies in second and third-tier cities or non-BAT firms are using new technologies to compete with giants. They must be able to alleviate users' technical anxieties.
  3. The talent pool for foundational software is gradually strengthening. P's production capacity CAP has contributed significantly to technical content marketing.
  4. Some core scenarios (core banking systems) dare to use domestic technologies.
  5. PingCAP's path: open source (internet/community) < - > commercialization.
References: