namespace lambdai: January 2021

Friday, January 15, 2021

sample application in C++ with lib boringssl

BoringSSL is a fork of OpenSSL that is designed to meet Google's needs. The maintainers provide a branch(master-with-bazel) with BUILD, turning it into an external bazel dependency. However, it's still one step away from a runnable main function.

Below is my scaffold.

WORKSPACE

workspace(name = "boringsslapp")

load("@bazel_tools//tools/build_defs/repo:git.bzl", "git_repository") 

git_repository(
    name = "boringssl",
    commit = "bdbe37905216bea8dd4d0fdee93f6ee415d3aa15",
    remote = "https://boringssl.googlesource.com/boringssl",
)

source/main.cc

#include "external/boringssl/src/include/openssl/bio.h"
#include "external/boringssl/src/include/openssl/err.h"
#include "external/boringssl/src/include/openssl/ssl.h"

int main(int argc, char **argv) {
  SSL_CTX *ctx = SSL_CTX_new(TLS_method());
  SSL *ssl = SSL_new(ctx);
  std::cout << "Hello, boringssl " << ssl << std::endl;
  SSL_free(ssl);
  SSL_CTX_free(ctx);
  return 0;
}

source/BUILD

cc_binary(
    name = "main",
    srcs = ["main.cc"],
    deps = [
        "@boringssl//:crypto",
        "@boringssl//:ssl",
    ],
)

$ bazel run //source:main

Hello, boringssl 0x562960bbbbe8

Saturday, January 9, 2021

我对kernel的敬畏之心在逐渐退散

经过这些天各式各样代码的阅读，userspace和kernel的，C++的和rust的，我一直以来崇拜linux kernel的心境逐渐在变化。

之前的half truth

kernel代码坚不可摧
我依然认为kernel的代码质量很高，因为more eyes on it。code review的都是领域的大神，同时发布之后的代码很难像发布service一样快速迭代，长期生长在这样严格的环境，maintainer必然倾向于维护高质量的代码。
kernel代码没有tech debt，即使有，也会很快消灭
从list的head和entry在逐渐用field name区分的趋势上看，tech debt更有可能在kernel中生存更久，因为骨灰级的maintainer才可能推动这样伤筋动骨的改动，但是一旦有了这样的能力，恐怕这些readability的改进对maintainer来说已经不再有多少impact。
kernel代码的开发的方法论应该推广到所有非kernel的领域
kconfig language很赞，但我怀疑在其他领域是否值得这样一个系统。kernel代码的作者来源更广，也许github不是最好的选择，但不代表所有代码都达到这个规模。同时github等代码hosting的提供者能不能继续改进从而能为kernel这个级别的代码库提供服务？我觉得并非不可能，而是github本身动力不足，毕竟这样规模的代码库恐怕也不多。
kernel的算法和结构远超userspace
从代码并发的需求上，kernel的代码的客观需求更加强烈。毕竟userspace的application很多没有必要部署在三位数的core的机器上。一个application需要64core的机器很罕见，通常的部署方式也更倾向于frontend做loadbalance，后面的application有多个replica。很多时候，即使软件可以scale到1M connections，为了减少部署时的可控和impact，更多时候是使用多个replica，而每个replica负责xxK connections。然而kernel的代码必须要支持最新的处理器，因为不能让一个baremetal上面跑两个kernel。
kernel的代码更超前
从某种意义上，kernel的代码需要从device driver的角度支持最新的硬件，而应用层依赖于更稳定并且陈旧的api。不过，反过来说，kernel和application都是老式api的奴隶。即使是新的device driver也需要向老的API妥协。
kernel推动硬件的发展
其实应该说kernel代表userspace和kernelspace一起推动硬件的发展。从DPDK上就可以看出来，各种bypass kernel的方案，不是完全抛弃kernel，而是让kernel提供更薄的包装，让userspace可以有更大的操作空间去使用硬件的API。相信这个layer变薄之后，会诞生更多来自userspace的硬件需求。

Friday, January 8, 2021

wait_queue api in kernel

I have been trapped under the impression that `wait_queue` is always linked with task pending. However, this is totally wrong.

The major API of wait_queue is

1. initialization

DEFINE_WAIT_FUNC(name, function) or init_wait_queue_entry

2. add an entry to queue

void add_wait_queue(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry);

Variations:
poll_wait(...)

3. Helper functions do real wait for the schedule.


// CPU0 waker

wake_up(wq_head); 


// CPU1 waiter

prepare_to_wait(&wq_head, &wait, state);
schedule();
if (@cond) {break;}

finish_wait(&wq_head, &wait);
// sleep until a condition gets true
wait_event(wq_head, condition)

P.S Glad to see the naming is moving to a less misleading situation. Basically the entry and head are named by their role, although the underlying struct is always list_head.

wait_queue_t -> wait_queue_entry_t

wait_queue_t::task_list -> wait_queue_entry_t::entry

wait_queue_head_t::task_list -> wait_queue_head_t::head

Monday, January 4, 2021

qemu, kvm and rr

tldr

qemu is an emulator. The qemu process owns the huge state machine which simulates a PC. Neither OS nor CPU running qemu is aware of the guest VM. All cpu instruction varies at side effects while driving the state machine in qemu.

The privileged instructions have no privilege in the view of qemu. Most importantly, qemu provides lots of hardware simulators like e1000, i440fx.
Qemu requires no host cpu feature.

kvm is an accelerator of qemu. Kvm utilizes vm extension of cpu to execute plain non-privileged instructions and take over the control from guest privileged instructions (cpu virtualization), as long as VMCS is populated and control registers are set correctly by host OS. The guest os image can be byte to byte equivalent to the host os image but the cpu executions are different since the flags in physical cpu registers are set to "guest mode" while executing cpu instructions in the guest OS.

rr is a delicate debugger in the view of the target process or binary. Another view is a "malicious" OS that disguises the target process the required syscall is executed but returning a recorded value. rr doesn't live in the kernel for now, so rr requires the co-op of the target process not intentionally revert the benevolent temporary changes to the target process. Similar to Qemu, rr also needs to understand the side effect of each cpu instruction to record the side effect for future replay, even though the instruction is not considered as "privileged". An example is a wall clock read instruction. rr need to intercept the instruction and return the recorded value to the target process.

namespace lambdai