现代的服务端技术栈：Golang/Protobuf/gRPC

2019-05-08 11:28

EAWorld

关注

发文

译注：

并发与并行：并发是虚拟的并行，比如通过时间切片技术在单核CPU上运行多个任务，让每个使用者“以为”自己在独占这一CPU资源；并行是实际的同一时间多任务同时运行，多数是指在多核CPU的场景下。

队列与双端队列：队列遵循先入先出的原则，从一端存数，从另一端取数，双端队列支持从队列的两端存数和取数。

阻塞和非阻塞：阻塞和非阻塞描述了程序等待返回结果时的状态，阻塞代表不返回结果就挂起，不进行任何操作；非阻塞是在没返回结果时可以执行其他任务。

合作和抢占：高优先级任务可以打断其他正在运行的低优先级任务，则调度器是抢占式的；反之，则是合作式的。

服务端编程的阵营中有很多新面孔，一水儿的谷歌血统。在谷歌开始将Golang应用于其产品系统后，Golang快速的吸引了大量的关注。随着微服务架构的兴起，人们开始关注一些现代的数据通信解决方案，如gRPC和Protobuf。在本文中，我们会对以上这些概念作一些简要的介绍。

一、Golang

Golang又称Go语言，是一个开源的、多用途的编程语言，由Google研发，并由于种种原因，正在日益流行。Golang已经有10年的历史，并且据Google称已经在生产环境中使用了接近7年的时间，这一点可能让大多数人大跌眼镜。

Golang的设计理念是，简单、现代、易于理解和快速上手。Golang的创造者将Golang设计为一个普通的程序员可以在一个周末的时间就可以掌握，并达到使用Golang进行工作的程度。这一点我已经亲身证实。Golang的创造者，都是C语言原始草案的专家组成员，可以说，Golang根红苗正，值得信赖。

理都懂，但话说回来，为什么我们需要另一门编程语言呢

多数场景下，确实并不需要。事实上，Go语言并不能解决其他语言或工具无法解决的新问题。但在一些强调效率、优雅与直观的场景下，人们通常会面临一系列的相关问题，这正是Go所致力于解决的领域。Go的主要特点是：

一流的并发支持

内核十分简单，语言优雅、现代

高性能

提供现代软件开发所需要的原生工具支持

我将简要介绍Go是如何提供上述的支持的。在Go语言的官网可以了解更多的特性和细节。

一流的并发支持

并发是多数服务端应用所需要考虑的主要问题之一，考虑到现代微处理器的特性，并发也成为编程语言的主要关切之一。Go语言引入了“goroutine”的理念。可以把“goroutine”理解为一个“轻量级的用户空间线程”（现实中，当然远比这要复杂得多，同一线程可能会附着多路的goroutine，但这样的提法可以让你有一个大致的概念）。所谓“轻量级”，可以这样理解，由于采用了十分袖珍的堆栈，你可以同时启动数以百万计的goroutine，事实上这也是Go语言官方所推荐的方式。在Go语言中，任何函数或方法都可以生成一个goroutine。比如，只需要运行“go myAsyncTask（）”就可以从“myAsyncTask”函数生成一个goroutine。示例代码如下：

／／ This function performs the given task concurrently by spawing a goroutine

／／ for each of those tasks．

func performAsyncTasks（task ［］Task）｛

for ＿， task ：＝ range tasks ｛

／／ This will spawn a separate goroutine to carry out this task．

／／ This call is non－blocking

go task．Execute（）

｝

goroutineExample．go hosted with ？ by GitHub

（左右滑动查看全部代码）

怎么样，是不是很简单？Go是一门简单的语言，因此注定是以这样的方式来解决问题。你可以为每个独立的异步任务生成一个goroutine而不需要顾虑太多事情。如果处理器支持多核运行，Go语言运行时会自动的以并行的方式运行所有的goroutine。那么，goroutine之间是如何通信的呢，答案是channel。

“channel”也是Go语言的一个概念，用于进行goroutine之间的通信。通过channel，你可以向另一个goroutine传递各种信息（比如Go语言概念里的type或者struct甚至是channel）。一个channel大体上是一个“双端阻塞队列”（也可以单端的）。如果需要goroutine基于特定条件触发下一步的行动，也可以利用channel来实现goroutine的协作阻塞任务模式。

在编写异步或者并发的代码时，goroutine和channel这两个概念赋予了编程者大量的灵活性和简便性。可以籍此很容易的建立其他很有用的库，比如goroutine pool，举个简单的例子：

package executor

import （

＂log＂

＂sync／atomic＂

）

／／ The Executor struct is the main executor for tasks．

／／＇maxWorkers＇ represents the maximum number of simultaneous goroutines．

／／＇ActiveWorkers＇ tells the number of active goroutines spawned by the Executor at given time．

／／＇Tasks＇ is the channel on which the Executor receives the tasks．

／／＇Reports＇ is channel on which the Executor publishes the every tasks reports．

／／＇signals＇ is channel that can be used to control the executor． Right now， only the termination

／／ signal is supported which is essentially is sending ＇1＇ on this channel by the client．

type Executor struct ｛

maxWorkers int64

ActiveWorkers int64

Tasks chan Task

Reports chan Report

signals chan int

｝

／／ NewExecutor creates a new Executor．

／／＇maxWorkers＇ tells the maximum number of simultaneous goroutines．

／／＇signals＇ channel can be used to control the Executor．

func NewExecutor（maxWorkers int， signals chan int）＊Executor ｛

chanSize ：＝ 1000

if maxWorkers ＞ chanSize ｛

chanSize ＝ maxWorkers

｝

executor ：＝ Executor｛

maxWorkers： int64（maxWorkers），

Tasks： make（chan Task， chanSize），

Reports： make（chan Report， chanSize），

signals： signals，

｝

go executor．launch（）

return ＆executor

｝

／／ launch starts the main loop for polling on the all the relevant channels and handling differents

／／ messages．

func （executor ＊Executor） launch（） int ｛

reports ：＝ make（chan Report， executor．maxWorkers）

for ｛

select ｛

case signal ：＝＜－executor．signals：

if executor．handleSignals（signal）＝＝ 0 ｛

return 0

｝

case r ：＝＜－reports：

executor．addReport（r）

default：

if executor．ActiveWorkers ＜ executor．maxWorkers ＆＆ len（executor．Tasks）＞ 0 ｛

task ：＝＜－executor．Tasks

atomic．AddInt64（＆executor．ActiveWorkers， 1）

go executor．launchWorker（task， reports）

｝

／／ handleSignals is called whenever anything is received on the ＇signals＇ channel．

／／ It performs the relevant task according to the received signal（request） and then responds either

／／ with 0 or 1 indicating whether the request was respected（0） or rejected（1）．

func （executor ＊Executor） handleSignals（signal int） int ｛

if signal ＝＝ 1 ｛

log．Println（＂Received termination request．．．＂）

if executor．Inactive（）｛

log．Println（＂No active workers， exiting．．．＂）

executor．signals ＜－ 0

return 0

｝

executor．signals ＜－ 1

log．Println（＂Some tasks are still active．．．＂）

｝

return 1

｝

／／ launchWorker is called whenever a new Task is received and Executor can spawn more workers to spawn

／／ a new Worker．

／／ Each worker is launched on a new goroutine． It performs the given task and publishes the report on

／／ the Executor＇s internal reports channel．

func （executor ＊Executor） launchWorker（task Task， reports chan＜－ Report）｛

report ：＝ task．Execute（）

if len（reports）＜ cap（reports）｛

reports ＜－ report

｝ else ｛

log．Println（＂Executor＇s report channel is full．．．＂）

｝

atomic．AddInt64（＆executor．ActiveWorkers，－1）

｝

／／ AddTask is used to submit a new task to the Executor is a non－blocking way． The Client can submit

／／ a new task using the Executor＇s tasks channel directly but that will block if the tasks channel is

／／ full．

／／ It should be considered that this method doesn＇t add the given task if the tasks channel is full

／／ and it is up to client to try again later．

func （executor ＊Executor） AddTask（task Task） bool ｛

if len（executor．Tasks）＝＝ cap（executor．Tasks）｛

return false

｝

executor．Tasks ＜－ task

return true

｝

／／ addReport is used by the Executor to publish the reports in a non－blocking way． It client is not

／／ reading the reports channel or is slower that the Executor publishing the reports， the Executor＇s

／／ reports channel is going to get full． In that case this method will not block and that report will

／／ not be added．

func （executor ＊Executor） addReport（report Report） bool ｛

if len（executor．Reports）＝＝ cap（executor．Reports）｛

return false

｝

executor．Reports ＜－ report

return true

｝

／／ Inactive checks if the Executor is idle． This happens when there are no pending tasks， active

／／ workers and reports to publish．

func （executor ＊Executor） Inactive（） bool ｛

return executor．ActiveWorkers ＝＝ 0 ＆＆ len（executor．Tasks）＝＝ 0 ＆＆ len（executor．Reports）＝＝ 0

｝

executor．go hosted with ？ by GitHub

（左右滑动查看全部代码）

内核十分简单，语言优雅、现代

与其他多数的现代语言不同，Golang本身并没有提供太多的特性。事实上，严格限制特性集的范围正是Go语言的显著特征，且Go语言着意于此。Go语言的设计与Java的编程范式不同，也不支持如Python一样的多语言的编程范式。Go只是一个编程的骨架结构。除了必要的特性，其他一无所有。

看过Go语言之后，第一感觉是其不遵循任何特定的哲学或者设计指引，所有的特性都是以引用的方式解决某一个特定的问题，不会画蛇添足做多余的工作。比如，Go语言提供方法和接口但没有类；Go语言的编译器生成动态链接库，但同时保留垃圾回收器；Go语言有严格的类型但不支持泛型；Go语言有一个轻量级的运行时但不支持异常。

Go的这一设计理念的主要用意在于，在表达想法、算法或者编码的环节，开发者可以尽量少想或者不去想“在某种编程语言中处理此事的最佳方案”，让不同的开发者可以更容易理解对方的代码。不支持泛型和异常使得Go语言并不那么完美，也因此在很多场景下束手束脚，因此在“Go 2”版本中，官方加入了对这些必要特性的考虑。

高性能

单线程的执行效率并不足以评估一门语言的优劣，当语言本身聚焦于解决并发和并行问题的时候尤其如此。即便如此，Golang还是跑出了亮眼的成绩，仅次于一些硬核的系统编程语言，如C／C＋＋／Rust等等，并且Golang还在不断的改进。考虑到Go是有垃圾回收机制的语言，这一成绩实际上相当的令人印象深刻，这使得Go语言的性能可以应付几乎所有的使用场景。

（Image Source： Medium）

提供现代软件开发所需要的原生工具支持

是否采用一种新的语言或工具，直接取决于开发者体验的好坏。就Go语言来说，其工具集是用户采纳的主要考量。同最小化的内核一样，Go的工具集也采用了同样的设计理念，最小化，但足够应付需要。执行所有Go语言工具，都采用 go 命令及其子命令，并且全部是以命令行的方式。

Go语言中并没有类似pip或者npm这类包管理器。但只需要下面的命令，就可以得到任何的社区包：

go get github．com／farkaskid／WebCrawler／blob／master／executor／executor．go

（左右滑动查看全部代码）

是的，这样就行。可以直接从Github或其他地方拉取所需要的包。所有的包都是源代码文件的形态。

对于package．json这类的包，我没有看到与 goget 等价的命令。事实上也没有。在Go语言中，无须在一个单一文件中指定所有的依赖，可以在源文件中直接使用下面的命令：

import ＂github．com／xlab／pocketsphinx－go／sphinx＂

（左右滑动查看全部代码）

那么，当执行go build命令的时候，运行时会自动的运行 goget 来获取所需要的依赖。完整的源码如下：

package main

import （

＂encoding／binary＂

＂bytes＂

＂log＂

＂os／exec＂

＂github．com／xlab／pocketsphinx－go／sphinx＂

pulse ＂github．com／mesilliac／pulse－simple＂／／ pulse－simple

）

var buffSize int

func readInt16（buf ［］byte）（val int16）｛

binary．Read（bytes．NewBuffer（buf）， binary．LittleEndian，＆val）

return

｝

func createStream（）＊pulse．Stream ｛

ss ：＝ pulse．SampleSpec｛pulse．SAMPLE＿S16LE， 16000， 1｝

buffSize ＝ int（ss．UsecToBytes（1 ＊ 1000000））

stream， err ：＝ pulse．Capture（＂pulse－simple test＂，＂capture test＂，＆ss）

if err ！＝ nil ｛

log．Panicln（err）

｝

return stream

｝

func listen（decoder ＊sphinx．Decoder）｛

stream ：＝ createStream（）

defer stream．Free（）

defer decoder．Destroy（）

buf ：＝ make（［］byte， buffSize）

var bits ［］int16

log．Println（＂Listening．．．＂）

for ｛

＿， err ：＝ stream．Read（buf）

if err ！＝ nil ｛

log．Panicln（err）

｝

for i ：＝ 0； i ＜ buffSize； i ＋＝ 2 ｛

bits ＝ append（bits， readInt16（buf［i：i＋2］））

｝

process（decoder， bits）

bits ＝ nil

｝

func process（dec ＊sphinx．Decoder， bits ［］int16）｛

if ！dec．StartUtt（）｛

panic（＂Decoder failed to start Utt＂）

｝

dec．ProcessRaw（bits， false， false）

dec．EndUtt（）

hyp， score ：＝ dec．Hypothesis（）

if score ＞－2500 ｛

log．Println（＂Predicted：＂， hyp， score）

handleAction（hyp）

｝

func executeCommand（commands ．．．string）｛

cmd ：＝ exec．Command（commands［0］， commands［1：］．．．）

cmd．Run（）

｝

func handleAction（hyp string）｛

switch hyp ｛

case ＂SLEEP＂：

executeCommand（＂loginctl＂，＂lock－session＂）

case ＂WAKE UP＂：

executeCommand（＂loginctl＂，＂unlock－session＂）

case ＂POWEROFF＂：

executeCommand（＂poweroff＂）

｝

func main（）｛

cfg ：＝ sphinx．NewConfig（

sphinx．HMMDirOption（＂／usr／local／share／pocketsphinx／model／en－us／en－us＂），

sphinx．DictFileOption（＂6129．dic＂），

sphinx．LMFileOption（＂6129．lm＂），

sphinx．LogFileOption（＂commander．log＂），

）

dec， err ：＝ sphinx．NewDecoder（cfg）

if err ！＝ nil ｛

panic（err）

｝

listen（dec）

｝

client．go hosted with ？ by GitHub

（左右滑动查看全部代码）

上述的代码将把所有的依赖声明与源文件绑定在一起。

如你所见，Go语言是如此的简单、最小化但仍足够满足需要并且十分优雅。Go语言提供了诸多的直接的工具支持，既可用于单元测试，也可以用于benchmark的火焰图。诚然，正如前面所讲到的特性集方面的限制，Go语言也有其缺陷。比如， goget 并不支持版本化，一旦源文件中引用了某个URL，就将锁定于此。但是，Go也还在逐渐的演进，一些依赖管理的工具也正在涌现。

Golang最初是设计用来解决Google的一些产品问题，比如厚重的代码库，以及满足编写高效并发类应用的急迫需求。在需要利用现代处理器的多核特性的场景，Go语言使得在应用和库文件的编程方面变得更加容易。并且，这些都不需要开发者来考虑。Go语言是一门现代的编程语言，简单是其主旨，Go语言永远不会考虑超过这一主旨的范畴。

二、Protobuf（Protocol Buffers）

Protobuf 或者说 Protocol Buffers是由Google研发的一种二进制通信格式，用以对结构化数据进行序列化。格式是什么意思？类似于JSON这样？是的。Protobuf已经有10年的历史，在Google内部也已经使用了一段时间。

既然已经有了JSON这种通信格式，并且得到了广泛的应用，为什么需要Protobuf？

与Golang一样，Protobuf实际上并有解决任何新的问题，只是在解决现有的问题方面更加高效，更加现代化。与Golang不同的是，Protobuf并不一定比现存的解决方案更加优雅。下面是Protobuf的主要特性：

Protobuf是一种二进制格式，不同于JSON和XML，后者是基于文本的也因此相对比较节省空间。

Protobuf提供了对于schema的精巧而直接的支持

Protobuf为生成解析代码和消费者代码提供直接的多语言支持。

Protobuf的二进制格式带来的是传输速度方面的优化

那么Protobuf是不是真的很快？简单回答，是的。根据Google Developer的数据，相对于XML来说，Protobuf在体积上只有前者的1／3到1／10，在速度上却要快20到100倍。毋庸置疑的是，由于采用了二进制格式，序列化的数据对于人类来说是不可读的。

（Image Source： Beating JSON performance with Protobuf）

相对其他传输协议格式来说，Protobuf采用了更有规划性的方式。首先需要定义．proto 文件，这种文件与schema类似，但更强大。在．proto 文件中定义消息结构，哪些字段是必选的哪些是可选的，以及字段的数据类型等。接下来，Protobuf编译器会生成用于数据访问的类，开发者可以在业务逻辑中使用这些类来更方便的进行数据传输。

观察某个服务的．proto 文件，可以清晰的获知通信的细节以及暴露的特性。一个典型的．proto 文件类似如下：

message Person ｛

required string name ＝ 1；

required int32 id ＝ 2；

optional string email ＝ 3；

enum PhoneType ｛

MOBILE ＝ 0；

HOME ＝ 1；

WORK ＝ 2；

｝

message PhoneNumber ｛

required string number ＝ 1；

optional PhoneType type ＝ 2 ［default ＝ HOME］；

｝

repeated PhoneNumber phone ＝ 4；

｝

protobufExample．proto hosted with ？ by GitHub

（左右滑动查看全部代码）

曝个料：Stack Overflow的大牛Jon Skeet也是Protobuf项目的主要贡献者之一。

三、gRPC

gRPC，物如其名，是一种功能齐备的现代的RPC框架，提供了诸多内置支持的机制，如负载均衡、跟踪、健康检查和认证等。gRPC由Google在2015年开源，并由此日益火爆。

既然已经有了REST，还搞个RPC做什么？

在SOA架构的时代，有相当长的时间，基于WSDL的SOAP协议是系统间通信的解决方案。彼时，通信协议的定义是十分严格的，庞大的单体架构系统暴露大量的接口用于扩展。

随着B／S理念的兴起，服务器和客户端开始解耦，在这样的架构下，即使客户端和服务端分别进行独立的编码，也不影响对服务的调用。客户端想查询一本书的信息，服务端会根据请求提供相关的列表供客户端浏览。REST范式主要解决的就是这种场景下的问题，REST允许服务端和客户端可以自由的通信，而不需要定义严格的契约以及独有的语义。

从某种意义上讲，此时的服务已经开始变得像是单体式架构系统一样，对于某个特定的请求，会返回一坨毫无必要的数据，用以满足客户端的“浏览”需求。但这并不是所有场景下都会发生的情况，不是么？

跨入微服务的时代

采用微服务架构理由多多。最常提及的事实是，单体架构太难扩展了。以微服务架构设计大型系统，所有的业务和技术需求都倾向于实现成互相合作的组件，这些组件就是“微”服务。

微服务不需要以包罗万象的信息响应用户请求，而仅需要根据请求完成特定的任务并给出所需要的回应。理想情况下，微服务应该像一堆可以无縫组装的函数。

使用REST做为此类服务的通信范式变得不那么有效。一方面，采用REST API确实可以让服务的表达能力更强，但同时，如果这种表达的能力既非必要也并不出自设计者的本意，我们就需要根据不同的因素考虑其他范式了。

gRPC尝试在如下的技术方面改进传统的HTTP请求：

默认支持HTTP／2协议，并可以享受该协议带来的所有好处

采用Protobuf格式用于机器间通信

得益于HTTP／2协议，提供了对流式调用的专有支持

对所有常用的功能提供了插件化的支持，如认证、跟踪、负载均衡和健康检查等。

当然，既然是RPC框架，仍旧会有服务定义和接口描述语言（DSL）的相关概念，REST世代的开发者可能会感觉这些概念有些格格不入，但是由于gRPC采用Protobuf做为通信格式，就不会显得像以前那么笨拙。

Protobuf的设计理念使得其既是一种通信格式，又可以是一种协议规范工具，在此过程中无需做任何额外的工作。一个典型的gRPC服务定义类似如下：