Golang Image

前言

本文对golang中image包的用法和底层实现进行了总结和探究。包含内容如下： 1. 第一部分，介绍image的基本使用。 2. 第二部分，结合golang中image的源码和图片文件相关基础知识，对golang image的底层机制进行了探究和分析。

Golang Image包的基本使用

Golang官方文档（https://golang.org/pkg/image/）对Image包的介绍为：Package image implements a basic 2-D image library.

The fundamental interface is called Image. An Image contains colors, which are described in the image/color package.

使用举例

包的介绍文档开头说到：Values of the Image interface are created either by calling functions such as NewRGBA and NewPaletted, or by calling Decode on an io.Reader containing image data in a format such as GIF, JPEG or PNG. Decoding any particular image format requires the prior registration of a decoder function. Registration is typically automatic as a side effect of initializing that format's package so that, to decode a PNG image, it suffices to have

import _ "image/png"

翻译一下就是可以通过NewRGBA，NewPaletted或者对GIF/JPEG/PNG文件（需要引入对应包完成注册）使用io.Reader读入后用Decode方法处理可以获得一个Image接口实例。

图片解码

通过image包下的decode_example_test可以看一下使用方法

func Example_decodeConfig() {

reader := base64.NewDecoder(base64.StdEncoding, strings.NewReader(data))

config, format, err := image.DecodeConfig(reader)

if err != nil {

log.Fatal(err)

}

fmt.Println("Width:", config.Width, "Height:", config.Height, "Format:", format)

}

func Example() {

// Decode the JPEG data. If reading from file, create a reader with

//

// reader, err := os.Open("testdata/video-001.q50.420.jpeg")

// if err != nil {

// log.Fatal(err)

// }

// defer reader.Close()

reader := base64.NewDecoder(base64.StdEncoding, strings.NewReader(data))

m, _, err := image.Decode(reader)

if err != nil {

log.Fatal(err)

}

bounds := m.Bounds()

// Calculate a 16-bin histogram for m's red, green, blue and alpha components.

//

// An image's bounds do not necessarily start at (0, 0), so the two loops start

// at bounds.Min.Y and bounds.Min.X. Looping over Y first and X second is more

// likely to result in better memory access patterns than X first and Y second.

var histogram [16][4]int

for y := bounds.Min.Y; y < bounds.Max.Y; y++ {

for x := bounds.Min.X; x < bounds.Max.X; x++ {

r, g, b, a := m.At(x, y).RGBA()

// A color's RGBA method returns values in the range [0, 65535].

// Shifting by 12 reduces this to the range [0, 15].

histogram[r>>12][0]++

histogram[g>>12][1]++

histogram[b>>12][2]++

histogram[a>>12][3]++

}

// Print the results.

fmt.Printf("%-14s %6s %6s %6s %6s\n", "bin", "red", "green", "blue", "alpha")

for i, x := range histogram {

fmt.Printf("0x%04x-0x%04x: %6d %6d %6d %6d\n", i<<12, (i+1)<<12-1, x[0], x[1], x[2], x[3])

}

// Output:

// bin red green blue alpha

// 0x0000-0x0fff: 364 790 7242 0

// 0x1000-0x1fff: 645 2967 1039 0

// 0x2000-0x2fff: 1072 2299 979 0

// 0x3000-0x3fff: 820 2266 980 0

// 0x4000-0x4fff: 537 1305 541 0

// 0x5000-0x5fff: 319 962 261 0

// 0x6000-0x6fff: 322 375 177 0

// 0x7000-0x7fff: 601 279 214 0

// 0x8000-0x8fff: 3478 227 273 0

// 0x9000-0x9fff: 2260 234 329 0

// 0xa000-0xafff: 921 282 373 0

// 0xb000-0xbfff: 321 335 397 0

// 0xc000-0xcfff: 229 388 298 0

// 0xd000-0xdfff: 260 414 277 0

// 0xe000-0xefff: 516 428 298 0

// 0xf000-0xffff: 2785 1899 1772 15450

}

const data = `

/9j/4AAQSkZJRgABAQIAHAAcAAD/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdA

SFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2Nj

Y2NjY2NjY2Nj ....`

这段代码包含两个函数，Example_decodeConfig和Example。

Example_decodeConfig展示了调用image.DecodeConfig得到了图片的宽、高以及格式。传入的参数reader通过base64.NewDecoder(base64.StdEncoding, strings.NewReader(data))得到。其实就是把末尾的常量data通过strings.NewReader包装成io.reader并通过base64解码为二进制的io.reader。base64是一种算法，可将字节流转换为可打印字符流，想详细了解base64可以参考http://www.sunshine2k.de/articles/coding/base64/understanding_base64.html。

Example函数在使用与Example_decodeConfig中相同方法获得io.reader后，使用image.Deocde函数对其解码，获得了一个image.Image接口的实例。通过image.Image获得了图片的整个矩形范围，并遍历所有像素，统计色彩信息RGBA（后续会详述）并绘制了一个16-bin直方图。

Image接口

image/image.go中定义了Image接口

// Image is a finite rectangular grid of color.Color values taken from a color

// model.

type Image interface {

// ColorModel returns the Image's color model.

ColorModel() color.Model

// Bounds returns the domain for which At can return non-zero color.

// The bounds do not necessarily contain the point (0, 0).

Bounds() Rectangle

// At returns the color of the pixel at (x, y).

// At(Bounds().Min.X, Bounds().Min.Y) returns the upper-left pixel of the grid.

// At(Bounds().Max.X-1, Bounds().Max.Y-1) returns the lower-right one.

At(x, y int) color.Color

}

该接口定义了三个方法，ColorModel, Bounds和At。

ColorModel方法返回的color.Model定义在color包中，也是一个接口，其定义在image/color/color.go：

// Model can convert any Color to one from its own color model. The conversion

// may be lossy.

type Model interface {

Convert(c Color) Color

}

Bounds方法返回一个矩形，image包中是一个Rectangle(定义在image/geom.go)通过左上方和右下方的两个Point来确定。

// A Rectangle contains the points with Min.X <= X < Max.X, Min.Y <= Y < Max.Y.

// It is well-formed if Min.X <= Max.X and likewise for Y. Points are always

// well-formed. A rectangle's methods always return well-formed outputs for

// well-formed inputs.

//

// A Rectangle is also an Image whose bounds are the rectangle itself. At

// returns color.Opaque for points in the rectangle and color.Transparent

// otherwise.

type Rectangle struct {

Min, Max Point

}

// A Point is an X, Y coordinate pair. The axes increase right and down.

type Point struct {

X, Y int

}

At方法入参数为坐标，返回一个颜色接口变量。

// Color can convert itself to alpha-premultiplied 16-bits per channel RGBA.

// The conversion may be lossy.

type Color interface {

// RGBA returns the alpha-premultiplied red, green, blue and alpha values

// for the color. Each value ranges within [0, 0xffff], but is represented

// by a uint32 so that multiplying by a blend factor up to 0xffff will not

// overflow.

//

// An alpha-premultiplied color component c has been scaled by alpha (a),

// so has valid values 0 <= c <= a.

RGBA() (r, g, b, a uint32)

}

Color接口的方法是RGBA，RGB的意思都很明显，而alpha值，则是图片的透明度（不透明性）。

RGB是一种常见的色彩模型，常见的色彩模型有RGB、CMYK和YUV。

色彩模型

RGB

RGB色彩空间或RGB色彩系统可根据红色R，绿色G和蓝色B的组合构造所有颜色。RGB即是代表Red红、Green绿、Blue蓝三个通道的颜色，这个标准几乎包括了人类视力所能感知的所有颜色，是运用最广泛的色彩系统之一。

红、绿、蓝也被成为三原色。这是加色系。就是光源只含有特定的波段，本身就是色光，将不同颜色的光加在一起形成新的颜色。典型的例子是显示屏。https://www.zhihu.com/question/25074051?sort=created

|

电脑屏幕上的所有颜色，都由这红色绿色蓝色三种色光按照不同的比例混合而成的。一组红色绿色蓝色就是一个最小的显示单位。屏幕上的任何一个颜色都可以由一组RGB值来记录和表达。在电脑中，RGB的所谓“多少”就是指亮度，并使用整数来表示。通常情况下，RGB各有256级亮度，用数字表示为从0、1、2...直到255。注意虽然数字最高是255，但0也是数值之一，因此共256级。

RGBA是代表Red（红色）Green（绿色）Blue（蓝色）和Alpha的色彩控。虽然它有的时候被描述为一个颜色空间，但是它其实仅仅是RGB模型的附加了额外的信息。采用的颜色是RGB，可以属于任何一种RGB色彩空间。

alpha通道一般用作不透明度参数。如果一个像素的alpha通道数值为0%，那它就是完全透明的（也就是看不见的），而数值为100%则意味着一个完全不透明的像素（传统的数字图像）。在0%和100%之间的值则使得像素可以透过背景显示出来，就像透过玻璃（半透明性），这种效果是简单的二元透明性（透明或不透明）做不到的。它使图片合成变得容易。alpha通道值可以用百分比、整数或者像RGB参数那样用0到1的实数表示。

Golang用uint32来表示每个通道，实际上值的范围是[0, 0xffff]，只用到了16位，剩下的16位适用于保证乘上透明度后alpha后不会溢出。

*透明度alpha和预乘透明度premultiplied-alpha

从概念上讲，alpha混合用于传达一个面的透明度。通常，消费者应用程序（游戏）倾向于使用RGB来传达底层表面的颜色，依靠alpha通道来指示该颜色的“不透明性”。

具体地说，在管道中启用Alpha混合时，开发人员倾向于使用以下形式进行混合：

透明通道在渲染的时候通过 Alpha Blending 产生作用，如果一个透明度为 as 的颜色 Cs 渲染到颜色 Cd 上，混合后的颜色通过以下公式计算：

Co = asCs + (1 − as)Cd

在旧的固定功能中，这称为“ SourceAlpha，InvSourceAlpha”；也称为“后乘Alpha”。

例如两个相邻像素（通道均为8位）：(255, 0, 0，153)红色和白色(255, 255, 255 ,0)要进行混合，把红色的渲染到白色，

Co = (255, 0, 0) * 0.6 + (255, 255, 255) *(1 − 0.6) = (255, 102, 102)

但是，这种形式的alpha混合存在一个严重缺陷：在许多情况下，它会导致颜色错误！（这部分的详细解释参见https://developer.nvidia.com/content/alpha-blending-pre-or-not-pre 和https://www.cnblogs.com/dins/p/premultiplied-alpha.html）

如果颜色以 Premultiplied Alpha 形式存储，Premultiplied Alpha 是把 RGB 通道乘以透明度也就是（r * a, g * a, b * a, a），也就是 Cs 已经乘以透明度了，上面的60%透明度的红色就表示成了(255*0.6, 0, 0,0, 153)。所以混合公式变成了：

Co = Cs′ + (1 − as)Cd

CMYK

CMYK（印刷四色模式）是彩色印刷时采用的一种套色模式，利用色料的三原色混色原理，加上黑色油墨，共计四种颜色混合叠加，形成所谓“全彩印刷”。四种标准颜色是：C：Cyan = 青色，又称为‘天蓝色’或是‘湛蓝’M：Magenta = 品红色，又称为‘洋红色’；Y：Yellow = 黄色；K：blacK=黑色，CMYK模式是减色模式。

加入黑色的原因

青、品红、黄三色也是印刷三原色，三种颜色理论上可以混合出黑色，但是现实中由于生产技术的限制，油墨纯度往往不尽人意，混合出的黑色不够浓郁，只能依靠提纯的黑色加以混合。另外也可以节省油墨消耗，一副图中的黑色部分，如果在没有黑色油墨的情况下是由等量的CMY混合成黑色的，如果有黑色则可以直接使用黑色减少油墨的使用量。

YUV

YUV，是一种颜色编码方法。常使用在各个视频处理组件中。 YUV在对照片或视频编码时，考虑到人类的感知能力，允许降低色度的带宽。

YUV是编译true-color颜色空间（color space）的种类，Y'UV, YUV, YCbCr，YPbPr等专有名词都可以称为YUV，彼此有重叠。“Y”表示明亮度（Luminance或Luma），也就是灰阶值，“U”和“V”表示的则是色度（Chrominance或Chroma），作用是描述影像色彩及饱和度，用于指定像素的颜色。

Y′UV,YUV,YCbCr,YPbPr所指涉的范围，常有混淆或重叠的情况。从历史的演变来说，其中YUV和Y'UV通常用来编码电视的模拟信号，而YCbCr则是用来描述数字的视频信号，适合视频与图片压缩以及传输，例如MPEG、JPEG。但在现今，YUV通常已经在电脑系统上广泛使用。

在image包中有ycbcr.go文件，其中将YCbCr定义为一个结构体

type YCbCr struct {

Y, Cb, Cr []uint8

YStride int

CStride int

SubsampleRatio YCbCrSubsampleRatio

Rect Rectangle

}

// YCbCr is an in-memory image of Y'CbCr colors. There is one Y sample per

// pixel, but each Cb and Cr sample can span one or more pixels.

// YStride is the Y slice index delta between vertically adjacent pixels.

// CStride is the Cb and Cr slice index delta between vertically adjacent pixels

// that map to separate chroma samples.

// It is not an absolute requirement, but YStride and len(Y) are typically

// multiples of 8, and:

// For 4:4:4, CStride == YStride/1 && len(Cb) == len(Cr) == len(Y)/1.

// For 4:2:2, CStride == YStride/2 && len(Cb) == len(Cr) == len(Y)/2.

// For 4:2:0, CStride == YStride/2 && len(Cb) == len(Cr) == len(Y)/4.

// For 4:4:0, CStride == YStride/1 && len(Cb) == len(Cr) == len(Y)/2.

// For 4:1:1, CStride == YStride/4 && len(Cb) == len(Cr) == len(Y)/4.

// For 4:1:0, CStride == YStride/4 && len(Cb) == len(Cr) == len(Y)/8.

代码下的注释讲了6种类降采样方式，分别是4:4:4，4:2:2，4:2:0，4:4:0，4:1:1，4:1:0。可结合下图进行理解。https://zhuanlan.zhihu.com/p/85620611 http://dougkerr.net/Pumpkin/articles/Subsampling.pdf

对每一个像素均需对亮度的值Y进行采样，但是Cb和Cr的采样则是每一个或多个像素进行一次采样。

如上图中所示，左侧一列，每一个小矩形是图形像素表示，黑框矩形是色度像素表示，小黑点是表示色度像素值(Cb+Cr)，表示图形像素和色度像素在水平和垂直方向的比例关系。比如，

4：4：0 水平方向是1/1，垂直方向是1/2，表示一个色度像素对应了两个图形像素。

4：2：2 水平方向是1/2，垂直方向是1/1，表示一个色度像素对应了两个图形像素。

4：2：0 水平方向是1/2，垂直方向是1/2，表示一个色度像素对应了四个图形像素。

右侧一列是二次采样模式记号表示, 是 J：a：b 模式，实心黑色圆圈表示包含色度像素(Cb+Cr），空心圆圈表示不包含色度像素。对于 J:a:b 模式，主要是围绕参考块的概念定义的，这个参考块是一个 J x 2 的矩形，J 通常是 4。这样，此参考块就是宽度有 4 个像素、高度有 2 个像素的矩形。a 表示参考块的第一行包含的色度像素样本数，b 表示在参考块的第二行包含的色度像素样本数。

4：4：0 参考块第一行包含四个色度样本，第二行没有包含色度样本。

4：2：2 参考块第一行包含两个色度样本，第二行也包含两个色度样本，他们是交替出现。

4：2：0 参考块第一行包含两个色度样本，第二行没有包含色度样本。

现在我们发现 yuv444，yuv422，yuv420 yuv 等像素格式的本质是：每个图形像素都会包含亮度值，但是某几个图形像素会共用一个色度值，这个比例关系就是通过 4 x 2 的矩形参考块来定的。这样很容易理解类似 yuv440，yuv420 这样的格式了。

YCbCr结构体中用YCbCrSubsampleRatio类型的枚举值SubsampleRatio来表示降采样的方式，与之对应的还有YStride和CStride两个int型的值来表示，其对应关系见注释。

图片格式

看一段通过image包来读取文件夹下的所有图片的宽高的代码

import (

"fmt"

"image"

"io/ioutil"

"os"

"strconv"

"strings"

)

import _ "image/jpeg"

import _ "image/gif"

import _ "image/png"

func main() {

// 获取当前文件夹路径

dir,_ := os.Getwd()

// 读取当前文件夹的所有文件及文件夹

files,_ := ioutil.ReadDir(dir)

var imgs []os.FileInfo

for _,f := range files{

// 过滤，只找到JPG图片

ok := strings.HasSuffix(f.Name(), ".jpg")

if ok {

imgs = append(imgs,f)

}

count := strconv.Itoa(len(imgs))

fmt.Println("共计找到"+count+"张照片")

for _,img := range imgs{

imgName := img.Name()

imgRes,_ := os.Open(imgName)

imgFile,_,err := image.DecodeConfig(imgRes)

if err != nil{

fmt.Println(err.Error())

}

width := strconv.Itoa(imgFile.Width)

height := strconv.Itoa(imgFile.Height)

fmt.Println("图片["+imgName+"] 宽:"+width+"像素：高:"+height+"像素")

}

fmt.Println("输入任意键关闭窗口")

var a string

fmt.Scan(&a)

}

通过image.DecodeConfig函数可以获得image.Config结构体，其定义为

// Config holds an image's color model and dimensions.

type Config struct {

ColorModel color.Model

Width, Height int

}

在引用包的时候除了引用了包含DecodeConfig函数的image包以外，还引入了image/jpeg, image/gif以及image/png。如果不引用这三个包将无法解析对应格式的图片，大家可以实验验证一下。

import _ "image/jpeg"这样的引用方式，很明显是在进行初始化操作。

在image/jpeg, image/png和image/gif等包中的reader.go文件中定义了init函数进行初始化操作。

image/jpeg包的init函数的实现分别是：

func init() {

image.RegisterFormat("jpeg", "\xff\xd8", Decode, DecodeConfig)

}

image.RegisterFormat定义在image包中，

// RegisterFormat registers an image format for use by Decode.

// Name is the name of the format, like "jpeg" or "png".

// Magic is the magic prefix that identifies the format's encoding. The magic

// string can contain "?" wildcards that each match any one byte.

// Decode is the function that decodes the encoded image.

// DecodeConfig is the function that decodes just its configuration.

func RegisterFormat(name, magic string, decode func(io.Reader) (Image, error), decodeConfig func(io.Reader) (Config, error)) {

formatsMu.Lock()

formats, _ := atomicFormats.Load().([]format)

atomicFormats.Store(append(formats, format{name, magic, decode, decodeConfig}))

formatsMu.Unlock()

}

RegisterFormat将一种图片格式注册以进行解码。其中name参数是格式的名字，如"jpeg"，Magic是改种格式图片的二进制标记前缀，对于jpeg格式的图片其文件开头必为"\xff\xd8"。Decode，和DecodeConfig是解码图片和解码图片config的两个函数，不同的图片有不同的实现。

image.RegisterFormat加上互斥锁后，以原子操作的方式将格式名，文件二进制特殊前缀，解码方法加载到formats切片中。

实际上，image包中定义的format结构体如下：

// A format holds an image format's name, magic header and how to decode it.

type format struct {

name, magic string

decode func(io.Reader) (Image, error)

decodeConfig func(io.Reader) (Config, error)

}

与初始化所需元素相同。

图片解码与图片设置解码

// Decode decodes an image that has been encoded in a registered format.

// The string returned is the format name used during format registration.

// Format registration is typically done by an init function in the codec-

// specific package.

func Decode(r io.Reader) (Image, string, error) {

rr := asReader(r)

f := sniff(rr)

if f.decode == nil {

return nil, "", ErrFormat

}

m, err := f.decode(rr)

return m, f.name, err

}

// DecodeConfig decodes the color model and dimensions of an image that has

// been encoded in a registered format. The string returned is the format name

// used during format registration. Format registration is typically done by

// an init function in the codec-specific package.

func DecodeConfig(r io.Reader) (Config, string, error) {

rr := asReader(r)

f := sniff(rr)

if f.decodeConfig == nil {

return Config{}, "", ErrFormat

}

c, err := f.decodeConfig(rr)

return c, f.name, err

}

读取图片后用image.reader来包装一下原始的io.reader，image.reader就是在io.reader的基础上新增了Peak函数以方便地读取图片文件的头部部分字节。来一起看一下image.reader的定义，以及asReader的实现：

// A reader is an io.Reader that can also peek ahead.

type reader interface {

io.Reader

Peek(int) ([]byte, error)

}

// asReader converts an io.Reader to a reader.

func asReader(r io.Reader) reader {

if rr, ok := r.(reader); ok {

return rr

}

return bufio.NewReader(r)

}

获得image.reader后，调用sniff函数。sniff函数是识别格式的关键，其识别格式的原理即是通过检查图片文件的头部字节来实现，我们通过阅读代码来验证一下。

// Sniff determines the format of r's data.

func sniff(r reader) format {

formats, _ := atomicFormats.Load().([]format)

for _, f := range formats {

b, err := r.Peek(len(f.magic))

if err == nil && match(f.magic, b) {

return f

}

return format{}

}

可以看出，图片是遍历所有注册的formats，并对比format的magic字段。从这里也可以看出，如果没有将相应格式注册，是无法正确识别对应图片的格式的。

格式举例：JPEG

大致了解了Golang解码图片的方法后，我们以JPEG图片为例来深入了解一下图片的Decode和DecodeConfig的具体实现。

直接看image/jpeg的Decode和DecodeConfig

// Decode reads a JPEG image from r and returns it as an image.Image.

func Decode(r io.Reader) (image.Image, error) {

var d decoder

return d.decode(r, false)

}

// DecodeConfig returns the color model and dimensions of a JPEG image without

// decoding the entire image.

func DecodeConfig(r io.Reader) (image.Config, error) {

var d decoder

if _, err := d.decode(r, true); err != nil {

return image.Config{}, err

}

switch d.nComp {

case 1:

return image.Config{

ColorModel: color.GrayModel,

Width: d.width,

Height: d.height,

}, nil

case 3:

cm := color.YCbCrModel

if d.isRGB() {

cm = color.RGBAModel

}

return image.Config{

ColorModel: cm,

Width: d.width,

Height: d.height,

}, nil

case 4:

return image.Config{

ColorModel: color.CMYKModel,

Width: d.width,

Height: d.height,

}, nil

}

return image.Config{}, FormatError("missing SOF marker")

}

可以看到Decode和DecodeConfig的实现关键都是decoder，Decode和DecodeConfig函数的不同在于调用decoder的decode方法传入的第二个参数configonly(bool)的值，decoder的定义如下：

type decoder struct {

r io.Reader

bits bits

// bytes is a byte buffer, similar to a bufio.Reader, except that it

// has to be able to unread more than 1 byte, due to byte stuffing.

// Byte stuffing is specified in section F.1.2.3.

bytes struct {

// buf[i:j] are the buffered bytes read from the underlying

// io.Reader that haven't yet been passed further on.

buf [4096]byte

i, j int

// nUnreadable is the number of bytes to back up i after

// overshooting. It can be 0, 1 or 2.

nUnreadable int

}

width, height int

img1 *image.Gray

img3 *image.YCbCr

blackPix []byte

blackStride int

ri int // Restart Interval.

nComp int

// As per section 4.5, there are four modes of operation (selected by the

// SOF? markers): sequential DCT, progressive DCT, lossless and

// hierarchical, although this implementation does not support the latter

// two non-DCT modes. Sequential DCT is further split into baseline and

// extended, as per section 4.11.

baseline bool

progressive bool

jfif bool

adobeTransformValid bool

adobeTransform uint8

eobRun uint16 // End-of-Band run, specified in section G.1.2.2.

comp [maxComponents]component

progCoeffs [maxComponents][]block // Saved state between progressive-mode scans.

huff [maxTc + 1][maxTh + 1]huffman

quant [maxTq + 1]block // Quantization tables, in zig-zag order.

tmp [2 * blockSize]byte

}

这个结构体域较多，可能一时难以全部接受，我们先跳过它。

decoder的decode方法实现就成了关键

// decode reads a JPEG image from r and returns it as an image.Image.

func (d *decoder) decode(r io.Reader, configOnly bool) (image.Image, error) {

d.r = r

// Check for the Start Of Image marker.

if err := d.readFull(d.tmp[:2]); err != nil {

return nil, err

}

if d.tmp[0] != 0xff || d.tmp[1] != soiMarker {

return nil, FormatError("missing SOI marker")

}

// Process the remaining segments until the End Of Image marker.

for {

err := d.readFull(d.tmp[:2])

if err != nil {

return nil, err

}

for d.tmp[0] != 0xff {

// Strictly speaking, this is a format error. However, libjpeg is

// liberal in what it accepts. As of version 9, next_marker in

// jdmarker.c treats this as a warning (JWRN_EXTRANEOUS_DATA) and

// continues to decode the stream. Even before next_marker sees

// extraneous data, jpeg_fill_bit_buffer in jdhuff.c reads as many

// bytes as it can, possibly past the end of a scan's data. It

// effectively puts back any markers that it overscanned (e.g. an

// "\xff\xd9" EOI marker), but it does not put back non-marker data,

// and thus it can silently ignore a small number of extraneous

// non-marker bytes before next_marker has a chance to see them (and

// print a warning).

//

// We are therefore also liberal in what we accept. Extraneous data

// is silently ignored.

//

// This is similar to, but not exactly the same as, the restart

// mechanism within a scan (the RST[0-7] markers).

//

// Note that extraneous 0xff bytes in e.g. SOS data are escaped as

// "\xff\x00", and so are detected a little further down below.

d.tmp[0] = d.tmp[1]

d.tmp[1], err = d.readByte()

if err != nil {

return nil, err

}

marker := d.tmp[1]

if marker == 0 {

// Treat "\xff\x00" as extraneous data.

continue

}

for marker == 0xff {

// Section B.1.1.2 says, "Any marker may optionally be preceded by any

// number of fill bytes, which are bytes assigned code X'FF'".

marker, err = d.readByte()

if err != nil {

return nil, err

}

if marker == eoiMarker { // End Of Image.

break

}

if rst0Marker <= marker && marker <= rst7Marker {

// Figures B.2 and B.16 of the specification suggest that restart markers should

// only occur between Entropy Coded Segments and not after the final ECS.

// However, some encoders may generate incorrect JPEGs with a final restart

// marker. That restart marker will be seen here instead of inside the processSOS

// method, and is ignored as a harmless error. Restart markers have no extra data,

// so we check for this before we read the 16-bit length of the segment.

continue

}

// Read the 16-bit length of the segment. The value includes the 2 bytes for the

// length itself, so we subtract 2 to get the number of remaining bytes.

if err = d.readFull(d.tmp[:2]); err != nil {

return nil, err

}

n := int(d.tmp[0])<<8 + int(d.tmp[1]) - 2

if n < 0 {

return nil, FormatError("short segment length")

}

switch marker {

case sof0Marker, sof1Marker, sof2Marker:

d.baseline = marker == sof0Marker

d.progressive = marker == sof2Marker

err = d.processSOF(n)

if configOnly && d.jfif {

return nil, err

}

case dhtMarker:

if configOnly {

err = d.ignore(n)

} else {

err = d.processDHT(n)

}

case dqtMarker:

if configOnly {

err = d.ignore(n)

} else {

err = d.processDQT(n)

}

case sosMarker:

if configOnly {

return nil, nil

}

err = d.processSOS(n)

case driMarker:

if configOnly {

err = d.ignore(n)

} else {

err = d.processDRI(n)

}

case app0Marker:

err = d.processApp0Marker(n)

case app14Marker:

err = d.processApp14Marker(n)

default:

if app0Marker <= marker && marker <= app15Marker || marker == comMarker {

err = d.ignore(n)

} else if marker < 0xc0 { // See Table B.1 "Marker code assignments".

err = FormatError("unknown marker")

} else {

err = UnsupportedError("unknown marker")

}

if err != nil {

return nil, err

}

if d.progressive {

if err := d.reconstructProgressiveImage(); err != nil {

return nil, err

}

if d.img1 != nil {

return d.img1, nil

}

if d.img3 != nil {

if d.blackPix != nil {

return d.applyBlack()

} else if d.isRGB() {

return d.convertToRGB()

}

return d.img3, nil

}

return nil, FormatError("missing SOS marker")

}

在decode方法中，一直以decoder的readFull方法来读取文件字节，readFull方法的特殊之处在于其读取了io.Reader中的字节后可以回退读取位置(unread)。因为与图片关系不大，此处不展开。

decode方法在初始化了decoder的reader后，首先读取前两个字节，检查其文件开头两个字节是否是Magic： "\xff\xd8"。JPEG图片的第二个字节"\xd8"又成为soiMarker，soi是start of image的缩写，意思是图片的开始。

然后开始向后遍历，每次读入两个字节，直到其值为"\xff\xd9"，这是图片的结束标记，代码中命名为eoiMarker。每次读到值为"\xff"的字节时候，会开始检查此处是不是一个标记（Marker）。"\xff\xd0"~"\xff\xd7"是rst0Marker ～ rst7Marker。在解码过程中不对其进行解析。

除此之外，对于"\xff\x00"也不处理。

对于其他Marker，如sof0Marker，sof1Marker，sof2Marker，dhtMarker，dqtMarker，sosMarker，driMarker，app0Marker，app14Marker等标记，则需要进行解析。当decode方法传入的参数configOnly值为true时，以dhtMarker，dqtMarker，sosMarker，driMarker等marker开始的字节则不读取。JPEG格式的图片是分段存储 Marker + Compress Data, 段的结构如下：

所以代码每次遇到一个marker（两个字节）后，后面紧跟着读取此段的长度（两个字节），段的长度为n := int(d.tmp[0])<<8 + int(d.tmp[1]) - 2。

这些段的意思见下表。

JPEG图片的宽高信息存储在SOF0段中。

我们来看一下处理SOF的代码

// Specified in section B.2.2.

func (d *decoder) processSOF(n int) error {

if d.nComp != 0 {

return FormatError("multiple SOF markers")

}

switch n {

case 6 + 3*1: // Grayscale image.

d.nComp = 1

case 6 + 3*3: // YCbCr or RGB image.

d.nComp = 3

case 6 + 3*4: // YCbCrK or CMYK image.

d.nComp = 4

default:

return UnsupportedError("number of components")

}

if err := d.readFull(d.tmp[:n]); err != nil {

return err

}

// We only support 8-bit precision.

if d.tmp[0] != 8 {

return UnsupportedError("precision")

}

d.height = int(d.tmp[1])<<8 + int(d.tmp[2])

d.width = int(d.tmp[3])<<8 + int(d.tmp[4])

if int(d.tmp[5]) != d.nComp {

return FormatError("SOF has wrong length")

}

for i := 0; i < d.nComp; i++ {

d.comp[i].c = d.tmp[6+3*i]

// Section B.2.2 states that "the value of C_i shall be different from

// the values of C_1 through C_(i-1)".

for j := 0; j < i; j++ {

if d.comp[i].c == d.comp[j].c {

return FormatError("repeated component identifier")

}

d.comp[i].tq = d.tmp[8+3*i]

if d.comp[i].tq > maxTq {

return FormatError("bad Tq value")

}

hv := d.tmp[7+3*i]

h, v := int(hv>>4), int(hv&0x0f)

if h < 1 || 4 < h || v < 1 || 4 < v {

return FormatError("luma/chroma subsampling ratio")

}

if h == 3 || v == 3 {

return errUnsupportedSubsamplingRatio

}

switch d.nComp {

case 1:

// If a JPEG image has only one component, section A.2 says "this data

// is non-interleaved by definition" and section A.2.2 says "[in this

// case...] the order of data units within a scan shall be left-to-right

// and top-to-bottom... regardless of the values of H_1 and V_1". Section

// 4.8.2 also says "[for non-interleaved data], the MCU is defined to be

// one data unit". Similarly, section A.1.1 explains that it is the ratio

// of H_i to max_j(H_j) that matters, and similarly for V. For grayscale

// images, H_1 is the maximum H_j for all components j, so that ratio is

// always 1. The component's (h, v) is effectively always (1, 1): even if

// the nominal (h, v) is (2, 1), a 20x5 image is encoded in three 8x8

// MCUs, not two 16x8 MCUs.

h, v = 1, 1

case 3:

// For YCbCr images, we only support 4:4:4, 4:4:0, 4:2:2, 4:2:0,

// 4:1:1 or 4:1:0 chroma subsampling ratios. This implies that the

// (h, v) values for the Y component are either (1, 1), (1, 2),

// (2, 1), (2, 2), (4, 1) or (4, 2), and the Y component's values

// must be a multiple of the Cb and Cr component's values. We also

// assume that the two chroma components have the same subsampling

// ratio.

switch i {

case 0: // Y.

// We have already verified, above, that h and v are both

// either 1, 2 or 4, so invalid (h, v) combinations are those

// with v == 4.

if v == 4 {

return errUnsupportedSubsamplingRatio

}

case 1: // Cb.

if d.comp[0].h%h != 0 || d.comp[0].v%v != 0 {

return errUnsupportedSubsamplingRatio

}

case 2: // Cr.

if d.comp[1].h != h || d.comp[1].v != v {

return errUnsupportedSubsamplingRatio

}

case 4:

// For 4-component images (either CMYK or YCbCrK), we only support two

// hv vectors: [0x11 0x11 0x11 0x11] and [0x22 0x11 0x11 0x22].

// Theoretically, 4-component JPEG images could mix and match hv values

// but in practice, those two combinations are the only ones in use,

// and it simplifies the applyBlack code below if we can assume that:

// - for CMYK, the C and K channels have full samples, and if the M

// and Y channels subsample, they subsample both horizontally and

// vertically.

// - for YCbCrK, the Y and K channels have full samples.

switch i {

case 0:

if hv != 0x11 && hv != 0x22 {

return errUnsupportedSubsamplingRatio

}

case 1, 2:

if hv != 0x11 {

return errUnsupportedSubsamplingRatio

}

case 3:

if d.comp[0].h != h || d.comp[0].v != v {

return errUnsupportedSubsamplingRatio

}

d.comp[i].h = h

d.comp[i].v = v

}

return nil

}

入参n是SOF段的长度，根据段的长度能够判断图片的色彩模式。

一次将长度为n的数据全部读入以进行分析。

代码里规定SOF段第一个字节必为8（受Golang的官方包处理能力限制），代表只能处理精度为8比特的图片。

然后紧邻的两个字节即是图片的高度，

d.height = int(d.tmp[1])<<8 + int(d.tmp[2])

紧接着的两个字节即是图片的宽度，

d.width = int(d.tmp[3])<<8 + int(d.tmp[4])。

起点菜鸟

公告