Golang Image
前言
本文对golang中image包的用法和底层实现进行了总结和探究。包含内容如下: 1. 第一部分,介绍image的基本使用。 2. 第二部分,结合golang中image的源码和图片文件相关基础知识,对golang image的底层机制进行了探究和分析。
Golang Image包的基本使用
Golang官方文档(https://golang.org/pkg/image/)对Image包的介绍为:Package image implements a basic 2-D image library.
The fundamental interface is called Image. An Image contains colors, which are described in the image/color package.
使用举例
包的介绍文档开头说到:Values of the Image interface are created either by calling functions such as NewRGBA and NewPaletted, or by calling Decode on an io.Reader containing image data in a format such as GIF, JPEG or PNG. Decoding any particular image format requires the prior registration of a decoder function. Registration is typically automatic as a side effect of initializing that format's package so that, to decode a PNG image, it suffices to have
import _ "image/png"
翻译一下就是可以通过NewRGBA,NewPaletted或者对GIF/JPEG/PNG文件(需要引入对应包完成注册)使用io.Reader读入后用Decode方法处理可以获得一个Image接口实例。
图片解码
通过image包下的decode_example_test可以看一下使用方法
func Example_decodeConfig() {
reader := base64.NewDecoder(base64.StdEncoding, strings.NewReader(data))
config, format, err := image.DecodeConfig(reader)
if err != nil {
log.Fatal(err)
}
fmt.Println("Width:", config.Width, "Height:", config.Height, "Format:", format)
}
func Example() {
// Decode the JPEG data. If reading from file, create a reader with
//
// reader, err := os.Open("testdata/video-001.q50.420.jpeg")
// if err != nil {
// log.Fatal(err)
// }
// defer reader.Close()
reader := base64.NewDecoder(base64.StdEncoding, strings.NewReader(data))
m, _, err := image.Decode(reader)
if err != nil {
log.Fatal(err)
}
bounds := m.Bounds()
// Calculate a 16-bin histogram for m's red, green, blue and alpha components.
//
// An image's bounds do not necessarily start at (0, 0), so the two loops start
// at bounds.Min.Y and bounds.Min.X. Looping over Y first and X second is more
// likely to result in better memory access patterns than X first and Y second.
var histogram [16][4]int
for y := bounds.Min.Y; y < bounds.Max.Y; y++ {
for x := bounds.Min.X; x < bounds.Max.X; x++ {
r, g, b, a := m.At(x, y).RGBA()
// A color's RGBA method returns values in the range [0, 65535].
// Shifting by 12 reduces this to the range [0, 15].
histogram[r>>12][0]++
histogram[g>>12][1]++
histogram[b>>12][2]++
histogram[a>>12][3]++
}
}
// Print the results.
fmt.Printf("%-14s %6s %6s %6s %6s\n", "bin", "red", "green", "blue", "alpha")
for i, x := range histogram {
fmt.Printf("0x%04x-0x%04x: %6d %6d %6d %6d\n", i<<12, (i+1)<<12-1, x[0], x[1], x[2], x[3])
}
// Output:
// bin red green blue alpha
// 0x0000-0x0fff: 364 790 7242 0
// 0x1000-0x1fff: 645 2967 1039 0
// 0x2000-0x2fff: 1072 2299 979 0
// 0x3000-0x3fff: 820 2266 980 0
// 0x4000-0x4fff: 537 1305 541 0
// 0x5000-0x5fff: 319 962 261 0
// 0x6000-0x6fff: 322 375 177 0
// 0x7000-0x7fff: 601 279 214 0
// 0x8000-0x8fff: 3478 227 273 0
// 0x9000-0x9fff: 2260 234 329 0
// 0xa000-0xafff: 921 282 373 0
// 0xb000-0xbfff: 321 335 397 0
// 0xc000-0xcfff: 229 388 298 0
// 0xd000-0xdfff: 260 414 277 0
// 0xe000-0xefff: 516 428 298 0
// 0xf000-0xffff: 2785 1899 1772 15450
}
const data = `
/9j/4AAQSkZJRgABAQIAHAAcAAD/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdA
SFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2Nj
Y2NjY2NjY2Nj ....`
这段代码包含两个函数,Example_decodeConfig和Example。
Example_decodeConfig展示了调用image.DecodeConfig得到了图片的宽、高以及格式。传入的参数reader通过base64.NewDecoder(base64.StdEncoding, strings.NewReader(data))得到。其实就是把末尾的常量data通过strings.NewReader包装成io.reader并通过base64解码为二进制的io.reader。base64是一种算法,可将字节流转换为可打印字符流,想详细了解base64可以参考http://www.sunshine2k.de/articles/coding/base64/understanding_base64.html。
Example函数在使用与Example_decodeConfig中相同方法获得io.reader后,使用image.Deocde函数对其解码,获得了一个image.Image接口的实例。通过image.Image获得了图片的整个矩形范围,并遍历所有像素,统计色彩信息RGBA(后续会详述)并绘制了一个16-bin直方图。
Image接口
image/image.go中定义了Image接口
// Image is a finite rectangular grid of color.Color values taken from a color
// model.
type Image interface {
// ColorModel returns the Image's color model.
ColorModel() color.Model
// Bounds returns the domain for which At can return non-zero color.
// The bounds do not necessarily contain the point (0, 0).
Bounds() Rectangle
// At returns the color of the pixel at (x, y).
// At(Bounds().Min.X, Bounds().Min.Y) returns the upper-left pixel of the grid.
// At(Bounds().Max.X-1, Bounds().Max.Y-1) returns the lower-right one.
At(x, y int) color.Color
}
该接口定义了三个方法,ColorModel, Bounds和At。
ColorModel方法返回的color.Model定义在color包中,也是一个接口,其定义在image/color/color.go:
// Model can convert any Color to one from its own color model. The conversion
// may be lossy.
type Model interface {
Convert(c Color) Color
}
Bounds方法返回一个矩形,image包中是一个Rectangle(定义在image/geom.go)通过左上方和右下方的两个Point来确定。
// A Rectangle contains the points with Min.X <= X < Max.X, Min.Y <= Y < Max.Y.
// It is well-formed if Min.X <= Max.X and likewise for Y. Points are always
// well-formed. A rectangle's methods always return well-formed outputs for
// well-formed inputs.
//
// A Rectangle is also an Image whose bounds are the rectangle itself. At
// returns color.Opaque for points in the rectangle and color.Transparent
// otherwise.
type Rectangle struct {
Min, Max Point
}
// A Point is an X, Y coordinate pair. The axes increase right and down.
type Point struct {
X, Y int
}
At方法入参数为坐标,返回一个颜色接口变量。
// Color can convert itself to alpha-premultiplied 16-bits per channel RGBA.
// The conversion may be lossy.
type Color interface {
// RGBA returns the alpha-premultiplied red, green, blue and alpha values
// for the color. Each value ranges within [0, 0xffff], but is represented
// by a uint32 so that multiplying by a blend factor up to 0xffff will not
// overflow.
//
// An alpha-premultiplied color component c has been scaled by alpha (a),
// so has valid values 0 <= c <= a.
RGBA() (r, g, b, a uint32)
}
Color接口的方法是RGBA,RGB的意思都很明显,而alpha值,则是图片的透明度(不透明性)。
RGB是一种常见的色彩模型,常见的色彩模型有RGB、CMYK和YUV。
色彩模型
RGB
RGB色彩空间或RGB色彩系统可根据红色R,绿色G和蓝色B的组合构造所有颜色。RGB即是代表Red红、Green绿、Blue蓝三个通道的颜色,这个标准几乎包括了人类视力所能感知的所有颜色,是运用最广泛的色彩系统之一。
红、绿、蓝也被成为三原色。这是加色系。就是光源只含有特定的波段,本身就是色光,将不同颜色的光加在一起形成新的颜色。典型的例子是显示屏。https://www.zhihu.com/question/25074051?sort=created
|
电脑屏幕上的所有颜色,都由这红色绿色蓝色三种色光按照不同的比例混合而成的。一组红色绿色蓝色就是一个最小的显示单位。屏幕上的任何一个颜色都可以由一组RGB值来记录和表达。在电脑中,RGB的所谓“多少”就是指亮度,并使用整数来表示。通常情况下,RGB各有256级亮度,用数字表示为从0、1、2...直到255。注意虽然数字最高是255,但0也是数值之一,因此共256级。
RGBA是代表Red(红色)Green(绿色)Blue(蓝色)和Alpha的色彩控。虽然它有的时候被描述为一个颜色空间,但是它其实仅仅是RGB模型的附加了额外的信息。采用的颜色是RGB,可以属于任何一种RGB色彩空间。
alpha通道一般用作不透明度参数。如果一个像素的alpha通道数值为0%,那它就是完全透明的(也就是看不见的),而数值为100%则意味着一个完全不透明的像素(传统的数字图像)。在0%和100%之间的值则使得像素可以透过背景显示出来,就像透过玻璃(半透明性),这种效果是简单的二元透明性(透明或不透明)做不到的。它使图片合成变得容易。alpha通道值可以用百分比、整数或者像RGB参数那样用0到1的实数表示。
Golang用uint32来表示每个通道,实际上值的范围是[0, 0xffff],只用到了16位,剩下的16位适用于保证乘上透明度后alpha后不会溢出。
*透明度alpha和预乘透明度premultiplied-alpha
从概念上讲,alpha混合用于传达一个面的透明度。通常,消费者应用程序(游戏)倾向于使用RGB来传达底层表面的颜色,依靠alpha通道来指示该颜色的“不透明性”。
具体地说,在管道中启用Alpha混合时,开发人员倾向于使用以下形式进行混合:
透明通道在渲染的时候通过 Alpha Blending 产生作用,如果一个透明度为 as 的颜色 Cs 渲染到颜色 Cd 上,混合后的颜色通过以下公式计算:
Co = asCs + (1 − as)Cd
在旧的固定功能中,这称为“ SourceAlpha,InvSourceAlpha”;也称为“后乘Alpha”。
例如两个相邻像素(通道均为8位):(255, 0, 0,153)红色和白色(255, 255, 255 ,0)要进行混合,把红色的渲染到白色,
Co = (255, 0, 0) * 0.6 + (255, 255, 255) *(1 − 0.6) = (255, 102, 102)
但是,这种形式的alpha混合存在一个严重缺陷:在许多情况下,它会导致颜色错误!(这部分的详细解释参见https://developer.nvidia.com/content/alpha-blending-pre-or-not-pre 和https://www.cnblogs.com/dins/p/premultiplied-alpha.html)
如果颜色以 Premultiplied Alpha 形式存储,Premultiplied Alpha 是把 RGB 通道乘以透明度也就是(r * a, g * a, b * a, a),也就是 Cs 已经乘以透明度了,上面的60%透明度的红色就表示成了(255*0.6, 0, 0,0, 153)。所以混合公式变成了:
Co = Cs′ + (1 − as)Cd
CMYK
CMYK(印刷四色模式)是彩色印刷时采用的一种套色模式,利用色料的三原色混色原理,加上黑色油墨,共计四种颜色混合叠加,形成所谓“全彩印刷”。四种标准颜色是:C:Cyan = 青色,又称为‘天蓝色’或是‘湛蓝’M:Magenta = 品红色,又称为‘洋红色’;Y:Yellow = 黄色;K:blacK=黑色,CMYK模式是减色模式。
加入黑色的原因
青、品红、黄三色也是印刷三原色,三种颜色理论上可以混合出黑色,但是现实中由于生产技术的限制,油墨纯度往往不尽人意,混合出的黑色不够浓郁,只能依靠提纯的黑色加以混合。另外也可以节省油墨消耗,一副图中的黑色部分,如果在没有黑色油墨的情况下是由等量的CMY混合成黑色的,如果有黑色则可以直接使用黑色减少油墨的使用量。
YUV
YUV,是一种颜色编码方法。常使用在各个视频处理组件中。 YUV在对照片或视频编码时,考虑到人类的感知能力,允许降低色度的带宽。
YUV是编译true-color颜色空间(color space)的种类,Y'UV, YUV, YCbCr,YPbPr等专有名词都可以称为YUV,彼此有重叠。“Y”表示明亮度(Luminance或Luma),也就是灰阶值,“U”和“V”表示的则是色度(Chrominance或Chroma),作用是描述影像色彩及饱和度,用于指定像素的颜色。
Y′UV,YUV,YCbCr,YPbPr所指涉的范围,常有混淆或重叠的情况。从历史的演变来说,其中YUV和Y'UV通常用来编码电视的模拟信号,而YCbCr则是用来描述数字的视频信号,适合视频与图片压缩以及传输,例如MPEG、JPEG。但在现今,YUV通常已经在电脑系统上广泛使用。
在image包中有ycbcr.go文件,其中将YCbCr定义为一个结构体
type YCbCr struct {
Y, Cb, Cr []uint8
YStride int
CStride int
SubsampleRatio YCbCrSubsampleRatio
Rect Rectangle
}
// YCbCr is an in-memory image of Y'CbCr colors. There is one Y sample per
// pixel, but each Cb and Cr sample can span one or more pixels.
// YStride is the Y slice index delta between vertically adjacent pixels.
// CStride is the Cb and Cr slice index delta between vertically adjacent pixels
// that map to separate chroma samples.
// It is not an absolute requirement, but YStride and len(Y) are typically
// multiples of 8, and:
// For 4:4:4, CStride == YStride/1 && len(Cb) == len(Cr) == len(Y)/1.
// For 4:2:2, CStride == YStride/2 && len(Cb) == len(Cr) == len(Y)/2.
// For 4:2:0, CStride == YStride/2 && len(Cb) == len(Cr) == len(Y)/4.
// For 4:4:0, CStride == YStride/1 && len(Cb) == len(Cr) == len(Y)/2.
// For 4:1:1, CStride == YStride/4 && len(Cb) == len(Cr) == len(Y)/4.
// For 4:1:0, CStride == YStride/4 && len(Cb) == len(Cr) == len(Y)/8.
代码下的注释讲了6种类降采样方式,分别是4:4:4,4:2:2,4:2:0,4:4:0,4:1:1,4:1:0。可结合下图进行理解。https://zhuanlan.zhihu.com/p/85620611 http://dougkerr.net/Pumpkin/articles/Subsampling.pdf
对每一个像素均需对亮度的值Y进行采样,但是Cb和Cr的采样则是每一个或多个像素进行一次采样。
如上图中所示,左侧一列,每一个小矩形是图形像素表示,黑框矩形是色度像素表示,小黑点是表示色度像素值(Cb+Cr),表示图形像素和色度像素在水平和垂直方向的比例关系。比如,
4:4:0 水平方向是1/1,垂直方向是1/2,表示一个色度像素对应了两个图形像素。
4:2:2 水平方向是1/2,垂直方向是1/1,表示一个色度像素对应了两个图形像素。
4:2:0 水平方向是1/2,垂直方向是1/2,表示一个色度像素对应了四个图形像素。
右侧一列是二次采样模式记号表示, 是 J:a:b 模式,实心黑色圆圈表示包含色度像素(Cb+Cr),空心圆圈表示不包含色度像素。对于 J:a:b 模式,主要是围绕参考块的概念定义的,这个参考块是一个 J x 2 的矩形,J 通常是 4。这样,此参考块就是宽度有 4 个像素、高度有 2 个像素的矩形。a 表示参考块的第一行包含的色度像素样本数,b 表示在参考块的第二行包含的色度像素样本数。
4:4:0 参考块第一行包含四个色度样本,第二行没有包含色度样本。
4:2:2 参考块第一行包含两个色度样本,第二行也包含两个色度样本,他们是交替出现。
4:2:0 参考块第一行包含两个色度样本,第二行没有包含色度样本。
现在我们发现 yuv444,yuv422,yuv420 yuv 等像素格式的本质是:每个图形像素都会包含亮度值,但是某几个图形像素会共用一个色度值,这个比例关系就是通过 4 x 2 的矩形参考块来定的。这样很容易理解类似 yuv440,yuv420 这样的格式了。
YCbCr结构体中用YCbCrSubsampleRatio类型的枚举值SubsampleRatio来表示降采样的方式,与之对应的还有YStride和CStride两个int型的值来表示,其对应关系见注释。
图片格式
看一段通过image包来读取文件夹下的所有图片的宽高的代码
import (
"fmt"
"image"
"io/ioutil"
"os"
"strconv"
"strings"
)
import _ "image/jpeg"
import _ "image/gif"
import _ "image/png"
func main() {
// 获取当前文件夹路径
dir,_ := os.Getwd()
// 读取当前文件夹的所有文件及文件夹
files,_ := ioutil.ReadDir(dir)
var imgs []os.FileInfo
for _,f := range files{
// 过滤,只找到JPG图片
ok := strings.HasSuffix(f.Name(), ".jpg")
if ok {
imgs = append(imgs,f)
}
}
count := strconv.Itoa(len(imgs))
fmt.Println("共计找到"+count+"张照片")
for _,img := range imgs{
imgName := img.Name()
imgRes,_ := os.Open(imgName)
imgFile,_,err := image.DecodeConfig(imgRes)
if err != nil{
fmt.Println(err.Error())
}
width := strconv.Itoa(imgFile.Width)
height := strconv.Itoa(imgFile.Height)
fmt.Println("图片["+imgName+"] 宽:"+width+"像素:高:"+height+"像素")
}
fmt.Println("输入任意键关闭窗口")
var a string
fmt.Scan(&a)
}
通过image.DecodeConfig函数可以获得image.Config结构体,其定义为
// Config holds an image's color model and dimensions.
type Config struct {
ColorModel color.Model
Width, Height int
}
在引用包的时候除了引用了包含DecodeConfig函数的image包以外,还引入了image/jpeg, image/gif以及image/png。如果不引用这三个包将无法解析对应格式的图片,大家可以实验验证一下。
import _ "image/jpeg"这样的引用方式,很明显是在进行初始化操作。
在image/jpeg, image/png和image/gif等包中的reader.go文件中定义了init函数进行初始化操作。
image/jpeg包的init函数的实现分别是:
func init() {
image.RegisterFormat("jpeg", "\xff\xd8", Decode, DecodeConfig)
}
image.RegisterFormat定义在image包中,
// RegisterFormat registers an image format for use by Decode.
// Name is the name of the format, like "jpeg" or "png".
// Magic is the magic prefix that identifies the format's encoding. The magic
// string can contain "?" wildcards that each match any one byte.
// Decode is the function that decodes the encoded image.
// DecodeConfig is the function that decodes just its configuration.
func RegisterFormat(name, magic string, decode func(io.Reader) (Image, error), decodeConfig func(io.Reader) (Config, error)) {
formatsMu.Lock()
formats, _ := atomicFormats.Load().([]format)
atomicFormats.Store(append(formats, format{name, magic, decode, decodeConfig}))
formatsMu.Unlock()
}
RegisterFormat将一种图片格式注册以进行解码。其中name参数是格式的名字,如"jpeg",Magic是改种格式图片的二进制标记前缀,对于jpeg格式的图片其文件开头必为"\xff\xd8"。Decode,和DecodeConfig是解码图片和解码图片config的两个函数,不同的图片有不同的实现。
image.RegisterFormat加上互斥锁后,以原子操作的方式将格式名,文件二进制特殊前缀,解码方法加载到formats切片中。
实际上,image包中定义的format结构体如下:
// A format holds an image format's name, magic header and how to decode it.
type format struct {
name, magic string
decode func(io.Reader) (Image, error)
decodeConfig func(io.Reader) (Config, error)
}
与初始化所需元素相同。
图片解码与图片设置解码
// Decode decodes an image that has been encoded in a registered format.
// The string returned is the format name used during format registration.
// Format registration is typically done by an init function in the codec-
// specific package.
func Decode(r io.Reader) (Image, string, error) {
rr := asReader(r)
f := sniff(rr)
if f.decode == nil {
return nil, "", ErrFormat
}
m, err := f.decode(rr)
return m, f.name, err
}
// DecodeConfig decodes the color model and dimensions of an image that has
// been encoded in a registered format. The string returned is the format name
// used during format registration. Format registration is typically done by
// an init function in the codec-specific package.
func DecodeConfig(r io.Reader) (Config, string, error) {
rr := asReader(r)
f := sniff(rr)
if f.decodeConfig == nil {
return Config{}, "", ErrFormat
}
c, err := f.decodeConfig(rr)
return c, f.name, err
}
读取图片后用image.reader来包装一下原始的io.reader,image.reader就是在io.reader的基础上新增了Peak函数以方便地读取图片文件的头部部分字节。来一起看一下image.reader的定义,以及asReader的实现:
// A reader is an io.Reader that can also peek ahead.
type reader interface {
io.Reader
Peek(int) ([]byte, error)
}
// asReader converts an io.Reader to a reader.
func asReader(r io.Reader) reader {
if rr, ok := r.(reader); ok {
return rr
}
return bufio.NewReader(r)
}
获得image.reader后,调用sniff函数。sniff函数是识别格式的关键,其识别格式的原理即是通过检查图片文件的头部字节来实现,我们通过阅读代码来验证一下。
// Sniff determines the format of r's data.
func sniff(r reader) format {
formats, _ := atomicFormats.Load().([]format)
for _, f := range formats {
b, err := r.Peek(len(f.magic))
if err == nil && match(f.magic, b) {
return f
}
}
return format{}
}
可以看出,图片是遍历所有注册的formats,并对比format的magic字段。从这里也可以看出,如果没有将相应格式注册,是无法正确识别对应图片的格式的。
格式举例:JPEG
大致了解了Golang解码图片的方法后,我们以JPEG图片为例来深入了解一下图片的Decode和DecodeConfig的具体实现。
直接看image/jpeg的Decode和DecodeConfig
// Decode reads a JPEG image from r and returns it as an image.Image.
func Decode(r io.Reader) (image.Image, error) {
var d decoder
return d.decode(r, false)
}
// DecodeConfig returns the color model and dimensions of a JPEG image without
// decoding the entire image.
func DecodeConfig(r io.Reader) (image.Config, error) {
var d decoder
if _, err := d.decode(r, true); err != nil {
return image.Config{}, err
}
switch d.nComp {
case 1:
return image.Config{
ColorModel: color.GrayModel,
Width: d.width,
Height: d.height,
}, nil
case 3:
cm := color.YCbCrModel
if d.isRGB() {
cm = color.RGBAModel
}
return image.Config{
ColorModel: cm,
Width: d.width,
Height: d.height,
}, nil
case 4:
return image.Config{
ColorModel: color.CMYKModel,
Width: d.width,
Height: d.height,
}, nil
}
return image.Config{}, FormatError("missing SOF marker")
}
可以看到Decode和DecodeConfig的实现关键都是decoder,Decode和DecodeConfig函数的不同在于调用decoder的decode方法传入的第二个参数configonly(bool)的值,decoder的定义如下:
type decoder struct {
r io.Reader
bits bits
// bytes is a byte buffer, similar to a bufio.Reader, except that it
// has to be able to unread more than 1 byte, due to byte stuffing.
// Byte stuffing is specified in section F.1.2.3.
bytes struct {
// buf[i:j] are the buffered bytes read from the underlying
// io.Reader that haven't yet been passed further on.
buf [4096]byte
i, j int
// nUnreadable is the number of bytes to back up i after
// overshooting. It can be 0, 1 or 2.
nUnreadable int
}
width, height int
img1 *image.Gray
img3 *image.YCbCr
blackPix []byte
blackStride int
ri int // Restart Interval.
nComp int
// As per section 4.5, there are four modes of operation (selected by the
// SOF? markers): sequential DCT, progressive DCT, lossless and
// hierarchical, although this implementation does not support the latter
// two non-DCT modes. Sequential DCT is further split into baseline and
// extended, as per section 4.11.
baseline bool
progressive bool
jfif bool
adobeTransformValid bool
adobeTransform uint8
eobRun uint16 // End-of-Band run, specified in section G.1.2.2.
comp [maxComponents]component
progCoeffs [maxComponents][]block // Saved state between progressive-mode scans.
huff [maxTc + 1][maxTh + 1]huffman
quant [maxTq + 1]block // Quantization tables, in zig-zag order.
tmp [2 * blockSize]byte
}
这个结构体域较多,可能一时难以全部接受,我们先跳过它。
decoder的decode方法实现就成了关键
// decode reads a JPEG image from r and returns it as an image.Image.
func (d *decoder) decode(r io.Reader, configOnly bool) (image.Image, error) {
d.r = r
// Check for the Start Of Image marker.
if err := d.readFull(d.tmp[:2]); err != nil {
return nil, err
}
if d.tmp[0] != 0xff || d.tmp[1] != soiMarker {
return nil, FormatError("missing SOI marker")
}
// Process the remaining segments until the End Of Image marker.
for {
err := d.readFull(d.tmp[:2])
if err != nil {
return nil, err
}
for d.tmp[0] != 0xff {
// Strictly speaking, this is a format error. However, libjpeg is
// liberal in what it accepts. As of version 9, next_marker in
// jdmarker.c treats this as a warning (JWRN_EXTRANEOUS_DATA) and
// continues to decode the stream. Even before next_marker sees
// extraneous data, jpeg_fill_bit_buffer in jdhuff.c reads as many
// bytes as it can, possibly past the end of a scan's data. It
// effectively puts back any markers that it overscanned (e.g. an
// "\xff\xd9" EOI marker), but it does not put back non-marker data,
// and thus it can silently ignore a small number of extraneous
// non-marker bytes before next_marker has a chance to see them (and
// print a warning).
//
// We are therefore also liberal in what we accept. Extraneous data
// is silently ignored.
//
// This is similar to, but not exactly the same as, the restart
// mechanism within a scan (the RST[0-7] markers).
//
// Note that extraneous 0xff bytes in e.g. SOS data are escaped as
// "\xff\x00", and so are detected a little further down below.
d.tmp[0] = d.tmp[1]
d.tmp[1], err = d.readByte()
if err != nil {
return nil, err
}
}
marker := d.tmp[1]
if marker == 0 {
// Treat "\xff\x00" as extraneous data.
continue
}
for marker == 0xff {
// Section B.1.1.2 says, "Any marker may optionally be preceded by any
// number of fill bytes, which are bytes assigned code X'FF'".
marker, err = d.readByte()
if err != nil {
return nil, err
}
}
if marker == eoiMarker { // End Of Image.
break
}
if rst0Marker <= marker && marker <= rst7Marker {
// Figures B.2 and B.16 of the specification suggest that restart markers should
// only occur between Entropy Coded Segments and not after the final ECS.
// However, some encoders may generate incorrect JPEGs with a final restart
// marker. That restart marker will be seen here instead of inside the processSOS
// method, and is ignored as a harmless error. Restart markers have no extra data,
// so we check for this before we read the 16-bit length of the segment.
continue
}
// Read the 16-bit length of the segment. The value includes the 2 bytes for the
// length itself, so we subtract 2 to get the number of remaining bytes.
if err = d.readFull(d.tmp[:2]); err != nil {
return nil, err
}
n := int(d.tmp[0])<<8 + int(d.tmp[1]) - 2
if n < 0 {
return nil, FormatError("short segment length")
}
switch marker {
case sof0Marker, sof1Marker, sof2Marker:
d.baseline = marker == sof0Marker
d.progressive = marker == sof2Marker
err = d.processSOF(n)
if configOnly && d.jfif {
return nil, err
}
case dhtMarker:
if configOnly {
err = d.ignore(n)
} else {
err = d.processDHT(n)
}
case dqtMarker:
if configOnly {
err = d.ignore(n)
} else {
err = d.processDQT(n)
}
case sosMarker:
if configOnly {
return nil, nil
}
err = d.processSOS(n)
case driMarker:
if configOnly {
err = d.ignore(n)
} else {
err = d.processDRI(n)
}
case app0Marker:
err = d.processApp0Marker(n)
case app14Marker:
err = d.processApp14Marker(n)
default:
if app0Marker <= marker && marker <= app15Marker || marker == comMarker {
err = d.ignore(n)
} else if marker < 0xc0 { // See Table B.1 "Marker code assignments".
err = FormatError("unknown marker")
} else {
err = UnsupportedError("unknown marker")
}
}
if err != nil {
return nil, err
}
}
if d.progressive {
if err := d.reconstructProgressiveImage(); err != nil {
return nil, err
}
}
if d.img1 != nil {
return d.img1, nil
}
if d.img3 != nil {
if d.blackPix != nil {
return d.applyBlack()
} else if d.isRGB() {
return d.convertToRGB()
}
return d.img3, nil
}
return nil, FormatError("missing SOS marker")
}
在decode方法中,一直以decoder的readFull方法来读取文件字节,readFull方法的特殊之处在于其读取了io.Reader中的字节后可以回退读取位置(unread)。因为与图片关系不大,此处不展开。
decode方法在初始化了decoder的reader后,首先读取前两个字节,检查其文件开头两个字节是否是Magic: "\xff\xd8"。JPEG图片的第二个字节"\xd8"又成为soiMarker,soi是start of image的缩写,意思是图片的开始。
然后开始向后遍历,每次读入两个字节,直到其值为"\xff\xd9",这是图片的结束标记,代码中命名为eoiMarker。每次读到值为"\xff"的字节时候,会开始检查此处是不是一个标记(Marker)。"\xff\xd0"~"\xff\xd7"是rst0Marker ~ rst7Marker。在解码过程中不对其进行解析。
除此之外,对于"\xff\x00"也不处理。
对于其他Marker,如sof0Marker,sof1Marker,sof2Marker,dhtMarker,dqtMarker,sosMarker,driMarker,app0Marker,app14Marker等标记,则需要进行解析。当decode方法传入的参数configOnly值为true时,以dhtMarker,dqtMarker,sosMarker,driMarker等marker开始的字节则不读取。JPEG格式的图片是分段存储 Marker + Compress Data, 段的结构如下:
所以代码每次遇到一个marker(两个字节)后,后面紧跟着读取此段的长度(两个字节),段的长度为n := int(d.tmp[0])<<8 + int(d.tmp[1]) - 2。
这些段的意思见下表。
JPEG图片的宽高信息存储在SOF0段中。
我们来看一下处理SOF的代码
// Specified in section B.2.2.
func (d *decoder) processSOF(n int) error {
if d.nComp != 0 {
return FormatError("multiple SOF markers")
}
switch n {
case 6 + 3*1: // Grayscale image.
d.nComp = 1
case 6 + 3*3: // YCbCr or RGB image.
d.nComp = 3
case 6 + 3*4: // YCbCrK or CMYK image.
d.nComp = 4
default:
return UnsupportedError("number of components")
}
if err := d.readFull(d.tmp[:n]); err != nil {
return err
}
// We only support 8-bit precision.
if d.tmp[0] != 8 {
return UnsupportedError("precision")
}
d.height = int(d.tmp[1])<<8 + int(d.tmp[2])
d.width = int(d.tmp[3])<<8 + int(d.tmp[4])
if int(d.tmp[5]) != d.nComp {
return FormatError("SOF has wrong length")
}
for i := 0; i < d.nComp; i++ {
d.comp[i].c = d.tmp[6+3*i]
// Section B.2.2 states that "the value of C_i shall be different from
// the values of C_1 through C_(i-1)".
for j := 0; j < i; j++ {
if d.comp[i].c == d.comp[j].c {
return FormatError("repeated component identifier")
}
}
d.comp[i].tq = d.tmp[8+3*i]
if d.comp[i].tq > maxTq {
return FormatError("bad Tq value")
}
hv := d.tmp[7+3*i]
h, v := int(hv>>4), int(hv&0x0f)
if h < 1 || 4 < h || v < 1 || 4 < v {
return FormatError("luma/chroma subsampling ratio")
}
if h == 3 || v == 3 {
return errUnsupportedSubsamplingRatio
}
switch d.nComp {
case 1:
// If a JPEG image has only one component, section A.2 says "this data
// is non-interleaved by definition" and section A.2.2 says "[in this
// case...] the order of data units within a scan shall be left-to-right
// and top-to-bottom... regardless of the values of H_1 and V_1". Section
// 4.8.2 also says "[for non-interleaved data], the MCU is defined to be
// one data unit". Similarly, section A.1.1 explains that it is the ratio
// of H_i to max_j(H_j) that matters, and similarly for V. For grayscale
// images, H_1 is the maximum H_j for all components j, so that ratio is
// always 1. The component's (h, v) is effectively always (1, 1): even if
// the nominal (h, v) is (2, 1), a 20x5 image is encoded in three 8x8
// MCUs, not two 16x8 MCUs.
h, v = 1, 1
case 3:
// For YCbCr images, we only support 4:4:4, 4:4:0, 4:2:2, 4:2:0,
// 4:1:1 or 4:1:0 chroma subsampling ratios. This implies that the
// (h, v) values for the Y component are either (1, 1), (1, 2),
// (2, 1), (2, 2), (4, 1) or (4, 2), and the Y component's values
// must be a multiple of the Cb and Cr component's values. We also
// assume that the two chroma components have the same subsampling
// ratio.
switch i {
case 0: // Y.
// We have already verified, above, that h and v are both
// either 1, 2 or 4, so invalid (h, v) combinations are those
// with v == 4.
if v == 4 {
return errUnsupportedSubsamplingRatio
}
case 1: // Cb.
if d.comp[0].h%h != 0 || d.comp[0].v%v != 0 {
return errUnsupportedSubsamplingRatio
}
case 2: // Cr.
if d.comp[1].h != h || d.comp[1].v != v {
return errUnsupportedSubsamplingRatio
}
}
case 4:
// For 4-component images (either CMYK or YCbCrK), we only support two
// hv vectors: [0x11 0x11 0x11 0x11] and [0x22 0x11 0x11 0x22].
// Theoretically, 4-component JPEG images could mix and match hv values
// but in practice, those two combinations are the only ones in use,
// and it simplifies the applyBlack code below if we can assume that:
// - for CMYK, the C and K channels have full samples, and if the M
// and Y channels subsample, they subsample both horizontally and
// vertically.
// - for YCbCrK, the Y and K channels have full samples.
switch i {
case 0:
if hv != 0x11 && hv != 0x22 {
return errUnsupportedSubsamplingRatio
}
case 1, 2:
if hv != 0x11 {
return errUnsupportedSubsamplingRatio
}
case 3:
if d.comp[0].h != h || d.comp[0].v != v {
return errUnsupportedSubsamplingRatio
}
}
}
d.comp[i].h = h
d.comp[i].v = v
}
return nil
}
入参n是SOF段的长度,根据段的长度能够判断图片的色彩模式。
一次将长度为n的数据全部读入以进行分析。
代码里规定SOF段第一个字节必为8(受Golang的官方包处理能力限制),代表只能处理精度为8比特的图片。
然后紧邻的两个字节即是图片的高度,
d.height = int(d.tmp[1])<<8 + int(d.tmp[2])
紧接着的两个字节即是图片的宽度,
d.width = int(d.tmp[3])<<8 + int(d.tmp[4])。