R:ggplot2数据可视化——进阶(1)
,分为三个部分,此篇为Part1,推荐学习一些基础知识后阅读~
-
Part 1: Introduction to ggplot2, 覆盖构建简单图表并进行修饰的基础知识
-
Part 2: Customizing the Look and Feel, 更高级的自定义图形
-
Part 3: Top 50 Ggplot2 Visualizations - The Master List, 应用Part1、2部分知识创建进阶图形
1 理解ggplot语法
(1)对数据框类型数据进行可视化
(2)可以叠加层来不断丰富图形信息
让我们基于midwest数据集来初始化一个基本的图形:
# Setup options (scipen=999) # 关闭科学计数表示法 1e+06 library (ggplot2) data ( "midwest" , package = "ggplot2" ) # 加载数据 # midwest <- read.csv("http://goo.gl/G1K41K") # alt source # 初始化 Ggplot ggplot (midwest, aes (x=area, y=poptotal)) # area 和 poptotal 是'midwest'中的列 |
aes()函数用来专门指定x和y轴,源数据框的任何信息都需要在这个函数中特意指定。
2 线性模型拟合散点图
1 2 3 | library (ggplot2) g <- ggplot (midwest, aes (x=area, y=poptotal)) + geom_point () + geom_smooth (method= "lm" ) # set se=FALSE to turnoff confidence bands plot (g) |
?geom_smooth 查询该函数帮助文档
3 调整x y轴范围
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | #Method 1: By deleting the points outside the range library (ggplot2) g <- ggplot (midwest, aes (x=area, y=poptotal)) + geom_point () + geom_smooth (method= "lm" ) # set se=FALSE to turnoff confidence bands # Delete the points outside the limits g + xlim ( c (0, 0.1)) + ylim ( c (0, 1000000)) # deletes points #Method 2: Zooming In library (ggplot2) g <- ggplot (midwest, aes (x=area, y=poptotal)) + geom_point () + geom_smooth (method= "lm" ) # set se=FALSE to turnoff confidence bands # Zoom in without deleting the points outside the limits. # As a result, the line of best fit is the same as the original plot. g1 <- g + coord_cartesian (xlim= c (0,0.1), ylim= c (0, 1000000)) # zooms in plot (g1) |
4 改变标题
1 2 3 4 5 6 7 8 9 10 11 | # Full Plot call library (ggplot2) ggplot (midwest, aes (x=area, y=poptotal)) + geom_point () + geom_smooth (method= "lm" ) + coord_cartesian (xlim= c (0,0.1), ylim= c (0, 1000000)) + labs (title= "Area Vs Population" , subtitle= "From midwest dataset" , y= "Population" , x= "Area" , caption= "Midwest Demographics" ) # or g1 + ggtitle ( "Area Vs Population" , subtitle= "From midwest dataset" ) + xlab ( "Area" ) + ylab ( "Population" ) |
5 改变点的颜色和大小
1 2 3 4 5 6 | library (ggplot2) ggplot (midwest, aes (x=area, y=poptotal)) + geom_point (col= "steelblue" , size=3) + # Set static color and size for points geom_smooth (method= "lm" , col= "firebrick" ) + # change the color of line coord_cartesian (xlim= c (0, 0.1), ylim= c (0, 1000000)) + labs (title= "Area Vs Population" , subtitle= "From midwest dataset" , y= "Population" , x= "Area" , caption= "Midwest Demographics" ) |
改变颜色以反应另一列变量的类型
1 2 3 4 5 6 7 | library (ggplot2) gg <- ggplot (midwest, aes (x=area, y=poptotal)) + geom_point ( aes (col=state), size=3) + # Set color to vary based on state categories. geom_smooth (method= "lm" , col= "firebrick" , size=2) + coord_cartesian (xlim= c (0, 0.1), ylim= c (0, 1000000)) + labs (title= "Area Vs Population" , subtitle= "From midwest dataset" , y= "Population" , x= "Area" , caption= "Midwest Demographics" ) plot (gg) |
color, size
, shape
, stroke
(thickness of boundary) and fill
(fill color) 均可指定
也可以改变调色板
1 | gg + scale_colour_brewer (palette = "Set1" ) # change color palette |
更多调色板可以在 RColorBrewer 包中找到
1 2 3 4 5 6 7 8 9 10 11 12 13 | library (RColorBrewer) head (brewer.pal.info, 10) # show 10 palettes #> maxcolors category colorblind #> BrBG 11 div TRUE #> PiYG 11 div TRUE #> PRGn 11 div TRUE #> PuOr 11 div TRUE #> RdBu 11 div TRUE #> RdGy 11 div FALSE #> RdYlBu 11 div TRUE #> RdYlGn 11 div FALSE #> Spectral 11 div FALSE #> Accent 8 qual FALSE |
6 改变x轴文本和刻度位置
breaks
and labels
Step 1: Set the breaks
scale_x_continuous
—— X 轴变量是连续变量
scale_x_date
——日期变量
1 2 3 4 5 6 7 8 9 10 11 | library (ggplot2) # Base plot gg <- ggplot (midwest, aes (x=area, y=poptotal)) + geom_point ( aes (col=state), size=3) + # Set color to vary based on state categories. geom_smooth (method= "lm" , col= "firebrick" , size=2) + coord_cartesian (xlim= c (0, 0.1), ylim= c (0, 1000000)) + labs (title= "Area Vs Population" , subtitle= "From midwest dataset" , y= "Population" , x= "Area" , caption= "Midwest Demographics" ) # Change breaks gg + scale_x_continuous (breaks= seq (0, 0.1, 0.01)) |
Step 2: Change the labels
改变 labels
at the axis ticks. labels
需要和 breaks向量长度保持一致
1 2 3 4 5 6 7 8 9 10 11 | library (ggplots) # Base Plot gg <- ggplot (midwest, aes (x=area, y=poptotal)) + geom_point ( aes (col=state), size=3) + # Set color to vary based on state categories. geom_smooth (method= "lm" , col= "firebrick" , size=2) + coord_cartesian (xlim= c (0, 0.1), ylim= c (0, 1000000)) + labs (title= "Area Vs Population" , subtitle= "From midwest dataset" , y= "Population" , x= "Area" , caption= "Midwest Demographics" ) # Change breaks + label gg + scale_x_continuous (breaks= seq (0, 0.1, 0.01), labels = letters [1:11]) |
1 2 | # Reverse X Axis Scale gg + scale_x_reverse () |
为轴标签自定义文本
Method 1: Using sprintf()
. (Have formatted it as % in below example)
Method 2: Using a custom user defined function. (Formatted 1000’s to 1K scale)
1 2 3 4 5 6 7 8 9 10 11 12 | library (ggplot2) # Base Plot gg <- ggplot (midwest, aes (x=area, y=poptotal)) + geom_point ( aes (col=state), size=3) + # Set color to vary based on state categories. geom_smooth (method= "lm" , col= "firebrick" , size=2) + coord_cartesian (xlim= c (0, 0.1), ylim= c (0, 1000000)) + labs (title= "Area Vs Population" , subtitle= "From midwest dataset" , y= "Population" , x= "Area" , caption= "Midwest Demographics" ) # Change Axis Texts gg + scale_x_continuous (breaks= seq (0, 0.1, 0.01), labels = sprintf ( "%1.2f%%" , seq (0, 0.1, 0.01))) + scale_y_continuous (breaks= seq (0, 1000000, 200000), labels = function (x){ paste0 (x/1000, 'K' )}) |
使用内置主题一次性自定义整个主题
?theme_bw
theme_set() to set the theme before drawing the ggplot. Note that this setting will affect all future plots. *
Draw the ggplot and then add the overall theme setting (eg. theme_bw()
)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | library (ggplot2) # Base plot gg <- ggplot (midwest, aes (x=area, y=poptotal)) + geom_point ( aes (col=state), size=3) + # Set color to vary based on state categories. geom_smooth (method= "lm" , col= "firebrick" , size=2) + coord_cartesian (xlim= c (0, 0.1), ylim= c (0, 1000000)) + labs (title= "Area Vs Population" , subtitle= "From midwest dataset" , y= "Population" , x= "Area" , caption= "Midwest Demographics" ) gg <- gg + scale_x_continuous (breaks= seq (0, 0.1, 0.01)) # method 1: Using theme_set() theme_set ( theme_classic ()) # not run gg # method 2: Adding theme Layer itself. gg + theme_bw () + labs (subtitle= "BW Theme" ) gg + theme_classic () + labs (subtitle= "Classic Theme" ) |
更多主题可以看看 the ggthemes package and the ggthemr package.
参考:
英文教程:http://r-statistics.co/Complete-Ggplot2-Tutorial-Part1-With-R-Code.html
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 理解Rust引用及其生命周期标识(上)
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
· 没有源码,如何修改代码逻辑?
· 分享4款.NET开源、免费、实用的商城系统
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
· 记一次.NET内存居高不下排查解决与启示