《基于plotly的可视化绘图》读书笔记

《python数据分析 基于plotly的动态可视化绘图》, 孙洋洋 王硕 邢梦来 袁泉 吴娜著, 电子工业出版社.

有点plotly中文使用手册的感觉,很厉害

快速开始

他说,pandas和plotly是python数据分析和量化分析两大重量级模块库:
pandas是业界公认的大数据工程一线最好解决方案;
plotly是史上最好的绘图工具之一。

安装

1
pip install plotly -i https://pypi.tuna.tsinghua.edu.cn/simple

我安完是4.13.0的,所以我感觉以前主流的v3版本可能有些地方的写法需要变化。
我在桌面上建了个叫python的文件夹,操作用jupyter
而且书中给的网址好像也不是那个了,不过依然可以跳转,现在好像应该是plotlt.com

更多官方案例见官网

需要什么语言的就对应地改网址就好。

编辑器

然后这个作者说要注册个账号,保存在线绘图。
对于做数据分析的人员来说,一般用的都是离线的,在线绘图duck不必。

1
2
3
4
py.offline.init_notebook_mode() #初始化 告诉是jupyter

plotly.offine.plot()
plotly.offine.iplot()

都是离线画图的方法;前者是打开一个单独的html,下面语句可以命名文件名,.html加不加都可吧。

1
py.offline.plot(data, filename = 'first_offline_start')

后者加个i是直接在jupyter下面输出。

举例区分一下:

1
2
3
4
5
py.offline.plot(data, filename = 'first_offline_start') #跟下句一样
py.offline.plot(data, filename = 'first_offline_start.html') #打开网页first_offline_start.html
py.offline.plot(data) #打开网页temp-plot.html
py.offline.iplot(data, filename = 'first_offline_start') #不生成网页
py.offline.iplot(data) #同上,直接在下面输出

简单示例

示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import plotly as py
from plotly.graph_objs import Scatter, Layout, Data

py.offline.init_notebook_mode()

trace0 = Scatter(
x=[1, 2, 3, 4],
y=[10, 15, 13, 17]
)
trace1 = Scatter(
x=[1, 2, 3, 4],
y=[16, 5, 11, 9]
)
data = Data([trace0, trace1])

py.offline.iplot(data)

trace0trace1各自为一组画图的数据,然后以列表[]的形式输入到Data中

查看帮助

记住jupyter就是加i

1
2
import plotly
help(plotly.offline.iplot)

总结

在线画图是:

1
2
import plotly.ployly as py
py.plot

离线画图是:

1
2
import plotly
plotly.offline.iplot

这一点区别和用法不知道你搞懂了吗

plotly基础图形

基本的流程模板:

  • 添加图形数据,例如scatter等
  • 设置画面布局,layout
  • 集成图形、布局数据,Data、Figure
  • 图形的输出,offine.iplot,自定义的短命令是pyplt

散点图

简单案例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import numpy as np
import plotly as py
import plotly.graph_objs as go

pyplt = py.offline.iplot #自定义一下

N = 100
random_x = np.linspace(0, 1, N)
random_y0 = np.random.randn(N)+5
random_y1 = np.random.randn(N)
random_y2 = np.random.randn(N)-5


trace0 = go.Scatter(
x = random_x,
y = random_y0,
mode = 'markers', # 纯散点的绘图
name = 'markers' # 曲线名称
)
trace1 = go.Scatter(
x = random_x,
y = random_y1,
mode = 'lines+markers', # 散点+线的绘图
name = 'lines+markers'
)
trace2 = go.Scatter(
x = random_x,
y = random_y2,
mode = 'lines', # 线的绘图
name = 'lines'
)

data = [trace0, trace1, trace2]
pyplt(data)

散点图1

仔细看除了数据差异以外,三个线的样式不同,是因为mode的参数不同。

样式设置

散点图2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
import plotly as py
import plotly.graph_objs as go

import numpy as np

pyplt = py.offline.iplot

N = 500
x = np.random.randn(N)

trace0 = go.Scatter(
x = np.random.randn(N),
y = np.random.randn(N)+2,
name = 'Above',
mode = 'markers+lines',
marker = dict(
size = 10, # 设置点的宽度
color = 'rgba(152, 0, 0, .8)', # 设置曲线的颜色
line = dict(
width = 2, # 设置线条的宽度
color = 'rgb(0, 0, 0)' # 设置线条的颜色
)
)
)


trace1 = go.Scatter(
x = np.random.randn(N),
y = np.random.randn(N) - 2,
name = 'Below',
mode = 'markers',
marker = dict(
size = 10,
color = 'rgba(255, 182, 193, .9)',
line = dict(
width = 2,
)
)
)

data = [trace0, trace1]

layout = dict(title = 'Styled Scatter',
yaxis = dict(zeroline = True), # 显示y轴的0刻度线
xaxis = dict(zeroline = False) # 不显示x轴的0刻度线
)

fig = dict(data=data, layout=layout)
pyplt(fig)

scatter参数很多,我肯定不能把书中代码和所有的参数含义都贴进来,也没有实战过。

其实有需要也可以购买电子版,好保存,查起来也方便。

气泡图

还是散点图的命令,就是调整了点的大小。

简单案例

气泡图1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import plotly as py
import plotly.graph_objs as go

pyplt = py.offline.iplot

trace0 = go.Scatter(
x=[1, 2, 3, 4],
y=[10, 11, 12, 13],
mode='markers',
marker=dict( ###数据节点大小,包括颜色,大小,格式
size=[40, 60, 80, 100],
)
)

data = [trace0]
pyplt(data)

样式设置

气泡图2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import plotly as py
import plotly.graph_objs as go

pyplt = py.offline.iplot

trace0 = go.Scatter(
x=[1, 2, 3, 4],
y=[10, 11, 12, 13],
mode='markers',
text=['A<br>size: 40', 'B<br>size: 60', 'C<br>size: 80', 'D<br>size: 100'], ###指定每个点的悬浮文字
marker=dict(
color= [120, 125, 130, 135], ###点的颜色
opacity=[1, 0.8, 0.6, 0.4], ###点的透明度
size=[40, 60, 80, 100], ###点的大小
showscale= True, ###是否显示右边的颜色条
)
)

data = [trace0]
pyplt(data)

缩放设置

气泡图3

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
import plotly as py
import plotly.graph_objs as go

pyplt = py.offline.iplot

trace0 = go.Scatter(
x=[1, 2, 3, 4],
y=[10, 11, 12, 13],
text=['A</br>size: 40</br>default', 'B</br>size: 60</br>default', 'C</br>size: 80</br>default', 'D</br>size: 100</br>default'],
mode='markers',
name='default',
marker=dict(
size=[400, 600, 800, 1000],
sizemode='area',
)
)
trace1 = go.Scatter(
x=[1, 2, 3, 4],
y=[14, 15, 16, 17],
text=['A</br>size: 40</br>sizeref: 0.2', 'B</br>size: 60</br>sizeref: 0.2', 'C</br>size: 80</br>sizeref: 0.2', 'D</br>size: 100</br>sizeref: 0.2'],
mode='markers',
name = 'ref0.2',
marker=dict(
size=[400, 600, 800, 1000],
sizeref=0.2,
sizemode='area',
)
)

trace2 = go.Scatter(
x=[1, 2, 3, 4],
y=[20, 21, 22, 23],
text=['A</br>size: 40</br>sizeref: 2', 'B</br>size: 60</br>sizeref: 2', 'C</br>size: 80</br>sizeref: 2', 'D</br>size: 100</br>sizeref: 2'],
mode='markers',
name='ref2',
marker=dict(
size=[400, 600, 800, 1000], #对应点的大小,举例这样,那要是实际很多数据呢???
sizeref=2, #缩放比例,2就是原来的1/2
sizemode='area', #按照面积缩放
)
)

data = [trace0, trace1, trace2]
pyplt(data)

线形图

基本案例

线性图1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import plotly as py
import plotly.graph_objs as go

pyplt = py.offline.iplot
# 600000浦发银行20170301-20170428涨跌幅度数据,数据来源Wind
profit_rate = [-0.001, -0.013, -0.004, 0.002, 0.003, -0.001, -0.009, 0.0, 0.007,\
-0.005, 0.0, 0.001, -0.006, -0.006, -0.009, -0.013, 0.005, 0.007,\
0.004, -0.006, -0.009, -0.004, 0.015, 0.007, 0.001, 0.003, -0.009,\
-0.005, 0.001, -0.008, -0.016, 0.002, -0.013, -0.009, -0.014, 0.009,\
-0.003, 0.002, -0.001, 0.011, 0.004] #数据为y轴
date = pd.date_range(start = '3/1/2017', end = '4/30/2017') #生成对应长度的日期序列,x轴
trace = [go.Scatter(
x = date,
y = profit_rate
)]

layout = dict( #布局设置
title = '浦发银行20170301-20170428涨跌幅变化',
xaxis = dict(title = 'Date'),
yaxis = dict(title = 'profit_rate')
)

fig = dict(data = trace, layout = layout)
pyplt(fig)

数据缺失的情况和插值的做法等有需要可以再百度。

填充线形图

线形图2

比如书中的例子,绘制一个股票一段时间的最高价和最低价。图上会有三条线分别是这段时间每天的开盘价、最高价和最低价,画图的时候代码是这么做的:

1
2
3
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
x_rev = x[::-1]
x + x_rev
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1]

然后y值也是一样的处理方法,你可以想象一下,对于第一天来说,y的最高价放在了上面序列第一个1的位置,y的最低价放在了上面序列最后一个1的位置;对于最后一天来说,y的最高价放在了上面序列中间左边那个10的位置,y的最低价放在了右边那个10的位置。

全部代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
import plotly as py
import plotly.graph_objs as go

pyplt = py.offline.iplot
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
x_rev = x[::-1]

# Line 1 002104恒宝股份20170518-20170602
y1 = [8.86, 8.85, 8.69, 8.4, 8.62, 9, 8.99, 8.85, 8.59, 9.31]
y1_upper = [9.05, 9.03, 9.08, 8.76, 8.63, 9.04, 9.09, 9.16, 8.9, 9.45]
y1_lower = [8.86, 8.85, 8.64, 8.36, 8.33, 8.43, 8.93, 8.84, 8.53, 8.52]
y1_lower = y1_lower[::-1] # 逆序

# Line 2 002125湘潭电化20170518-20170602
y2 = [10.39, 10.35, 9.85, 9.73, 9.77, 9.8, 9.75, 9.65, 9.16, 9.34]
y2_upper = [10.58, 10.52, 10.34, 10.14, 9.87, 9.87, 9.94, 9.6, 9.42, 9.5]
y2_lower = [10.15, 10.21, 9.72, 9.68, 9.24, 9.48, 9.62, 9.12, 9.12, 9.34]
y2_lower = y2_lower[::-1]

# Line 3 002077大港股份20170518-20170602
y3 = [11.88, 13.07, 12.75, 12.02, 12.1, 12.61, 12.42, 12.42, 11.18, 10.72]
y3_upper = [11.98, 13.07, 13.4, 12.91, 12.45, 13.1, 12.61, 12.65, 12.45, 11.16]
y3_lower = [11.6, 11.75, 12.75, 12.02, 11.8, 11.92, 12.17, 12.29, 11.18, 10.35]
y3_lower = y3_lower[::-1]

trace1 = go.Scatter(
x = x + x_rev,
y = y1_upper + y1_lower,
fill = 'tozerox',
fillcolor = 'rgba(0,0,205,0.2)',
line = go.Line(color = 'rgba(255,0,0,0)'), #这里改了
opacity = 0,
showlegend = False,
name = '恒宝股份',
)
trace2 = go.Scatter(
x = x + x_rev,
y = y2_upper + y2_lower,
fill = 'tozerox',
fillcolor = 'rgba(30,144,255,0.2)',
line = go.Line(color = 'rgba(255,0,0,0)'),
name = '湘潭电化',
showlegend = False,
)
trace3 = go.Scatter(
x = x+x_rev,
y = y3_upper+y3_lower,
fill = 'tozerox',
fillcolor = 'rgba(112,128,144,0.2)',
line = go.Line(color = 'rgba(255,0,0,0)'),
showlegend = False,
name = '大港股份',
)
trace4 = go.Scatter(
x = x,
y = y1,
line = go.Line(color = 'rgb(0,0,205)'),
mode = 'lines',
name = '恒宝股份',
)
trace5 = go.Scatter(
x = x,
y = y2,
line = go.Line(color='rgb(30,144,255)'),
mode = 'lines',
name = '湘潭电化',
)
trace6 = go.Scatter(
x = x,
y = y3,
line = go.Line(color='rgb(112,128,144)'),
mode = 'lines',
name = '大港股份',
)

data = go.Data([trace1, trace2, trace3, trace4, trace5, trace6])
#"""
layout = go.Layout(
paper_bgcolor = 'rgb(255,255,255)',
plot_bgcolor = 'rgb(229,229,229)',
xaxis = go.XAxis(
gridcolor = 'rgb(255,255,255)',
range = [1,10],
showgrid = True,
showline = False,
showticklabels = True,
tickcolor = 'rgb(127,127,127)',
ticks = 'outside',
zeroline = False
),
yaxis = go.YAxis(
gridcolor = 'rgb(255,255,255)',
showgrid = True,
showline = False,
showticklabels = True,
tickcolor = 'rgb(127,127,127)',
ticks = 'outside',
zeroline = False
),
)
#"""
fig = go.Figure(data = data, layout = layout)
pyplt(fig)

但是有个地方我并不确定,那就是这个最高价和最低价的线的透明设置,原代码为:

    line = go.Line(color = 'transparent'),

我看懂他的逻辑了,但是我并不能成功运行,可能是版本的问题,我试了试直接写opacity,结果不行,所以暂时把这个参数值替换成了'whitesmoke'(随便取的)

不好意思我会了,rgba里第四个参数是不透明度,那我直接改成白色非常透明就好了,即:

    line = go.Line(color = 'rgba(255,0,0,0)'),

有点东西

应用案例

加了个应用是因为我觉得这个用循环来写在本书是第一次出现,而且里面的参数也很多,挺有价值。所以就好好看好好学这个写法的逻辑,然后参数我反复看了很久,书上说前面都提到了,其实没有。

我个人认为下面这个应该是写错了:

color = colors,

但我也很怀疑,他能运行出来应该不会是error,但是赋的值是给了color,而且很明显是四次,所以怎么可能是colors,然后我改成了:

color = color,

不仔细看还以为是rgba的写法不对,不过也多亏了这里,我想到了上面的例子设置透明度的办法。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
import plotly as py
import plotly.graph_objs as go

pyplt = py.offline.iplot

x_data = [ #横坐标
[2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2013],
[2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2013],
[2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2013],
[2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2013],
]

y_data = [ #四个线的y
[74, 82, 80, 74, 73, 72, 74, 70, 70, 66, 66, 69],
[45, 42, 50, 46, 36, 36, 34, 35, 32, 31, 31, 28],
[13, 14, 20, 24, 20, 24, 24, 40, 35, 41, 43, 50],
[18, 21, 18, 21, 16, 14, 13, 18, 17, 16, 19, 23],
]

traces = []

for i in range(0, 4):
traces.append(go.Scatter( #四个线
x = x_data[i],
y = y_data[i],
mode = 'lines',
line = dict(color = colors[i], width = line_size[i]),
connectgaps = True, #这个参数是没说的数据连接部分,true表示连接缺失值左右相邻的数据点
))

traces.append(go.Scatter( #格外标记了一下第一个和最后一个点
x = [x_data[i][0], x_data[i][11]],
y = [y_data[i][0], y_data[i][11]],
mode = 'markers',
marker = dict(color = colors[i], size = mode_size[i])
))


layout = go.Layout(
xaxis = dict(
showline = True,
showgrid = False,
showticklabels = True, # True显示坐标标记
linecolor = 'rgb(204, 204, 204)',# x轴线的颜色
linewidth = 2,
dtick = False, # True自动删除部分日期标示,False保持原状
ticks = 'outside', # x轴上的刻度线,在图内or图外
tickcolor = 'rgb(204, 204, 204)', # x轴上的刻度线的颜色
tickwidth = 2, # x轴上的刻度线的宽度
ticklen = 10, # x轴上的刻度线的长度
tickfont=dict( # x轴上的坐标标记字体样式,大小,颜色
family = 'Arial',
size = 12,
color = 'rgb(82, 82, 82)',
),
),
yaxis=dict(
showgrid = False,
zeroline = False,
showline = False,
showticklabels = False,
),
autosize = False,
margin = dict(
autoexpand = False,
l = 100,
r = 20,
t = 110,
),
showlegend = False,
) #参数太多了,有些细微的设置夸几就是一行,属实对新手不友好

annotations = []

title = 'Main Source for News'

labels = ['Television', 'Newspaper', 'Internet', 'Radio']

colors = ['rgba(67,67,67,1)', 'rgba(115,115,115,1)', 'rgba(49,130,189, 1)', 'rgba(189,189,189,1)']

mode_size = [8, 8, 12, 8]

line_size = [2, 2, 4, 2]


# 标签注释
for y_trace, label, color in zip(y_data, labels, colors):
# 标记图的左侧的那个数值
annotations.append(dict(xref = 'paper', x = 0.05, y = y_trace[0], #xy就是点一下纸上的位置
xanchor = 'right', yanchor = 'middle',
text = label + ' {}%'.format(y_trace[0]),
font = dict(family = 'Arial',
size = 16,
color = color,),
showarrow = False)) #这里我改成True试了试,也看到变化了,但是不知道怎么形容
# 标记图的右侧的那个数值
annotations.append(dict(xref = 'paper', x = 0.95, y = y_trace[11],
xanchor = 'left', yanchor = 'middle',
text = '{}%'.format(y_trace[11]),
font = dict(family = 'Arial',
size = 16,
color = color,),
showarrow = False))
# 标题的设置
annotations.append(dict(xref = 'paper', yref = 'paper', x = 0.0, y = 1.05,
xanchor = 'left', yanchor = 'bottom',
text = 'Main Source for News',
font = dict(family = 'Arial',
size = 30,
color = 'rgb(37,37,37)'),
showarrow = False))
# 最下面的那行小字
annotations.append(dict(xref = 'paper', yref = 'paper', x = 0.5, y = -0.2,
xanchor = 'center', yanchor = 'top',
text = 'Source: PewResearch Center & ' +
'Storytelling with data',
font = dict(family = 'Arial',
size = 12,
color = 'rgb(150,150,150)'),
showarrow = False))

layout['annotations'] = annotations

fig = go.Figure(data = traces, layout = layout)
pyplt(fig)

线形图3

柱状图

基本柱状图

基本柱状图

使用的是graph_objs中的Bar函数,再layout中对barmode设置,可以绘制不同类型的柱状图,下面是一个简单案例的示例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import plotly as py
import plotly.graph_objs as go
pyplt = py.offline.iplot

# Trace
trace_basic = [go.Bar(
x = ['Variable_1', 'Variable_2', 'Variable_3','Variable_4','Variable_5'],
y = [1, 2, 3, 2, 4],
)]

# Layout
layout_basic = go.Layout(
title = 'The Graph Title',
xaxis = go.XAxis(range = [-0.5,4.5], domain = [0,1])
)

# Figure
figure_basic = go.Figure(data = trace_basic, layout = layout_basic)

# Plot
pyplt(figure_basic)

柱状簇

在上面基础代码的基础上加入多组数据即可:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import plotly as py
import plotly.graph_objs as go
pyplt = py.offline.iplot
# Traces
trace_1 = go.Bar(
x = ["上海物贸", "广东明珠", "五矿发展"],
y = [4.12, 5.32, 0.60],
name = "201609"
)

trace_2 = go.Bar(
x = ["上海物贸", "广东明珠", "五矿发展"],
y = [3.65, 6.14, 0.58],
name = "201612"
)

trace_3 = go.Bar(
x = ["上海物贸", "广东明珠", "五矿发展"],
y = [2.15, 1.35, 0.19],
name = "201703"
)

trace = [trace_1, trace_2, trace_3]

# Layout
layout = go.Layout(
title = '国际贸易板块净资产收益率对比图'
)

# Figure
figure = go.Figure(data = trace, layout = layout)

# Plot
pyplt(figure)

柱状簇

这么看是非常简单的,想必这章主要是学习画图函数。

比方说我再这么一改,数据还是那个数据,表达形式就变了:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import plotly as py
import plotly.graph_objs as go
pyplt = py.offline.iplot
# Traces
trace_1 = go.Bar(
x = ["09/2016", "12/2016", "03/2017"],
y = [4.12, 3.65, 2.15],
name = "上海物贸"
)

trace_2 = go.Bar(
x = ["09/2016", "12/2016", "03/2017"],
y = [5.32, 6.14, 1.35],
name = "广东明珠"
)

trace_3 = go.Bar(
x = ["09/2016", "12/2016", "03/2017"],
y = [0.60, 0.58, 0.19],
name = "五矿发展"
)

trace = [trace_1, trace_2, trace_3]

# Layout
layout = go.Layout(
title = '国际贸易板块净资产收益率对比图'
)

# Figure
figure = go.Figure(data = trace, layout = layout)

# Plot
pyplt(figure)

柱状簇2

层叠柱状图

或者是叫堆积柱状图,由barmode = 'stack'控制,下图示例:

就像看多个饼图一样,个体占比清晰可见。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import plotly as py
import plotly.graph_objs as go
pyplt = py.offline.iplot

trace_1 = go.Bar(
x = ['华夏新经济混合', '华夏上证50', '嘉实新机遇混合', '南方消费活力混合','华泰柏瑞'],
y = [0.7252, 0.9912, 0.5347, 0.4436, 0.9911],
name = '股票投资'
)

trace_2 = go.Bar(
x = ['华夏新经济混合', '华夏上证50', '嘉实新机遇混合', '南方消费活力混合','华泰柏瑞'],
y = [0.2072, 0, 0.4081, 0.4955, 0.02],
name='其它投资'
)

trace_3 = go.Bar(
x = ['华夏新经济混合', '华夏上证50', '嘉实新机遇混合', '南方消费活力混合','华泰柏瑞'],
y = [0, 0, 0.037, 0, 0],
name='债券投资'
)

trace_4 = go.Bar(
x = ['华夏新经济混合', '华夏上证50', '嘉实新机遇混合', '南方消费活力混合','华泰柏瑞'],
y = [0.0676, 0.0087, 0.0202, 0.0609, 0.0087],
name='银行存款'
)

trace = [trace_1, trace_2, trace_3, trace_4]
layout = go.Layout(
title = '基金资产配置比例图',
xaxis = dict(tickangle = -45),
barmode='stack'
)

fig = go.Figure(data = trace, layout = layout)
pyplt(fig)

层叠柱状图

瀑布式柱状图

不表,感觉我用不到。

水平条形图

与柱状图共用一个函数,需要在bar里设置orientation = 'h',也可以通过设置barmode = 'stack'绘制层叠水平条形图与瀑布式水平条形图。

基本案例

水平条形图基础

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
import plotly as py
import plotly.graph_objs as go
pyplt = py.offline.iplot

trace_1 = go.Bar(
y = ['华夏新经济混合', '嘉实新机遇混合', '南方消费活力混合','华泰柏瑞'],
x = [0.7252, 0.5347, 0.4436, 0.9911],
orientation = 'h',
name = '股票投资'
)

trace_2 = go.Bar(
y = ['华夏新经济混合', '嘉实新机遇混合', '南方消费活力混合'],
x = [0.2072, 0.4081, 0.4955],
orientation = 'h',
name='其它投资'
)

trace_3 = go.Bar(
y = ['华夏新经济混合', '嘉实新机遇混合', '南方消费活力混合'],
x = [0, 0.037, 0],
orientation = 'h',
name='债券投资'
)

trace_4 = go.Bar(
y = ['华夏新经济混合', '嘉实新机遇混合', '南方消费活力混合'],
x = [0.0676, 0.0202, 0.0609],
orientation = 'h',
name='银行存款'
)

trace = [trace_1, trace_2, trace_3, trace_4]
layout = go.Layout(
title = '基金资产配置比例图',
#xaxis = dict(tickangle = -45),
barmode='stack'
)

fig = go.Figure(data = trace, layout = layout)
pyplt(fig)

这个坐标轴我看着有点不喜欢,想想办法变成百分比的:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
import plotly as py
import plotly.graph_objs as go
pyplt = py.offline.iplot

trace_1 = go.Bar(
y = ['华夏新经济混合', '嘉实新机遇混合', '南方消费活力混合','华泰柏瑞'],
x = [i*100 for i in [0.7252, 0.5347, 0.4436, 0.9911]],
orientation = 'h',
name = '股票投资'
)

trace_2 = go.Bar(
y = ['华夏新经济混合', '嘉实新机遇混合', '南方消费活力混合'],
x = [i*100 for i in [0.2072, 0.4081, 0.4955]],
orientation = 'h',
name='其它投资'
)

trace_3 = go.Bar(
y = ['华夏新经济混合', '嘉实新机遇混合', '南方消费活力混合'],
x = [i*100 for i in [0, 0.037, 0]],
orientation = 'h',
name='债券投资'
)

trace_4 = go.Bar(
y = ['华夏新经济混合', '嘉实新机遇混合', '南方消费活力混合'],
x = [i*100 for i in [0.0676, 0.0202, 0.0609]],
orientation = 'h',
name='银行存款'
)

trace = [trace_1, trace_2, trace_3, trace_4]
layout = go.Layout(
title = '基金资产配置比例图',
barmode='stack',
xaxis = dict(
ticksuffix = '%'
)
)

fig = go.Figure(data = trace, layout = layout)
pyplt(fig)

水平条形图基础2

高难度应用

水平条形图难度

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
import plotly as py
import plotly.graph_objs as go
from plotly import tools
pyplt = py.offline.iplot

y_saving = [1.3586, 2.2623000000000002, 4.9821999999999997, 6.5096999999999996,
7.4812000000000003, 7.5133000000000001, 15.2148, 17.520499999999998
] # 左侧的,对应柱形的长度
y_net_worth = [93453.919999999998, 81666.570000000007, 69889.619999999995,
78381.529999999999, 141395.29999999999, 92969.020000000004,
66090.179999999993, 122379.3] # 右侧的,对应折线的值,分别对应从下至上
x_saving = ['Japan', 'United Kingdom', 'Canada', 'Netherlands',
'United States', 'Belgium', 'Sweden', 'Switzerland']
x_net_worth = ['Japan', 'United Kingdom', 'Canada', 'Netherlands',
'United States', 'Belgium', 'Sweden', 'Switzerland'
]
trace0 = go.Bar(
x = y_saving,
y = x_saving,
marker = dict(
color = 'rgba(50, 171, 96, 0.6)', # 柱形颜色
line = dict(
color = 'rgba(50, 171, 96, 1.0)', # 柱形边框颜色
width = 1),
),
name = 'Household savings, percentage of household disposable income',
orientation = 'h',
)
trace1 = go.Scatter(
x = y_net_worth,
y = x_net_worth,
mode = 'lines + markers',
line = dict(
color = 'rgb(128, 0, 128)'), # 折线颜色
name = 'Household net worth, Million USD/capita',
)
layout = dict(
title = 'Household savings & net worth for eight OECD countries',
# 左边的图 y 轴
yaxis1 = dict(
showgrid = False, # 是否显示横向网格
showline = False, # 是否显示左侧轴线
showticklabels = True, # 是否显示坐标轴上的标注
domain = [0, 0.85],
),
# 右边的图 y 轴
yaxis2 = dict(
showgrid = False,
showline = True,
showticklabels = False,
linecolor = 'rgba(102, 102, 102, 0.8)', # 左侧轴线颜色
linewidth = 2,
domain = [0, 0.85],
),
# 左边的图 x 轴
xaxis1 = dict(
zeroline = False, # 是否显示左侧轴线
showline = False, # 是否显示下方轴线
showticklabels = True,
showgrid = True, # 是否显示纵向网格
domain = [0, 0.42],
),
# 右边的图 x 轴
xaxis2 = dict(
zeroline = False,
showline = False,
showticklabels = True,
showgrid = True,
domain = [0.47, 1],
side = 'top', # 轴上标注在上方,默认下方
dtick = 25000, # 调整轴上标注数值间隔,25000表示相邻标注间隔数值为25000
),
legend = dict(
x = 0.029, # 图例x位置
y = 1.038, # 图例y位置
font = dict(
size = 10, # 图例字号大小
),
),
margin = dict(
l = 100, # 左侧空白大小
r = 20, # 右侧空白大小
t = 70, # 上方空白大小
b = 70, # 下方空白大小
),
paper_bgcolor = 'rgb(248, 248, 255)', # 整张图片背景颜色
plot_bgcolor = 'rgb(248, 248, 255)', # 绘图部分背景颜色
)

annotations = []

y_s = np.round(y_saving, decimals = 2) # 四舍五入至两位小数
y_nw = np.rint(y_net_worth) # 四舍五入至整数

for ydn, yd, xd in zip(y_nw, y_s, x_saving): # 把数据对应起来
# 右侧折线图设置标签
annotations.append(dict(xref = 'x2', yref = 'y2',
y = xd, x = ydn - 20000,
text='{:,}'.format(ydn) + 'M', # 从右向左,每隔三位','
font = dict(family = 'Arial', size = 12,
color = 'rgb(128, 0, 128)'), # 设置标签字体,颜色与大小
showarrow = False)) # 是否添加从标签到数据点的箭头
# 左侧水平柱形图设置标签
annotations.append(dict(xref = 'x1', yref = 'y1',
y = xd, x = yd + 3,
text = str(yd) + '%',
font = dict(family = 'Arial', size = 12,
color = 'rgb(50, 171, 96)'),
showarrow = False))
# 下侧标签设置
annotations.append(dict(xref = 'paper', yref = 'paper', # 设置文字样式
x = 0.3, y = -0.05, # 设置文字位置
text = 'OECD "' +
'(2015), Household savings (indicator), ' +
'Household net worth (indicator). doi: ' +
'10.1787/cfc6f499-en (Accessed on 05 June 2015)', # 设置图下方文字
font = dict(family = 'Arial', size = 10, # 设置图下方文字格式
color = 'rgb(150,150,150)'),
showarrow = False))

layout['annotations'] = annotations

# 一左一右两个画布
# shared_yaxes 不共享y轴,shared_xaxes 共享x轴,rows = 1, cols = 2 表示划分为两个子图
fig = tools.make_subplots(rows = 1, cols = 2,
shared_xaxes = True,
shared_yaxes = False)

fig.append_trace(trace0, 1, 1)
fig.append_trace(trace1, 1, 2)

fig['layout'].update(layout)
pyplt(fig)

本样例来自plotly官网。且本书作者已给出详细注释,概括起来代码就是:

  • 数据
  • trace
  • 各自的样式layout
  • 各自的注释annotation
  • 整个画布一分为二

甘特图

简单案例

甘特图

常用于显示项目的进度、时间的安排等与时间有关的东西。

使用函数为plotly.figure中的creat_gantt,通过传递事件task与开始start结束finish的数据来绘制图表。
越接近左侧,越早发生;
越接近右侧,越快结束。

1
2
3
4
5
6
7
8
9
10
11
12
13
import plotly as py
import plotly.figure_factory as ff

pyplt = py.offline.iplot
df = [dict(Task="Job A", Start='2009-01-01', Finish='2009-02-28', Complete=10),
dict(Task="Job B", Start='2008-12-05', Finish='2009-04-15', Complete=10),
dict(Task="Job C", Start='2009-02-20', Finish='2009-05-30', Complete=50),
dict(Task="Job D", Start='2009-03-20', Finish='2009-06-30', Complete=50),
dict(Task="Job E", Start='2009-01-12', Finish='2009-04-28', Complete=100),
dict(Task="Job F", Start='2009-03-07', Finish='2009-08-21', Complete=100)]

fig = ff.create_gantt(df, index_col='Complete', show_colorbar=True) #Complete类似一个百分比程度,是索引,传给index_col
pyplt(fig)

这图太赞了,果然是展示计划进行程度非常之合适。

应用案例

注意以下几个点:

  • 数据精确到了秒s
  • 上面简单案例是按照数字索引,下面是按照类别索引
  • 设置了颜色colors,放在create_grantt

甘特图应用案例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import plotly as py
import plotly.figure_factory as ff
pyplt = py.offline.iplot

df = [
dict(Task='Morning Sleep', Start='2016-01-01', Finish='2016-01-01 6:00:00', Resource='Sleep'),
dict(Task='Breakfast', Start='2016-01-01 7:00:00', Finish='2016-01-01 7:30:00', Resource='Food'),
dict(Task='Work', Start='2016-01-01 9:00:00', Finish='2016-01-01 11:25:00', Resource='Brain'),
dict(Task='Break', Start='2016-01-01 11:30:00', Finish='2016-01-01 12:00:00', Resource='Rest'),
dict(Task='Lunch', Start='2016-01-01 12:00:00', Finish='2016-01-01 13:00:00', Resource='Food'),
dict(Task='Work', Start='2016-01-01 13:00:00', Finish='2016-01-01 17:00:00', Resource='Brain'),
dict(Task='Exercise', Start='2016-01-01 17:30:00', Finish='2016-01-01 18:30:00', Resource='Cardio'),
dict(Task='Post Workout Rest', Start='2016-01-01 18:30:00', Finish='2016-01-01 19:00:00', Resource='Rest'),
dict(Task='Dinner', Start='2016-01-01 19:00:00', Finish='2016-01-01 20:00:00', Resource='Food'),
dict(Task='Evening Sleep', Start='2016-01-01 21:00:00', Finish='2016-01-01 23:59:00', Resource='Sleep')
]

colors = dict(Cardio = 'rgb(46, 137, 205)',
Food = 'rgb(114, 44, 121)',
Sleep = 'rgb(198, 47, 105)',
Brain = 'rgb(58, 149, 136)',
Rest = 'rgb(107, 127, 135)')

fig = ff.create_gantt(df, colors = colors, index_col = 'Resource', title = 'Daily Schedule',
bar_width = 0.8, showgrid_x = True, showgrid_y = True)
pyplt(fig)

现在画的图越好看,我越想知道实际工作中面对数据文件应该如何操作。

面积图

使用的是scatter,与散点图气泡图折线图用的一样的函数。

还需要在折线图的基础上对fill = 'tonexty'进行设置,

基本案例

面积图

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
import plotly as py
import plotly.graph_objs as go
import numpy as np

pyplt = py.offline.iplot


s1 = np.random.RandomState(8) # 定义局部种子
s2 = np.random.RandomState(9)
rd1 = s1.rand(100)/10 - 0.02 # 随机生成100个交易日的收益率
rd2 = s2.rand(100)/10 - 0.02

initial1 = 100000 # 设定初始资金
initial2 = 100000
total1 = []
total2 = []
for i in range(len(rd1)):
initial1 = initial1*rd1[i] + initial1 #C(1+X)
initial2 = initial2*rd2[i] + initial2
total1.append(initial1)
total2.append(initial2)
# 看看人家生成想要的数的时候是多么流畅

trace1 = go.Scatter(
y = total1,
fill = 'tonexty',
name = "策略1"
)
trace2 = go.Scatter(
y = total2,
fill = 'tozeroy',
mode= 'none', # 无边界线,上面那个有边界线
name = "策略2"
)

data = [trace1, trace2]

layout = dict(title = '策略净值曲线',
xaxis = dict(title = '交易天数'),
yaxis = dict(title = '净值'),
)
fig = dict(data = data, layout = layout)
pyplt(fig)

内部填充面积图

还有一种内部填充面积图,比如还是上例,就两个fill分别设置成fill = None',fill = 'tonexty'即可。
注意上例中第二个是fill = 'tozeroy'!!
这样只有两个折线之间的阴影了。

内部填充面积图

堆积面积图

与层叠柱状图很像啊,不同之处在于数据的设定,对于个体来说数值得越来越大。

直方图

基本案例

使用的函数是graph_objs里的Histogram函数。
把数据给参数x就是一般的直方图,给参数y就是水平直方图,默认的坐标是样本个数,通过设定histnorm = 'probability'使其变为频率。

1
2
3
4
5
6
7
8
9
10
11
import plotly as py
import plotly.graph_objs as go
import numpy as np

pyplt = py.offline.iplot
s1 = np.random.RandomState(1)
x = s1.randn(1000) #正态分布 1000个

data = [go.Histogram(x=x, histnorm = 'probability')]
pyplt(data)

直方图简单案例

重叠直方图

通过barmode='overlay'设置。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import plotly as py
import plotly.graph_objs as go
import numpy as np

pyplt = py.offline.iplot
s1 = np.random.RandomState(1)
x0 = s1.randn(1000)
x1 = s1.chisquare(5,1000) #上面是正态的,这是卡方的。

trace1 = go.Histogram(
x = x0,
histnorm = 'probability', #显示频率
opacity = 0.75
)
trace2 = go.Histogram(
x = x1,
histnorm = 'probability',
opacity = 0.75
)

data = [trace1, trace2]
layout = go.Layout(barmode='overlay') #在这里
fig = go.Figure(data = data, layout = layout)
pyplt(fig)

重叠直方图

层叠直方图

通过barmode='stack'设置。

层叠直方图

累积直方图

没有看到这个书上直方图关于颜色的参数,那我就试试之前的设置方法管用吗,结果是管用的,开心,不能光抄代码,必须动脑子。

累积直方图+颜色

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import plotly as py
import plotly.graph_objs as go
import numpy as np

pyplt = py.offline.iplot
s1 = np.random.RandomState(1)
x0 = s1.randn(1000)
x1 = s1.chisquare(5,1000)

trace1 = [go.Histogram(
x = x0,
histnorm = 'probability',
cumulative=dict(enabled=True),
marker = dict(
color = 'rgb(50, 171, 96)',
line = dict(
color = 'rgb(50, 171, 96)'
),
),
opacity = 0.75
)]


pyplt(trace1)

cumulative=dict(enabled=True)这是累积直方图的参数,有enabled(默认false则不进行累计)、directio(累积方向,越来越大或者越来越小)、currentbin(为了防止偏差,一般选择half)三个关键字。
xbins设置划分区间属性,start是起始坐标,end是终止坐标,size是区间长度。

难度应用案例

直方图难度案例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import plotly as py
import plotly.graph_objs as go
import numpy as np
import plotly.figure_factory as ff
pyplt = py.offline.iplot


s1 = np.random.RandomState(12)
x1 = s1.standard_cauchy(200) - 4 # 柯西分布
x2 = s1.uniform(1,10,200) # 泊松分布
x3 = s1.standard_gamma(3,200) + 4 # Gamma 分布
x4 = s1.exponential(3,200) + 8 # 指数分布


hist_data = [x1, x2, x3, x4]

group_labels = ['Group 1', 'Group 2', 'Group 3', 'Group 4']
fig = ff.create_distplot(hist_data, group_labels, bin_size=4) #???
pyplt(fig)

怎么换了画图函数啊,bin_size我也没懂,随便试了试,好像进行了一些放缩。

饼图

用的是graph_objs里的pie函数。他有两个常用参数,values用于赋给需要可视化的数据,labels表示不同数据对应的标签。

基本饼图

饼图案例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import plotly as py
import plotly.graph_objs as go

pyplt = py.offline.iplot
labels = ['上海国际集团有限公司', '中国移动通信集团',
'富德生命人寿-传统', '富德生命人寿-资本金', '上海上国投资产管理有限公司']
values = [4222533311, 4103763711, 2138028672, 1356332558, 1073516173]
colors = ['#104E8B', '#1874CD', '#1C86EE', '#6495ED']

trace = [go.Pie(labels = labels,
values = values,
rotation = 30, #旋转角度
opacity = 1,
showlegend = False, #是否显示图例
pull = [0.1,0,0,0,0], #用于设置组成饼图的各个扇形的突出程度
hoverinfo = 'label+percent', #用户将鼠标移到数据上时显示的内容
textinfo = 'percent', #设置显示在扇形上的是具体数值textinfo = 'value'还是这里的比例
textfont = dict(size = 30, color = 'white'),
marker = dict(colors = colors, #每个扇形的样式
line = dict(color = '#000000', #扇形边框
width = 2)))]
fig = go.Figure(data = trace)
pyplt(trace)

环形饼图

hole = 0.7可以理解为控制中心空的也可以理解为图的胖瘦,这个值是1的时候,中心就不空了。

1
2
3
4
5
6
7
8
9
10
11
12
labels = ['完成','未完成']
values = [0.7,0.3]
trace = [go.Pie(
labels = labels,
values = values,
hole = 0.7,
hoverinfo = "label + percent")]
layout = go.Layout(
title = '工作进度图'
)
fig = go.Figure(data = trace, layout = layout)
pyplt(fig)

环形饼图

高级案例

饼图案例3

学习一下数据的输入方式。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
pyplt = py.offline.iplot
fig = {
"data": [
{
"values": [9884, 9510, 9363, 7961, 6755],
"labels": [
'金瑞期货',
'海通期货',
'国泰君安',
'银河期货',
'五矿经易'
],
'domain': {'x': [0, .6],
'y': [0, .5]},
"name": "AU.SHF多头持仓",
"hoverinfo":"label + percent + name",
"hole": .4,
"type": "pie"
},
{
"values": [8847, 6229, 2764, 2406, 2022],
"labels": [
'中信期货',
'招金期货',
'国贸期货',
'铜冠金源',
'中银国际'
],
'domain': {'x': [.2, 1],
'y': [0, .5]},
"name": "AU.SHF空头持仓",
"hoverinfo":"label + percent + name",
"hole": .4,
"type": "pie"
},
{
"values": [14393, 12220, 11824, 11233, 10072],
"labels": [
'中信期货',
'东证期货',
'海通期货',
'方正中期',
'国泰君安'
],
'domain': {'x': [0, .9],
'y': [.5, 1]},
"name": "AG.SHF多头持仓",
"hoverinfo":"label + percent + name",
"hole": .4,
"type": "pie"
},
{
"values": [30983, 20699, 16781, 15686, 14198],
"labels": [
'中信期货',
'国泰君安',
'海通期货',
'国贸期货',
'永安期货'
],
'domain': {'x': [0.5, 1],
'y': [.5, 1]},
"name": "AG.SHF空头持仓",
"hoverinfo":"label + percent + name",
"hole": .4,
"type": "pie"
}],

"layout": {
"title":"有色金属板块主力合约多空持仓分布图",
"annotations": [
{
"font": {
"size": 10
},
"showarrow": False,
"text": "AU.SHF多头持仓",
"x": 0.45,
"y": 0.754
},
{
"font": {
"size": 10
},
"showarrow": False,
"text": "AU.SHF空头持仓",
"x": 0.794,
"y": 0.754
},
{
"font": {
"size": 10
},
"showarrow": False,
"text": "AG.SHF多头持仓",
"x": 0.255,
"y": 0.23
},
{
"font": {
"size": 10
},
"showarrow": False,
"text": "AG.SHF空头持仓",
"x": 0.6,
"y": 0.23
}
]
}
}
pyplt(fig)

总结

这章主要是讲的单独的画图函数,主要是散点图、柱状图、线形图、条形图、直方图、饼图、面积图、甘特图。
可见plotly功能之强大,能做的图的种类就很多,估计后面还会讲到其他画图的方法,例如一个函数,不同的图修改其参数。
说到参数,这里面的参数有非常多,我觉得这需要在实战中记忆,一开始不需要对每个参数都进行修改,大可不必强迫症去学习全部的参数。仅从这一遍学习中,还是能感觉到用这个作图逻辑,其各个部分的作用,简单的需求应该修改哪里,而且对于不同的作图函数,其实大部分参数也是相通的。
学习,学习方法,形成逻辑,应用逻辑,学习方法。

trace->layout->annotations->axis
&字典

plotly高级图形

时间序列

本人没用过python做时间序列,统计类用eviews应该是更专业,这个我们来看一下写法吧。

时序

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import plotly as py
import plotly.graph_objs as go

from datetime import datetime

pyplt = py.offline.iplot

x_datetime = [datetime(year=2013, month=10, day=4),
datetime(year=2013, month=11, day=5),
datetime(year=2013, month=12, day=6)]
x_string = ['2013-10-04', '2013-11-05', '2013-12-06']

trace_datetime = go.Scatter(x=x_datetime, y=[1, 3, 6],name='trace_datetime')
trace_string = go.Scatter(x=x_string, y=[2, 4, 7],name='trace_string')
data = [trace_datetime, trace_string]
pyplt(data)

滑动选择控件

主要用在金融时序的绘图里,当需求是限定时间展示数据时,除了自己写个函数规范数据以外,还可以更为方便地实现,即修改layout.xaxis.rangeselector

1
2
3
4
5
6
7
8
9
layout = dict(
title='时间序列的滑块与选择器',
xaxis=dict(
rangeselector=dict( # 定义x轴的选择器
buttons=list([ # 选择器的按钮
dict(count=1, # 覆盖范围 = count * step
label='1m', # 选择器的标签
step='month',
stepmode='backward'), # 从后往前推进

若设置count=1,step='year',stepmode='todate'这里不是一直到今天,而是给定一个最后日期20201208则输出20200101---20201208,即最后日期的那年年初到最后日期。

滑动块

代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
import plotly as py
import plotly.graph_objs as go
import pandas as pd

pyplt = py.offline.iplot

df = pd.read_csv(r'C:\Users\ASUS\Desktop\python\Chapter03\dat/day01.csv',index_col=[0])

data = [go.Scatter(x=df.index,
y=df.close)]

layout = dict(
title='时间序列的滑块与选择器',
xaxis=dict(
rangeselector=dict(
buttons=list([
dict(count=1,
label='1m',
step='month',
stepmode='backward'),
dict(count=6,
label='6m',
step='month',
stepmode='backward'),
dict(count=1,
label='YTD',#今天以来
step='year',
stepmode='todate'),
dict(count=1,
label='1y',
step='year',
stepmode='backward'),
dict(step='all')
])
),
rangeslider=dict(),
type='date'
)
)
fig = dict(data=data, layout=layout)
pyplt(fig)

表格

表格

本来我是觉得表格可以不用plotly,但是我看了后面内容发现,有的时候表图会一起展示,即有些情况下还是必要的。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import plotly as py
import plotly.figure_factory as FF

pyplt = py.offline.iplot


data_matrix = [['国家', '年份', '人口'],
['中国',2000, 1267430000],
['美国', 2000, 282200000],
['加拿大', 2000, 27790000],
['中国', 2005, 1307560000],
['美国', 2005, 295500000],
['加拿大', 2005, 32310000],
['中国', 2010, 1340910000],
['美国', 2010, 309000000],
['加拿大', 2010, 34000000]]

colorscale = [[0, '#4d004c'], [.3, '#f2e5ff'], [1, '#ffffff']]
fontcolor = ['#FF0000', '#00EE00', '#FF3030']

table = FF.create_table(data_matrix) # data_matrix可直接换为用pandas读取的dataframe
table.layout.width = 700
pyplt(table)

这里的colorscale和fontcolor都有三个参数,不是三列或者什么的,而是表头(第一行和有索引的第一列)、奇数列、偶数列。

表格与图

艾西,不亏是18年版的书,这部分的两个代码都报错,搜了百度无果后,还是官网解决了问题。
我发现哈,本书的例子中的变量名字和一些参数的设置和官网案例是如出一辙,就给换了下数据。
这是代码和作图结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
import plotly as py
import plotly.graph_objs as go
import plotly.figure_factory as ff

pyplt = py.offline.iplot


# 添加表格数据
table_data = [['团队', '赢', '输', '平'],
['清华大学', 18, 4, 0],
['北京大学', 18, 5, 0],
['中国<br>人民大学', 16, 5, 0],
['复旦大学', 13, 8, 0],
['上海<br>交通大学', 13, 8, 0],
['同济大学', 13, 8, 0]]
# 通过 ff.create_table(table_data)来初始化一个figure
fig = ff.create_table(table_data, height_constant=60)


# 添加绘图数据
teams = ['清华大学', '北京大学', '中国<br>人民大学',
'复旦大学', '上海<br>交通大学', '同济大学']
scoreA = [3.54, 3.48, 3.0, 3.27, 2.83, 2.45]
scoreB = [2.17, 2.57, 2.0, 2.91, 2.57, 2.14]


# 对绘图添加 traces
fig.add_trace(go.Bar(x=teams, y=scoreA, xaxis='x2', yaxis='y2',
marker=dict(color='#0099ff'),
name='分值A'))
fig.add_trace(go.Bar(x=teams, y=scoreB, xaxis='x2', yaxis='y2',
marker=dict(color='#404040'),
name='分值B'))



# 设置 figure 的 layout
# 图的 yaxis 要与图的 xaxis 对应
# 设置figure的边界

fig.update_layout(
title_text = '部分高校游戏比赛',
height = 800,
margin = {'t':75, 'l':50}, # top left 距离..多少个单位
yaxis = {'domain': [0, .45]},
xaxis2 = {'anchor': 'y2'},
yaxis2 = {'domain': [.6, 1], 'anchor': 'x2', 'title': '分值'} #让yaxis坐标轴与xaxis对应
)

fig.show()

表图合一

上面这是放在一列了,如果想把表和图放在一行哈,就需要修改:

1
2
3
4
5
6
7
fig.update_layout(
margin = {'t':50, 'b':100},
xaxis = {'domain': [0, .45]}, # 看这里,行
xaxis2 = {'domain': [0.6, 1.]}, # 看这里,行
yaxis2 = {'anchor': 'x2','title': '分值'},
title_text = '高校游戏比赛'
)

多图表

比如一个图里,既有柱状图也有折线图。

多图表

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import plotly as py
import plotly.graph_objs as go

pyplt = py.offline.iplot

x = list('ABCDEF')
trace1 = go.Scatter(
x=x,
y=[1.5, 1, 1.3, 0.7, 0.8, 0.9],
name='line'
)
trace2 = go.Bar(
x=x,
y=[1, 0.5, 0.7, -1.2, 0.3, 0.4],
name = 'bar'
)

data = [trace1, trace2]

layout = dict(title = 'Bar-Line Demo')

fig = dict(data=data,layout=layout)


pyplt(data, show_link=False)

双坐标轴

就像excel那个功能一样。

注意看layoutyaxisyaxis2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import plotly as py
import plotly.graph_objs as go

pyplt = py.offline.iplot

trace1 = go.Scatter(
x=[1, 2, 3],
y=[400, 500, 600],
name='yaxis 数据'
)
trace2 = go.Scatter(
x=[2, 3, 4],
y=[47, 52, 16],
name='yaxis2 数据',
yaxis='y2'
)
data = [trace1, trace2]
layout = go.Layout(
title='Y轴双轴示例',
yaxis=dict(
title='yaxis 标题'
),
yaxis2=dict(
title='yaxis2 标题',
titlefont=dict(
color='rgb(148, 103, 189)' # 标题颜色
),
tickfont=dict(
color='rgb(148, 103, 189)' # 刻度颜色
),
overlaying='y', # 覆盖y轴
side='right' # 坐标轴的位置
)
)
fig = go.Figure(data=data, layout=layout)
plot_url = pyplt(fig)

双坐标轴

多坐标轴

多坐标轴类似这个,不多赘述。

共享坐标轴

看图:

共享坐标轴

应该不难理解的:上例的情况就是x的范围变动不大, 但是y轴的波动范围特别大,所以就这样展示了,实际应用中肯定有许多交集情况出现。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
import plotly as py
import plotly.graph_objs as go

pyplt = py.offline.iplot

trace1 = go.Scatter(
x=[0, 1, 2],
y=[10, 11, 12]
)
trace2 = go.Scatter(
x=[2, 3, 4],
y=[100, 110, 120],
yaxis='y2'
)
trace3 = go.Scatter(
x=[3, 4, 5],
y=[1000, 1100, 1200],
yaxis='y3'
)
data = [trace1, trace2, trace3]
layout = go.Layout(
yaxis=dict(
domain=[0, 0.33] # 1范围
),
legend=dict(
traceorder='reversed'
),
yaxis2=dict(
domain=[0.33, 0.66] # 2范围
),
yaxis3=dict(
domain=[0.66, 1] # 3范围
)
)
fig = go.Figure(data=data, layout=layout)
pyplt(fig)

多子图

看到这里,虽然多子图单独拿出来做了一节,但是之前已经给了两种方法做这个多子图了。

这里也展示两种方法,然后是一个高阶的布局方法。

最后还会有R的多子图展示(跟plotly无关,只是觉得也很好看)

法一

make_subplots

method1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
from plotly import tools
import plotly as py
import plotly.graph_objs as go

pyplt = py.offline.iplot

trace1 = go.Scatter(x=[1, 2, 3],
y=[4, 5, 6],
mode='markers+text+lines',
text=['A', 'B', 'C'],
textposition='bottom center'
)
trace2 = go.Scatter(x=[20, 30, 40],y=[50, 60, 70])
trace3 = go.Scatter(x=[300, 400, 500],
y=[600, 700, 800],
mode='markers+text+lines',
text=['D', 'E', 'F'],
textposition='bottom center'
)
trace4 = go.Scatter(x=[4000, 5000, 6000], y=[7000, 8000, 9000])

fig = tools.make_subplots(rows=2, cols=2, subplot_titles=('Plot 1', 'Plot 2',
'Plot 3', 'Plot 4'))

fig.append_trace(trace1, 1, 1)
fig.append_trace(trace2, 1, 2)
fig.append_trace(trace3, 2, 1)
fig.append_trace(trace4, 2, 2)

fig['layout'].update(height=600, width=1000, title='Multiple Subplots2')

pyplt(fig)

法二

本来先贴了代码,做到后面突然发现对anchor的理解不是很深刻,所以又回来找了个案例特地来说明,所以把代码放文字后面了。

正确版

这里我觉得应该重点理解一下anchor,我一开始对这个参数指代什么非常不明白。但是当我把几个anchor都注释掉后,就能看懂一些了。
这个参数的本意是需要绑定的坐标轴。
都注释掉后,图是这样的:

step1

这个图说明,默认绘制的坐标轴是x1 y1 x2 y3,那么我们修改一下代码,让每个图都绑定一个坐标轴是不是就好了?这是代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
xaxis=dict(
domain=[0, 0.45],
anchor = 'y1'
),
xaxis2=dict(
domain=[0.55, 1],
anchor = 'y2'
),
xaxis3=dict(
domain=[0, 0.45],
anchor='y3'
),
xaxis4=dict(
domain=[0.55, 1],
anchor='y4'
),

这是效果:

step2

奇怪的是坐标轴的显示还是不正确。
那我们这样想,把图上的坐标轴分成两类,一种是默认给固定好的,一种是默认没有的。所以我们是不是应该对没有正确显示的坐标轴进行绑定,而绑定的就应该对应固定好的那个轴。
需要绑定的是y2 x3 x4 y4,那么对应地应该绑定谁呢?x2 y3 y4 x4
代码确实是这样的:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
import plotly as py
import plotly.graph_objs as go


pyplt = py.offline.iplot

trace1 = go.Scatter(
x=[1, 2, 3],
y=[4, 5, 6]
)
trace2 = go.Scatter(
x=[20, 30, 40],
y=[50, 60, 70],
xaxis='x2',
yaxis='y2'
)
trace3 = go.Scatter(
x=[300, 400, 500],
y=[600, 700, 800],
xaxis='x3',
yaxis='y3'
)
trace4 = go.Scatter(
x=[4000, 5000, 6000],
y=[7000, 8000, 9000],
xaxis='x4',
yaxis='y4'
)
data = [trace1, trace2, trace3, trace4]
layout = go.Layout(
xaxis=dict(
domain=[0, 0.45]
),
xaxis2=dict(
domain=[0.55, 1]
),
xaxis3=dict(
domain=[0, 0.45],
anchor='y3'
),
xaxis4=dict(
domain=[0.55, 1],
anchor='y4'
),
yaxis=dict(
domain=[0, 0.45]
),
yaxis2=dict(
domain=[0, 0.45],
anchor='x2'
),
yaxis3=dict(
domain=[0.55, 1]
),
yaxis4=dict(
domain=[0.55, 1],
anchor='x4'
)
)
fig = go.Figure(data=data, layout=layout)
pyplt(fig)

高阶布局

先看下下面这个代码,布局还是用的法一,但是需要加个specs

1
2
3
fig = tools.make_subplots(rows=2, cols=2, specs=[[{}, {}], 
[{'colspan': 2}, None]],
subplot_titles=('First Subplot','Second Subplot', 'Third Subplot'))

这个specs是个[],里面还有俩[],里面这个[]有几行就有几个;里面的{},有几列就有几个。

然后下面是一个更为复杂的应用:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
from plotly import tools
import plotly as py
import plotly.graph_objs as go

pyplt = py.offline.iplot

trace1 = go.Scatter(x=[1, 2], y=[1, 2], name='(1,1)')
trace2 = go.Scatter(x=[1, 2], y=[1, 2], name='(1,2)')
trace3 = go.Scatter(x=[1, 2], y=[1, 2], name='(2,1)')
trace4 = go.Scatter(x=[1, 2], y=[1, 2], name='(3,1)')
trace5 = go.Scatter(x=[1, 2], y=[1, 2], name='(5,1)')
trace6 = go.Scatter(x=[1, 2], y=[1, 2], name='(5,2)')

fig = tools.make_subplots(rows=5, cols=2,
specs=[[{}, {'rowspan': 2}],
[{}, None],
[{'rowspan': 2, 'colspan': 2}, None],
[None, None],
[{}, {}]],
print_grid=True)

fig.append_trace(trace1, 1, 1)
fig.append_trace(trace2, 1, 2)
fig.append_trace(trace3, 2, 1)
fig.append_trace(trace4, 3, 1)
fig.append_trace(trace5, 5, 1)
fig.append_trace(trace6, 5, 2)

fig['layout'].update(height=600, width=1000, title='specs examples')
pyplt(fig)

高阶布局

1
2
3
4
5
specs=[[{}, {'rowspan': 2}],
[{}, None],
[{'rowspan': 2, 'colspan': 2}, None],
[None, None],
[{}, {}]],

对应上图看这个specsNone就是不作图了。

子图共享坐标轴

感觉这一节很难去说,这应该是一个熟悉的技巧,所以仅展示一个例子,大概看一下。

这是两个坐标轴都共享的情况。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
from plotly import tools
import plotly as py
import plotly.graph_objs as go

pyplt = py.offline.iplot

trace1 = go.Scatter(
x=[0, 1, 2],
y=[10, 11, 12]
)
trace2 = go.Scatter(
x=[2, 3, 4],
y=[100, 110, 120],
)
trace3 = go.Scatter(
x=[3, 4, 5],
y=[1000, 1100, 1200],
)
fig = tools.make_subplots(rows=3, cols=1, specs=[[{}], [{}], [{}]], # 感觉这个specs是不是可以不加啊
shared_xaxes=True, shared_yaxes=True,
vertical_spacing=0.001)
fig.append_trace(trace1, 3, 1)
fig.append_trace(trace2, 2, 1)
fig.append_trace(trace3, 1, 1)

fig['layout'].update(height=600, width=600, title='Stacked Subplots with Shared X-Axes')
pyplt(fig)

子图共享坐标轴

子图坐标轴自定义

子图坐标轴自定义

这些个例子非常容易看懂。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
from plotly import tools
import plotly as py
import plotly.graph_objs as go


pyplt = py.offline.iplot

trace1 = go.Scatter(x=[1, 2, 3], y=[4, 5, 6])
trace2 = go.Scatter(x=[20, 30, 40], y=[50, 60, 70])
trace3 = go.Scatter(x=[300, 400, 500], y=[600, 700, 800])
trace4 = go.Scatter(x=[4000, 5000, 6000], y=[7000, 8000, 9000])

fig = tools.make_subplots(rows=2, cols=2, subplot_titles=('Plot 1', 'Plot 2',
'Plot 3', 'Plot 4'))
fig.append_trace(trace1, 1, 1)
fig.append_trace(trace2, 1, 2)
fig.append_trace(trace3, 2, 1)
fig.append_trace(trace4, 2, 2)

fig['layout']['xaxis1'].update(title='xaxis 1 title')
fig['layout']['xaxis2'].update(title='xaxis 2 title', range=[10, 50]) # 横坐标的范围
fig['layout']['xaxis3'].update(title='xaxis 3 title', showgrid=False) # 关掉网格线
fig['layout']['xaxis4'].update(title='xaxis 4 title', type='log') # 不知道

fig['layout']['yaxis1'].update(title='yaxis 1 title')
fig['layout']['yaxis2'].update(title='yaxis 2 title', range=[40, 80])
fig['layout']['yaxis3'].update(title='yaxis 3 title', showgrid=False)
fig['layout']['yaxis4'].update(title='yaxis 4 title')

fig['layout'].update(title='Customizing Subplot Axes')

pyplt(fig)

嵌入式子图

方法就是用domain控制范围,所以说学以致用非常重要。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import plotly as py
import plotly.graph_objs as go


pyplt = py.offline.iplot

trace1 = go.Scatter(
x=[1, 2, 3],
y=[4, 3, 2]
)
trace2 = go.Scatter(
x=[20, 30, 40],
y=[30, 40, 50],
xaxis='x2',
yaxis='y2'
)
data = [trace1, trace2]
layout = go.Layout(
xaxis2=dict(
domain=[0.6, 0.95],
anchor='y2',
showgrid=False
),
yaxis2=dict(
domain=[0.6, 0.95],
anchor='x2',
showgrid=False
)
)
fig = go.Figure(data=data, layout=layout)
pyplt(fig)

嵌入式子图

看过之后让人恍然大悟。

时间序列高亮显示

这一部分实际上是SVG图形的绘制,我个人感觉用不到,就暂时忽略了。

时间序列高亮显示

这个高亮也不是很亮啊…也没说怎么调,有没有别的模式。
用到的话还是官网吧,从这个图中只能看出来是阴影的那块区域。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
import plotly as py
import plotly.graph_objs as go
import pandas as pd

pyplt = py.offline.iplot

df = pd.read_csv(r'C:\Users\ASUS\Desktop\python\Chapter03\dat\day01.csv', index_col=['date'], parse_dates=['date']) # 读取数据。
df.sort_index(inplace=True) # 设置索引列从大到小排序
df = df.iloc[-300:-100] # 选取其中的200行数据

trace0 = go.Scatter(x=df.index, y=df['close'], mode='lines', name='temperature') # 源代码是mode = 'line' 改一下

data = [trace0]
layout = {
# 我们通过创建矩形的方式来高亮某一个时间区间
'shapes': [
# 首先,我们高亮显示1月4日--->3月6日.
{
'type': 'rect',
# x参考系使用绝对坐标系(相对于轴)
'xref': 'x',
# y参考系使用相对坐标系(相对于plot)
'yref': 'paper',
'x0': '2015-01-04',
'y0': 0,
'x1': '2015-03-06',
'y1': 1,
'fillcolor': '#d3d3d3',
'opacity': 0.2,
'line': {
'width': 0,
}
},
# 其次,我们高亮显示区间4月20日--->6月22日.
{
'type': 'rect',
'xref': 'x',
'yref': 'paper',
'x0': '2015-04-20',
'y0': 0,
'x1': '2015-06-22',
'y1': 1,
'fillcolor': '#d3d3d3',
'opacity': 0.2,
'line': {
'width': 0,
}
}
]
}

fig = {'data': data, 'layout': layout}
pyplt(fig)

layout里都是字典。
书上的高级绘图就完事了。

ggplot2绘图示例

源代码在我的github上,这里是图:

R语言的plot_ly函数,修改参数type即可实现各种图

总结

主要是在讲子图的画法,有两种,一种是make_subplots,除了定义rows cols,还可以设置specs,即复杂的子图位置,需要看例子明白,分清行列关系;再一种是用domain给出图的明确的范围,注意理解anchor的含义。
个人感觉应该多尝试去做图表一起的那个和子图的应用,确实也是对时序数据用的少。
对于layoutannotation更应该心中有谱,除了作图流程以外,所用到的函数,其大概的参数和参数格式是什么应该在实战中重点关注。