正文

pandas documentation — pandas 1.5.2 documentation (pydata.org)

36 - Introduction to Pandas - Data reading and handling

绘制直方图

python

import pandas as pd
 
df = pd.read_csv('images/grains/grain_measurements.csv')
df['Area'].plot(kind='hist', title='Area', bins=50)

<AxesSubplot:title={'center':'Area'}, ylabel='Frequency'>

创建 DataFrame, 修改 index 和 columns

python

data = [[10, 200, 60],
        [12, 155, 45],
        [9, 50, -45.],
        [16, 240, 90]]
 
df = pd.DataFrame(data, index=[1, 2, 3, 4], columns=['Area', 'Intensity', 'Orientation'])
df

	Area	Intensity	Orientation
1	10	200	60.0
2	12	155	45.0
3	9	50	-45.0
4	16	240	90.0

查看各行缺失信息

python

import pandas as pd
 
df = pd.read_csv('data/manual_vs_auto.csv')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Unnamed: 0  100 non-null    object 
 1   Image       100 non-null    object 
 2   Manual      94 non-null     float64
 3   Manual2     3 non-null      float64
 4   Auto_th_2   100 non-null    int64  
 5   Auto_th_3   100 non-null    int64  
 6   Auto_th_4   100 non-null    int64  
dtypes: float64(2), int64(3), object(2)
memory usage: 5.6+ KB

查看表格行列数

python

df.shape

(100, 7)

查看整个表格

python

df

	Unnamed: 0	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4
0	Set1	Image1	92.0	93.0	70	87	82
1	Set1	Image2	87.0	83.0	60	85	83
2	Set1	Image3	104.0	98.0	74	99	94
3	Set1	Image4	99.0	NaN	73	101	109
4	Set1	Image5	89.0	NaN	59	90	67
...	...	...	...	...	...	...	...
95	Set4	Image96	106.0	NaN	75	112	98
96	Set4	Image97	80.0	NaN	66	80	88
97	Set4	Image98	92.0	NaN	73	93	95
98	Set4	Image99	116.0	NaN	101	115	93
99	Set4	Image100	99.0	NaN	77	106	102

100 rows × 7 columns

查看表格前 7 行

python

df.head(7)

	Unnamed: 0	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4
0	Set1	Image1	92.0	93.0	70	87	82
1	Set1	Image2	87.0	83.0	60	85	83
2	Set1	Image3	104.0	98.0	74	99	94
3	Set1	Image4	99.0	NaN	73	101	109
4	Set1	Image5	89.0	NaN	59	90	67
5	Set1	Image6	115.0	NaN	82	124	105
6	Set1	Image7	102.0	NaN	68	103	93

查看表格后 7 行

python

df.tail(7)

	Unnamed: 0	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4
93	Set4	Image94	81.0	NaN	65	90	70
94	Set4	Image95	NaN	NaN	104	122	88
95	Set4	Image96	106.0	NaN	75	112	98
96	Set4	Image97	80.0	NaN	66	80	88
97	Set4	Image98	92.0	NaN	73	93	95
98	Set4	Image99	116.0	NaN	101	115	93
99	Set4	Image100	99.0	NaN	77	106	102

将某一列改为 index

python

df1 = df.set_index('Image')
df1.head()

	Unnamed: 0	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4
Image
Image1	Set1	92.0	93.0	70	87	82
Image2	Set1	87.0	83.0	60	85	83
Image3	Set1	104.0	98.0	74	99	94
Image4	Set1	99.0	NaN	73	101	109
Image5	Set1	89.0	NaN	59	90	67

查看 columns（列）名

python

df1.columns

Index(['Unnamed: 0', 'Manual', 'Manual2', 'Auto_th_2', 'Auto_th_3',
       'Auto_th_4'],
      dtype='object')

去重

python

df['Unnamed: 0'].unique()

array(['Set1', 'Set2', 'Set3', 'Set4'], dtype=object)

修改列名

python

df1 = df.rename(columns={'Unnamed: 0': 'Image_set'})
df1.columns

Index(['Image_set', 'Image', 'Manual', 'Manual2', 'Auto_th_2', 'Auto_th_3',
       'Auto_th_4'],
      dtype='object')

显示数据类型

python

df.dtypes

Unnamed: 0     object
Image          object
Manual        float64
Manual2       float64
Auto_th_2       int64
Auto_th_3       int64
Auto_th_4       int64
dtype: object

显示统计数据

python

df.describe()

	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4
count	94.000000	3.000000	100.000000	100.000000	100.000000
mean	100.021277	91.333333	76.370000	97.580000	93.210000
std	11.285140	7.637626	11.971055	12.327337	14.128769
min	80.000000	83.000000	55.000000	71.000000	63.000000
25%	90.250000	88.000000	67.750000	89.500000	83.750000
50%	101.000000	93.000000	74.500000	98.500000	93.000000
75%	108.000000	95.500000	85.000000	106.000000	103.250000
max	120.000000	98.000000	109.000000	124.000000	129.000000

37 - Introduction to Pandas - Data Manipulation

python

import pandas as pd
 
df = pd.read_csv('data/manual_vs_auto.csv')
df.head()

	Unnamed: 0	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4
0	Set1	Image1	92.0	93.0	70	87	82
1	Set1	Image2	87.0	83.0	60	85	83
2	Set1	Image3	104.0	98.0	74	99	94
3	Set1	Image4	99.0	NaN	73	101	109
4	Set1	Image5	89.0	NaN	59	90	67

删除某列

python

df1 = df.drop('Manual2', axis=1)
df1.head()

	Unnamed: 0	Image	Manual	Auto_th_2	Auto_th_3	Auto_th_4
0	Set1	Image1	92.0	70	87	82
1	Set1	Image2	87.0	60	85	83
2	Set1	Image3	104.0	74	99	94
3	Set1	Image4	99.0	73	101	109
4	Set1	Image5	89.0	59	90	67

删除多列

python

df2 = df.drop(['Manual2', 'Auto_th_2'], axis=1)
df2.head()

	Unnamed: 0	Image	Manual	Auto_th_3	Auto_th_4
0	Set1	Image1	92.0	87	82
1	Set1	Image2	87.0	85	83
2	Set1	Image3	104.0	99	94
3	Set1	Image4	99.0	101	109
4	Set1	Image5	89.0	90	67

添加并填充一列

python

df['Date'] = '2019-06-24'
df.head()

	Unnamed: 0	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4	Date
0	Set1	Image1	92.0	93.0	70	87	82	2019-06-24
1	Set1	Image2	87.0	83.0	60	85	83	2019-06-24
2	Set1	Image3	104.0	98.0	74	99	94	2019-06-24
3	Set1	Image4	99.0	NaN	73	101	109	2019-06-24
4	Set1	Image5	89.0	NaN	59	90	67	2019-06-24

python

df.dtypes

Unnamed: 0     object
Image          object
Manual        float64
Manual2       float64
Auto_th_2       int64
Auto_th_3       int64
Auto_th_4       int64
Date           object
dtype: object

将字符串转换成时间

python

df['Date'] = pd.to_datetime('2019-06-24')
df.head()

	Unnamed: 0	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4	Date
0	Set1	Image1	92.0	93.0	70	87	82	2019-06-24
1	Set1	Image2	87.0	83.0	60	85	83	2019-06-24
2	Set1	Image3	104.0	98.0	74	99	94	2019-06-24
3	Set1	Image4	99.0	NaN	73	101	109	2019-06-24
4	Set1	Image5	89.0	NaN	59	90	67	2019-06-24

python

df.dtypes

Unnamed: 0            object
Image                 object
Manual               float64
Manual2              float64
Auto_th_2              int64
Auto_th_3              int64
Auto_th_4              int64
Date          datetime64[ns]
dtype: object

保存成 .csv 文件

python

df.to_csv('data/manual_vs_auto_updated.csv')

删除某行

python

df1 = df.drop(df.index[1])
df1.head()

	Unnamed: 0	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4	Date
0	Set1	Image1	92.0	93.0	70	87	82	2019-06-24
2	Set1	Image3	104.0	98.0	74	99	94	2019-06-24
3	Set1	Image4	99.0	NaN	73	101	109	2019-06-24
4	Set1	Image5	89.0	NaN	59	90	67	2019-06-24
5	Set1	Image6	115.0	NaN	82	124	105	2019-06-24

删除前 10 行

python

df1 = df.iloc[10:,]
df1.head()

	Unnamed: 0	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4	Date
10	Set1	Image11	91.0	NaN	61	87	77	2019-06-24
11	Set1	Image12	119.0	NaN	79	105	111	2019-06-24
12	Set1	Image13	NaN	NaN	65	90	84	2019-06-24
13	Set1	Image14	117.0	NaN	94	115	105	2019-06-24
14	Set1	Image15	91.0	NaN	66	99	70	2019-06-24

选取某些行

python

df1 = df[df['Unnamed: 0'] != 'Set1']
df1.head()

	Unnamed: 0	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4	Date
25	Set2	Image26	102.0	NaN	85	103	105	2019-06-24
26	Set2	Image27	93.0	NaN	76	84	98	2019-06-24
27	Set2	Image28	83.0	NaN	62	71	87	2019-06-24
28	Set2	Image29	110.0	NaN	92	117	85	2019-06-24
29	Set2	Image30	89.0	NaN	70	96	81	2019-06-24

38 - Introduction to Pandas - Data Sorting

排序

python

import pandas as pd
 
df = pd.read_csv('data/manual_vs_auto.csv')
df2 = df.sort_values('Manual', ascending=True)  # ascending: 升序

选取某行/列

python

df2[['Manual', 'Auto_th_2']]

	Manual	Auto_th_2
34	80.0	58
96	80.0	66
93	81.0	65
66	81.0	65
44	82.0	67
...	...	...
32	NaN	66
59	NaN	74
79	NaN	69
82	NaN	64
94	NaN	104

100 rows × 2 columns

python

df[20: 30]

	Unnamed: 0	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4
20	Set1	Image21	89.0	NaN	65	94	86
21	Set1	Image22	88.0	NaN	66	96	83
22	Set1	Image23	106.0	NaN	71	112	105
23	Set1	Image24	107.0	NaN	92	91	111
24	Set1	Image25	108.0	NaN	93	113	115
25	Set2	Image26	102.0	NaN	85	103	105
26	Set2	Image27	93.0	NaN	76	84	98
27	Set2	Image28	83.0	NaN	62	71	87
28	Set2	Image29	110.0	NaN	92	117	85
29	Set2	Image30	89.0	NaN	70	96	81

loc 方法是通过行、列的名称或者标签来寻找我们需要的值。

Pandas 读取某列、某行数据——loc、iloc 用法总结_子木同学的博客-CSDN 博客_pandas iloc

python

df.loc[20: 30, ['Manual', 'Auto_th_2']]

	Manual	Auto_th_2
20	89.0	65
21	88.0	66
22	106.0	71
23	107.0	92
24	108.0	93
25	102.0	85
26	93.0	76
27	83.0	62
28	110.0	92
29	89.0	70
30	115.0	77

python

set2_df = df[df['Unnamed: 0'] == 'Set2']
set2_df.head()

	Unnamed: 0	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4
25	Set2	Image26	102.0	NaN	85	103	105
26	Set2	Image27	93.0	NaN	76	84	98
27	Set2	Image28	83.0	NaN	62	71	87
28	Set2	Image29	110.0	NaN	92	117	85
29	Set2	Image30	89.0	NaN	70	96	81

选取最大值

python

max(set2_df['Manual'])

120.0

根据条件选取某些值

python

df['Manual'] > 100

0     False
1     False
2      True
3     False
4     False
      ...  
95     True
96    False
97    False
98     True
99    False
Name: Manual, Length: 100, dtype: bool

python

df[df['Manual'] > 100].head()

	Unnamed: 0	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4
2	Set1	Image3	104.0	98.0	74	99	94
5	Set1	Image6	115.0	NaN	82	124	105
6	Set1	Image7	102.0	NaN	68	103	93
7	Set1	Image8	117.0	NaN	77	122	88
8	Set1	Image9	104.0	NaN	88	99	112

复合条件

python

df[(df['Manual'] > 100) & (df['Auto_th_2'] < 100)].head()

	Unnamed: 0	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4
2	Set1	Image3	104.0	98.0	74	99	94
5	Set1	Image6	115.0	NaN	82	124	105
6	Set1	Image7	102.0	NaN	68	103	93
7	Set1	Image8	117.0	NaN	77	122	88
8	Set1	Image9	104.0	NaN	88	99	112

遍历某些行/列

python

for index, row in df.iterrows():
    average_auto = (row['Auto_th_2'] + row['Auto_th_3'] + row['Auto_th_4']) / 3
    print(round(average_auto), row['Manual'])

39 - Introduction to Pandas - Grouping Data

python

import pandas as pd
 
df = pd.read_csv('data/manual_vs_auto.csv')
df = df.rename(columns = {'Unnamed: 0': 'Image_set'})
df.head()

	Image_set	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4
0	Set1	Image1	92.0	93.0	70	87	82
1	Set1	Image2	87.0	83.0	60	85	83
2	Set1	Image3	104.0	98.0	74	99	94
3	Set1	Image4	99.0	NaN	73	101	109
4	Set1	Image5	89.0	NaN	59	90	67

python

df = df.drop('Manual2', axis=1)
df.head()

	Image_set	Image	Manual	Auto_th_2	Auto_th_3	Auto_th_4
0	Set1	Image1	92.0	70	87	82
1	Set1	Image2	87.0	60	85	83
2	Set1	Image3	104.0	74	99	94
3	Set1	Image4	99.0	73	101	109
4	Set1	Image5	89.0	59	90	67

以 Image_set 为分组做统计

python

group_by_file = df.groupby(by=['Image_set'])
set_data_count = group_by_file.count()
set_data_avg = group_by_file.mean()

python

set_data_count

	Image	Manual	Auto_th_2	Auto_th_3	Auto_th_4
Image_set
Set1	25	24	25	25	25
Set2	25	24	25	25	25
Set3	25	24	25	25	25
Set4	25	22	25	25	25

python

set_data_avg

	Manual	Auto_th_2	Auto_th_3	Auto_th_4
Image_set
Set1	100.666667	72.84	98.04	92.36
Set2	98.666667	75.40	98.00	93.44
Set3	100.000000	78.48	95.52	94.40
Set4	100.818182	78.76	98.76	92.64

统计关联性（手动与自动阈值）

python

df['Manual'].corr(df['Auto_th_2'])

0.7381233054217538

40 - Introduction to Pandas - Dealing with missing -null- data

python

import pandas as pd
 
df = pd.read_csv('data/manual_vs_auto.csv')
df.head(8)

	Unnamed: 0	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4
0	Set1	Image1	92.0	93.0	70	87	82
1	Set1	Image2	87.0	83.0	60	85	83
2	Set1	Image3	104.0	98.0	74	99	94
3	Set1	Image4	99.0	NaN	73	101	109
4	Set1	Image5	89.0	NaN	59	90	67
5	Set1	Image6	115.0	NaN	82	124	105
6	Set1	Image7	102.0	NaN	68	103	93
7	Set1	Image8	117.0	NaN	77	122	88

查看缺失值

python

df.isnull()

	Unnamed: 0	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4
0	False	False	False	False	False	False	False
1	False	False	False	False	False	False	False
2	False	False	False	False	False	False	False
3	False	False	False	True	False	False	False
4	False	False	False	True	False	False	False
...	...	...	...	...	...	...	...
95	False	False	False	True	False	False	False
96	False	False	False	True	False	False	False
97	False	False	False	True	False	False	False
98	False	False	False	True	False	False	False
99	False	False	False	True	False	False	False

100 rows × 7 columns

python

df.isnull().sum()

Unnamed: 0     0
Image          0
Manual         6
Manual2       97
Auto_th_2      0
Auto_th_3      0
Auto_th_4      0
dtype: int64

删除缺失值

python

df = df.drop('Manual2', axis=1)
df2 = df.dropna()
df2.head(10)

	Unnamed: 0	Image	Manual	Auto_th_2	Auto_th_3	Auto_th_4
0	Set1	Image1	92.0	70	87	82
1	Set1	Image2	87.0	60	85	83
2	Set1	Image3	104.0	74	99	94
3	Set1	Image4	99.0	73	101	109
4	Set1	Image5	89.0	59	90	67
5	Set1	Image6	115.0	82	124	105
6	Set1	Image7	102.0	68	103	93
7	Set1	Image8	117.0	77	122	88
8	Set1	Image9	104.0	88	99	112
9	Set1	Image10	103.0	69	98	94

python

df = pd.read_csv('data/manual_vs_auto.csv')
df.describe()

	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4
count	94.000000	3.000000	100.000000	100.000000	100.000000
mean	100.021277	91.333333	76.370000	97.580000	93.210000
std	11.285140	7.637626	11.971055	12.327337	14.128769
min	80.000000	83.000000	55.000000	71.000000	63.000000
25%	90.250000	88.000000	67.750000	89.500000	83.750000
50%	101.000000	93.000000	74.500000	98.500000	93.000000
75%	108.000000	95.500000	85.000000	106.000000	103.250000
max	120.000000	98.000000	109.000000	124.000000	129.000000

填充缺失值

python

df['Manual'].fillna(100, inplace=True)
df.head(10)

	Unnamed: 0	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4
0	Set1	Image1	92.0	93.0	70	87	82
1	Set1	Image2	87.0	83.0	60	85	83
2	Set1	Image3	104.0	98.0	74	99	94
3	Set1	Image4	99.0	NaN	73	101	109
4	Set1	Image5	89.0	NaN	59	90	67
5	Set1	Image6	115.0	NaN	82	124	105
6	Set1	Image7	102.0	NaN	68	103	93
7	Set1	Image8	117.0	NaN	77	122	88
8	Set1	Image9	104.0	NaN	88	99	112
9	Set1	Image10	103.0	NaN	69	98	94

使用平均值填充缺失值

python

import numpy as np
 
df = pd.read_csv('data/manual_vs_auto.csv')
df['Manual'] = df.apply(
    lambda row: (round((row['Auto_th_2'] + row['Auto_th_3'] + row['Auto_th_3']) / 3))  # 平均值
    if np.isnan(row['Manual'])  # 如果是缺失值的话
    else row['Manual'], axis=1)  # 填充在 Manual 列上

python

df.head(10)

	Unnamed: 0	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4
0	Set1	Image1	92.0	93.0	70	87	82
1	Set1	Image2	87.0	83.0	60	85	83
2	Set1	Image3	104.0	98.0	74	99	94
3	Set1	Image4	99.0	NaN	73	101	109
4	Set1	Image5	89.0	NaN	59	90	67
5	Set1	Image6	115.0	NaN	82	124	105
6	Set1	Image7	102.0	NaN	68	103	93
7	Set1	Image8	117.0	NaN	77	122	88
8	Set1	Image9	104.0	NaN	88	99	112
9	Set1	Image10	103.0	NaN	69	98	94

41 - Introduction to Pandas - Plotting

python

import pandas as pd
 
df = pd.read_csv('data/manual_vs_auto.csv')
df = df.rename(columns={'Unnamed: 0': 'Image_set'})
df.head()

	Image_set	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4
0	Set1	Image1	92.0	93.0	70	87	82
1	Set1	Image2	87.0	83.0	60	85	83
2	Set1	Image3	104.0	98.0	74	99	94
3	Set1	Image4	99.0	NaN	73	101	109
4	Set1	Image5	89.0	NaN	59	90	67

绘制折线图

python

df['Manual'].plot()

<AxesSubplot:>

绘制直方图

python

# 类型 hist，分组 30，标题 Manual Count，图像大小 12 * 10
df['Manual'].plot(kind='hist', bins=30, title='Manual Count', figsize=(12, 10))

<AxesSubplot:title={'center':'Manual Count'}, ylabel='Frequency'>

详解 pandas 中的 rolling

python

df['Manual'].rolling(3).mean().plot()

<AxesSubplot:>

绘制 box 图，判断平均数与中位数的关系 https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.boxplot.html

python

df['Manual'].describe()

count     94.000000
mean     100.021277
std       11.285140
min       80.000000
25%       90.250000
50%      101.000000
75%      108.000000
max      120.000000
Name: Manual, dtype: float64

python

df['Manual'].plot(kind='box', figsize=(8, 6))

<AxesSubplot:>

散点图

python

df.plot(kind='scatter', x='Manual', y='Auto_th_2', title='Manual vs Auto 2')

<AxesSubplot:title={'center':'Manual vs Auto 2'}, xlabel='Manual', ylabel='Auto_th_2'>

pd.apply(): Pandas 数据处理(五) — apply() 方法介绍！

python

def cell_count(x):
    if x <= 100.0:
        return 'low'
    else:
        return 'high'

python

df['cell_count_index'] = df['Manual'].apply(cell_count)
df.head()

	Image_set	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4	cell_count_index
0	Set1	Image1	92.0	93.0	70	87	82	low
1	Set1	Image2	87.0	83.0	60	85	83	low
2	Set1	Image3	104.0	98.0	74	99	94	high
3	Set1	Image4	99.0	NaN	73	101	109	low
4	Set1	Image5	89.0	NaN	59	90	67	low

python

df.to_csv('data/manual_vs_auto2.csv')

python

df.boxplot(column='Manual', by='cell_count_index')

<AxesSubplot:title={'center':'Manual'}, xlabel='cell_count_index'>

42 - Introduction to Seaborn Plotting in Python

Seaborn

python

import pandas as pd
 
df = pd.read_csv('data/manual_vs_auto.csv')
df['Manual'].fillna(100, inplace=True)
df = df.rename(columns={'Unnamed: 0': 'Image_Set'})

python

import seaborn as sns
 
sns.distplot(df['Manual'])

C:\Users\gzjzx\anaconda3\lib\site-packages\seaborn\distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
  warnings.warn(msg, FutureWarning)



<AxesSubplot:xlabel='Manual', ylabel='Density'>

sns.kdeplot()核密度估计图

python

sns.kdeplot(df['Manual'], shade=True)
sns.kdeplot(df['Auto_th_2'], shade=True)
sns.kdeplot(df['Auto_th_3'], shade=True)
sns.kdeplot(df['Auto_th_4'], shade=True)

<AxesSubplot:xlabel='Manual', ylabel='Density'>

sns.jointplot() 双变量关系图

python

sns.jointplot(x='Manual', y='Auto_th_2', data=df, kind='kde')

<seaborn.axisgrid.JointGrid at 0x212f9ad23d0>

sns.pairplot() 用来展示两两特征之间的关系

python

sns.pairplot(df, x_vars=['Auto_th_2', 'Auto_th_3', 'Auto_th_4'], y_vars='Manual', height=6)

<seaborn.axisgrid.PairGrid at 0x212f9bd0fd0>

sns.lmplot() 展示线性关系

python

sns.lmplot(x='Manual', y='Auto_th_2', data=df, order=1, hue='Image_Set')

<seaborn.axisgrid.FacetGrid at 0x212fa457f70>

python

from scipy import stats
 
slope, intercept, r_value, p_value, std_err = stats.linregress(df['Manual'], df['Auto_th_2'])
slope, intercept, r_value, p_value, std_err

(0.772483189743971,
 -0.8937686381919718,
 0.7058094587729904,
 2.396963973676236e-16,
 0.07831918096230937)

sns.swarmplot() 分簇散点图

python

df = pd.read_csv('data/manual_vs_auto2.csv')
df['Manual'].fillna(100, inplace=True)
df = df.rename(columns={'Unnamed: 0': 'Image_Set'})
 
sns.swarmplot(x='Image_Set', y='Manual', data=df, hue='cell_count_index', dodge=True)

<AxesSubplot:xlabel='Image_Set', ylabel='Manual'>

sns.heatmap() 热图

python

corr = df.loc[:,df.dtypes == 'int64'].corr() #Correlates all int64 columns
sns.heatmap(corr, xticklabels=corr.columns, yticklabels=corr.columns, cmap=sns.diverging_palette(220, 10, as_cmap=True))

<AxesSubplot:>