python - How to plot values for multiple factors from one column across several dates in pandas + matplotlib? -
i've got data frame looks this:
df=pd.dataframe({'animal': {timestamp('2014-11-12 00:00:00'): 'dog', timestamp('2014-11-13 00:00:00'): 'rabbit', timestamp('2014-11-14 00:00:00'): 'rabbit', timestamp('2014-11-15 00:00:00'): 'rabbit', timestamp('2014-11-16 00:00:00'): 'rabbit', timestamp('2014-11-17 00:00:00'): 'rabbit', timestamp('2014-11-18 00:00:00'): 'dog', timestamp('2014-11-19 00:00:00'): 'rabbit', timestamp('2014-11-20 00:00:00'): 'dog', timestamp('2014-11-21 00:00:00'): 'dog', timestamp('2014-12-01 00:00:00'): 'rabbit', timestamp('2014-12-02 00:00:00'): 'dog', timestamp('2014-12-03 00:00:00'): 'dog', timestamp('2014-12-04 00:00:00'): 'rabbit', timestamp('2014-12-05 00:00:00'): 'rabbit', timestamp('2014-12-06 00:00:00'): 'dog', timestamp('2014-12-07 00:00:00'): 'dog', timestamp('2014-12-08 00:00:00'): 'rabbit', timestamp('2014-12-09 00:00:00'): 'rabbit', timestamp('2014-12-10 00:00:00'): 'rabbit', timestamp('2014-12-11 00:00:00'): 'rabbit', timestamp('2014-12-12 00:00:00'): 'rabbit', timestamp('2014-12-13 00:00:00'): 'rabbit', timestamp('2014-12-14 00:00:00'): 'rabbit', timestamp('2014-12-15 00:00:00'): 'dog', timestamp('2014-12-16 00:00:00'): 'dog', timestamp('2014-12-17 00:00:00'): 'dog', timestamp('2014-12-18 00:00:00'): 'rabbit', timestamp('2014-12-19 00:00:00'): 'rabbit', timestamp('2014-12-20 00:00:00'): 'dog'}, 'count': {timestamp('2014-11-12 00:00:00'): 6136, timestamp('2014-11-13 00:00:00'): 14620, timestamp('2014-11-14 00:00:00'): 16437, timestamp('2014-11-15 00:00:00'): 17273, timestamp('2014-11-16 00:00:00'): 15302, timestamp('2014-11-17 00:00:00'): 15180, timestamp('2014-11-18 00:00:00'): 7177, timestamp('2014-11-19 00:00:00'): 16193, timestamp('2014-11-20 00:00:00'): 8226, timestamp('2014-11-21 00:00:00'): 9741, timestamp('2014-12-01 00:00:00'): 26237, timestamp('2014-12-02 00:00:00'): 12146, timestamp('2014-12-03 00:00:00'): 12910, timestamp('2014-12-04 00:00:00'): 25820, timestamp('2014-12-05 00:00:00'): 29323, timestamp('2014-12-06 00:00:00'): 17294, timestamp('2014-12-07 00:00:00'): 15219, timestamp('2014-12-08 00:00:00'): 26174, timestamp('2014-12-09 00:00:00'): 27112, timestamp('2014-12-10 00:00:00'): 27131, timestamp('2014-12-11 00:00:00'): 28268, timestamp('2014-12-12 00:00:00'): 34059, timestamp('2014-12-13 00:00:00'): 39162, timestamp('2014-12-14 00:00:00'): 38314, timestamp('2014-12-15 00:00:00'): 19807, timestamp('2014-12-16 00:00:00'): 20606, timestamp('2014-12-17 00:00:00'): 21552, timestamp('2014-12-18 00:00:00'): 36499, timestamp('2014-12-19 00:00:00'): 42163, timestamp('2014-12-20 00:00:00'): 30301}, 'day': {timestamp('2014-11-12 00:00:00'): 12, timestamp('2014-11-13 00:00:00'): 13, timestamp('2014-11-14 00:00:00'): 14, timestamp('2014-11-15 00:00:00'): 15, timestamp('2014-11-16 00:00:00'): 16, timestamp('2014-11-17 00:00:00'): 17, timestamp('2014-11-18 00:00:00'): 18, timestamp('2014-11-19 00:00:00'): 19, timestamp('2014-11-20 00:00:00'): 20, timestamp('2014-11-21 00:00:00'): 21, timestamp('2014-12-01 00:00:00'): 1, timestamp('2014-12-02 00:00:00'): 2, timestamp('2014-12-03 00:00:00'): 3, timestamp('2014-12-04 00:00:00'): 4, timestamp('2014-12-05 00:00:00'): 5, timestamp('2014-12-06 00:00:00'): 6, timestamp('2014-12-07 00:00:00'): 7, timestamp('2014-12-08 00:00:00'): 8, timestamp('2014-12-09 00:00:00'): 9, timestamp('2014-12-10 00:00:00'): 10, timestamp('2014-12-11 00:00:00'): 11, timestamp('2014-12-12 00:00:00'): 12, timestamp('2014-12-13 00:00:00'): 13, timestamp('2014-12-14 00:00:00'): 14, timestamp('2014-12-15 00:00:00'): 15, timestamp('2014-12-16 00:00:00'): 16, timestamp('2014-12-17 00:00:00'): 17, timestamp('2014-12-18 00:00:00'): 18, timestamp('2014-12-19 00:00:00'): 19, timestamp('2014-12-20 00:00:00'): 20}, 'month': {timestamp('2014-11-12 00:00:00'): 11, timestamp('2014-11-13 00:00:00'): 11, timestamp('2014-11-14 00:00:00'): 11, timestamp('2014-11-15 00:00:00'): 11, timestamp('2014-11-16 00:00:00'): 11, timestamp('2014-11-17 00:00:00'): 11, timestamp('2014-11-18 00:00:00'): 11, timestamp('2014-11-19 00:00:00'): 11, timestamp('2014-11-20 00:00:00'): 11, timestamp('2014-11-21 00:00:00'): 11, timestamp('2014-12-01 00:00:00'): 12, timestamp('2014-12-02 00:00:00'): 12, timestamp('2014-12-03 00:00:00'): 12, timestamp('2014-12-04 00:00:00'): 12, timestamp('2014-12-05 00:00:00'): 12, timestamp('2014-12-06 00:00:00'): 12, timestamp('2014-12-07 00:00:00'): 12, timestamp('2014-12-08 00:00:00'): 12, timestamp('2014-12-09 00:00:00'): 12, timestamp('2014-12-10 00:00:00'): 12, timestamp('2014-12-11 00:00:00'): 12, timestamp('2014-12-12 00:00:00'): 12, timestamp('2014-12-13 00:00:00'): 12, timestamp('2014-12-14 00:00:00'): 12, timestamp('2014-12-15 00:00:00'): 12, timestamp('2014-12-16 00:00:00'): 12, timestamp('2014-12-17 00:00:00'): 12, timestamp('2014-12-18 00:00:00'): 12, timestamp('2014-12-19 00:00:00'): 12, timestamp('2014-12-20 00:00:00'): 12}}
i'm trying plot line graph counts of each of 2 animals across 7 day period; essentially, i'm aiming time series each animal displayed on same graph.
here's code:
df['date'] = pd.to_datetime(df['date'], dayfirst=true, infer_datetime_format = true) df['animal'] = df['animal'].astype('category') df = df.set_index('date') grouped = df.groupby('animal') key, group in grouped: data = group.groupby(lambda x: x.day) data['count'].plot(label=key) plt.legend() plt.show()
instead of counts both animals displayed, this:
i feel i'm missing obvious chunk here, can't quite figure out.
edit: couldn't quite grasp how order both month , day, appended data data frame.
make day
column store day numbers:
df['day'] = df.index.day
and since want days ordered along x
-axis, sort column too:
df = df.sort_values(by='day')
then can group animal
, plot each subgroup:
grouped = df.groupby(['animal']) fig, ax = plt.subplots() key, group in grouped: group.plot('day', 'count', label=key, ax=ax)
note group.plot
calls dataframe.plot
allows specify columns used x-
, y-
axes. in contrast, group['count'].plot
calls series.plot
, assumes x-axis
index , y-axis
series's values.
import matplotlib.pyplot plt import pandas pd pandas import timestamp df = pd.dataframe({'animal': {12: 'dog', 44: 'dog', 47: 'dog', 69: 'rabbit', 76: 'rabbit', 84: 'dog', 122: 'rabbit', 162: 'rabbit', 177: 'rabbit', 190: 'rabbit', 217: 'dog', 219: 'dog', 220: 'dog', 226: 'rabbit'}, 'count': {12: 34573, 44: 30676, 47: 41821, 69: 56880, 76: 73172, 84: 30581, 122: 52895, 162: 58430, 177: 57132, 190: 53903, 217: 32001, 219: 35776, 220: 31095, 226: 53809}, 'date': {12: timestamp('2014-12-29 00:00:00'), 44: timestamp('2014-12-28 00:00:00'), 47: timestamp('2014-12-31 00:00:00'), 69: timestamp('2014-12-29 00:00:00'), 76: timestamp('2014-12-31 00:00:00'), 84: timestamp('2014-12-26 00:00:00'), 122: timestamp('2014-12-25 00:00:00'), 162: timestamp('2014-12-30 00:00:00'), 177: timestamp('2014-12-27 00:00:00'), 190: timestamp('2014-12-28 00:00:00'), 217: timestamp('2014-12-27 00:00:00'), 219: timestamp('2014-12-30 00:00:00'), 220: timestamp('2014-12-25 00:00:00'), 226: timestamp('2014-12-26 00:00:00')}}) df['animal'] = df['animal'].astype('category') df = df.set_index('date') df['day'] = df.index.day df = df.sort_values(by='day') grouped = df.groupby(['animal']) fig, ax = plt.subplots() key, group in grouped: group.plot('day', 'count', label=key, ax=ax) plt.legend(loc='best') plt.show()
for revised question, if want entire date along x-axis
, perhaps easiest use series.plot
(as doing in original code):
import matplotlib.pyplot plt import pandas pd pandas import timestamp df = pd.dataframe({'animal': ['dog', 'rabbit', 'rabbit', 'rabbit', 'rabbit', 'rabbit', 'dog', 'rabbit', 'dog', 'dog', 'rabbit', 'dog', 'dog', 'rabbit', 'rabbit', 'dog', 'dog', 'rabbit', 'rabbit', 'rabbit', 'rabbit', 'rabbit', 'rabbit', 'rabbit', 'dog', 'dog', 'dog', 'rabbit', 'rabbit', 'dog'], 'count': [6136, 14620, 16437, 17273, 15302, 15180, 7177, 16193, 8226, 9741, 26237, 12146, 12910, 25820, 29323, 17294, 15219, 26174, 27112, 27131, 28268, 34059, 39162, 38314, 19807, 20606, 21552, 36499, 42163, 30301], 'date': [timestamp('2014-11-12 00:00:00'), timestamp('2014-11-13 00:00:00'), timestamp('2014-11-14 00:00:00'), timestamp('2014-11-15 00:00:00'), timestamp('2014-11-16 00:00:00'), timestamp('2014-11-17 00:00:00'), timestamp('2014-11-18 00:00:00'), timestamp('2014-11-19 00:00:00'), timestamp('2014-11-20 00:00:00'), timestamp('2014-11-21 00:00:00'), timestamp('2014-12-01 00:00:00'), timestamp('2014-12-02 00:00:00'), timestamp('2014-12-03 00:00:00'), timestamp('2014-12-04 00:00:00'), timestamp('2014-12-05 00:00:00'), timestamp('2014-12-06 00:00:00'), timestamp('2014-12-07 00:00:00'), timestamp('2014-12-08 00:00:00'), timestamp('2014-12-09 00:00:00'), timestamp('2014-12-10 00:00:00'), timestamp('2014-12-11 00:00:00'), timestamp('2014-12-12 00:00:00'), timestamp('2014-12-13 00:00:00'), timestamp('2014-12-14 00:00:00'), timestamp('2014-12-15 00:00:00'), timestamp('2014-12-16 00:00:00'), timestamp('2014-12-17 00:00:00'), timestamp('2014-12-18 00:00:00'), timestamp('2014-12-19 00:00:00'), timestamp('2014-12-20 00:00:00')]}) df = df.set_index('date') grouped = df.groupby(['animal']) fig, ax = plt.subplots() key, group in grouped: group['count'].plot(label=key, ax=ax) plt.legend(loc='best') plt.show()
Comments
Post a Comment