python - How do I combine Pandas dataframes by looking at dates in one dataframe that fall within a date range in another dataframe? -
i have 2 dataframes have employee data below. 1 data file has employee data including dates on employees sick, , other data file has dates on employees worked (i.e. presented date ranges). combine 2 files (hopefully in pandas) looking @ "sick day" particular employee falls in "work range". example, in image/data below, employee 1 sick on 11/25/2015, 12/23/2015, , 10/12/2015. these fall in "work ranges" 11/21/2015 - 11/29/2015, 12/21/2015 - 12/29/2015, , 10/9/2015 - 10/17/2015, respectively.
employee work dates data:
╔══════════╦════════════╦════════════╗ ║ employee ║ datein ║ dateout ║ ╠══════════╬════════════╬════════════╣ ║ 1 ║ 11/21/2015 ║ 11/29/2015 ║ ║ 2 ║ 12/9/2015 ║ 12/14/2015 ║ ║ 3 ║ 11/10/2015 ║ 11/19/2015 ║ ║ 4 ║ 11/11/2015 ║ 11/17/2015 ║ ║ 5 ║ 11/30/2015 ║ 12/8/2015 ║ ║ 1 ║ 12/21/2015 ║ 12/29/2015 ║ ║ 2 ║ 1/7/2016 ║ 1/12/2016 ║ ║ 3 ║ 12/10/2015 ║ 12/19/2015 ║ ║ 4 ║ 12/10/2015 ║ 12/16/2015 ║ ║ 5 ║ 12/30/2015 ║ 1/7/2016 ║ ║ 1 ║ 10/9/2015 ║ 10/17/2015 ║ ║ 2 ║ 10/27/2015 ║ 11/1/2015 ║ ║ 3 ║ 9/28/2015 ║ 10/7/2015 ║ ║ 4 ║ 9/29/2015 ║ 10/5/2015 ║ ╚══════════╩════════════╩════════════╝
employee sick dates data:
╔══════════╦════════════╦═══════════╗ ║ employee ║ sickdate ║ sickness ║ ╠══════════╬════════════╬═══════════╣ ║ 1 ║ 11/25/2015 ║ flu ║ ║ 10 ║ 11/21/2015 ║ hd ║ ║ 21 ║ 9/20/2015 ║ other ║ ║ 1 ║ 12/23/2015 ║ other ║ ║ 4 ║ 12/13/2015 ║ vacationx ║ ║ 7 ║ 7/21/2015 ║ cough ║ ║ 3 ║ 10/1/2015 ║ rash ║ ║ 4 ║ 10/5/2015 ║ other ║ ║ 5 ║ 1/7/2016 ║ eyex ║ ║ 2 ║ 12/12/2015 ║ tanx ║ ║ 1 ║ 10/12/2015 ║ fatiguex ║ ╚══════════╩════════════╩═══════════╝
consolidated data:
╔══════════╦════════════╦════════════╦════════════╦═══════════╗ ║ employee ║ datein ║ dateout ║ sickdate ║ sickness ║ ╠══════════╬════════════╬════════════╬════════════╬═══════════╣ ║ 1 ║ 11/21/2015 ║ 11/29/2015 ║ 11/25/2015 ║ flu ║ ║ 2 ║ 12/9/2015 ║ 12/14/2015 ║ 12/12/2015 ║ tanx ║ ║ 3 ║ 11/10/2015 ║ 11/19/2015 ║ ║ ║ ║ 4 ║ 11/11/2015 ║ 11/17/2015 ║ ║ ║ ║ 5 ║ 11/30/2015 ║ 12/8/2015 ║ ║ ║ ║ 1 ║ 12/21/2015 ║ 12/29/2015 ║ 12/23/2015 ║ other ║ ║ 2 ║ 1/7/2016 ║ 1/12/2016 ║ ║ ║ ║ 3 ║ 12/10/2015 ║ 12/19/2015 ║ ║ ║ ║ 4 ║ 12/10/2015 ║ 12/16/2015 ║ 12/13/2015 ║ vacationx ║ ║ 5 ║ 12/30/2015 ║ 1/7/2016 ║ 1/7/2016 ║ eyex ║ ║ 1 ║ 10/9/2015 ║ 10/17/2015 ║ 10/12/2015 ║ fatiguex ║ ║ 2 ║ 10/27/2015 ║ 11/1/2015 ║ ║ ║ ║ 3 ║ 9/28/2015 ║ 10/7/2015 ║ 10/1/2015 ║ rash ║ ║ 4 ║ 9/29/2015 ║ 10/5/2015 ║ 10/5/2015 ║ other ║ ╚══════════╩════════════╩════════════╩════════════╩═══════════╝
how do in pandas or python? (thank help!)
you need put data pd.dataframe( ... )
df1 , set_index('employee')
╔══════════╦════════════╦════════════╗ ║ employee ║ datein ║ dateout ║ ╠══════════╬════════════╬════════════╣ ║ 1 ║ 11/21/2015 ║ 11/29/2015 ║ ║ 2 ║ 12/9/2015 ║ 12/14/2015 ║ ║ 3 ║ 11/10/2015 ║ 11/19/2015 ║ ║ 4 ║ 11/11/2015 ║ 11/17/2015 ║ ║ 5 ║ 11/30/2015 ║ 12/8/2015 ║ ║ 1 ║ 12/21/2015 ║ 12/29/2015 ║ ║ 2 ║ 1/7/2016 ║ 1/12/2016 ║ ║ 3 ║ 12/10/2015 ║ 12/19/2015 ║ ║ 4 ║ 12/10/2015 ║ 12/16/2015 ║ ║ 5 ║ 12/30/2015 ║ 1/7/2016 ║ ║ 1 ║ 10/9/2015 ║ 10/17/2015 ║ ║ 2 ║ 10/27/2015 ║ 11/1/2015 ║ ║ 3 ║ 9/28/2015 ║ 10/7/2015 ║ ║ 4 ║ 9/29/2015 ║ 10/5/2015 ║ ╚══════════╩════════════╩════════════╝
then put data pd.dataframe( ... )
df2 , set_index('employee')
╔══════════╦════════════╦═══════════╗ ║ employee ║ sickdate ║ sickness ║ ╠══════════╬════════════╬═══════════╣ ║ 1 ║ 11/25/2015 ║ flu ║ ║ 10 ║ 11/21/2015 ║ hd ║ ║ 21 ║ 9/20/2015 ║ other ║ ║ 1 ║ 12/23/2015 ║ other ║ ║ 4 ║ 12/13/2015 ║ vacationx ║ ║ 7 ║ 7/21/2015 ║ cough ║ ║ 3 ║ 10/1/2015 ║ rash ║ ║ 4 ║ 10/5/2015 ║ other ║ ║ 5 ║ 1/7/2016 ║ eyex ║ ║ 2 ║ 12/12/2015 ║ tanx ║ ║ 1 ║ 10/12/2015 ║ fatiguex ║ ╚══════════╩════════════╩═══════════╝
finally, df = df1.join(df2).reset_index()
Comments
Post a Comment