Ended up that the best way was to take year and month info from the datetime.now() and just replace the day, hour and time for each tide time data point. Output for
print dtTideTimes[j], tideheights[j]
looks like this:2015-10-01 09:21:22.975449 11.7 2015-10-01 21:42:22.976826 11.5 2015-10-01 03:48:22.977813 0.6 2015-10-01 16:07:22.978737 0.9 2015-10-02 10:00:22.979654 11.1 2015-10-02 22:23:22.980587 10.6 2015-10-02 04:27:22.981501 1.1 2015-10-02 16:47:22.982419 1.5 2015-10-03 10:37:22.983506 10.2 2015-10-03 23:03:22.984480 9.6 2015-10-03 05:06:22.985411 1.9 2015-10-03 17:27:22.986337 2.4
etc
Problem now is that this is not sorted in strict time order. It is in the format HT, HT, LT, LT for each day. I created a dictionary thinking it would be easy to sort, but it's not.
I think I have a plan though, to find the two data points in the dictionary nearest to the current time.
import urllib2
from bs4 import BeautifulSoup
from time import sleep
import datetime as dt
#open site and grab html
rawhtml = urllib2.urlopen("http://www.ports.je/Pages/tides.aspx").read(40000)
soup = BeautifulSoup(rawhtml, "html.parser")
#get the tide data (it's all in tags)
rawtidedata = soup.findAll('td')
#parse all data points (date, times, heights) to one big list
#format of the list is [day,tm,ht,tm,ht,tm,lt,tm,lt]
n=0
parsedtidedata=[]
for i in rawtidedata:
parsedtidedata.append(rawtidedata[n].get_text())
n += 1
#extract each class of data (day, time , height) to a separate list (there are 10 data items for each day)
tidetimes=[]
tideheights=[]
tideday=[]
lastdayofmonth=int(parsedtidedata[-10])
for n in range(0,lastdayofmonth*10,10):
tideday.append(parsedtidedata[n])
tidetimes.extend([parsedtidedata[n+1],parsedtidedata[n+3],parsedtidedata[n+5],parsedtidedata[n+7]])
tideheights.extend([parsedtidedata[n+2],parsedtidedata[n+4],parsedtidedata[n+6],parsedtidedata[n+8]])
#get time now:
currentTime = dt.datetime.now()
#create a list of all the tide times as datetime objects:
dtTideTimes=[]
for j in range (0,lastdayofmonth*4):
#print tidetimes[j][0:2], tidetimes[j][3:6]
if tidetimes[j]=='**':
dtTideTimes.append('**')
else:
dtTideTimes.append(dt.datetime.now().replace(day=int(j/4+1), hour=int(tidetimes[j][0:2]), minute=int(tidetimes[j][3:5])))
print dtTideTimes[j], tideheights[j]
#create a dictionary linking dtTideTimes:tideheights
tidedatadict={}
for k in range (0,lastdayofmonth*4):
tidedatadict[dtTideTimes[k]]=tideheights[k]
No comments:
Post a Comment