Thursday 15 October 2015

Tide Indicator Pi Project #6 - Converting tide times to datetime format

Crikey..that was tricky.

Ended up that the best way was to take year and month info from the datetime.now() and just replace the day, hour and time for each tide time data point. Output for

print dtTideTimes[j], tideheights[j]


looks like this:

2015-10-01 09:21:22.975449 11.7
2015-10-01 21:42:22.976826 11.5
2015-10-01 03:48:22.977813 0.6
2015-10-01 16:07:22.978737 0.9
2015-10-02 10:00:22.979654 11.1
2015-10-02 22:23:22.980587 10.6
2015-10-02 04:27:22.981501 1.1
2015-10-02 16:47:22.982419 1.5
2015-10-03 10:37:22.983506 10.2
2015-10-03 23:03:22.984480 9.6
2015-10-03 05:06:22.985411 1.9
2015-10-03 17:27:22.986337 2.4
etc

Problem now is that this is not sorted in strict time order. It is in the format HT, HT, LT, LT for each day. I created a dictionary thinking it would be easy to sort, but it's not.

I think I have a plan though, to find the two data points in the dictionary nearest to the current time.





import urllib2
from bs4 import BeautifulSoup
from time import sleep
import datetime as dt


#open site and grab html

rawhtml = urllib2.urlopen("http://www.ports.je/Pages/tides.aspx").read(40000)
soup = BeautifulSoup(rawhtml, "html.parser")


#get the tide data (it's all in tags)

rawtidedata = soup.findAll('td')


#parse all data points (date, times, heights) to one big list
#format of the list is [day,tm,ht,tm,ht,tm,lt,tm,lt]

n=0
parsedtidedata=[]
for i in rawtidedata: 
 parsedtidedata.append(rawtidedata[n].get_text())
 n += 1

#extract each class of data (day, time , height) to a separate list (there are 10 data items for each day)

tidetimes=[]
tideheights=[]
tideday=[]
lastdayofmonth=int(parsedtidedata[-10])

for n in range(0,lastdayofmonth*10,10):

 tideday.append(parsedtidedata[n])
 tidetimes.extend([parsedtidedata[n+1],parsedtidedata[n+3],parsedtidedata[n+5],parsedtidedata[n+7]])
 tideheights.extend([parsedtidedata[n+2],parsedtidedata[n+4],parsedtidedata[n+6],parsedtidedata[n+8]])

#get time now:

currentTime = dt.datetime.now()


#create a list of all the tide times as datetime objects:

dtTideTimes=[]

for j in range (0,lastdayofmonth*4):
 #print tidetimes[j][0:2], tidetimes[j][3:6]
 if tidetimes[j]=='**':
  dtTideTimes.append('**')
 else:
  dtTideTimes.append(dt.datetime.now().replace(day=int(j/4+1), hour=int(tidetimes[j][0:2]), minute=int(tidetimes[j][3:5])))
 print dtTideTimes[j], tideheights[j]
 
#create a dictionary linking dtTideTimes:tideheights

tidedatadict={}

for k in range (0,lastdayofmonth*4):
 tidedatadict[dtTideTimes[k]]=tideheights[k]
 
 


Friday 9 October 2015

Son of Thermobot - Mk2 Lives!



It's nearly time for the first Jersey Raspberry Jam and I'm taking along Thermobot. Thermobot Mk1 was taken apart for other projects so it was time to rebuild.

We have 3 HUGE tubs of lego, so it was pretty easy to source enough bits for a much better chassis:


check out the bolted stepper motor!
Then add the pi, thermometer and stepper controller board:



It took several goes and a nasty burning smell before I finally got all the wires in the right places, which is why I drew the above, so it's easier next time.

The code is the same as before. Except that I haven't bothered to get it to graph the data this time.

With the new gearing, the robot moves 6 cm per rotation, so 3 cm per °C







Monday 5 October 2015

Tide Indicator Pi Project #5 - Parsing a months worth of tide data into different lists

Today I learned to use 

list.extend([item[1],item[2]])

to split the parsed tide data to different lists. :-)

full code below.



#A python program to import tide data from a portsofjersey website
#tidenow.py
#It pulls the data in from the tide site, a month at a time
#It looks for the class headers associated with date,time and height information
#and then creates lists of these data


import urllib2
from bs4 import BeautifulSoup
from time import sleep
import datetime as dt


#open site and grab html

rawhtml = urllib2.urlopen("http://www.ports.je/Pages/tides.aspx").read(40000)
soup = BeautifulSoup(rawhtml, "html.parser")


#get the tide data (it's all in 'td' tags)

rawtidedata = soup.findAll('td')


#get just the month and year (it's in the 1st 'h2' tag on the page)

rawmonthyear = soup.findAll('h2')[0].get_text()
print ('Month and Year: ', rawmonthyear)

#strip the html and parse it all to one big list

n=0
parsedtidedata=[]
for i in rawtidedata: 
   parsedtidedata.append(rawtidedata[n].get_text())
 # print (parsedtidedata[n]) #leave in for debugging for now
   n += 1


#create lists for each class of data

tidetimes=[]
tideheights=[]
tideday=[]


#extract data to each list (there are 10 data items for each day)

lastdayofmonth=int(parsedtidedata[-10])

for n in range(0,lastdayofmonth*10,10):

   tideday.append(parsedtidedata[n])
   tidetimes.extend([parsedtidedata[n+1],parsedtidedata[n+3],parsedtidedata[n+5],parsedtidedata[n+7]])
   tideheights.extend([parsedtidedata[n+2],parsedtidedata[n+4],parsedtidedata[n+6],parsedtidedata[n+8]])

print('data for the 1st of the month')
n=0
print tideday[n]
print tidetimes[n:n+4]
print tideheights[n:n+4] 

Sunday 4 October 2015

Tide Indicator Pi Project #4 - Scraping a months worth of tide data in one hit

Having realised in my previous post I needed to move away from daily tide processing to collecting as data for a longer period, I chose this site to gather from, as the html code looked easy to scrape.

The code below is a starting point. I collects a month's worth of tide data, and parses it to one long list. I like this data better than the last site because 'empty' data slots are filled with '***' as a useful place-holder. The output is shown below.


#A python program to import tide data from a Ports of Jersey website
#tidenow.py
#It pulls the data in from the tide site, a month at a time

#import tweepy
#import smtplib
import urllib2
#import re
from bs4 import BeautifulSoup
from time import sleep
import datetime as dt


#open site and grab html

rawhtml = urllib2.urlopen("http://www.ports.je/Pages/tides.aspx").read(40000)
soup = BeautifulSoup(rawhtml, "html.parser")


#get the tide data (it's all in 'td'tags)

rawtidedata = soup.findAll('td')


#get just the month and year (it's in the 1st h2 tag on the page)

rawmonthyear = soup.findAll('h2')[0].get_text()
print ('Month and Year: ', rawmonthyear)

#parse it all to a list
n=0
parsedtidedata=[]
for i in rawtidedata: 
 parsedtidedata.append(rawtidedata[n].get_text())
 print (parsedtidedata[n])
 n += 1


Output:


Thursday 1 October 2015

Tide Indicator Pi Project #3 - Success! and Complete Rethink Required

The next step in my plan to collect tide data from the web and use it to make some kind of live tide gauge.

Darn it, I thought I had it:


#tides for #jerseyci today, Thursday 1 October:
03:48 0.6m
09:21 11.7m
16:07 0.9m
21:42 11.5m
data from http://mbcurl.me/13KDW

I've been successfully scraping daily tide data, and posting it on my Pi-hosted site here...

jcwyatt.ddns.net

and tweeting it here...

www.twitter.com/#jerseyci

which was a major goal (code is below). Chron runs this Python program every morning at 5:30am.

However now it has come to thinking about calculating live tide heights I've hit a wall when trying to use the data I'm currently scraping.

I've been working on daily data, when what is needed is continuous data over a longer time. Once I have that I think I can calculate live tide height with a rolling algorithm. Tides don't fit in neat daily chunks.

I'm going back to here: http://www.ports.je/Pages/tides.aspx to scrape a month's worth of data at a time and see how it goes.

The code below took a while and is pretty untidy, but it does what it needs to do, with some nifty string and list handling that I'm quite proud of. 



#A python program to import tide data from a gov.je website
#tidescrape6.0.py - working fine
#It pulls the data in from the gov.je tide site, which is updated daily
#It looks for the class headers associated with date,time and height information
#and then creates a list of these bits of html

#this version(6.0) is called by a chrontab function and tweets at 5:30am everyday.

import tweepy
import smtplib
import urllib2
import re
from bs4 import BeautifulSoup
from time import sleep
import datetime as dt



#function to scrape tide data from website
def tidedatascrape():

 #open site
 rawhtml = urllib2.urlopen("http://www.gov.je/Weather/Pages/Tides.aspx").read(20000)

 soup = BeautifulSoup(rawhtml)

 #from http://stackoverflow.com/questions/14257717/python-beautifulsoup-wildcard-attribute-id-search

 #get the dates:
 tidedates = soup.findAll('td', {'class': re.compile('TidesDate.*')} )
 #get the times:
 tidetimes = soup.findAll('td', {'class': re.compile('TidesTime.*')} )
 #get the heights:
 tideheights = soup.findAll('td', {'class': re.compile('TidesHeight.*')} )

 #collect together the data for today

 todaysdate = tidedates[0].get_text()
 print (todaysdate)
 todaystimes = tidetimes[0].get_text()
 print (todaystimes)
 todaysheights = tideheights[0].get_text()
 print (todaysheights)


 #parse the times (always a 5 character string)
 ttime = [0,0,0,0]
 for i in range (0,4):
  ttime[i]=todaystimes[5*i:(5*i+5)]
  print ttime[i]


 #parse the heights (3 or 4 ch string delimited by 'm' e.g 2.5m3.4m etc)
 theight = ['','','','']
 list_index = 0
 for i in todaysheights:
  if i == 'm':
   list_index += 1
  else:
   theight[list_index] = theight[list_index] + i
 print theight[0]



 #create a tweetable string of all the data
 tweetstring = ('#tides for #jerseyci today, ' + todaysdate + ':\n')
 for i in range (0,4):
  tweetstring = tweetstring + (ttime[i] + ' ' + theight[i] + 'm\n')
 tweetstring = tweetstring + 'data from http://mbcurl.me/13KDW'
 print tweetstring
 return tweetstring
 

 #print len(tweetstring) #just to check it is within 140 characters

#function to write to a text file
def writetidestofile(tweetstring):
        with open('/var/www/dailytideoutput.txt','w') as f:
                f.write(str(tweetstring))
                f.close()


#function to tweet it
def tweettidedata(tweetstring):
 CONSUMER_KEY = '0000000000000000000'#keep the quotes, replace this with your consumer key
 CONSUMER_SECRET = '00000000000000000000000000000000000000'#keep the quotes, replace this with your consumer secret key
 ACCESS_KEY = '00000000000000000000000000000000000000'#keep the quotes, replace this with your access token
 ACCESS_SECRET = '00000000000000000000000000000000000000'#keep the quotes, replace this with your access token secret
 auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
 auth.set_access_token(ACCESS_KEY, ACCESS_SECRET)
 api = tweepy.API(auth)

 api.update_status(status=tweetstring) #THIS LINE TWEETS! - LEAVE DEACTIVATED UNTIL READY


#email it(commented out for now)
'''
fromaddr = 'jbloggs@gmail.com'
toaddr  = 'j.bloggette@free.sch.uk'

# Credentials (if needed)
username = raw_input('gmail un: ')
password = raw_input('gmail pw: ')

# The actual mail send
server = smtplib.SMTP('smtp.gmail.com:587')
server.ehlo()
server.starttls()
server.login(username,password)
headers = "\r\n".join(["from: " + fromaddr,
                       "subject: " + 'Tides Today',
                       "to: " + toaddr,
                       "mime-version: 1.0",
                       "cont#ent-type: text/html"])

# body_of_email can be plaintext or html!                    
content = headers + "\r\n\r\n" + tweetstring
server.sendmail(fromaddr, toaddr, content)
server.quit
'''

#main prog
#collect data
tweetstring = tidedatascrape()
#output to file
writetidestofile(tweetstring)
#tweet data
tweettidedata(tweetstring)