Sunday 26 July 2015

Tide Indicator Pi Project #1 - Scraping the tidal data for St Helier

I have a plan to build a live tide indicator. This will lift tide data from the web, interpret and extrapolate it to work out the current tide height, and then output this data live to the web and a physical indicator. 

This is the data on the UK Hydrographic Office site:




(http://www.ukho.gov.uk/easytide/EasyTide/ShowPrediction.aspx?PortID=1605&PredictionLength=7)





and here's the html to scrape:


 

Tidal info is also avalable here: http://www.portofjersey.je/Pages/tides.aspx


and here*: http://www.gov.je/Weather/Pages/Tides.aspx

*Of these it actually looks like this last one has the neatest, most 'scrapable' formatting and the current days tides will always be at the same position on the page.


To scrape the data I initially tried this: 

#python program to import tide data from a website
import urllib2

#open site
rawhtml = urllib2.urlopen("http://www.ukho.gov.uk/easytide/EasyTide/ShowPrediction.aspx?PortID=1605&PredictionLength=7").read(20000)

print (rawhtml)

Which collected the text from the site, but it appeared tricky to extract the meaningful data.

I googled and found this: http://docs.python-guide.org/en/latest/scenarios/scrape/

But it wasn't to easy to install the lxml module using pip but this worked:

sudo apt-get install python-lxml

Next job is to learn how to extract particular bits of data.

No comments:

Post a Comment