This is the data on the UK Hydrographic Office site:
(http://www.ukho.gov.uk/easytide/EasyTide/ShowPrediction.aspx?PortID=1605&PredictionLength=7)
and here's the html to scrape:
Tidal info is also avalable here: http://www.portofjersey.je/Pages/tides.aspx
and here*: http://www.gov.je/Weather/Pages/Tides.aspx
*Of these it actually looks like this last one has the neatest, most 'scrapable' formatting and the current days tides will always be at the same position on the page.
To scrape the data I initially tried this:
#python program to import tide data from a website
import urllib2
#open site
rawhtml = urllib2.urlopen("http://www.ukho.gov.uk/easytide/EasyTide/ShowPrediction.aspx?PortID=1605&PredictionLength=7").read(20000)
print (rawhtml)
Which collected the text from the site, but it appeared tricky to extract the meaningful data.I googled and found this: http://docs.python-guide.org/en/latest/scenarios/scrape/
But it wasn't to easy to install the lxml module using
pip
but this worked:sudo apt-get install python-lxml
Next job is to learn how to extract particular bits of data.
No comments:
Post a Comment