Sunday, 9 August 2015

Tide Indicator Pi Project #2 - Scraping the tidal data for St Helier

Tide Indicator Pi Project #2 - Scraping the tidal data for St Helier

Have working code that extracts the code and makes lists from it (with the html tags still attached for now):

#A python program to import tide data from a gov.je website
#tidescrape1.0.py - working fine
#It pulls the data in from the gov.je tide site, wkhich is updated daily
#It looks for the class headers associated with date,time and height information
#and then creates a list of these bits of html

#next step - try to extract just the data from current day and tweet it.

import urllib2
import re
from bs4 import BeautifulSoup


#open site
rawhtml = urllib2.urlopen("http://www.gov.je/Weather/Pages/Tides.aspx").read(20000)

soup = BeautifulSoup(rawhtml)

#from http://stackoverflow.com/questions/14257717/python-beautifulsoup-wildcard-attribute-id-search
#get the dates:
tidedates = soup.findAll('td', {'class': re.compile('TidesDate.*')} )

print (tidedates[0])

#get the times:
tidetimes = soup.findAll('td', {'class': re.compile('TidesTime.*')} )

print (tidetimes[0])

#get the heights:
tideheights = soup.findAll('td', {'class': re.compile('TidesHeight.*')} )

print (tideheights[0])

Output looks like this:

<td class="TidesDate Weekend">Sunday 9 August</td>
<td class="TidesTime Weekend"><span style="color:#cc0000;">01:57</span><br/>08:42<br/>14:33<br/>21:27<br/></td>
<td class="TidesHeight Weekend"><span style="color:#cc0000;">8.5m</span><br/>3.6m<br/>8.4m<br/>3.7m<br/></td>



Next step is to somehow strip the text out.

No comments:

Post a Comment