Sunday, 9 August 2015

Tide Indicator Pi Project #2 - Scraping the tidal data for St Helier

Tide Indicator Pi Project #2 - Scraping the tidal data for St Helier

Have working code that extracts the code and makes lists from it (with the html tags still attached for now):

#A python program to import tide data from a website - working fine
#It pulls the data in from the tide site, wkhich is updated daily
#It looks for the class headers associated with date,time and height information
#and then creates a list of these bits of html

#next step - try to extract just the data from current day and tweet it.

import urllib2
import re
from bs4 import BeautifulSoup

#open site
rawhtml = urllib2.urlopen("").read(20000)

soup = BeautifulSoup(rawhtml)

#get the dates:
tidedates = soup.findAll('td', {'class': re.compile('TidesDate.*')} )

print (tidedates[0])

#get the times:
tidetimes = soup.findAll('td', {'class': re.compile('TidesTime.*')} )

print (tidetimes[0])

#get the heights:
tideheights = soup.findAll('td', {'class': re.compile('TidesHeight.*')} )

print (tideheights[0])

Output looks like this:

<td class="TidesDate Weekend">Sunday 9 August</td>
<td class="TidesTime Weekend"><span style="color:#cc0000;">01:57</span><br/>08:42<br/>14:33<br/>21:27<br/></td>
<td class="TidesHeight Weekend"><span style="color:#cc0000;">8.5m</span><br/>3.6m<br/>8.4m<br/>3.7m<br/></td>

Next step is to somehow strip the text out.