Sunday, 4 October 2015

Tide Indicator Pi Project #4 - Scraping a months worth of tide data in one hit

Having realised in my previous post I needed to move away from daily tide processing to collecting as data for a longer period, I chose this site to gather from, as the html code looked easy to scrape.

The code below is a starting point. I collects a month's worth of tide data, and parses it to one long list. I like this data better than the last site because 'empty' data slots are filled with '***' as a useful place-holder. The output is shown below.


#A python program to import tide data from a Ports of Jersey website
#tidenow.py
#It pulls the data in from the tide site, a month at a time

#import tweepy
#import smtplib
import urllib2
#import re
from bs4 import BeautifulSoup
from time import sleep
import datetime as dt


#open site and grab html

rawhtml = urllib2.urlopen("http://www.ports.je/Pages/tides.aspx").read(40000)
soup = BeautifulSoup(rawhtml, "html.parser")


#get the tide data (it's all in 'td'tags)

rawtidedata = soup.findAll('td')


#get just the month and year (it's in the 1st h2 tag on the page)

rawmonthyear = soup.findAll('h2')[0].get_text()
print ('Month and Year: ', rawmonthyear)

#parse it all to a list
n=0
parsedtidedata=[]
for i in rawtidedata: 
 parsedtidedata.append(rawtidedata[n].get_text())
 print (parsedtidedata[n])
 n += 1


Output:


No comments:

Post a Comment