The code below is a starting point. I collects a month's worth of tide data, and parses it to one long list. I like this data better than the last site because 'empty' data slots are filled with '***' as a useful place-holder. The output is shown below.
#A python program to import tide data from a P
orts of Jersey website
#tidenow.py
#It pulls the data in from the tide site, a month at a time
#import tweepy
#import smtplib
import urllib2
#import re
from bs4 import BeautifulSoup
from time import sleep
import datetime as dt
#open site and grab html
rawhtml = urllib2.urlopen("http://www.ports.je/Pages/tides.aspx").read(40000)
soup = BeautifulSoup(rawhtml, "html.parser")
#get the tide data (it's all in 'td'tags)
rawtidedata = soup.findAll('td')
#get just the month and year (it's in the 1st h2 tag on the page)
rawmonthyear = soup.findAll('h2')[0].get_text()
print ('Month and Year: ', rawmonthyear)
#parse it all to a list
n=0
parsedtidedata=[]
for i in rawtidedata:
parsedtidedata.append(rawtidedata[n].get_text())
print (parsedtidedata[n])
n += 1
Output:
No comments:
Post a Comment