I wish to capture data from
<html>
<head>
<title>Text Only Market Page</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<body bgcolor="#FFFFFF">
<br><br>
<pre>
<b><font size=6>
Kitco Inc.
Text Only Market Page</font></b>
<a href="http://www.kitco.com/market/">Graphic version of this page</a>
<a href="http://www.kitco.com/market/LFrate.html">Precious Metals Lease Rates</a>
<a href="http://www.kitco.com/gold.londonfix.html">Historical Price Data</a>
<a href="http://www.kitco.com/market/marketnews.html">Precious Metals News Headlines</a>
<font size=4><b><a href="https://online.kitco.com/bullion/completelist_USD.html#gold">Buy gold and silver online direct from Kitco!</a>
Live quotes for all bullion products.</b></font>
--------------------------------------------------------------------------------
London Fix GOLD SILVER PLATINUM PALLADIUM
AM PM AM PM AM PM
--------------------------------------------------------------------------------
Jun 19,2012 1628.50 1625.50 28.8100 1486.00 1486.00 629.00 634.00
Jun 18,2012 1623.50 1615.50 28.4300 1486.00 1484.00 626.00 628.00
--------------------------------------------------------------------------------
New York Spot Price
MARKET IS OPEN
Will close in 4 hour 25 minutes
----------------------------------------------------------------------
Metals Bid Ask Change Low High
----------------------------------------------------------------------
Gold 1619.80 1620.80 -8.90 -0.55% 1616.60 1632.70
Silver 28.46 28.56 -0.28 -0.97% 28.24 28.95
Platinum 1479.00 1489.00 0.00 0.00% 1476.00 1500.00
Palladium 627.00 632.00 0.00 0.00% 622.00 639.00
----------------------------------------------------------------------
Last Update on Jun 19, 2012 at 12:50.59
----------------------------------------------------------------------
Asia / Europe Spot Price
MARKET IS OPEN
Will close in 4 hours 25 minutes
----------------------------------------------------------------------
Metals Bid Ask Change from NY close
----------------------------------------------------------------------
Gold 1619.80 1620.80 -8.90 -0.55%
Silver 28.46 28.56 -0.28 -0.97%
Platinum 1479.00 1489.00 +0.00 +0.00%
Palladium 627.00 632.00 +0.00 +0.00%
----------------------------------------------------------------------
Last Update on Jun 19, 2012 at 12:50.59
----------------------------------------------------------------------
<b> File created on Tue Jun 19 12:51:04 2012</b>
<style type="text/css"><!--
#main_container_footer {width:100%;text-align: center;}
#main_container_footer #footer_container {width:auto; margin:25px auto 25px auto;}
#main_container_footer #footer_container ul {margin:0; padding:0;}
#main_container_footer #footer_container ul li {float:left; display:inline; list-style:none; padding:0 8px; font-family:Verdana, Arial, Helvetica, sans-serif; font-size:12px; color:#000; border-right:1px #000 solid;}
#main_container_footer #footer_container ul li a {font-family:Verdana, Arial, Helvetica, sans-serif; font-size:12px; color:#000; text-decoration:underline; font-weight:normal;}
#main_container_footer #footer_container ul li a:hover {color:#ac1a2f; text-decoration:none; font-weight:normal;}
#main_container_footer #footer_container ul li.no_border {border:0px;}
--></style>
<table border="0" cellspacing="0" cellpadding="0"><tr><td>
<div id="main_container_footer">
<div id="footer_container">
<ul>
<li class="no_border"><script type="text/javascript">
copyright=new Date();
update=copyright.getFullYear();
document.write("© "+ update + " Kitco Metals Inc.");
</script></li>
<li><a href="https://corp.kitco.com/index.html">About Us</a></li>
<li><a href="http://www.kitco.com/TermsofUse/" target="_top" onclick="Window_open(this.href,'KITCO','top=120,left=250,width=500,height=350'); return false">Website Terms of Use</a></li>
<li><a href="https://online.kitco.com/help/privacy_policy.html" target="_top" onclick="Window_open(this.href,'KITCO','top=120,left=250,width=500,height=350'); return false">Privacy Policy</a></li>
<li><a href="http://www.kitco.com/ads/">Advertise With Us</a></li>
<li><a href="https://corp.kitco.com/en/corporate_culture.html">Careers</a></li>
<li><a href="https://corp.kitco.com/en/contact.html" target="_top" onclick="Window_open(this.href,'KITCO','top=120,left=250,width=500,height=350'); return false">Contact Us</a></li>
<li class="no_border"><a href="https://corp.kitco.com/en/feedback.html" target="_top" onclick="Window_open(this.href,'KITCO','top=120,left=250,width=500,height=350'); return false">Feedback</a></li>
</ul>
</div>
</div>
</td></tr></table><br /><br />
<script language="JavaScript" type="text/javascript">
<!--
function Window_open (Address) {
NewWindow = window.open(Address, "Popup", "width=695,height=600,left=100,top=200,resizable=yes,scrollbars=yes");
NewWindow.focus();
}
// -->
</script>
<!-- img src="http://www.kitco.com/scripts/counter/counter.pl?txtonlyE.txt" width="1" height="1" -->
<!-- Google-Analytics Code-->
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-4074364-3']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
</body>
</html>
More specifically, I am looking to capture the following data:
--------------------------------------------------------------------------------
London Fix GOLD SILVER PLATINUM PALLADIUM
AM PM AM PM AM PM
--------------------------------------------------------------------------------
Jun 19,2012 1628.50 NA 28.8100 1486.00 1486.00 629.00 634.00
Jun 18,2012 1623.50 1615.50 28.4300 1486.00 1484.00 626.00 628.00
--------------------------------------------------------------------------------
Does anybody have any suggestions how I can do this using PHP?
1 Answer
Quick and dirty regex method:
$data = file_get_contents('http://www.kitco.com/texten/texten.html');
preg_match_all('/([A-Z]{3,5}\s+[0-9]{1,2},[0-9]{4}\s+([0-9.NA]{2,10}\s+){1,7})/si',$data,$result);
$records = array();
foreach($result[1] as $date) {
$temp = preg_split('/\s+/',$date);
$index = array_shift($temp);
$index.= array_shift($temp);
$records[$index] = implode(',',$temp);
}
print_R($records);
Note, you'd probably want to add some validation, etc.
Source: http://stackoverflow.com/questions/11103001/web-data-extraction-scraping-data-from-kitco-inc-text-only-market-page
<html>
<head>
<title>Text Only Market Page</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<body bgcolor="#FFFFFF">
<br><br>
<pre>
<b><font size=6>
Kitco Inc.
Text Only Market Page</font></b>
<a href="http://www.kitco.com/market/">Graphic version of this page</a>
<a href="http://www.kitco.com/market/LFrate.html">Precious Metals Lease Rates</a>
<a href="http://www.kitco.com/gold.londonfix.html">Historical Price Data</a>
<a href="http://www.kitco.com/market/marketnews.html">Precious Metals News Headlines</a>
<font size=4><b><a href="https://online.kitco.com/bullion/completelist_USD.html#gold">Buy gold and silver online direct from Kitco!</a>
Live quotes for all bullion products.</b></font>
--------------------------------------------------------------------------------
London Fix GOLD SILVER PLATINUM PALLADIUM
AM PM AM PM AM PM
--------------------------------------------------------------------------------
Jun 19,2012 1628.50 1625.50 28.8100 1486.00 1486.00 629.00 634.00
Jun 18,2012 1623.50 1615.50 28.4300 1486.00 1484.00 626.00 628.00
--------------------------------------------------------------------------------
New York Spot Price
MARKET IS OPEN
Will close in 4 hour 25 minutes
----------------------------------------------------------------------
Metals Bid Ask Change Low High
----------------------------------------------------------------------
Gold 1619.80 1620.80 -8.90 -0.55% 1616.60 1632.70
Silver 28.46 28.56 -0.28 -0.97% 28.24 28.95
Platinum 1479.00 1489.00 0.00 0.00% 1476.00 1500.00
Palladium 627.00 632.00 0.00 0.00% 622.00 639.00
----------------------------------------------------------------------
Last Update on Jun 19, 2012 at 12:50.59
----------------------------------------------------------------------
Asia / Europe Spot Price
MARKET IS OPEN
Will close in 4 hours 25 minutes
----------------------------------------------------------------------
Metals Bid Ask Change from NY close
----------------------------------------------------------------------
Gold 1619.80 1620.80 -8.90 -0.55%
Silver 28.46 28.56 -0.28 -0.97%
Platinum 1479.00 1489.00 +0.00 +0.00%
Palladium 627.00 632.00 +0.00 +0.00%
----------------------------------------------------------------------
Last Update on Jun 19, 2012 at 12:50.59
----------------------------------------------------------------------
<b> File created on Tue Jun 19 12:51:04 2012</b>
<style type="text/css"><!--
#main_container_footer {width:100%;text-align: center;}
#main_container_footer #footer_container {width:auto; margin:25px auto 25px auto;}
#main_container_footer #footer_container ul {margin:0; padding:0;}
#main_container_footer #footer_container ul li {float:left; display:inline; list-style:none; padding:0 8px; font-family:Verdana, Arial, Helvetica, sans-serif; font-size:12px; color:#000; border-right:1px #000 solid;}
#main_container_footer #footer_container ul li a {font-family:Verdana, Arial, Helvetica, sans-serif; font-size:12px; color:#000; text-decoration:underline; font-weight:normal;}
#main_container_footer #footer_container ul li a:hover {color:#ac1a2f; text-decoration:none; font-weight:normal;}
#main_container_footer #footer_container ul li.no_border {border:0px;}
--></style>
<table border="0" cellspacing="0" cellpadding="0"><tr><td>
<div id="main_container_footer">
<div id="footer_container">
<ul>
<li class="no_border"><script type="text/javascript">
copyright=new Date();
update=copyright.getFullYear();
document.write("© "+ update + " Kitco Metals Inc.");
</script></li>
<li><a href="https://corp.kitco.com/index.html">About Us</a></li>
<li><a href="http://www.kitco.com/TermsofUse/" target="_top" onclick="Window_open(this.href,'KITCO','top=120,left=250,width=500,height=350'); return false">Website Terms of Use</a></li>
<li><a href="https://online.kitco.com/help/privacy_policy.html" target="_top" onclick="Window_open(this.href,'KITCO','top=120,left=250,width=500,height=350'); return false">Privacy Policy</a></li>
<li><a href="http://www.kitco.com/ads/">Advertise With Us</a></li>
<li><a href="https://corp.kitco.com/en/corporate_culture.html">Careers</a></li>
<li><a href="https://corp.kitco.com/en/contact.html" target="_top" onclick="Window_open(this.href,'KITCO','top=120,left=250,width=500,height=350'); return false">Contact Us</a></li>
<li class="no_border"><a href="https://corp.kitco.com/en/feedback.html" target="_top" onclick="Window_open(this.href,'KITCO','top=120,left=250,width=500,height=350'); return false">Feedback</a></li>
</ul>
</div>
</div>
</td></tr></table><br /><br />
<script language="JavaScript" type="text/javascript">
<!--
function Window_open (Address) {
NewWindow = window.open(Address, "Popup", "width=695,height=600,left=100,top=200,resizable=yes,scrollbars=yes");
NewWindow.focus();
}
// -->
</script>
<!-- img src="http://www.kitco.com/scripts/counter/counter.pl?txtonlyE.txt" width="1" height="1" -->
<!-- Google-Analytics Code-->
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-4074364-3']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
</body>
</html>
More specifically, I am looking to capture the following data:
--------------------------------------------------------------------------------
London Fix GOLD SILVER PLATINUM PALLADIUM
AM PM AM PM AM PM
--------------------------------------------------------------------------------
Jun 19,2012 1628.50 NA 28.8100 1486.00 1486.00 629.00 634.00
Jun 18,2012 1623.50 1615.50 28.4300 1486.00 1484.00 626.00 628.00
--------------------------------------------------------------------------------
Does anybody have any suggestions how I can do this using PHP?
1 Answer
Quick and dirty regex method:
$data = file_get_contents('http://www.kitco.com/texten/texten.html');
preg_match_all('/([A-Z]{3,5}\s+[0-9]{1,2},[0-9]{4}\s+([0-9.NA]{2,10}\s+){1,7})/si',$data,$result);
$records = array();
foreach($result[1] as $date) {
$temp = preg_split('/\s+/',$date);
$index = array_shift($temp);
$index.= array_shift($temp);
$records[$index] = implode(',',$temp);
}
print_R($records);
Note, you'd probably want to add some validation, etc.
Source: http://stackoverflow.com/questions/11103001/web-data-extraction-scraping-data-from-kitco-inc-text-only-market-page
No comments:
Post a Comment