Thursday, 11 September 2014

Web Data Extraction / Scraping Data from Kitco Inc. Text Only Market Page

I wish to capture data from

<html>
<head>
<title>Text Only Market Page</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>

<body bgcolor="#FFFFFF">
<br><br>
<pre>
<b><font size=6>
  Kitco Inc.

  Text Only Market Page</font></b>

    <a href="http://www.kitco.com/market/">Graphic version of this page</a>

    <a href="http://www.kitco.com/market/LFrate.html">Precious Metals Lease Rates</a> 
    <a href="http://www.kitco.com/gold.londonfix.html">Historical Price Data</a> 
    <a href="http://www.kitco.com/market/marketnews.html">Precious Metals News Headlines</a>

    <font size=4><b><a href="https://online.kitco.com/bullion/completelist_USD.html#gold">Buy gold and silver online direct from Kitco!</a>
   Live quotes for all bullion products.</b></font>


   --------------------------------------------------------------------------------
   London Fix          GOLD          SILVER       PLATINUM           PALLADIUM
                   AM       PM                  AM       PM         AM       PM
   --------------------------------------------------------------------------------
   Jun 19,2012   1628.50   1625.50   28.8100   1486.00   1486.00   629.00   634.00 
   Jun 18,2012   1623.50   1615.50   28.4300   1486.00   1484.00   626.00   628.00 
   --------------------------------------------------------------------------------


                  New York Spot Price
                MARKET IS OPEN
            Will close in 4 hour 25 minutes
   ----------------------------------------------------------------------
   Metals          Bid        Ask           Change        Low       High
   ----------------------------------------------------------------------
   Gold         1619.80     1620.80     -8.90  -0.55%    1616.60  1632.70
   Silver         28.46       28.56     -0.28  -0.97%      28.24    28.95
   Platinum     1479.00     1489.00      0.00   0.00%    1476.00  1500.00
   Palladium     627.00      632.00      0.00   0.00%     622.00   639.00
   ----------------------------------------------------------------------
   Last Update on Jun 19, 2012 at 12:50.59
   ----------------------------------------------------------------------


                Asia / Europe Spot Price
                MARKET IS OPEN
            Will close in 4 hours 25 minutes
   ----------------------------------------------------------------------
   Metals                      Bid          Ask      Change from NY close
   ----------------------------------------------------------------------
   Gold                      1619.80      1620.80     -8.90   -0.55%
   Silver                      28.46        28.56     -0.28   -0.97%
   Platinum                  1479.00      1489.00     +0.00   +0.00%
   Palladium                  627.00       632.00     +0.00   +0.00%
   ----------------------------------------------------------------------
   Last Update on Jun 19, 2012 at 12:50.59
   ----------------------------------------------------------------------


<b>   File created on Tue Jun 19 12:51:04 2012</b>


        <style type="text/css"><!--
 #main_container_footer {width:100%;text-align: center;}
    #main_container_footer #footer_container {width:auto; margin:25px auto 25px auto;}
    #main_container_footer #footer_container ul {margin:0; padding:0;}
    #main_container_footer #footer_container ul li {float:left; display:inline; list-style:none; padding:0 8px; font-family:Verdana, Arial, Helvetica, sans-serif; font-size:12px; color:#000; border-right:1px #000 solid;}
    #main_container_footer #footer_container ul li a {font-family:Verdana, Arial, Helvetica, sans-serif; font-size:12px; color:#000; text-decoration:underline; font-weight:normal;}
    #main_container_footer #footer_container ul li a:hover {color:#ac1a2f; text-decoration:none; font-weight:normal;}
    #main_container_footer #footer_container ul li.no_border {border:0px;}
--></style>
  <table border="0" cellspacing="0" cellpadding="0"><tr><td>
 <div id="main_container_footer">
        <div id="footer_container">
            <ul>
                <li class="no_border"><script type="text/javascript">
copyright=new Date();
update=copyright.getFullYear();
document.write("&copy; "+ update + " Kitco Metals Inc.");
</script></li>
                <li><a href="https://corp.kitco.com/index.html">About Us</a></li>
                <li><a href="http://www.kitco.com/TermsofUse/" target="_top" onclick="Window_open(this.href,'KITCO','top=120,left=250,width=500,height=350'); return false">Website Terms of Use</a></li>
                <li><a href="https://online.kitco.com/help/privacy_policy.html" target="_top" onclick="Window_open(this.href,'KITCO','top=120,left=250,width=500,height=350'); return false">Privacy Policy</a></li>
                <li><a href="http://www.kitco.com/ads/">Advertise With Us</a></li>
                <li><a href="https://corp.kitco.com/en/corporate_culture.html">Careers</a></li>
                <li><a href="https://corp.kitco.com/en/contact.html" target="_top" onclick="Window_open(this.href,'KITCO','top=120,left=250,width=500,height=350'); return false">Contact Us</a></li>
                <li class="no_border"><a href="https://corp.kitco.com/en/feedback.html" target="_top" onclick="Window_open(this.href,'KITCO','top=120,left=250,width=500,height=350'); return false">Feedback</a></li>
            </ul>
        </div>
    </div> 

    </td></tr></table><br /><br />
<script language="JavaScript" type="text/javascript">
<!--
function Window_open (Address) {
  NewWindow = window.open(Address, "Popup", "width=695,height=600,left=100,top=200,resizable=yes,scrollbars=yes");
  NewWindow.focus();
}
// -->
</script>
 <!-- img src="http://www.kitco.com/scripts/counter/counter.pl?txtonlyE.txt" width="1" height="1" -->
<!-- Google-Analytics Code-->
<script type="text/javascript">
  var _gaq = _gaq || [];
  _gaq.push(['_setAccount', 'UA-4074364-3']);
  _gaq.push(['_trackPageview']);

  (function() {
    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
  })();
</script>
</body>
</html>

More specifically, I am looking to capture the following data:

--------------------------------------------------------------------------------
London Fix          GOLD          SILVER       PLATINUM           PALLADIUM
               AM       PM                  AM       PM         AM       PM
--------------------------------------------------------------------------------
Jun 19,2012   1628.50   NA        28.8100   1486.00   1486.00   629.00   634.00 
Jun 18,2012   1623.50   1615.50   28.4300   1486.00   1484.00   626.00   628.00 
--------------------------------------------------------------------------------

Does anybody have any suggestions how I can do this using PHP?



1 Answer



Quick and dirty regex method:

$data = file_get_contents('http://www.kitco.com/texten/texten.html');
preg_match_all('/([A-Z]{3,5}\s+[0-9]{1,2},[0-9]{4}\s+([0-9.NA]{2,10}\s+){1,7})/si',$data,$result);

$records = array();
foreach($result[1] as $date) {
    $temp = preg_split('/\s+/',$date);
    $index = array_shift($temp);
    $index.= array_shift($temp);
    $records[$index] = implode(',',$temp);
}
print_R($records);

Note, you'd probably want to add some validation, etc.


Source: http://stackoverflow.com/questions/11103001/web-data-extraction-scraping-data-from-kitco-inc-text-only-market-page

No comments:

Post a Comment