Sunday, November 6, 2011

How To scrap get Hebrew Data or content from website using PLSQL



Here is the Plsql ( pl/sql)  Procedure which collects / gets / scrap some Data / content  from External Website :

CREATE OR REPLACE PROCEDURE download_html_to_clob (p_url  IN  VARCHAR2) AS

  l_clob           CLOB;

BEGIN

-- Set Proxy - only if you are working

utl_http.set_proxy(proxy=>'www-proxy.us.mycompany.com:80');

-- Set character set of retrived document 

--(in our example - the web page contains Hebrew content)

utl_http.set_body_charset('WINDOWS-1255');

l_clob := HTTPURITYPE.createuri(p_url).getclob();

  -- Insert the data into the table.

  INSERT INTO http_clob_test (id, url, data)

  VALUES (http_clob_test_seq.NEXTVAL, p_url, l_clob);

END download_html_to_clob;

How to implement :
begin

download_html_to_clob('http://www.google.co.il');

end;


Clob Table which will Hold HTML content received from External website :

CREATE TABLE http_clob_test (

  id    NUMBER(10),

  url   VARCHAR2(255),

  data  CLOB,

  CONSTRAINT http_clob_test_pk PRIMARY KEY (id)

);


Sequence for that table :
CREATE SEQUENCE http_clob_test_seq;


ALL INFO APPEARS HERE THANKS TO :
http://www.oracle-base.com/articles/misc/RetrievingHTMLandBinariesIntoTablesOverHTTP.php