CMX Ads Webmaster Resources for Success

Websites, Advertising, Scripts Tips n Snippets

News: The Big Brother you Never Wanted

Link and Content Ripper(Scraper)Utility Script Tutorial

Da Rippa. For scraping content. A webmasters best friend.

Hi!. Welcome to Da Rippa content and link ripping Php utility script.
This is a must have weapon in your arsenal of utility scripts for your websites.
If the content has been done before why reinvent the wheel? Just rip it and share it!
You are providing a benificial service to everyone.

Let us now get started.

This script is intended as shown below to be used as an include in a page you want the
ripped content to be displayed. Change the script file name and content write to file
to include in multiple pages for different content.

To rip content we will be looking for certain tags this content is
enclosed in from a webpages source code.
So we have to know what tags and coding format the webpage is in.

So I suggest to copy and paste the source code from the website to a file first
then go through the code to look for the tags the content you want is enclosed in.
Hopefully it is consistant. But could change with a new webdesigner for the site.
Like enclosing links in single quotes or double etc.

We then set our rippa script opening and closing tag variables for our preg match.
We set the webpage URL for curl to get the content.
And we enable the setting to add the url to the links if they are relative.

See script comments for info on the link str_replace and using base url in header.

Now the Magic

Introducing DaRippa!


  1: <?php
2:
//DARIPPA . ALL NEW! for 22. Webpage link and content ripper Php Utility script
3: //CopyRight 2022 by george AT CMxads.com. 
4: //Webmaster resources and promotions
5: // We can rip content and links from a webpage source code
6: //Include in your page and it will load external ripped content on your page.
7: //    
8: //    require_once "ripit.php";//this script    
9: //
10: //
11: // wrap content in your styled division
12: ////////////////////////////////////////////////
13:
14: ////////////////  SETTINGS
15:     
$url 'https//site.com';//website url ripping content from
16:     
$link 'add_site_link';// would be the url above for external site rip
17:     
$get_first '<div class="first_tags">';//first tag to look for to rip
18:     
$get_last '</div>';// end tag. may have to be more specific. like(</div><div class='next_tag'>)
19:
20: /////////////////////////////////////////////////
21: // depending on your server you may have to use htmlenties on the file get contents
22: // $c = stripslashes(htmlentities($var));
23: // $v = html_entity_decode($c);
24: // and variables below and use html_entities_decode on the echoed output
25:
26: //this script adds the external site url to the links
27: //you can use HTML base and not add url to links. like so
28: //<head>
29: //<base url="https://www.external_site.com">// set external url in head
30: //</head>
31: //<body>
32: //<a href="html">Ripped Link</a>// url href is now "html" in links
33:
34:
function curl_get_file_contents($URL)//Curl content
35:     
{
36:         
$c curl_init();
37:         
curl_setopt($cCURLOPT_RETURNTRANSFER1);
38:         
curl_setopt($cCURLOPT_URL$URL);
39:         
curl_setopt($cCURLOPT_SSL_VERIFYPEERfalse);
40:         
$contents curl_exec($c);
41:         
curl_close($c);
42:
43:         if (
$contents) return $contents;
44:             else return 
FALSE;
45:     }
46:     
47:        function 
getURL($url) {
48:         if (!
parse_url($url)) {
49:             return 
false;   
50:         }
51:         
$hostparse_url($url,PHP_URL_HOST);
52:         
$schemeparse_url($url,PHP_URL_SCHEME);
53:         switch (
$scheme) {
54:             case 
'https':
55:                 
$scheme 'ssl://';
56:                 
$port 443;
57:                 break;
58:             case 
'http':
59:             default:
60:                 
$scheme '';
61:                 
$port 80;   
62:         }
63:         
//Fsock content
64:      
$fp = @fsockopen($scheme $host$port$errno$errstr30);
65:      if (
$fp) {
66:     
stream_set_timeout($fp,5);
67:     
$out "GET / HTTP/1.1\r\n";
68:     
$out .= "Host: $host\r\n";
69:     
$out .= "Connection: Close\r\n\r\n";
70:     
fwrite($fp$out);
71:     
$body false;
72:     while (!
feof($fp)) {
73:         
$s fgets($fp1024);
74:         if (
$body)
75:             
$in .= $s;
76:         if (
$s == "\r\n")
77:             
$body true;
78:     }
79:     
fclose($fp);
80:    
81:     return 
$in;
82:     }else{
83:             return 
false;   
84:         }
85:     }    
86:
//we are checking if we have ripped content in our file
87: // we check file last modified and rip from site by our setting
88: //Ex: once a day or once a week etc
89: //if time stamp is greater than file last modified plus one week
90: // we will rip links once a week
91:
$html='';
92:    if(
file_exists('ripped_content.txt') && filesize('ripped_content.txt')>25) { 
93:  
//if the file exists we check last modified date unix timestsmp
94:
$last_mod =  filemtime('ripped_content.txt');
95:
// delete date. One day is 86400 unix one week 86400 * 7
96: //we get current last modified unix time and add one week to it
97:
$delete_date =   $last_mod 86400 7;
98:
//if the current date timestamp is greater than last modified pluss one week
99: //we rip again
100:
if( time() >= $delete_date){
101:
$html false;
102: if(
function_exists('curl_exec'){
103:
$html=stripslashes(curl_get_file_contents($url));
104: }else{
105:
$html=stripslashes(getURL('replace_url'));    
106: }
107:     if(
$html){//write html to file
108:     
file_put_contents('ripped_content.txt'$html);
109:     }
110:
111: }else{
//we get content from file
112:
$html stripslashes(file_get_contents('ripped_content.txt'));    
113: }
114:
115: }else{
// if file does not exist get new content
116:
if(function_exists('curl_exec'){
117:
$html=stripslashes(curl_get_file_contents($url));
118: }else{
119:
$html=stripslashes(getURL($url));    
120: }
121: }
122:
123:
// get all matches
124:             
function search($start,$end,$string){
125:             
$reg="!".preg_quote($start)."(.*?)".preg_quote($end)."!is";
126:             if(
preg_match_all($reg,$string,$matches)){
127:             
//if(preg_match($reg,$string,$matches)){
128:         
return $matches[0];
129:             }    
130:         else{ 
131:          return 
false;
132:          }
133:             
134:         }
135:
// so we should have html content to extract links etc
136:
if(!empty($link) && file_exists('ripped_content.txt') && $html){
137:  
$parts search($get_first$get_last$html);
138:  foreach ( 
$parts as  $part){
139:   if(
strpos($part'href="/')!== false){// sometime they use ' ' sometimes " "
140:    // the below str_replace for link must be modified
141:    //based on the link format of the site you are ripping
142:
echo str_replace('href="/','href="' $link '/',$part);
143: }else{
144: echo 
str_replace("href='","href='" $link "/",$part);    
145: }
146: echo 
"<hr>";
147: }
// if not replacing link or just static content rip
148:
}elseif(empty($link) && file_exists('ripped_content.txt') && $html){
149:
$parts search($get_first$get_last$html);
150:  foreach ( 
$parts as  $part){
151: echo 
$part;
152: echo 
"<hr>";
153: }
154: }    
155:
156:
?>
 

Click here for the script in a text file.

Please help me I am handicapped and support myself

Donate With PayPal

 

Donate Bitcoin

Please donate and help the handicapped.

19DQT9KTHabkJ7dUCHpzdg5XdSA5mFkCyJ



name:Quyen Date:01.9.24 @ 22:14pm Country:
A great help thanks I sent you 35 cents.


name:Steve Date:01.9.24 @ 22:14pm Country:
Thanks I sent you 89 cents.


name:Arty Date:01.9.24 @ 22:14pm Country:
Thanks your the greatest.


name:Mark Date:01.9.24 @ 22:14pm Country:
Thanks I admire you I sent you 67 cents.


name:Gavin Date:01.9.24 @ 22:14pm Country:
I will give it a try. thanks.


name:Precious Date:01.9.24 @ 22:14pm Country:
Thanks I sent you 71 cents.


name:Shantelle Date:01.9.24 @ 22:14pm Country:
Thanks I sent you 89 cents.


name:Tosha Date:01.9.24 @ 22:14pm Country:
Thanks you are cool I sent you 7 cents.


name:Rich Date:01.9.24 @ 22:14pm Country:
Thanks!


name:Arty Date:01.9.24 @ 22:14pm Country:
Thanks your the greatest.


name:Teddy Date:08.7.22 @ 06:38am IP:93141.2101.78.
Thanks for sharing.


name:Brandon Date:08.7.22 @ 06:38am IP:0.1.21311784.9
Thanks I sent you 71 cents.


name:Katerine Date:08.7.22 @ 06:38am IP:11..981702134.
This worked out great I sent you 19 cents.


name:Doloris Date:08.7.22 @ 06:19am IP:1714..1328.091
Thank you . I Sent you 5 cents.


name:Freddy Date:08.7.22 @ 06:19am IP:32918.4..07111
Worked out great for me I sent you 29 cents.



Name:
Click Here to Reload

My websites do not use cookies or any google spyware.

 

Quick Support: Make it short.
Email:

Message: