Simple Search and Spider Flatfile URL Submit Search Engine.
Hi! again!.This is a simple search and spider script that spiders the url when submitted and writes the description to file.
As with all my scripts its flatfile database and packed with the essentials and its Xtendable. CMXtendable
How? You can run a cron and re spider.
You can rip links write to file and spider and index them etc etc..
Click here for dom a tag ripper.
Features: Full URL submit validation and input sanitizing.
Simple spider feature that spiders the validated url and rips description and keywords and writes to database.
Spidering can done with fsockopen, curl, url fopen or file get contents depending whats enabled on your server.
Fast search results using strpos
Added new mouseover link site preview feature.
Search speed print out with memory usage.
Logs IP and time along with validated url.
Scans file first on url submit for duplicate IP and url.
Unix time is added to end of data line string for use if you want to filter entries by date and delete.
Backup results by StartPage.
Well sort of. if no results found the StartPage search box loads with the keywords in text box.
Whats StartPage?.
Its the only search engine I use for steath and privacy.
They dont record IP.
No referrer is sent when clicking search results.
No search string in URL
Total control over your searches with user settings
And for you Google lovers which I am not, The results are enhanced from Google.
Check it out the demo and give it a whirl on this site!
You can download it for free below. Please donate a dollar for me.
Its works great and looks great too!. I hope you can find it useful. Check it out below. If you have a comment stop at the forum!
Click here for search demo and search this site.
1: <?php
2: //CMXads-Simple Submit URL Search and Spider Script
3: //copyright 2015 HandicappedGeorge@CMXads.com
4: //Released under DONATEware terms. FREE FOR PERSONAL USE.
5: //Use change FOR YOUR PERSONAL USE. Redistribution not allowed.
6: date_default_timezone_set('America/New_York');
7: //search search word max count
8: $word_count='4';
9: //your file duh!
10: $filename='temp.txt';
11: //leave thse as is
12: $ok= '1';
13: $links ='';
14: $found='';
15: $needles='';
16: $part='';
17: //your search page name /scriptname
18: $comRefresh='mysearch.php';
19: $clientip= @$_SERVER['REMOTE_ADDR'];
20: $unix=time();
21:
22: function clean($text){
23: $text = strip_tags($text);
24: $text = htmlspecialchars($text, ENT_QUOTES);
25: $text = trim($text);
26: return ($text); }
27:
28: function highlight($need,$haystack) {
29: return str_ireplace($need,"<span style=\"font-weight:bold;color:darkred;\">".$need."</span>",$haystack);
30: }
31: //utf lock and get file
32: function flock_contents($filename){
33: $return = FALSE;
34: if(file_exists($filename) and is_readable($filename)){
35: if($handle = @fopen($filename, 'r')){
36: while(!$return){
37: if(flock($handle, LOCK_SH)){
38: if($return = file_get_contents($filename)){
39: flock($handle, LOCK_UN);
40: }}}
41: fclose($handle);
42: }}
43: return $return;
44: }
45:
46: //cuts our result page description text
47: function snippet($text,$length) {
48: $text = trim($text);
49: $txtl = strlen($text);
50: if($txtl > $length) {
51: for($i=1;$text[$length-$i]!=" ";$i++) {
52: if($i == $length) {
53: return substr($text,0,$length);
54: }
55: }
56: $text = substr($text,0,$length-$i+1);
57: }
58: return $text;
59: }
60: //multi get url contents, fsockopen works all the time
61: //see curl below if enabled on your server
62: //this gets http only
63: function fetchURL($url) {//see line 189 to change
64: $host = $url;
65: $port = 80;
66: $fp = fsockopen($host, $port, $errno, $errstr, 1);
67: if ($fp) {
68: //stream_set_timeout($fp,1);
69: $out = "GET / HTTP/1.1\r\n";
70: $out .= "Host: $host\r\n";
71: $out .= "Connection: Close\r\n\r\n";
72: fwrite($fp, $out);
73: $body = false;
74: while (!feof($fp)) {
75: $s = fgets($fp, 1024);
76: if ($body)
77: $in .= $s;
78: if ($s == "\r\n")
79: $body = true;
80: }
81: fclose($fp);
82:
83: return $in;
84: }
85: }
86: //this gets http and https
87: function getURL($url) {
88: $in='';
89: if (!parse_url($url)) {
90: return false;
91: }
92: $host= parse_url($url,PHP_URL_HOST);
93: $scheme= parse_url($url,PHP_URL_SCHEME);
94: switch ($scheme) {
95: case 'https':
96: $scheme = 'ssl://';
97: $port = 443;
98: break;
99: case 'http':
100: default:
101: $scheme = '';
102: $port = 80;
103: }
104:
105: $fp = @fsockopen($scheme . $host, $port, $errno, $errstr, 30);
106:
107: if ($fp) {
108: stream_set_timeout($fp,5);
109: $out = "GET / HTTP/1.1\r\n";
110: $out .= "Host: $host\r\n";
111: $out .= "Connection: Close\r\n\r\n";
112: fwrite($fp, $out);
113: $body = false;
114: while (!feof($fp)) {
115: $s = fgets($fp, 1024);
116: if ($body)
117: $in .= $s;
118: if ($s == "\r\n")
119: $body = true;
120: }
121: fclose($fp);
122:
123: return $in;
124: }
125: }
126: //if curl enabled
127: function curl_get_file_contents($URL)
128: {
129: $c = curl_init();
130: curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
131: curl_setopt($c, CURLOPT_URL, $URL);
132: $contents = curl_exec($c);
133: curl_close($c);
134:
135: if ($contents) return $contents;
136: else return FALSE;
137: }
138:
139: //if url fopen enabled
140: if (!function_exists('file_put_contents')) {
141: function file_put_contents($filename, $data) {
142: $f = @fopen($filename, 'w');
143: if (!$f) {
144: return false;
145: } else {
146: $bytes = fwrite($f, $data);
147: fclose($f);
148: return $bytes;
149: }
150: }
151: }
152: if(isset($_POST['get_html'])){//if url submitted
153:
154: foreach ($_POST as $key => $value) {
155: $$key = clean($value);
156: }
157:
158: if($check != $sum or !is_numeric($check) or !is_numeric($sum) or strlen($check)>4 or strlen($sum)>4)
159: {
160: print "<br><br><b>Numeric entry error</b>";
161: echo("<meta http-equiv=\"Refresh\" content=\"2;URL=$comRefresh\">");
162: exit;
163: }
164:
165: $url=$get_page;
166: $string=flock_contents($filename);
167: if(!empty($string)){
168: if(strpos($string,$url)!==false or strpos($string,$clientip)!==false){
169: $ok= '';
170: }}
171: //simple url regex check
172: $urlregex = "^(https?|ftp)\:\/\/([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?@)?[a-z0-9+\$_-]+(\.[a-z0-9+\$_-]+)*(\:[0-9]{2,5})?(\/([a-z0-9+\$_-]\.?)+)*\/?(\?[a-z+&\$_.-][a-z0-9;:@/&%=+\$_.-]*)?(#[a-z_.-][a-z0-9+\$_.-]*)?\$";
173:
174: if ($ok and preg_match("$urlregex^", $url)) {
175: $ok= '1';
176: } else {
177: $ok= '';
178: echo "<p style=\"margin:auto;width:200px;text-align:center;font-weight:bold;color:crimson;\">bad or invalid link</p>";
179: }
180:
181: if ($ok and filter_var($url, FILTER_VALIDATE_URL) === FALSE) {
182: $ok= '';
183: echo "<p style=\"margin:auto;width:200px;text-align:center;font-weight:bold;color:crimson;\">bad or invalid link</p>";
184:
185: }
186:
187:
188: if ($ok){
189:
190: //we are using fsock with http and https switch
191:
192: $content= getURL($url);
193:
194: if($content){
195: //gets meta tags and html tag content for search string
196: function get_text_data($html) {
197: preg_match_all('/<[\s]*meta[\s]*name="?' . '([^>"]*)"?[\s]*' . 'content="?([^>"]*)"?[\s]*[\/]?[\s]*>/si', $html, $out,PREG_PATTERN_ORDER);
198:
199: $data=array();
200: for ($i=0;$i < count($out[1]);$i++) {
201: // loop through the meta data - add other meta tags here if you need
202: if (strtolower($out[1][$i]) == "description") $data[]= $out[2][$i];
203: if (strtolower($out[1][$i]) == "keywords") $data[]= $out[2][$i];
204: }
205: return $data;
206: }
207: function alphaNumSpace($c1)
208: {
209: $c1=trim(preg_replace('/(\s)\s+/', ' ', $c1));
210: return trim(preg_replace("/[^A-Za-z0-9\s]/", "",$c1));
211: }
212: if($tags=strtolower(alphaNumSpace(implode(' ', get_text_data($content))))){
213: if($sl=strlen($tags)>='200'){
214: $length='200';
215: }else{
216: $length=$sl;
217: }
218: $searchStr=snippet($tags,$length);
219: file_put_contents($filename,"$searchStr||$url||$clientip||$unix\n",LOCK_EX|FILE_APPEND);
220: }else{
221: echo "<span style=\"font-weight:bold;color:crimson;\">bad or invalid link</span>";
222: }
223: echo "<span style=\"font-weight:bold;color:green;\">Your site has been successfully indexed in our search engine.</span>";
224: } else{
225: echo "<span style=\"font-weight:bold;color:crimson;\">bad or invalid link</span>";
226: }
227:
228: }
229: }
230: ?>
231: <!doctype html>
232: <html lang="en">
233: <head>
234: <title>CMXads URL Submit Search and Simple Spider Flat File Search Engine</title>
235: <meta charset="utf-8">
236: <meta name="robots" content="noindex">
237: <meta name="description" content="Submit your website tothe new search engine. With secure private backup search with https StartPage no track or referrer search.uses the free search submit and spider script by george from cmxads its another great simple flatfile database script donateware.">
238: <style type="text/css">
239: body, html {
240: margin-top:10px;
241: font-size:14px;
242: height:100%;
243: font-family: Tahoma, Arial;
244: background:#E0E0E0;
245: background: -webkit-gradient(linear, 0% 0%, 0% 100%, from(#FFFFFF), to(#E0E0E0 ));*/
246: background: -moz-linear-gradient(top, #FFFFFF 0%, #E0E0E0 100%);*/
247: background: -o-linear-gradient(top, #FFFFFF 0%, #E0E0E0 100%);*/
248: background: linear-gradient(top, #FFFFFF 0%, #E0E0E0 100%);*/
249: }
250: #wrapper {
251: padding:10px;
252: width:700px;
253: margin: auto;
254: min-height:100%;
255: background-color: #ffffff;
256: border: 2px solid #333;
257: /* curved border radius */
258: -moz-border-radius:20px;
259: -webkit-border-radius:20px;
260: -o-border-radius:20px;
261: border-radius:20px;
262: }
263:
264: </style>
265:
266: </head>
267:
268: <body>
269: <div id="wrapper">
270: <div style="width:300px;height:140px;float:left;">
271: <h3>Search the Web with Handicapped George</h3>
272: <form name="searchform" id="searchform" action="" method="get">
273: <table>
274: <tr><td>
275: <input type="text" name="keyword" value="<?php echo @$_GET['keyword'];?>" id="keyword"></td><td>
276: <img src="th2.jpg" height="25px"></td><td>
277: <input type="submit" style="width:100px" value="Search now">
278: </td></tr>
279: </table>
280: </form>
281: </div>
282: <div style="width:300px;height:140px;float:right;">
283: <b>Submit</b><br>
284: We will spider your site.<br>
285: Enter url: http://www.example.com<br>
286: <form method="POST" action="">
287: <input type="text" size='40' maxlength='50' name="get_page" value=""><br>
288: <?php
289: $c1=rand(5, 105);
290: $c2=rand(5, 150);
291: $c3 = $c1 + $c2;
292: print "What is the sum of $c1+$c2=";
293: print "<input name='sum' type='hidden' value='$c3'>";
294: print "<input name='check' type='text' maxlength='4' size='8'><br>";
295: ?>
296: <input type="submit" name="get_html" value="Submit your site">
297: </form>
298: </div>
299: <hr style="clear:both;">
300:
301: <?php
302:
303:
304:
305: if(!empty($_GET['keyword'])){
306:
307: // Calculate the start time of page loading
308: $starttime = microtime();
309: $startarray = explode(" ", $starttime);
310: $starttime = $startarray[1] + $startarray[0];
311:
312: $keywords= clean($_GET['keyword']);
313: //remove extra spaces and trim keywords
314: $needle=preg_replace('/(\s)\s+/', ' ',$keywords);
315: $needle=trim(strtolower(preg_replace('#[^A-Za-z0-9\s]#', '',$needle)));
316: //if space
317: if(substr_count($needle,' ')>0){
318: //explode into array
319: $needles=explode(' ',$needle);
320:
321: //remove words less than 3 characters long
322: $needles = array_filter($needles, function($x) { return strlen($x) >= 3; });
323: //remove duplicate keywords
324: $needles=$needles3=array_unique($needles);
325: //slice array to keywords limit
326: if(count($needles)>$word_count){
327: $needles=$needles3=array_slice($needles,0,$word_count);
328: }
329: $cnt=count($needles);
330: $needles=implode(' ', $needles);
331: }
332: else{
333: $needles=$needle;
334: $cnt='1';
335: }
336:
337:
338: //count final keywords total after filter
339:
340:
341: //$c = count($words1);
342:
343: if($string=flock_contents($filename)){
344: $links .= '<table><tr><td><img src="th2.jpg" height="50px"></td><td><strong>Search results</strong></td></tr></table>';
345: //$links .= '<br style="clear:both;">';
346:
347: $parts=explode("\n", $string);
348: $parts = array_map('trim',$parts);
349: $parts =array_values(array_filter($parts));
350: $n=count($parts);
351: //echo $n;
352:
353:
354: foreach ($parts as $value){
355: if (strpos($value, $needles)!==false) {
356: $found='1';
357: $part=explode('||', $value);
358:
359: if($part){
360:
361:
362: //$links .= "<h4>Search results</h4>";
363: $links .= "<div class=\"break-word\" style=\"margin-left:100px;width:500px;\">Term found (".substr_count($part[0],$needles)."x) " . snippet(highlight($needles,ucwords($part[0])),'150')."\r\n";
364:
365: $links .= "<span><a href=\"$part[1]\" target=\"_blank\">$part[1]</a></span></div><hr>\r\n";
366: }
367: }}
368: if($found) {
369: echo $links;
370: }
371:
372: }
373:
374:
375:
376:
377: if(!$found) {
378: echo '<div style="width:450px;height:140px;margin:auto;">';
379: echo "<p>Sorry, your search: <b>"" . $needles. ""</b> returned zero results.<br>Search the web in complete privacy wth StartPage enhanced by Google.</p>";
380: ?>
381:
382: <form method=POST accept-charset="UTF-8" action='https://startpage.com/do/search' style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" id=searchsys name="searchsys" onsubmit="javascript:document.searchsys.query.value=document.searchsys.keyword.value;">
383: <table style="border-bottom: #808080 1px solid; border-left: #808080 1px solid; border-top: #808080 1px solid; border-right: #808080 1px solid" border=0 cellspacing=0 cellpadding=0 height=38>
384: <tbody>
385: <tr>
386: <td align=middle valign=bottom style="padding-bottom: 1px; padding-left: 0px; padding-right: 5px; padding-top: 0px"><a style="background:url(https://startpage.com/graphics/startpage_searchbox_logo.jpg) no-repeat;filter: progid:dximagetransform.microsoft.alphaimageloader(src= https://startpage.com/graphics/startpage_searchbox_logo.jpg,sizingmethod='scale'); width: 70px; display: inline-block; height: 38px" href= 'https://startpage.com' class='no-cpt-prgpt'><!--apply full url path for background--><img border=0 alt=StartPage src= https://startpage.com/graphics/spacer.gif width=70 height=38></a></td>
387: <td style="height: 38px;" height=38 ><input type=hidden name=query><input class="width_update_class" style="width : 260px; _height : 23px; font : 14px verdana, arial, sans-serif; padding : 0px 6px; *padding : 1px 6px 6px; padding-top:1px !important; padding-bottom:1px !important;outline: none; background:white;" name=keyword value=<?php echo $needles;?> ><input value=sb type=hidden name=frm><input value=process_search type=hidden name=cmd><input value=english type=hidden name=language></td>
388: <td style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; height: 38px; padding-top: 0px" height=38 ><input style="margin: 0px 5px" value="Search" type=submit></td></tr></tbody></table>
389: <style type=text/css>.ie9fixq_opt1{height:23px!important; padding-top:0px !important; padding-bottom:0px !important;}</style>
390: </form>
391: </div>
392: <?php
393:
394: }
395: $links .= '</div>';
396:
397:
398: if(!empty($_GET['keyword'])){
399:
400: // Calculates the end time of the page loading and deductes the time taken to load page
401: $endtime = microtime();
402: $endarray = explode(" ", $endtime);
403: $endtime = $endarray[1] + $endarray[0];
404: $totaltime = $endtime - $starttime;
405: $totaltime = round($totaltime,4);
406: echo "<div id='timetaken'><p>This search took $totaltime seconds to complete.";
407:
408: function convert($size)
409: {
410: $unit=array('b','kb','mb','gb','tb','pb');
411: return @round($size/pow(1024,($i=floor(log($size,1024)))),2).' '.$unit[$i];
412: }
413:
414: echo ' Memory usage ' . convert(memory_get_usage(true)) . '</p></div>'; // 123 kb
415:
416: }//end nc
417: // echo '</div>';
418: }
419: echo '</div><br>';
420: echo '<p><a href="http://www.cmxads.com/search-and-spider-script.php">Cmxads.com Search and spider script</a></p>';
421: echo '</body>';
422: echo '</html>';
423: ?>
424:
Please help me I am handicapped and support myself
Donate With PayPal
Donate Bitcoin
Please donate and help the handicapped.
16KFkxbDSSqq5RefxCSDSDa3j6EHx1kpiPI will give it a try. thanks.
name:Kathrine Date:08.30.24 @ 17:55pm Country:
I will give it a try. thanks.
name:Hilton Date:08.30.24 @ 17:55pm Country:
Thanks I admire you I sent you 67 cents.
name:Stephani Date:08.30.24 @ 17:55pm Country:
Tyvm.
name:Kathrine Date:08.30.24 @ 17:55pm Country:
I will give it a try. thanks.
name:Gavin Date:08.30.24 @ 17:55pm Country:
This worked out great I sent you 19 cents.
name:Rozella Date:08.30.24 @ 09:48am Country:
I will give it a try. thanks.
name:Yuette Date:08.30.24 @ 09:48am Country:
I will give it a try. thanks.
name:Janiece Date:08.30.24 @ 09:48am Country:
Tyvm.
name:Arty Date:08.30.24 @ 09:48am Country:
Thanks I love you I sent you 58 cents.
name:Coralee Date:08.30.24 @ 09:48am Country:
A great help thanks I sent you 35 cents.
name:Arleen Date:08.27.24 @ 14:31pm Country:
Thank you you are the best. Works like a charm. I sent you 22 cents.
name:Rachal Date:08.27.24 @ 14:31pm Country:
Thanks I admire you I sent you 67 cents.
name:Tosha Date:08.27.24 @ 14:31pm Country:
Thanks!
name:Lucretia Date:08.27.24 @ 14:31pm Country:
Thanks your the greatest.
name:Kathrine Date:08.27.24 @ 14:31pm Country:
Thanks you are awesome. Working out good for me. I sent you 48 cents.