How to create a link checker
In this tutorial I will show you how to create your own basic link validate script to check link availability on any site.
How to create a link checker
Creating a basic link checker script is not very complicated task. If we think about a bit we can summarize the important sub-tasks as follows:
- Create a HTML form to get URL to check
- Open the main URL and store its content in a string
- Analyze the string and collect all URL and store them in an array
- Go through all of the array elements (URLs) and check the validity
- Display the result to the visitor
Step 1.
As first step we focus on URL processing with PHP. The HTML part (1. and 5. steps) will be at the end of the tutorial.
So you can open an URL using the PHP built in function fopen and then you can read the content with fread. In this tutorial we create a new function let's call it getPage() and this function accepts an URL string as parameter. Inside this new function we first try top open the URL and after that we read it's content in 1kbyte steps. The result will be stored in the $content variable. At the end we have the complete HTML code of the requested URL in the $content variable. This string will be returned by the function. The PHP code looks like this:
<?php
function getPage($link){
if ($fp = fopen($link, 'r')) {
$content = '';
while ($line = fread($fp, 1024)) {
$content .= $line;
}
}
return $content;
}
?>
Step 2.
Now we have the HTML code so the next step is to create a function which can analyze this string and collects all URL reference inside it. In HTML code the URLs are present inside an <a> tag as like this:
<a href="http://www.phptoys.com" alt="PhpToys">PhpToys</a>
So we need to find all <a> tag in the string. You can do this using strpos function. However it is not enough to find all <a> tag as you need only the URL link from the <a> tag parameters. So you need to make an other search to find the href parameter inside each <a> tag. At the end we put the URL inside an array. This array will be returned to the caller.
The function looks like this:
<?php
function checkPage($content){
$links = array();
$textLen = strlen($content);
if ( $textLen > 10){
$startPos = 0;
$valid = true;
while ($valid){
$spos = strpos($content,'<a ',$startPos);
if ($spos < $startPos) $valid = false;
$spos = strpos($content,'href',$spos);
$spos = strpos($content,'"',$spos)+1;
$epos = strpos($content,'"',$spos);
$startPos = $epos;
$link = substr($content,$spos,$epos-$spos);
if (strpos($link,'http://') !== false) $links[] = $link;
}
}
return $links;
}
?>
Step 3.
The last PHP function we need is to check a link validity. To do this we again use the fopen function. However in this case we don't want to get the HTML content of the link so if the function returns true then we can say that the link is alive. The realization is quite simple and a bit similar to our first function:
<?php
function pingLink($domain){
$file = @fopen($domain,"r");
$status = -1;
if (!$file) {
$status = -1; // Site is down
}
else {
$status = 1;
fclose($file);
}
return $status;
}
?>
Step 4.
The only missing part is to make an environment for our new functions. So we need to create a HTML page with a form where the visitor can provide the requested URL. After submit the code checks the URL and and calls our first and second functions to get the URLs list. With this list we can build a table where each row represents a link and it's status. To avoid long waiting after each link we display the actual status by calling the ob_flush function. This function force PHP to send the actual output buffer to the browser.
That's it!
You can find a complete link checker tool among the Products on this site.
On the next page you can find the complete source code of the script.
[newpage=Complete code]
<?php
function getPage($link){
if ($fp = fopen($link, 'r')) {
$content = '';
while ($line = fread($fp, 1024)) {
$content .= $line;
}
}
return $content;
}
function pingLink($domain){
$file = @fopen($domain,"r");
$status = -1;
if (!$file) {
$status = -1; // Site is down
}
else {
$status = 1;
fclose($file);
}
return $status;
}
function checkPage($content){
$links = array();
$textLen = strlen($content);
if ( $textLen > 10){
$startPos = 0;
$valid = true;
while ($valid){
$spos = strpos($content,'<a ',$startPos);
if ($spos < $startPos) $valid = false;
$spos = strpos($content,'href',$spos);
$spos = strpos($content,'"',$spos)+1;
$epos = strpos($content,'"',$spos);
$startPos = $epos;
$link = substr($content,$spos,$epos-$spos);
if (strpos($link,'http://') !== false) $links[] = $link;
}
}
return $links;
}
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "DTD/xhtml1-transitional.dtd">
<html>
<body>
<form action="<?php echo $_SERVER['PHP_SELF']; ?>" method="post" name="domain" id="domain">
<table width="100%">
<tr><td>URL to check:</td><td><input class="text" name="myurl" type="text" size="45"></td></tr>
<tr><td align="center" colspan="2"><br/><input class="text" type="submit" name="submitBtn" value="Check links"></td></tr>
</table>
</form>
<?php
if (isset($_POST['submitBtn'])){
$url = isset($_POST['myurl']) ? $_POST['myurl'] : '';
if (!(strpos($url,'http://') === 0) ) $url = 'http://'.$url;
?>
<table width="100%">
<?php
$txt = getPage($url);
$linkArray = checkPage($txt);
foreach ($linkArray as $value) {
/*if (pingLink($value) <= 0){
$status = "INVALID";
} else {
$status = "OK";
}*/
echo "<tr><td align='left'>$value</td><td>$status</td></tr>";
sleep(2);
@ob_flush();
flush();
}
?>
</table>
<?php
}
?>
</body>