There's no need to reinvent the wheel on this one. There's already a PEAR library that interfaces with the W3C HTML Validator API. They're willing to do the work for you, so why not let them? :)
While it isn't strictly PHP, (it is a executable) one i really like is w3c's HTML tidy. it will show what is wrong with the HTML, and fix it if you want it to. It also beautifies HTML so it doesn't look like a mess. runs from the command line and is easy to integrate into php.
check it out. http://www.w3.org/People/Raggett/tidy/
Videos
Maybe you need to check if the string is well formed.
I would use a function like this
function check($string) {
$start =strpos($string, '<');
$end =strrpos($string, '>',$start);
$len=strlen($string);
if ($end !== false) {
$string = substr($string, $start);
} else {
$string = substr($string, $start,
start);
}
libxml_use_internal_errors(true);
libxml_clear_errors();
$xml = simplexml_load_string($string);
return count(libxml_get_errors())==0;
}
Just a warning: html permits unbalanced string like the following one. It is not an xml valid chunk but it is a legal html chunk
<ul><li>Hi<li> I'm another li</li></ul>
Disclaimer I've modified the code (without testing it). in order to detect well formed html inside the string.
A last though Maybe you should use strip_tags to control user input (As I've seen in your comments)
You can use DomDocument's method loadHTML
If you want to validate (X)HTML documents, you can use PHP's native DOM extension:
DOMDocument::validateโ Validates the document based on its DTD
Example from Manual:
$dom = new DOMDocument;
$dom->load('book.xml'); // see docs for load, loadXml, loadHtml and loadHtmlFile
if ($dom->validate()) {
echo "This document is valid!\n";
}
If you want the individual errors, fetch them with libxml_get_errors()
I asked a similar question and you might check out some of the answers there.
In summary, I would recommend either running the HTML through tidy on the host or writing a short script to validate through W3C remotely. Personally, I don't like the tidy option because it reformats your code and I hate how it puts <p> tags on every line.
Here's a link to tidy and here's a link to the various W3C validation tools.
One thing to keep in mind is that HTML validation doesn't work with server-side code; it only works after your PHP is evaluated. This means that you'd need to run your code through the host's PHP interpreter and then 'pipe' it to either the tidy utility or the remote validation service. That command would look something like:
$ php myscript.php | tidy #options go here
Personally, I eventually chose to forgo the headache and simply render the page, copy the source and validate via direct input on the W3C validation utility. There are only so many times you need to validate a page anyway and automating it seemed more trouble than it's worth.
Good luck.
You do not validate the PHP file on the W3C Validator. What the validator validates is the (X)HTML markup, the output of your PHP pages.
If your PHP project is hosted on the net, simply give the URL to the validator.
If your PHP project is hosted locally, you will have to save the output to a text file and upload this to the validator. To do this, from your browser, open the "File" menu and choose "Save Page As"... (or press CTRL + S).
Alternatively, there are a variety of validator plugins available for your browser. Here are some for Firefox and Chrome.
W3C doesn't standardize php, so you can't validate the php files themselves. But, to validate the HTML or XHTML content of your site, just paste the url into the W3C validator.