You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This starts to run, and then a dialog box appears:
The program can't start because SSLEAY32_.dll is missing from your computer. ...
This seems to be because google.com has an https:// link:
W3C Link Checker version 4.81 (c) 1999-2011 W3C
GET http://www.google.com/ fetched in 1.44 seconds
Processing http://www.google.com/
Settings used:
- Accept: text/html, application/xhtml+xml;q=0.9, application/vnd.wap.xhtml+xml;q=0.6, */*;q=0.5
- Accept-Language: (not sent)
- Referer: sending
- Cookies: not used
- Sleeping 1 second between requests to each server
Parsing...
done (6 lines in 0.06 seconds).
Checking anchors...
done.
Checking link http://www.google.com/intl/en/ads/
HEAD http://www.google.com/intl/en/ads/
-> HEAD https://www.google.com/intl/en/ads/ fetched in 59.67 seconds
Checking link https://mail.google.com/mail/?tab=wm
HEAD https://mail.google.com/mail/?tab=wm fetched in 1.00 seconds
I have looked through the code and would suggest fixing this by changing line 29 of checklink.pl from:
$ENV{PATH} = undef;
to:
$ENV{PATH} = join($Config{path_sep}, $Config{installbin}, $Config{sitebin});
I have tested that change on Strawberry Perl on Win7 x64 and on Ubuntu 16.04 perl and it works OK. On the PC, a path is often needed for libraries like SSL.
Enhancement:
I am in the process of setting up an internal server cron job to check websites periodically and send an email if they have broken links or bad HTML/CSS. Currently, checklink.pl doesn't do HTML validation, it just points to a separate validation service in its report. I could use a list of links in a script or batch file to call a separate validation program for each link, or I could modify checklink.pl to do that internally. I have tried to modify checklink.pl to do both:
To output list of pages checked, at line 542, insert:
To call the nu validator for each page checked, at line 1354, insert:
# XXXX RB add page validation:
{
use Capture::Tiny ':all';
my $out = capture_merged {
# system ('/usr/bin/java', '-Xss1024k','-jar','./vnu.jar','--format','text','--asciiquotes','--skip-non-html',$response->{absolute_uri});
system ('C:\ProgramData\Oracle\Java\javapath\java.exe', '-Xss1024k','-jar','vnu.jar','--format','text','--asciiquotes','--skip-non-html',$response->{absolute_uri});
};
if ($? >> 8) {
print "\nPage Validation: $response->{absolute_uri}\n";
print "$out \n";
}
else {
print "\nPage Validation: $response->{absolute_uri} - OK\n" unless $Opts{Summary_Only};
}
}
Note that I have a line commented out for the system call on linux/unix. Because this program runs under taint mode, a system("whole command line string") call won't work. Since $ENV{PATH} = undef, on Windows, there is no path to search for the java runtime. (The suggested bugfix above will make useful paths for Perl programs, but not Java) On Linux, you could try /usr/local/bin:/usr/bin and hope it works most of the time. An alternative is to pass the path to java in from a command line argument or an entry in the config file. The command line opens security issues that taint mode tries to prevent, as a user could specify any malicious program as "java". Though, taint mode for a non-setuid command line program seems severe.
Putting this in the config file would open up the opportunity for the user to specify a config file with W3C_CHECKLINK_CFG that similarly contains a malicious path. However, this does not seem of much use unless checklink.pl is running with elevated privileges, which it doesn't need.
I would be interested in your thoughts on the bugfix and enhancement ideas.
The text was updated successfully, but these errors were encountered:
I've been using the LinkChecker as installed by CPAN and have two points:
Using Strawberry Perl on a Win7 x64 machine, executing checklink.bat results in a complaint about taint mode:
This can be worked around by invoking the script directly from perl, such as:
C:\util\html>perl -wT \Strawberry\perl\site\bin\checklink www.google.com
`
This starts to run, and then a dialog box appears:
The program can't start because SSLEAY32_.dll is missing from your computer. ...
This seems to be because google.com has an https:// link:
I have looked through the code and would suggest fixing this by changing line 29 of checklink.pl from:
$ENV{PATH} = undef;
to:
$ENV{PATH} = join($Config{path_sep}, $Config{installbin}, $Config{sitebin});
I have tested that change on Strawberry Perl on Win7 x64 and on Ubuntu 16.04 perl and it works OK. On the PC, a path is often needed for libraries like SSL.
I am in the process of setting up an internal server cron job to check websites periodically and send an email if they have broken links or bad HTML/CSS. Currently, checklink.pl doesn't do HTML validation, it just points to a separate validation service in its report. I could use a list of links in a script or batch file to call a separate validation program for each link, or I could modify checklink.pl to do that internally. I have tried to modify checklink.pl to do both:
To output list of pages checked, at line 542, insert:
To call the nu validator for each page checked, at line 1354, insert:
Note that I have a line commented out for the system call on linux/unix. Because this program runs under taint mode, a system("whole command line string") call won't work. Since $ENV{PATH} = undef, on Windows, there is no path to search for the java runtime. (The suggested bugfix above will make useful paths for Perl programs, but not Java) On Linux, you could try /usr/local/bin:/usr/bin and hope it works most of the time. An alternative is to pass the path to java in from a command line argument or an entry in the config file. The command line opens security issues that taint mode tries to prevent, as a user could specify any malicious program as "java". Though, taint mode for a non-setuid command line program seems severe.
Putting this in the config file would open up the opportunity for the user to specify a config file with W3C_CHECKLINK_CFG that similarly contains a malicious path. However, this does not seem of much use unless checklink.pl is running with elevated privileges, which it doesn't need.
I would be interested in your thoughts on the bugfix and enhancement ideas.
The text was updated successfully, but these errors were encountered: