↓
 

SOB: Son of a Bos'n

There are three states of being... Alive, Dead or at Sea.

SOB: Son of a Bos'n
  • About
  • The Boatswain
  • eQuoria
  • Salute the Flag
Home→Published 2016 → March

Monthly Archives: March 2016

HTTrack

SOB: Son of a Bos'n Posted on March 31, 2016 by BobMarch 31, 2016  

HTTrack is a free (GPL) and easy-to-use offline browser utility.

Basically, it allows you to download the contents of a internet site to a local directory. It builds a complete set of recursively directories, getting HTML, images, and other files from the server and stashing them on your computer.  These are static, HTML images of the original site, even if it was built using some database centered, dynamic page tool.

I find it great for archiving copies of my sites before making major changes, or shutting them down.

Using HTTrack

There are versions of HTTrack for multiple OS environments.  The one I use is for a standard Linux system.  I have configured it to run from a script as a CRON task.  The script reads a series of files that list small collections of web sites.  It only processes one site at a time, to prevent overloading remote sites that are on shared servers. It stashes each collection in a designated directory on my local server for local backup and browsing.

One of the nice features of the %L function is that it automatically builds an index of the site collections in the target folder.

httrack -%U apache -F "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1" 
     -%L LinkList-01 -O /home/Mirror/Mirror-01 --update

The file list (LinkList-01) is a simple list of targeted sites.   I found that WordPress sites seem to like to be listed as “http://sob.boatswain.us/”, while my Mediawiki sites won’t work with that and need to be listed without the domain garbage, simply as “sysadm.equoria.com”.

The user agent (-F) is explained in the next section.

user agent 403 rejections

There appears to be a problem with many sites related to the default User Agent identification.

Like a good boy, HTTrack identifies itself when it connects, and immediately get rejected.

Using wget as a testing tool, you can see that it is the HTTrack User Agent that triggers the forbidden message.

[root@neptune temp]# ls -l
total 0
[root@neptune temp]#
[root@neptune temp]# wget -U "Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)" www.garg.com 2>&1 | egrep HTTP
HTTP request sent, awaiting response... 403 Forbidden
[root@neptune temp]# ls -l
total 0
[root@neptune temp]#
[root@neptune temp]# wget www.garg.com 2>&1
[root@neptune temp]# ls -l
total 4
-rw-r--r--. 1 root root 3288 Mar 24 09:55 index.html
[root@neptune temp]#

This is handled by the security software on the server. The problem is that they simply do not have HTTrack registered in their database of approved agents.

Use the -F option in httrack to change the user agent message.

F  user-agent field (-F "user-agent name") (--user-agent)

In the example above, I used;

-F "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1"

The user agent text “Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1” came from Firefox pages on User Agent String.Com.

For additional information, open the HTTrack Users Guide and scroll down the the section on Browser Options.

Posted in Technology | Leave a reply

Republican Politics

SOB: Son of a Bos'n Posted on March 30, 2016 by BobMarch 30, 2016  

I wrote the following post in 2009. I received a call from a close friend the other day. After “catching up”, he shifted to the reason behind his call. He and his wife were bothered by the debate on health care. Knowing that I was involved with the Republican Party, he called to express his concerns. His concerns were simple. In all of the reports on the Health Care debate, he did not see a true debate. He heard over … Continue reading →

Posted in At Sea, Commentary | Leave a reply

Apple vs Lazy Big Brother

SOB: Son of a Bos'n Posted on March 10, 2016 by BobMarch 10, 2016  

OK — Color me clueless, but I simply do not understand the government’s position.  They simply do not appear to know what it takes to decrypt an iPhone, so they are taking the easy way out.  Make someone else do it. 1. If you have too many failed attempts, using the iPhone to access the data, the phone will erase the disk. So, trying different passwords is not an option. 2. The first thing you do in a case like … Continue reading →

Posted in Commentary, Politics | Leave a reply

Recent Posts

  • SmaRT service to MCC
  • Sociopathic Investors Need Their Pints of Blood
  • AB-2895 Mobilehome Park Rent Increases – OPPOSE
  • Pro-Life should not be just Anti-Abortion
  • Google Privacy Checkup

Community

  • Ashworth-Remillard House
  • Five Wounds Urban Village
  • Silicon Valley VTA Sprinter

eQuoria

  • eQuoria
  • The Boatswain

Categories

  • age-friendly
  • Alive
  • At Sea
  • Commentary
  • Dead
  • Family
  • Life as a Brat
  • MH Living
  • Politics
  • Spirituality
  • Technology
  • The Poetic Side
  • US Navy

Login

  • Register
  • Log in

Archives

  • November 2020
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
  • April 2020
  • January 2020
  • December 2019
  • November 2019
  • October 2019
  • September 2019
  • July 2019
  • June 2019
  • August 2017
  • April 2017
  • September 2016
  • August 2016
  • June 2016
  • March 2016
  • January 2016
  • December 2015
  • November 2015
  • October 2015
  • July 2015
  • June 2015
  • May 2015
  • April 2015
  • March 2015
  • February 2015
  • December 2014
  • October 2014
  • March 2014
  • January 2013
  • September 2012
  • June 2012
  • April 2012
  • July 2010
  • June 2010
  • April 2010
  • January 2010
  • February 2008
  • March 2007
  • September 2006
  • July 2006
  • May 2004
  • November 2003
© 2017 eQuoria
↑