Apache Lounge


About Forum Index Downloads Search Register Log in RSS X

Keep Server Online

If you find the Apache Lounge, the downloads and overall help useful, please express your satisfaction with a donation.



A donation makes a contribution towards the costs, the time and effort that's going in this site and building.

Thank You! Steffen

Your donations will help to keep this site alive and well, and continuing building binaries. Apache Lounge is not sponsored.
Post new topic   Forum Index -> Apache View previous topic :: View next topic
Reply to topic   Topic: strange "deep path" entries in access.log?

Joined: 02 Apr 2016
Posts: 5
Location: Taiwan, Taichung

PostPosted: Sat 02 Apr '16 10:19    Post subject: strange "deep path" entries in access.log? Reply with quote

Hi all from forum newbie Smile

I have many such strange entries in my /var/log/apache2/access.log:
Code: - - [02/Apr/2016:05:35:03 +0800] "GET /~ckhung/index.php/a/a/b/al/b/al/b/al/b/svg/dl/b/tk/b/dg/c/s/b/pr/ HTTP/1.1" 200 4096 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" - - [02/Apr/2016:06:35:24 +0800] "GET /~ckhung/index.php/a/a/b/al/b/al/b/al/b/svg/p/algotutor/b/svg/b/al/mm/c/l/ HTTP/1.1" 200 4097 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" - - [02/Apr/2016:08:50:14 +0800] "GET /~ckhung/index.php/a/a/b/al/b/al/b/al/b/svg/p/algotutor/s/c/v/c/mentor/c/p/toy/ HTTP/1.1" 200 4100 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

As you can see there are 20 (sometimes up to 30!) levels of subdirectories in the path. But in fact the deepest of my documents is only at 5 levels deep (e.g. /~ckhung/i/jq/presentation/images/structure )

Initially I thought it was just an annoying but harmless harvester making some honest mistakes. But on second thought it just doesn't look right since this IP repeatedly access such nonsense deep paths and rarely any other meaningul short/shallow paths. I can't find any such discussions by googling. So here are a few questions:
1. Did you see similar things in you access.log?
2. Is there some known apache (old?) vulnerability related to this kind of requests?
3. According to the host command and also according to http://whatismyipaddress.com/ip/ this IP belongs to Microsoft. Is there a reasonable explanation for that?

In case you need it, here is how you can inspect your log about this problem:

perl -ne 'print "$1\n" if m#GET ((/[^/\s]*)+) #' /var/log/apache2/access.log | sed 's#[^/]##g' | sort | uniq -c
perl -ne 'print if m#GET ((/[^/\s]*){20,}) #' /var/log/apache2/access.log

The first command creates a simple statistics about path depts. The second command prints entries whose depths are 20 or more.

Many thanks for your time and many thanks to apachelounge for providing this forum!

OS: Debian GNU/Linux 7 (wheezy)
apache: 2.2.22-13+deb7u6
Back to top
James Blond

Joined: 19 Jan 2006
Posts: 7374
Location: Germany, Next to Hamburg

PostPosted: Mon 04 Apr '16 11:49    Post subject: Reply with quote

I did not see that before. I wonder why your PHP script doesn't send a 404 in such a case. That would stop the bot from searching.
Back to top

Joined: 02 Apr 2016
Posts: 5
Location: Taiwan, Taichung

PostPosted: Mon 04 Apr '16 17:06    Post subject: Reply with quote

A decade ago or maybe earlier I switched all my static pages from .shtml to .php . So for most php pages the only php commands are mostly just nclude directives. The rest of (the majority of) the contents are just plain html. To me the more intriguing question is: why would the robot invent and insist on only visiting such strange deep paths if its author did not have bad intention? Or what does it try to accomplish if it indeed is a bad robot?
Back to top
James Blond

Joined: 19 Jan 2006
Posts: 7374
Location: Germany, Next to Hamburg

PostPosted: Mon 04 Apr '16 21:23    Post subject: Reply with quote

Well the robots often try some URL parameters. And in your case the server returns a 200 instead of a 404 that is why the bot continues to look for different content.

It is a bad robot? Maybe. Some hacker or script kiddy tools try a lot to break into a server. And the weak part is mostly a script and not the server itself.

Since you are on linux you can use iptables to block certain IPs.
Back to top

Joined: 02 Apr 2016
Posts: 5
Location: Taiwan, Taichung

PostPosted: Tue 05 Apr '16 10:44    Post subject: Reply with quote

Indeed prior to posting this question, I just wrote a filter for fail2ban to block such requiests. I wanted to know if this was worth sharing and wanted to explain what this was blocking.

And thanks for pointing out the 200 success code. That's very strange. I also tried to identify such requests in my php scripts as you suggested. However according to https://stackoverflow.com/questions/22742086/getting-parameters-in-a-url-without-question-mark and https://stackoverflow.com/questions/13813316/removing-question-mark-from-query-string-with-apache-and-htaccess , it seems that without enabling mod rewrite, my php script wouldn't be able to process a request that has no question mark. And indeed I verified with "apache2ctl -M" that I didn't enable mod rewrite.

So I guess a relevant question is: shouldn't apache reply 404 for such a request when mod rewrite is not enabled?
Back to top
James Blond

Joined: 19 Jan 2006
Posts: 7374
Location: Germany, Next to Hamburg

PostPosted: Tue 05 Apr '16 12:22    Post subject: Reply with quote

It is this setting: https://httpd.apache.org/docs/current/mod/core.html#acceptpathinfo

Just turn it off.
Back to top

Joined: 02 Apr 2016
Posts: 5
Location: Taiwan, Taichung

PostPosted: Wed 06 Apr '16 16:05    Post subject: Reply with quote

Many thanks, James! I will try that tomorrow when my server is up again. (It's on power-off vacation these few days ^_^) If I don't report back then it means this solves my problem beautifully.
Back to top

Joined: 02 Apr 2016
Posts: 5
Location: Taiwan, Taichung

PostPosted: Sat 09 Apr '16 6:43    Post subject: Reply with quote

hmmm no luck. I added

AcceptPathInfo Off

into /etc/apache2/mods-available/userdir.conf and the apache2 service successfully restarted. But once I removed the fail2ban jails, those deep path requests started swarming in within a few hours.
Back to top

Reply to topic   Topic: strange "deep path" entries in access.log? View previous topic :: View next topic
Post new topic   Forum Index -> Apache