Tuesday, October 27, 2009

feed finder in php

Here is a php snippet which finds the rss and atom links of a web site by parsing the meta information in the header section of the page.

Friday, October 23, 2009

feed finder in python

Here is a python code snippet which finds the RSS link in any web site...

import sys
from urllib2 import urlopen
from urlparse import urljoin
from HTMLParser import HTMLParser, HTMLParseError

class FeedAutodiscoveryParser(HTMLParser):
# These are the MIME types of links accepted as feeds
FEED_TYPES = ('application/rss+xml',
def __init__(self, base_href):
self.base_href = base_href
self.feeds = []
def handle_starttag(self, tag, attrs_tup):
tag = tag.lower()
attrs = dict([(k.lower(), v) for k,v in attrs_tup])
if tag == "base" and 'href' in attrs:
self.base_href = attrs['href']
if tag == "link":
rel = attrs.get("rel", "")
type = attrs.get("type", "")
title = attrs.get("title", "")
href = attrs.get("href", "")
if rel == "alternate" and type in self.FEED_TYPES:
'type' : type,
'title' : title,
'href' : href
def getFeedsDetail(url):
data = urlopen(url).read()
parser = FeedAutodiscoveryParser(url)
except HTMLParseError:
for feed in parser.feeds:
feed['href'] = urljoin(parser.base_href, feed['href'])
return parser.feeds
def getFeeds(url):
return [ x['href'] for x in getFeedsDetail(url) ]

def main():
url = sys.argv[1]
feeds = getFeedsDetail(url)
print "Site %s : " % url
print "###########################################"
for feed in feeds:
print "Title : '%(title)s' \nType : %(type)s \nURI : %(href)s" % feed
print "------------------------------------------------------------------------"

if __name__ == "__main__":

The use is...

F:\Python26>python minifeedfinder.py http://www.timesofindia.com/

Site http://www.timesofindia.com/ :

Title : ''
Type : application/rss+xml
URI : http://www.timesofindia.com/rssfeedsdefault.cms

F:\Python26>python minifeedfinder.py http://asitdhal.blogspot.com/

Site http://asitdhal.blogspot.com/ :

Title : 'Life like this - Atom'
Type : application/atom+xml
URI : http://asitdhal.blogspot.com/feeds/posts/default
Title : 'Life like this - RSS'
Type : application/rss+xml
URI : http://asitdhal.blogspot.com/feeds/posts/default?alt=rss

The equivalent php code is in the following link...

Saturday, October 17, 2009

Country and City from ip address (php)

Here is a code, I made from scratch to get the geographical information from ip address.

This code does not implement any validation of address. As it depends upon whois server to perform lookup, the result takes time to show output.

Wednesday, October 7, 2009

link extractor in python

I my engineering, I coded a python script that will extract links from a web page.
Here is the code...

import urllib
import sys
import os.path
import sgmllib

print "\n\n\t\tlipun4u[at]gmail[dot]com"
print "\t\t------------------------"

appname = os.path.basename(sys.argv[0])

class MyParser(sgmllib.SGMLParser):
"A simple parser class."

def parse(self, s):
"Parse the given string 's'."

def __init__(self, verbose=0):
"Initialise an object, passing 'verbose' to the superclass."

sgmllib.SGMLParser.__init__(self, verbose)
self.hyperlinks = []

def start_a(self, attributes):
"Process a hyperlink and its 'attributes'."

for name, value in attributes:
if name == "href":

def get_hyperlinks(self):
"Return the list of hyperlinks."

return self.hyperlinks

if len(sys.argv) not in [2,]:
print "Usage : " + appname + " "
print "e.g. : " + appname + " www.google.com "
elif "-h" in sys.argv:
print "Usage : " + appname + " "
print "e.g. : " + appname + " www.google.com "
elif "--help" in sys.argv:
print "Usage : " + appname + " "
print "e.g. : " + appname + " www.google.com "

site = sys.argv[1].replace("http://","")
site = "http://" + site.lower()

print "Target : " + site
site_data = urllib.urlopen(site)
parser = MyParser()
print "Error in connecting site ", site
print msg
links = parser.get_hyperlinks()
print "Total no. of hyperlinks : " + str(len(links))
print ""
for l in links:
print l

Here is the help file



Usage : linkscan1.py
e.g. : linkscan1.py www.google.com

I:\Python26>linkscan1.py www.iter.ac.in

Target : http://www.iter.ac.in
Total no. of hyperlinks : 12



But some guys added some spice to it and look what they made...

key logger in C++

Now time to do some bad work. Here I will give you the code of a key logger that works fine in windows NT platform. The code is not mine unlike the previous one. The original link is http://www.rohitab.com/discuss/index.php?showtopic=19360.

Before this, read this link to know what this stuff is ???

Here goes the code...

Compile it in Microsoft Visula Studio 6.0. I don't know if it can be compiled in any other compiler. As you run, it will log all the keys pressed in keys.log file in the same folder in which the executable file is present.You will see the console window as no code added to hide this.

No add some code in the main function to make it invisible...

Now the key logger is ready to run. You want to know how to start this during windows start up..http://packetstormsecurity.org/Win/auto.txt

N.B. This is for educational purpose. If anyone gets busted by using this code, I won't be responsible.

Sunday, October 4, 2009

timeout in Session (PHP)

  • Sessions allow the PHP script to store data on the web server that can be later used, even between requests to different php pages.

  • When a session is created, a flat-file is created on the server. Since the session ID is a unique identifier, those session files will accumulate over time.

  • The PHP garbage collector deletes old files from time to time. But the garbage collector is invoked with a certain probability, not every time the web server runs.

  • The default timeout for session files is 1440 seconds or 24 minutes. So a session file can be deleted after that timeout, but it may reside on the server longer, depending on the amount of sessions created - here comes the probability into the game.

  • The session may reside in server with a lifetime until the browser is closed, but the garbage collector might delete the session file much earlier. In this case, and if there is a session request after the session file has been deleted, a new session is created and the old session information is lost. This is annoying.

  • There are 3 variables described in PHP.ini file, which deal with the garbage collector

    Variabledefault valueChangeable
    session.gc_maxlifetime1440 secondsPHP_INI_ALL

    session.gc_probability along with session.gc_divisor is used to manage probability that the gc (garbage collection) routine is invoked. The probability is calculated by using gc_probability/gc_divisor.

  • The garbage collection timeout can be changed.

    $timeout = 7200; // 7200 seconds = 2 hour
    ini_set('session.gc_maxlifetime', $timeout);

  • Session timeout can be reduced without changing the global variable programmatically .

    // set timeout period in seconds
    $inactive = 600;
    if(isset($_SESSION['timeout']) ) {
    $session_life = time() - $_SESSION['timeout'];
    if($session_life > $inactive) {
    session_destroy(); header("Location: logoutpage.php"); }
    $_SESSION['timeout'] = time();

Friday, October 2, 2009

login page in PHP(naive)

This example shows how to design the login page of a web site in php for naive programmers.

First create the database(I am using mysql)

password VARCHAR(16) NOT NULL,
PRIMARY KEY(user_id)

Now, let's insert some data.

(user_name, password)
('asit', 'lipu');

(user_name, password)
('google', 'yahoo');

Now this is the html page that displays the login page.


Now let's make the php code that makes the necessary database connection..


$hostname = "localhost";
$username = "root";
$password = "iitiit";
$database = "db2";
$link = mysql_connect($hostname, $user, $password)or die("Mysql con't be connected");
mysql_select_db($database, $link) or die("Database can't be connected");

As the user enters the username and password, the requested information is sent to the server and the login.php script will be invoked.
Here is the code


If the user successfully logs in, then the page is redirected to welcome.php, otherwise to error.html



Few unknown facts

  • No website in this world uses this technique as the login information is not encrypted.

Thursday, October 1, 2009

Euclidean Algorithm(GCD)

Greatest Common Divisor can easily be calculated easily using Euclidean Algorithm.

This is the non recursive pseudo code

function gcd(a, b)
while b ≠ 0
t := b
b := a mod b
a := t
return a

int gcd(int a, int b)
if (a == 0)
return b;
while (b != 0)
if (a > b)
a = a - b;
b = b - a;
return a;

This is the recursive pseudo code

function gcd(a, b)
if b = 0
return a
return gcd(b, a mod b)

int gcd(int a, int b)
if (b==0)
return a;
return gcd(b, a%b);

get IP address in C(windows)

I have written this code snippet in my 2nd year which finds the ip address of windows machine. This is a simple code. Just go throw this...

explicit keyword in C++

Look at the following code.

Here ABC a=100 is equivalent to ABC a(100);
This is known as an implicit conversion. This reduces the readability of the code. This can be avoided by using the keyword explicit.

By prefixing the constructor with the explicit keyword, we can prevent the compiler from using that constructor for implicit conversions.

Look at the following code.

In this case ABC a=100 will be an error. We can only call this by using the constructor notation.

Some more information about explicit...

  • The explicit keyword is used to declare a single-argument constructor that can only be called explicitly. If the constructor takes multiple argument, it's use is useless.

  • It is only used in declarations of constructors within a class declaration.