|
| ||||||||||
| Tuesday, May 13, 2008 | Advanced Search | Ask Feathers | New Files | Popular Files | Links | Contact | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
This lesson assumes some Tcl knowledge by the reader.
The tutorial may also be applied to the Tcl http package with
some minor tweaks, but for our purposes we will be focused
on egghttp.tcl.
1. Loading and Checking for egghttp.tcl
To load egghttp.tcl, you must use the source command within your
eggdrop's config file, or within a script.
ie. source scripts/egghttp.tcl
(assuming egghttp.tcl is in your scripts directory)
To check within your script if egghttp.tcl has been successfully loaded,
we would do a check against the variable $egghttp(version)
ie.
if {![info exists egghttp(version)]} {
putlog "egghttp.tcl was NOT successfully loaded."
}
2. Opening a connection to a web page
To open a connection to a page, we use the egghttp:geturl command, which
returns the connection descriptor, which we may or may not need to know.
You need to specify what server/page you are interested in obtaining,
and what procedure you want to be called once the connection has
been established and obtained all required information.
ie.
set sock [egghttp:geturl http://www.yourserver.tld/index.shtml your_callbackproc]
The procedure your_callbackproc (which, btw, you can name whatever you wish to call
your procedure) will be called with one parameter by default, the connection descriptor
(aka. socket id).
3. Callback procedure and obtaining data
As mentioned, the callback procedure will be called with the connection descriptor,
which holds the identification of what server connection we are dealing with,
which comes in handy incase we have issued multiple connections to multiple servers.
Your callback procedure will take the form of:
proc your_callbackproc {sock} {
# stuff here
}
To obtain the html data from the page within your callback procedure, we
need to use the procedures egghttp:headers, which returns all the data
before the <HTML> tag, and egghttp:data, which returns all other data, including
the <HTML> tag. Both of these routines, you must specify the connection
descriptor to which you are interested in obtaining data for.
ie.
proc your_callbackproc {sock} {
set headers [egghttp:headers $sock]
set body [egghttp:data $sock]
}
4. Parsing the HTML data
This is probably the most important part of the procedure, and probably the most difficult.
One thing to make note of, is that eggdrop sockets insert carriage returns after x amount
of characters (256 If I remember correctly), and also strips out blank lines, so the data
obtained in the variable $body, won't resemble the original page itself. A good strategy
to handle this, is to get rid of all carriage returns (\n's), and insert your own where
you want them.
For example, let's say the page we are interested in looks like this:
<HTML>
<BODY>
<font>Welcome to My site</font><br>
<center>
this is some random text... blah... blah...blah.....<br>
blah...blah....<br>
<br>
We have served <b>578</b> people to date!<br>
<br>
</center>
</BODY>
</HTML>
So, needless to say, any lines longer than 256 characters will be wrapped onto the next line,
and blank lines will be removed, when egghttp:data is called. This could cause problems in
finding what we are looking for, if we assume that the data is like it is in the original html source.
To handle this, we will format it ourselves, so we know how it will look.
For our example, we will get rid of all newline's (\n's) and insert our own wherever there is a <br>:
proc your_callbackproc {sock} {
set headers [egghttp:headers $sock]
set body [egghttp:data $sock]
regsub -all "\n" $body "" body
regsub -all -nocase {<br>} $body "<br>\n" body
}
So now, we know the HTML source in $body will look something like:
<HTML> <BODY> <font>Welcome to My site</font><br> <center> this is some random text... blah... blah...blah.....<br> blah...blah....<br> <br> We have served <b>578</b> people to date!<br> <br> </center> </BODY></HTML>
Now, let us say we are interested in finding out how many people have been served to date in the webpage.
One strategy is to use regexp to pull that data.
For example:
regexp {We have served <b>(.*)</b> people to date!} $body - served
Which will store "578" into the variable $served. Without going into too much detail about regexp, the (.*) in our regular
expression
tells Tcl we are interested in extracting text between "We have served <b>" and "</b> people to date!", and
spitting it out into a variable, which
we provided as being called "served".
Now, some of you may not be comfortable working with regexp's yet, so another method is to loop through line by line,
and use string functions.
For example:
foreach line [split $body \n] {
if {[string match "*We have served*" $line]} {
set start [string first "We have served <b>" $line]
set start [expr {$start + 18}] ;# 18 is the number of characters in "We have served <b>"
set end [string first "</b> people to date!" $line]
set end [expr {$end - 1}];# We don't want the "<" from "</b>"
set served [string range $line $start $end]
}
}
Note: Because of repeated patterns of text in HTML code, you may need to use a loop as well when using 'regexp' and make
sure you are
at a position where you know the text you want is.
5. Final product
Putting together all of what we have discussed in this tutorial, we end up with something like this:
--------------------------------
# egghttp_example.tcl
# Config
set url "http://www.yourserver.tld/index.shtml"
set dcctrigger "example"
# End of config
if {![info exists egghttp(version)]} {
putlog "egghttp.tcl was NOT successfully loaded."
putlog "egghttp_example.tcl has not been loaded as a result."
} else {
proc your_callbackproc {sock} {
global url
set headers [egghttp:headers $sock]
set body [egghttp:data $sock]
regsub -all "\n" $body "" body
regsub -all -nocase {<br>} $body "<br>\n" body
regexp {We have served <b>(.*)</b> people to date!} $body - served
putlog "Website '$url' has served $served people so far."
}
bind dcc o|o $dcctrigger our:dcctrigger
proc our:dcctrigger {hand idx text} {
global url
set sock [egghttp:geturl $url your_callbackproc]
return 1
}
putlog "egghttp_example.tcl has been successfully loaded."
}
------------------------------------