Isuru Senadheera's Blog: February 2008

Friday, February 29, 2008

Defend Your ASP.NET Web Sites against Evil Bots

Robots are taking control of the Internet! Don’t let them overwhelm your Web site with their unrelenting, self-serving probes. Now you can fight back with this free control that allows you to discriminate between human and computer visitors.
While this might sound like a sci-fi promotion for the next Terminator or Transformers movie, in a way, that ominous sci-fi future is already here. But don’t be too afraid — just like in the movies, there are robots here to help us, too.

Find more about Defending Your ASP.NET Web Sites against Evil Bots at
http://steveorr.net/articles/CAPTCHASP.aspx

Thursday, February 28, 2008

How to use java script with ASP.NET 2.0

ASP.NET 2.0 has made quite a few enhancements over ASP.NET 1.x in terms of handling common client-side tasks. It has also created new classes, properties and method of working with JavaScript code. This article explores the enhancements and the various ways of injecting JavaScript programmatically into ASP.NET 2.0 pages.

More info can be found at;
http://dotnetslackers.com/articles/aspnet/JavaScript_with_ASP_NET_2_0_Pages_Part1.aspx

Unlocker

Ever had such an annoying message given by Windows?

It has many other flavors:
>> Cannot delete file:
>> Access is deniedThere has been a sharing violation.
>> The source or destination file may be in use.
>> The file is in use by another program or user.
>> Make sure the disk is not full or write-protected and that the file is not currently in use.

Unlocker is the solution!
For more information refer the following URL. http://ccollomb.free.fr/unlocker/

Top 10 Largest Databases in the World

We all collected things as children. Rocks, baseball cards, Barbies, perhaps even bugs -- we all tried to gather up as much stuff as possible to compile the biggest most interesting collection possible. Some of you may have even been able to amass a collection of items numbering into the hundreds (or thousands).
As the story always goes, we got older, our collections got smaller, and eventually our interests died out...until now.
There are currently organizations around the world in the business of amassing collections of things, and their collections number into and above the trillions. In many cases these collections, or databases, consist of items we use every day.
In this list, we cover the top 10 largest databases in the world:

10. Library of Congress
Not even the digital age can prevent the world's largest library from ending up on this list. The Library of Congress (LC) boasts more than 130 million items ranging from cook books to colonial newspapers to U.S. government proceedings. It is estimated that the text portion of the Library of Congress would comprise 20 terabytes of data. The LC expands at a rate of 10,000 items per day and takes up close to 530 miles of shelf space -- talk about a lengthy search for a book.
If you're researching a topic and cannot find the right information on the internet, the Library of Congress should be your destination of choice. For users researching U.S. history, around 5 million pieces from the LC's collection can be found online at American Memory.
Unfortunately for us, the Library of Congress has no plans of digitizing the entirety of its contents and limits the people who can check out materials to Supreme Court Justices, members of Congress, their respective staff, and a select few other government officials; however, anyone with a valid Reader Identification Card (the LC's library card) can access the collection.
By the Numbers
130 million items (books, photographs, maps, etc)
29 million books
10,000 new items added each day
530 miles of shelves
5 million digital documents
20 terabytes of text data

9. Central Intelligence Agency
The Central Intelligence Agency (CIA) is in the business of collecting and distributing information on people, places and things, so it should come as no surprise that they end up on this list. Although little is known about the overall size of the CIA's database, it is certain that the agency has amassed a great deal of information on both the public and private sectors via field work and digital intrusions.
Portions of the CIA database available to the public include the Freedom of Information Act (FOIA) Electronic Reading Room, The World Fact Book, and various other intelligence related publications. The FOIA library includes hundreds of thousands of official (and occasionally ultra-sensitive) U.S. government documents made available to the public electronically. The library grows at a rate of 100 articles per month and contains topics ranging from nuclear development in Pakistan to the type of beer available during the Korean War. The World Fact Book boasts general information on every country and territory in the world including maps, population numbers, military capabilities and more.
By the Numbers
100 FOIA items added each month
Comprehensive statistics on more than 250 countries and entities
Unknown number of classified information

8. Amazon
Amazon, the world's biggest retail store, maintains extensive records on its 59 million active customers including general personal information (phone number address, etc), receipts, wishlists, and virtually any sort of data the website can extract from its users while they are logged on. Amazon also keeps more than 250,000 full text books available online and allows users to comment and interact on virtually every page of the website, making Amazon one of the world's largest online communities.
This data coupled with millions of items in inventory Amazon sells each year -- and the millions of items in inventory Amazon associates sell -- makes for one very large database. Amazon's two largest databases combine for more than 42 terabytes of data, and that's only the beginning of things. If Amazon published the total number of databases they maintain and volume of data each database contained, the amount of data we know Amazon houses would increase substantially.
But still, you say 42 terabytes, that doesn't sound like so much. In relative terms, 42 terabytes of data would convert to 37 trillion forum posts.
By the Numbers
59 million active customers
More than 42 terabytes of data

7. YouTube
After less than two years of operation YouTube has amassed the largest video library (and subsequently one of the largest databases) in the world. YouTube currently boasts a user base that watches more than 100 million clips per day accounting for more than 60% of all videos watched online.
In August of 2006, the Wall Street Journal projected YouTube's database to the sound of 45 terabytes of videos. While that figure doesn't sound terribly high relative to the amount of data available on the internet, YouTube has been experiencing a period of substantial growth (more than 65,000 new videos per day) since that figures publication, meaning that YouTube's database size has potentially more than doubled in the last 5 months.
Estimating the size of YouTube's database is particularly difficult due to the varying sizes and lengths of each video. However if one were truly ambitious (and a bit forgiving) we could project that the YouTube database will expect to grow as much as 20 terabytes of data in the next month.
Given: 65,000 videos per day X 30 days per month = 1,950,000 videos per month; 1 terabyte = 1,048,576 megabytes. If we assume that each video has a size of 1MB, YouTube would expect to grow 1.86 terabytes next month. Similarly, if we assume that each video has a size of 10MB, YouTube would expect to grow 18.6 terabytes next month.
By the Numbers
100 million videos watched per day
65,000 videos added each day
60% of all videos watched online
At least 45 terabytes of videos

6. ChoicePoint
Imagine having to search through a phone book containing a billion pages for a phone number. When the employees at ChoicePoint want to know something about you, they have to do just that. If printed out, the ChoicePoint database would extend to the moon and back 77 times.
ChoicePoint is in the business of acquiring information about the American population -- addresses and phone numbers, driving records, criminal histories, etc., ChoicePoint has it all. For the most part, the data found in ChoicePoint's database is sold to the highest bidders, including the American government.
But how much does ChoicePoint really know? In 2002 ChoicePoint was able to help authorities solve a serial rapist case in Philadelphia and Fort Collins after producing a list of 6 potential suspects by data mining their DNA and personal records databases. In 2001 ChoicePoint was able to identify the remains of World Trade Center victims by matching DNA found in bone fragments to the information provided by victim's family members in conjunction to data found in their databases.
By the Numbers
250 terabytes of personal data
Information on 250 million people

5. Sprint
Sprint is one of the world's largest telecommunication companies as it offers mobile services to more than 53 million subscribers, and prior to being sold in May of 2006, offered local and long distance land line packages.
Large telecommunication companies like Sprint are notorious for having immense databases to keep track of all of the calls taking place on their network. Sprint's database processes more than 365 million call detail records and operational measurements per day. The Sprint database is spread across 2.85 trillion database rows making it the database with the largest number of rows (data insertions if you will) in the world. At its peak, the database is subjected to more than 70,000 call detail record insertions per second.
By the Numbers
2.85 trillion database rows.
365 million call detail records processed per day
At peak, 70,000 call detail record insertions per second

4. Google
Although there is not much known about the true size of Google's database (Google keeps their information locked away in a vault that would put Fort Knox to shame), there is much known about the amount of and types of information Google collects.
On average, Google is subjected to 91 million searches per day, which accounts for close to 50% of all internet search activity. Google stores each and every search a user makes into its databases. After a years worth of searches, this figure amounts to more than 33 trillion database entries. Depending on the type of architecture of Google's databases, this figure could comprise hundreds of terabytes of information.
Google is also in the business of collecting information on its users. Google combines the queries users search for with information provided by the Google cookies stored on a user's computer to create virtual profiles.
To top it off, Google is currently experiencing record expansion rates by assimilating into various realms of the internet including digital media (Google Video, YouTube), advertising (Google Ads), email (GMail), and more. Essentially, the more Google expands, the more information their databases will be subjected to.
In terms of internet databases, Google is king.
By the Numbers
91 million searches per day
accounts for 50% of all internet searches
Virtual profiles of countless number of users

3. AT&T
Similar to Sprint, the United States' oldest telecommunications company AT&T maintains one of the world's largest databases. Architecturally speaking, the largest AT&T database is the cream of the crop as it boasts titles including the largest volume of data in one unique database (312 terabytes) and the second largest number of rows in a unique database (1.9 trillion), which comprises AT&T's extensive calling records.
The 1.9 trillion calling records include data on the number called, the time and duration of the call and various other billing categories. AT&T is so meticulous with their records that they've maintained calling data from decades ago -- long before the technology to store hundreds of terabytes of data ever became available. Chances are, if you're reading this have made a call via AT&T, the company still has all of your call's information.
By the Numbers
323 terabytes of information
1.9 trillion phone call records

2. National Energy Research Scientific Computing Center
The second largest database in the world belongs to the National Energy Research Scientific Computing Center (NERSC) in Oakland, California. NERSC is owned and operated by the Lawrence Berkeley National Laboratory and the U.S. Department of Energy. The database is privy to a host of information including atomic enegry research, high energy physics experiements, simulations of the early universe and more. Perhaps our best bet at traveling back in time is to fire up NERSC's supercomputers and observe the big bang.
The NERSC database encompasses 2.8 petabytes of information and is operated by more than 2,000 computational scientists. To put the size of NERSC into perspective, the total amount of spoken words in the history of humanity is estimated to be at 5 exabytes; in relative terms, the NERSC database is equivalent to 0.055% of the size of that figure.
Although that may not seem a lot at first glance, when you factor in that 6 billion humans around the globe speak more than 2,000 words a day, the sheer magnitude of that number becomes apparent.
By the Numbers
2.8 petabytes of data
Operated by 2,000 computational scientists

1. World Data Centre for Climate
If you had a 35 million euro super computer lying around what would you use it for? The stock market? Building your own internet? Try extensive climate research -- if there's a machine out there that has the answer for global warming, this one might be it. Operated by the Max Planck Institute for Meteorology and German Climate Computing Centre, The World Data Centre for Climate (WDCC) is the largest database in the world.
The WDCC boasts 220 terabytes of data readily accessible on the web including information on climate research and anticipated climatic trends, as well as 110 terabytes (or 24,500 DVD's) worth of climate simulation data. To top it off, six petabytes worth of additional information are stored on magnetic tapes for easy access. How much data is six petabyte you ask? Try 3 times the amount of ALL the U.S. academic research libraries contents combined.
By the Numbers
220 terabytes of web data
6 petabytes of additional data

* Additional Databases

The following databases were unique (and massive) in their own right, and just fell short of the cut on our top 10 list.

Nielsen Media Research / Nielsen Net Ratings
Best known for its television audience size and composition rating abilities, the U.S. firm Nielsen Media Research is in the business of measuring mass-media audiences including television, radio, print media, and the internet. The database required to process such statistics as Google's daily internet searches is nothing short of massive.

Myspace
It would seem appropriate that the world's largest social networking site, Myspace, has a rather large database to keep up with all of its user's content.
United States Customs
The U.S. Customs database is unique in that it requires information on hundreds of thousands of people and objects entering and leaving the United States borders instantaneously. For this to be possible, the database was special programmed to process queries near instantaneously.

HPSS
There are various databases around the world using technology similar to that found in our countdown's second largest database NERSC. The technology is known as High Performance Storage System or HPSS. Several other massive HPSS databases include Lawrence Livermore National Laboratory, Sandia National Laboratories, Los Alamos National Laboratory, Commissariat a l'Energie Atomique Direction des Applications Militaires, and more.

C# Generics

Good article which shows beauty of generics.
http://msdn2.microsoft.com/en-us/library/ms379564(vs.80).aspx

Google's famous MapReduce algorithm

MapReduce is a distributed programming model intended for processing massive amounts of data in large clusters. Find more about this at;

http://www.theserverside.com/tt/knowledgecenter-tc/knowledgecenter-tc.tss?l=MapReduce

Performance Comparison between SQL server 2005 32bit and 64bit versions

Quite interesting findings,

http://sqlblog.com/blogs/linchi_shea/archive/2007/01/02/32-bit-vs-x64.aspx

Dynamic Stored Procedures

Can you change the behavior of your SP on the fly. You just create a SQL as a String and Execute that with EXEC.
http://www.4guysfromrolla.com/webtech/020600-1.shtml

If you would like the query plan, sp_executeSQL can be used for optimizations,
http://www.4guysfromrolla.com/webtech/sqlguru/q120899-2.shtml

Top 10 Icon Design Mistakes

Top 10 Icon Design Mistakes - Must for BA/UI guys

http://turbomilk.com/truestories/cookbook/criticism/10-mistakes-in-icon-design/

.

Running IE6 and IE7 on the same computer

For checking for browser compatibility checks, you might need to run both versions of IE on the same computer.
But currently, it is not possible since IE 7 overwrites IE 6.
There’s a solution provided by the following site to overcome this problem.
http://tredosoft.com/Multiple_IE

They provide IE 3, 4, 5.5 & 6.
For IE 7, you can use the conventional installer available at http://download.microsoft.com/download/3/8/8/38889DC1-848C-4BF2-8335-86C573AD86D9/IE7-WindowsXP-x86-enu.exe

Semantic Web Concept

What is semantic web?
http://www.w3.org/2001/sw/

Example(Check the different dimension of presenting data)
http://www.silobreaker.com/

Understanding the difference between “IS NULL” and “= NULL”

When a variable is created in SQL with the declare statement it is created with no data and stored in the variable table (vtable) inside SQLs memory space. The vtable contains the name and memory address of the variable. However, when the variable is created no memory address is allocated to the variable and thus the variable is not defined in terms of memory.

When you SET the variable it is allotted a memory address and the initial data is stored in that address. When you SET the value again the data in the memory address pointed to by the variable is then changed to the new value.

Now for the difference and why each behaves the way it does.

“= NULL”

“= NULL” is an expression of value. Meaning, if the variable has been set and memory created for the storage of data it has a value. A variable can in fact be set to NULL which means the data value of the objects is unknown. If the value has been set like so:

DECLARE @val CHAR(4)
SET @val = NULL

You have explicitly set the value of the data to unknown and so when you do:

If @val = NULL
It will evaluate as a true expression.

But if I do:

DECLARE @val CHAR(4)
If @val = NULL
It will evaluate to false.

The reason for this is the fact that I am checking for NULL as the value of @val. Since I have not SET the value of @val no memory address has been assigned and therefore no value exists for @val.

Note: See section on SET ANSI_NULLS (ONOFF) due to differences in SQL 7 and 2000 defaults that cause examples to not work. This is based on SQL 7.

“IS NULL”

Now “IS NULL” is a little trickier and is the preferred method for evaluating the condition of a variable being NULL. When you use the “IS NULL” clause, it checks both the address of the variable and the data within the variable as being unknown. So if I for example do:

DECLARE @val CHAR(4)
If @val IS NULL
PRINT ‘TRUE’
ELSE
PRINT ‘FALSE’

SET @val = NULL
If @val IS NULL
PRINT ‘TRUE’
ELSE
PRINT ‘FALSE’

Both outputs will be TRUE. The reason is in the first @val IS NULL I have only declared the variable and no address space for data has been set which “IS NULL” check for. And in the second the value has been explicitly set to NULL which “IS NULL” checks also.

SET ANSI_NULLS (ONOFF)

Now let me throw a kink in the works. In the previous examples you see that = NULL will work as long as the value is explicitly set. However, when you SET ANSI_NULLS ON things will behave a little different.

Ex.
DECLARE @val CHAR(4)
SET @val = NULL
SET ANSI_NULLS ON
If @val =NULL
PRINT ‘TRUE’
ELSE
PRINT ‘FALSE’

SET ANSI_NULLS OFF
If @val =NULL
PRINT ‘TRUE’
ELSE
PRINT ‘FALSE’

You will note the first time you run the = NULL statement after doing SET ANSI_NULLS ON you get a FALSE and after setting OFF you get a TRUE. The reason is as follows.

Excerpt from SQL BOL article “SET ANSI_NULLS”

The SQL-92 standard requires that an equals (=) or not equal to (<>) comparison against a null value evaluates to FALSE. When SET ANSI_NULLS is ON, a SELECT statement using WHERE column_name = NULL returns zero rows even if there are null values in column_name. A SELECT statement using WHERE column_name <> NULL returns zero rows even if there are nonnull values in column_name.
When SET ANSI_NULLS is OFF, the Equals (=) and Not Equal To (<>) comparison operators do not follow the SQL-92 standard. A SELECT statement using WHERE column_name = NULL returns the rows with null values in column_name. A SELECT statement using WHERE column_name <> NULL returns the rows with nonnull values in the column. In addition, a SELECT statement using WHERE column_name <> XYZ_value returns all rows that are not XYZ value and that are not NULL.
End Excerpt

So as defined by SQL92, “= NULL” should always evaluate false. So even setting the value explicitly means you will never meet the = NULL if condition and your code may not work as intended. The biggest reason where = NULL will shoot you in the foot is this, SQL 7 when shipped and installed is defaulted to ANSI_NULL OFF but SQL 2000 is defaulted to ANSI_NULL ON. Of course you can alter this several ways but if you upgraded a database from 7 to 2000 and found the = NULL worked only when you set if explicitly when you roll out a default 2000 server your code now breaks and can cause data issues.

Yet another reason to use IS NULL instead as under SQL 92 guidelines it is still going to evaluate to TRUE and thus your code is safer for upgrading the server.

Summary

If summary unless you need to check that the value of a variable was set to equal NULL and you have set ANSI_NULLS ON, then always use the “IS NULL” clause to validate if a variable is NULL. By using = NULL instead you can cause yourself a lot of headaches in trying to troubleshoot issues that may arise from it, now or unexpectedly in the future.

Find tons of mp3s in google

Here’s a little search query that allow you to find easily tons of mp3 in Google
-inurl:(phphtmhtmlasp) + “index of” +(mp3oggwma) +mozart

Obviously you can change “mozart” with everything you want

"*=" vs OUTER JOIN in T-SQL

I have recently got some problems with a query in T-SQL. I just put the LEFT JOIN down in the WHERE clause as *=, and it worked as it should.

select col1, col2, col3
from table1 LEFT JOIN table2
ON tablecol1 = tablecol2
where
table1_key = table2_key
and table1_category = 1
and table2_category = 2

vs

select col1, col2, col3
from table1, table2
where
table1_key = table2_key
and table1_category = 1
and table2_category = 2
and table1_key *= table2_key

Table1 contains 1000's of activities, and additional activity-info is located in table2 (sometimes not present), so i want to print Null-values when there are no additional activity-info. With example1 i get only 172 rows, but 1000+ rows then executing the second one. Therefore I was looking for the differences in those two examples and found the following interesting explanation.

Here is how OUTER JOINs work in SQL-92. Assume you are given:

Table1 Table2
a b a c
==== ======
1 w 1 r
2 x 2 s
3 y 3 t
4 z

and the outer join expression :

Table1 LEFT OUTER JOIN Table2
ON Table1.a = Table2.a <== join condition
AND Table2.c = 't'; <== single table condition

We call Table1 the "preserved table" and Table2 the "unpreserved table" in the query. What I am going to give you is a little different, but equivalent to the ANSI/ISO standards.

1) We build the CROSS JOIN of the two tables. Scan each row in the result set.
2) If the predicate tests TRUE for that row, then you keep it. You also remove all rows derived
from it from the CROSS JOIN
3) If the predicate tests FALSE or UNKNOWN for that row, then keep the columns from the
preserved table, convert all the columns from the unpreserved table to NULLs and remove the duplicates.

So let us execute this by hand:
Let @ = passed the first predicate
Let * = passed the second predicate

Table1 CROSS JOIN Table2
a b a c
=========================
1 w 1 r @
1 w 2 s
1 w 3 t *
2 x 1 r
2 x 2 s @
2 x 3 t *
3 y 1 r
3 y 2 s
3 y 3 t @* <== the TRUE set
4 z 1 r
4 z 2 s
4 z 3 t *

Table1 LEFT OUTER JOIN Table2
a b a c
=========================
3 y 3 t <= only TRUE row
-----------------------
1 w NULL NULL Sets of duplicates
1 w NULL NULL
1 w NULL NULL
-----------------------
2 x NULL NULL
2 x NULL NULL
2 x NULL NULL
3 y NULL NULL <== derived from the TRUE set - Remove
3 y NULL NULL
-----------------------
4 z NULL NULL
4 z NULL NULL
4 z NULL NULL

the final results:
Table1 LEFT OUTER JOIN Table2
a b a c
=========================
1 w NULL NULL
2 x NULL NULL
3 y 3 t
4 z NULL NULL

The basic rule is that every row in the preserved table is represented in the results in at least one result row.

There are limitations and very serious problems with the extended equality version of an outer join used in some diseased mutant products. Consider the two Chris Date tables

Suppliers SupParts
supno supno partno qty
========= ==============
S1 S1 P1 100
S2 S1 P2 250
S3 S2 P1 100
S2 P2 250

and let's do an extended equality outer join like this:
SELECT * FROM Supplier, SupParts
WHERE Supplier.supno *= SupParts.supno
AND qty <>

If I do the outer first, I get:
Suppliers LOJ SupParts
supno supno partno qty
=======================
S1 S1 P1 100
S1 S1 P2 250
S2 S2 P1 100
S2 S2 P2 250
S3 NULL NULL NULL

Then I apply the (qty < 200) predicate and get
Suppliers LOJ SupParts
supno supno partno qty
===================
S1 S1 P1 100
S2 S2 P1 100

Doing it in the opposite order
Suppliers LOJ SupParts
supno supno partno qty
===================
S1 S1 P1 100
S2 S2 P1 100
S3 NULL NULL NULL

Sybase does it one way, Oracle does it the other and Centura (nee Gupta) lets you pick which one -- the worst of both non-standard worlds! In SQL-92, you have a choice and can force the order of execution. Either do the predicates after the join ...

SELECT *
FROM Supplier
LEFT OUTER JOIN SupParts
ON Supplier.supno = SupParts.supno
WHERE qty <>

... or do it in the joining:
SELECT *
FROM Supplier
LEFT OUTER JOIN SupParts
ON Supplier.supno = SupParts.supno
AND qty <>

Another problem is that you cannot show the same table as preserved and unpreserved in the extended equality version, but it is easy in SQL- 92. For example to find the students who have taken Math 101 and might have taken Math 102:

SELECT C1.student, C1.math, C2.math
FROM (SELECT * FROM Courses WHERE math = 101) AS C1
LEFT OUTER JOIN
(SELECT * FROM Courses WHERE math = 102) AS C2
ON C1.student = C2.student;