There are many user voluntary and involuntary techniques available to identify web surfers. Every request (click or keystroke) passes not only the information relevant only to that particular request over the web but also some extremely poignant data in the information packet. This allows the resulting server and every server that the information passes through (with the ability to decipher the information packet) on its journey to record in a database.
Considering only those voluntary ways to collect data besides the user inputting the data themselves, it is possible by the use of the information packets described above to build a database profile containing which site they have come from, their entire journey on the site in question (content, clicks), which browser they are using, which operating system they are using, which language(s) they prefer, their geographical location and Internet Service Provider (ISP) (based on IP address) and search terms used to arrive at the site.
With analysis software any site, have collected such data, can attempt to identify its visitors by IP address or apply gathered data to an explicitly used user account.
It is fairly easy for their databases to have entity integrity, if they are well designed, where an accurate means of identification exists (login to an account, long lasting and undeleted cookie, identification by undetected malicious software, etc.) as this would ensure that no duplicate records existed and that the primary key is not null; the definition of entity integrity. Database software systems (such as MS SQL Server) can be enforced to attempt to ensure that records are not duplicated however this is only as good as the data collected itself. However when relying up gathered information solely, the primary key (or a secondary key) is likely to be the IP address collected as this represents the only unique key and this can change depending on how and where the visitor is surfing from; multiple visitors may use the same IP address (if allocated dynamically by their ISP), they may use different machines so that cookies and any other software method of identification is lost. This would result in duplicate records in the database, even though this could not be identified.
In conclusion I agree somewhat with the CEO of Sun that there is no such thing as privacy on the web and, in my opinion, there is an implied usage term in using the Web; if you are concerned about privacy, don’t use it.
Coronel, Morris & Rob (2009) Database Systems: Design, Implementation, and Management (9th Edition). Cengage Learning.
Rosella (2005) Web Mining: Web search and Web navigation Pattern Analyzer [Online]. http://www.roselladb.com/surf-pattern-analyzer.htm (Accessed 11 April 2010).
Webcredible (2005) Why track your visitor’s behaviour? [Online]. Available at http://www.webcredible.co.uk/user-friendly-resources/web-usability/track-visitors.shtml (Accessed 11 April 2010).