How to Defensibly Collect Web Pages and Social Media Posts When Risk of Spoliation is High or it is Infeasible to Collect from the Web Host
By Paul Easton and Tom Klaff
We’ve come a long way from the dark ages of electronic discovery, when courts regarded Web-based evidence as “voodoo information”, “adequate for almost nothing.”[i] Today’s courts tend to treat Web-based documents the same as any other electronic file,[ii] and documents originating on the Web are now routinely introduced into evidence. As companies increasingly engage their customers online and as individuals spend more time interacting with each other on social networks, the volume of Web-based evidence continues to grow exponentially.
This has created a bonanza of relevant, relatively unfiltered, and often public information that would have been difficult to obtain or would not have existed a few years ago. Savvy attorneys will mine this vast new vein of information. Pages on company websites and posts to personal blogs and social networking sites have been used to support or challenge claims in a wide variety of civil litigations, including: workers’ compensation and personal injury suits,[iii] workplace harassment claims,[iv] allegations of defamation and false advertising,[v] contract disputes,[vi] matrimonial litigation,[vii] and intellectual property cases.[viii]
On the criminal side, new laws have been enacted to counter cyber-bullying,[ix] prosecution of which will depend upon Web-based evidence. There are also the perplexing and often amusing cases where criminals boast of, or even engage in, illegal activities on social networking sites.[x] As people increasingly stream contemporaneous and intimate details of their lives online, the prominence of Web-based evidence in litigation will only grow.
"Collecting Web-based evidence, especially from dynamic social networking sites and blogs, creates a number of challenges."
Collecting Web-based evidence, especially from highly dynamic social networking sites and blogs, creates a number of challenges. A harassing or libelous post, for example, is easily removed and may prove impossible, or exceedingly expensive, to retrieve through discovery requests and subpoenas to the poster or the site operator. Therefore, in many instances, it is important to capture Web-based information quickly and before the content poster is aware that you are collecting it. To ensure that any important evidence is not lost, many attorneys will attempt to capture it by taking screenshots or printing it out themselves, or by instructing their clients to do so.
There are a variety of methods used to capture Web pages. Most Web browsers make it easy to save the page files and associated images to your hard drive or to save a “Web archive” file. There is also software that will follow hyperlinks to attempt to crawl and download multiple Web pages or an entire website. Such software is limited, however, and is unable to pull the code and content from scripts, programs, and databases used to generate more dynamic sites. If such data is required, you will have to obtain that data through discovery requests and subpoenas to the opposing side and/or the website operator. Where the content as displayed in an end-user’s browser is all that is required, taking screen shots, printing to PDF, or recording a screen-video capture, will generally suffice.[xi]
Self-collection, however, creates unnecessary risks. Attorneys who collect Web-based evidence risk becoming witnesses in their own trials, putting themselves into the “unseemly and ineffective position of arguing [their] own credibility.”[xii] Also, attorneys and their staff, or their clients, are generally not well-versed in best practices for Web-based evidence collection and do not have the tools for digitally sealing the evidence. Having a third-party expert collect Web-based evidence can protect against challenges of its authenticity.
Whether a resource in the law firm or an outside expert conducts the collection, there are a number of best practices that should be followed. Chief among these is sealing the Web-based evidence in time by providing a fail-safe means of detecting tampering and attaching a witnessed timestamp. A proper digital seal serves two important functions. First, it provides a “digital fingerprint” of the data to show that it has not changed since the time of the collection. Second, it includes a digital timestamp proving the date that the data was collected.
The most common method of taking a digital file’s fingerprint is the use of cryptographic hash functions: mathematical algorithms applied to a digital file, which return a unique “hash value” for that file. The main benefits of cryptographic hash functions for authenticating Web-based evidence are: (1) they make it infeasible to alter a document without also altering its hash value and (2) it is infeasible that any two documents would have the same hash value. Because of these attributes, courts have accepted the use of cryptographic hash values to authenticate digital documents.[xiii]
Hash values are commonly used in litigation support to find identical documents, in order to deduplicate documents for review or production, and also to verify that a digital file has not been altered. There are a number of hash-function types, including MD5, MD6, SHA-256, and RIPEMD-160, just to name a few. Of these, MD5 is by far the most commonly used by litigation-support applications. The MD5 hash function, however, is no longer considered secure.[xiv] While many litigation-support professionals argue that the ramifications of this for file verification in the litigation-support context has been overblown,[xv] more secure alternatives are readily available and many tools and service providers support newer, more secure hash functions.
In addition to taking the digital fingerprint of the screen capture or downloaded Web files, a good sealing program should also securely timestamp the documents to prove that the documents existed on, and have not been changed since, the date when they were sealed. It is often important to prove when content appeared on a website.[xvi] The date and time printed on the header or footer of a webpage printout, or the file’s metadata date attributes for a screen capture, are unreliable as they depend upon the collecting computer’s time and date settings, which can be easily manipulated.
Third-party digital notary services that seal the documents using trusted digital-timestamping technology, along with the file’s hash value, provide the strongest proof that data existed in a specific form on a specific date. Digital timestamping is most commonly used to protect intellectual property. For example, some companies use this technology to seal digital lab notebooks to help prove that they were the first to invent and that they own a particular invention. This technology is also useful to help prove the date a screenshot of a website was taken.
When evaluating digital-timestamping technologies, you should ensure that any product or service you are considering (1) provides long-lasting protection, (2) is independently verifiable, and (3) is based upon national and international standards. Meeting these three requirements helps ensure that you can prove a file, such as a screenshot of a Web page, existed on a specific date.
Of these three requirements, the first is particularly tricky. Technology changes and hash functions become outdated or compromised. It is important that the technology you use is able to extend the life of a seal beyond the life of the hash function used to create it. If the hash function you use to seal your evidence is compromised, you will want the ability to update your seals using a newer, secure hash function.
"A digital seal only proves that the data existed in a particular form at a particular time."
It is also important that any digital-date-stamping technology you use on your collected Web evidence can be independently verified. One way to ensure this is to use “hash-chain linking,” where a time value is bound to an electronic record by combining and hashing the file and time value, and then linking the results into a continuous hash chain maintained by a digital-notary service. The integrity of the chain itself is protected and auditable through the use of a widely witnessed process, such as by publishing the authenticated data stream in a printed newspaper with a large circulation. This enables a third party to validate a seal without placing any trust on another party's people, processes, or systems. Such an open, widely-witnessed process makes it impossible for anyone undefined including the vendor to backdate timestamps or validate electronic records that were not exact copies of the originals.
Selecting a digital-timestamping service that follows national and international standards provides another layer of defensibility to your process. Make sure that the digital-timestamping technology you select is compliant with both the ISO/IEC 18014-3 and ANSI X9.95 timestamping standards. These international and U.S. standards codify industry best practices for trusted timestamping.
It is important to remember, however, that a digital seal only proves that the data existed in a particular form at a particular time. It does not prove that the data was not contaminated or tampered with during the collection process. Therefore, it is also important to fully document the collection process and the chain of custody. While your documentation can and should involve manually recording your activities using checklists and logs, you should also conduct a deep packet capture (DPC) of your activities during the collection. A DPC will capture complete network packets, both header and payload, crossing the network as you navigate to, and download or print, the evidence-bearing Web page(s).
To avoid polluting your DPC with unrelated Internet activity, perhaps risking confidentiality by commingling other client data that may be transmitted over the Internet while you are conducting the capture, you should ensure that a filter is applied to only capture activity involving the IP addresses or domain names of the targeted host website. If you do many Web collections, it is prudent to use a computer dedicated for this purpose set up on its own subnetwork, to help minimize the capturing of unrelated Internet traffic.
Capturing Web evidence is not voodoo magic. But it can be tricky. Taking the measures discussed in this article will help protect your Web-based evidence from challenges to its authenticity. Perhaps the most important step you can take to protect your Web-based evidence from authentication challenges is selecting a vendor with expertise in such collections. A third-party vendor can provide convenience, experience, defensible tools and processes, and expert witnesses to testify regarding its technology and processes. Doing a Web-based collection right matters a screenshot may be the only shot you get in a case.
About the Authors
Paul C. Easton, Esq., Managing Director, Global Colleague LLC. Mr. Easton is responsible for the rigorous oversight and training of the firm’s full-time team of Indian and Taiwanese lawyers, translators, and IT professionals. His time is divided between work in Pune, India and Taichung, Taiwan. Mr. Easton’s background includes the management of high-volume document productions in massive litigations and regulatory investigations for numerous organizations and their legal counsel.
Tom Klaff, CEO, Surety LLC. Mr. Klaff brings over twenty years of high-tech management experience to Surety, most recently as Founder and CEO of Reliacast, Inc., a leading digital media software company. Prior to Reliacast, Mr. Klaff founded College Town, Inc., the first web portal for college admission widely used by students to purchase college-related items and to apply for financial aid. After College Town, Mr. Klaff established a management consulting firm to provide contract marketing and sales services to Internet-centric businesses. In that capacity, he developed strategic marketing and sales plans, managed large projects and helped his clients build an effective sales force.
Mr. Klaff received a Bachelor of Arts degree in English from Brown University and a Masters of Science in Industrial Administration from the Graduate School of Industrial Administration, Carnegie Mellon University.
[i] St. Clair v. Johnny's Oyster & Shrimp, Inc., 76 F. Supp. 2d 773, 775 (S.D. Texas 1999).
[ii] See, e.g., Arteria Property Pty Ltd. v. Universal Funding V.T.O., Inc., Civil Action 05-4896 (PGS), 2008 US Dist LEXIS 77199 (D.N.J. Oct. 1, 2008)(“This Court sees no reason to treat websites differently than other electronic files.”); Griffin v. State, 192 Md. App. 518; 995 A2d 791; 2010 Md App LEXIS 87 (Md. Ct. Spec. App. May 27, 2010)(finding that a printout from a MySpace, like any other electronic communication, “may be authenticated under existing evidentiary rules governing authentication by circumstantial evidence”); Leduc v. Roman, 2009 CanLII 6838, para. 27 (ON S.C.)(a Canadian personal injury case in which the court explained that Facebook profiles are “‘data and information in electronic form’ producible as ‘documents’ under the Rules of Civil Procedure”).
[iii] See, e.g., Order Regarding Plaintiffs’ Motion for Protective Order Pursuant to Fed. R. Civ. P. 26(c) Regarding Subpoenas Issued to Facebook, My Space, Inc., and Meetup.com, Ledbetter v. Wal-Mart Stores, Inc., No. 06-01958 (D. Colo., Apr. 21, 2009); Romano v Steelcase Inc., 2010 NY Slip Op 20388, 1 (N.Y. Sup. Ct. Sept. 21, 2010); McMillen v. Hummingbird Speedway, Inc., No. 113-2010 CD (Penn. C.P. Jefferson, Sept. 9, 2010)(plaintiff, an injured racecar driver, required to produce information from his MySpace and Facebook accounts); Leduc, 2009 CanLII 6838.
[iv] See, e.g., EEOC v. Simply Storage Mgmt., No. 09-1223, 2010 U.S. Dist. LEXIS 52766 (S.D. Ind. May 11, 2010)(claimants in sexual harassment case required to produce their Facebook and MySpace profiles).
[vi] See, e.g., TEKsystems, Inc. v. Hammernick, No. 10-00819 (D.Minn. 2010)(former employee sued for violating a non-solicitation provision in an employment agreement based upon Facebook messages).
[vii] Big Surge in Social Networking Evidence Says Survey of Nation's Top Divorce Lawyers, Am. Acad. of Matrimonial Lawyers (Feb. 10, 2010), http://www.aaml.org/about-the-academy/press/press-releases/e-discovery/big-surge-social-networking-evidence-says-survey- (“81% of AAML members cited an increase in the use of evidence from social networking websites during the past five years….”).
[viii] See, e.g., Netbula v. Chordiant Software, Inc., No. C08-00019 (N.D. Cal., Oct. 15, 2009)(Plaintiffs in copyright case ordered to produce Web pages).
[ix] Missouri was one of the first states to pass legislation revising its harassment statutes in response to the well-publicized Lori Drew cyber-bullying case. Mo. Ann. Stat. § 565.090 (West 1999 & Supp. 2009). 46 states have enacted or proposed anti-bullying laws, 32 of which include electronic harassment and six of which specifically include the term “cyber-bullying.” Sameer Hinduja and Justin W. Patchin, State Cyberbullying Laws: A Brief Review of State Cyberbullying Laws and Policies (Cyberbullying Research Center, December 2010), available at http://www.cyberbullying.us/Bullying_and_Cyberbullying_Laws.pdf.
[x] Griffin v. State, 2010 Md. App. LEXIS 87 (Md. Ct. Spec. App. May 27, 2010)(girlfriend of the defendant in a murder trial posted threats to witnesses on MySpace); Carlin DeGuerin Miller, Facebook Fugitive Chris Crego Gives Police Plenty of Help, Current Status: "Arrested", CBS News (Feb. 9, 2010), http://www.cbsnews.com/8301-504083_162-6186573-504083.html; Facebook Is Not The Place To Brag About Your Alleged Act Of Vandalism, The Smoking Gun (Nov. 2, 2010), http://www.thesmokinggun.com/buster/facebook/facebook-not-place-brag-about-your-alleged-act-vandalism.
[xi] Griffin, 192 Md. App. 518.
[xii] Model Code of Prof'l Responsibility DR 5-102 (1980). See also, Model Rules of Prof’l Conduct R. 3.7 (2010).
[xiii] See e.g., Lorraine v. Markel Am. Ins. Co., 241 F.R.D. 534, 542-43 (D. Md. 2007).
[xiv] Dep’t of Homeland Security, U.S. Computer Emergency Readiness Team, Vulnerability Note VU#836068: MD5 Vulnerable to Collision Attacks, http://www.kb.cert.org/vuls/id/836068 (Rev. 13, last updated 2009-01-21).
[xv] See, for example, the lively discussions on this topic at the LitSupport Yahoo Group. E.g., Orchatechherb, Here we go again--hash values rear their ugly heads, Message #44877, Wed Nov 24, 2010 8:28 pm, The Litigation Support List, Yahoo Groups, http://finance.groups.yahoo.com/group/litsupport/messages/44877 (free subscription required).
[xvi] See, e.g., In re F.P., 2005 PA Super 220, 878 A.2d 91, 95-96 (Pa. Super. Ct. 2005) (holding that evidence regarding content and timing of threatening instant messages was sufficient to authenticate them).