In every investigation, whether it is a
criminal or legal investigation, there are five golden Ws that the investigator
must answer in order to be successful. These are:
WHO is it about?
WHEN did it take place?
WHERE did it take place?
WHY did it happen?
Some might even consider two other pertinent questions:
HOW did it happen?
HOW MUCH or HOW OFTEN?
How do you cull through the vast volume of data
to find these answers? Manually analyzing the data to find the answers is very
time consuming and requires lots of resources. The reason is that you do not
know exactly what it is you are looking. What words should you be searching in
order to find the “smoking gun?” How do you find patterns among the words? Do
you highlight the words as you manually review each of the documents, then go
back and see how they are connected? It’s not that simple. Criminals use
aliases; transfers may be done by unknown off-shore companies or via unknown
bank accounts, etc. All of this complicates and slows down the investigation.
In addition, the size of electronic data that
needs to be investigated continues to grow with increasing complexity, exacerbating
the problem. Of course, there are technologies that can help expedite the
investigation. Computer technology can help analyze large data sets at
tremendous speed for specific patterns. In combination with other technological
advances including text mining, computational linguistics, statistics, machine
learning and even artificial intelligence, it is much easier to analyze the
data specifically focused on finding the five Golden W's.
Modern text mining and content analytics can
search on a higher level than just key words.
For example, with text mining linguistic patterns like ‘someone pays
someone else’ or ‘someone meets someone else at a certain location and at a
certain time’ can be identified without using the exact names or amounts. By
extracting such patterns combined with simple statistics, one can easily
identify unknown persons, companies, bank account numbers, and also spot code
names and aliases.
Criminals will try to cover up illegal
activities by hiding information in non-searchable file formats or by embedding
different types of electronic objects within complex compound files where the
most relevant information is often hidden in the deepest layers. Your solution
needs to identify information even when it is hidden in the deepest layers and be
able to search those seemingly unsearchable formats such bitmaps, images,
non-searchable PDFs, audio files or even a video. By combining text mining with
advanced analytics, relevant information can be quickly identified at speeds
many times faster and more efficient than what humans could ever do. The investigators
can easily validate the relevant information to prevent so-called tunnel vision
and identify invalid evidence or investigation directions.
Over the years, I
have seen many real-life cases where this hybrid man-machine approach has identified
twice the amount of relevant information with half the resources in half the
time! This is a great example where Big Data analytics can lead to Big Savings!
Johannes C. Scholtes, Ph.D. is chairman and chief strategy officer of ZyLAB. Scholtes, who was the company’s president and CEO from 1989 to 2009, shaped ZyLAB as an information management powerhouse in countries across the globe. His leadership and vision led to ZyLAB being selected for historic and high-profile engagements including the United Nations War Crime Tribunals, FBI-Enron investigations, and the United States White House Executive Office of the President.
Before joining ZyLAB, Scholtes was a lieutenant in the intelligence department of the Royal Dutch Navy. Scholtes holds a Master of Science degree in Computer Science from Delft University of Technology and a Ph.D. in Computational Linguistics from the University of Amsterdam. As of 2008, he holds the Extraordinary Chair in Text Mining from the Department of Knowledge Engineering at the University of Maastricht. In January 2010, Scholtes joined the board of directors of the Association of Information and Image Management (AIIM), the worldwide leading authority for standards and education in Enterprise Information Management.
Download the eBook
ROI in eDiscovery
Saving in eDiscovery and Risk Management