Blog Post

The Modern Data Paradox: Power and Problems in Discovery and Investigations

Tim Anderson

Senior Managing Director , FTI Consulting

Jerry Bui

Managing Director, FTI Consulting

E-discovery has always been a challenging field. Again and again, old problems are made new as the breadth and complexity of information sources change. Paper documents gave way to email, which has now been surpassed by a rising volume of chat and short-form messages from collaboration applications, mobile devices and social media. Today’s data landscape is rife with technical complexities that interfere with traditional e-discovery workflows — but it’s also rich with context and details that can inform strategy, decision making and more.

The evolution of the internet from Web 1.0 to 2.0 to the present emergence of 3.0 has played a significant role in shaping the modern data paradox in e-discovery and investigations. In Web 1.0, the internet was centered around browser searches and email use, while Web 2.0 brought the dawn of social media, smartphones, text messaging, collaboration and a blurring of lines between personal and professional communications. The result today is a vast, diverse, rapidly growing data universe.

For instance, there are currently more than 4.6 billion social media users in the world and the average user interacts with 7.5 different social platforms each month. More than 23 billion texts are sent worldwide each day. Microsoft Teams reached 270 million users this year and a reported more than 1.5 billion messages are sent via Slack each month. As Web 3.0 catches on, the landscape will shift again, introducing a new dynamic in which user-generated, user-owned content and the metaverse will bring new dynamics to the e-discovery and investigations arena.

Moreover, preferred platforms and the ways in which they are used continue to vary by region. While Teams and Slack are widely used in business, especially in the U.S., applications such as WhatsApp, WeChat, Telegram and other, lesser-known options are preferred in Europe, South America and Asia.

All these data sources and messaging channels are relevant in the context of legal and regulatory matters — whether or not the legal team is aware of or familiar with the formats or the regional nuances of how they are used. The implications are significant and legal teams must begin to reorient their awareness of the modern data environment, as well as the workflows and e-discovery strategies with which they approach it. Doing so will also open the door to opportunities to gain faster access to insights and uncover a more robust, enriched view of the facts and context of a matter.

In this landscape, key challenges, considerations and strategies include:

Preparing for common data requests and challenges. There’s been a significant uptick in discovery requests involving linked content (i.e., cloud-based documents that are shared via links rather than as traditional attachments in email or messages) and specific versions of dynamic cloud-based files. Identifying and collecting linked content, verifying the correct version and matching it to a corresponding message is a highly complex process for which no standardized workflow currently exists. While this can be a challenging and sometimes disproportionate task, it is usually technically possible, a fact that has made it difficult for organizations to successfully argue against producing it in an investigation. Early case law has suggested that hyperlinks do not qualify as attachments, however, given the increasing prevalence of linked content in corporate environments, there will be ongoing debate on this subject in the industry and the courts.

Another common challenge with emerging data sources is the collection and review of short-form messages (such as those from chat and collaboration apps). The e-discovery industry has begun experimenting with new tools and approaches that aim to integrate lengthy threads or groups of chat messages into predictable EDRM workflow processes. Several methodologies for finding and parsing relevant information within large volumes of chat messages have been introduced. These include grouping all messages from a 24-hour window (this is the most common format used and requested) or collecting a certain number of messages from before and after a message with a keyword hit.

These approaches, however, are mostly ineffective at clearing out the non-relevant noise in chat threads, preserving message context and converting relevant portions of chat threads into a document format that can be reviewed within an e-discovery platform. Our teams at FTI Technology have also developed alternative approaches to chat thread unitization that doesn’t rely on an arbitrary cutoffs, such as a 24-hour time frame, but analyzes the back and forth frequency between chat participants to let the density of that communication dictate the natural document cutoffs. This density-sensitive approach uses statistically-sound logic and has stood up to review scrutiny among clients. Clients have also reported better predictive coding results with chat transcripts, because the chat documents are rich with conceptually consistent topics.
Understanding local and regional nuances. It is essential to know which local technologies are permitted and used region to region, and how jurisdictional state secrecy and data privacy laws govern the management and transmittal of data that is identified and acquired for purposes of e-discovery.
Taking stock of the state of devices and text messages. Personal devices are commonplace in the work environment today. One study found that 75% of employees use their personal devices for work (more than half said this use increased during the pandemic) and 17% report using their personal devices for work without telling their organization’s IT department. This is notable for several reasons. First is that obtaining and collecting data from personal devices is in and of itself a challenge, especially if the organization doesn’t have clear, enforced acceptable use policies in place. Most of the time, an individual device owner must be willing to cooperate and provide consent to give the investigatory team access to “BYOD” devices and data.

Additionally, some legal teams have made a practice of collecting text messages from devices by allowing custodians to take screenshots of chat messages. However, screenshots of chat messages can be faked and it’s very difficult to verify the authenticity of messages produced via screenshot.

Another interesting development on the mobile device front is the recent release of Apple’s iOS 16. The full release includes numerous features that will in some scenarios impede the ability for investigators to uncover the content of edited and recalled iMessages acquired from iOS devices. Conversely, also iOS 16 includes a feature to recover deleted messages, which is a reversal of previous iOS versions which were set up to permanently remove nearly all traces of deleted content immediately. This new feature could be helpful for investigators who know where to look and how to uncover these artifacts. Several of our colleagues in our Digital Forensics & Investigations practice recently published detailed findings from initial testing of the features in the iOS 16 Beta release.
Exploring how advanced technology and AI can enrich the discovery process. When collecting and reviewing data from multiple dynamic sources, legal teams need a way to preserve the metadata and context to enrich the overall understanding of the facts. Advanced analytics and AI tools that can be plugged into datasets with numerous formats are becoming more and more effective at providing a 360-degree view of a matter. For example, computer vision technology can be used to analyze photos and new AI tools to analyze videos are emerging, which can reveal new artifacts that may have otherwise remained unknown in an investigation.
Addressing structured data and unstructured data within a single view. Structured data — such as financial records, customer lists and databases of contact information — often comes into scope during e-discovery. Structured data may be exported in a variety of formats (such as csv, json, xml), which are typically not compatible with traditional document review models and platforms. Legal teams should work with digital forensics experts to establish new custom models and workflows when needed to overlay structured data with unstructured data and gain a clear view of all facts in context of the full dataset.

The data landscape will never be static. Old problems will continuously be made new as technology advances. Legal teams and e-discovery practitioners will need to be flexible and nimble in the face of ongoing change. Recognizing that change is inevitable and embracing the opportunities it provides (despite the complexities) will be key to managing risk and circumnavigating evidence obstacles.

The views expressed herein are those of the author(s) and not necessarily the views of FTI Consulting, its management, its subsidiaries, its affiliates, or its other professionals.