Boolean Searches, DeNISTing, and Slack Space: A Glossary of eDiscovery Terms

dictionary pexels-pixabay-159581

The world of eDiscovery definitely has a language all its own. Here’s a list of terms to help you get more familiar with the industry, learn a new phrase or two, and maybe even find a few gems for your next Scrabble game.

If you want to “discover” more about how Avalon Legal can assist with your electronic or paper discovery needs, contact us today!

A

Admissible: Evidence that is allowable in court.

Analytics: The various technologies used to provide multiple views into a data set.

Archive: Long-term repository for the storage of records and files.

Assisted Review: This method of review utilizes advance machine learning, including predictive coding, in order to apply reviewers’ coding decisions to large amounts of data.

Attachment: A document or file that is connected to another document or file either externally, e.g. a document connected to an email, or embedded, e.g. an image in a word processing document.

Attachment Backup: Both the action of and the result of creating a copy of data as a precaution against the loss or damage of the original data.

B

Backup Tape: Portable media used to store copies of data that are created as a precaution against the loss or damage of the original data.

Batching: The process of gathering large amounts of electronically stored information together in batches. Typically, this process is done so that documents can be allocated to reviewers for tagging.

Big Data: Term used to describe data sets so large and complex that it becomes difficult to process them using traditional data processing applications. 

Boolean Search: This technique is used to connect individual keywords or phrases with a single query, used to avoid false positives, and accurately pinpoint documents of interest. Typical connectors are terms such as AND, OR, and NOT. 

C

Chain-of-Custody: The order in which a piece of criminal evidence should be handled by persons investigating a case, specifically the unbroken trail of accountability that ensures the physical security of samples, data, and records in a criminal investigation.

Child Document: A file that is attached to another communication file, e.g. the attachment to an email or a spreadsheet imbedded in a word processing document.

Civil Procedure Rules (CPR): The Federal Rules of Civil Procedure govern civil proceedings in the United States District Courts. Their purpose is “to secure the just, speedy, and inexpensive determination of every action and proceeding.”

Cloud: A network connection providing access to computers and software applications.

Cloud Computing: The practice of using a network of remote servers hosted on the Internet to store, manage, and process data, rather than a local server or a personal computer.

Coding (Objective/Subjective): The method of entering fields of information from a document and saving them in a format that will be linked to that particular document within a database. There are different types of coding: objective and subjective. Objective coding is coding applied by anyone who can read the language of the document such as a date on a scan. Subjective coding requires knowledge of the underlying investigation, such as, “Is this information good for our case?”

Collect, Collection: Gathering electronically stored information for discovery.

Compliance: Efforts by organizations to comply with laws, policies, and regulations.

Concept Search: A method of searching for files not based on keywords, but on the subject matter of the document, paragraph, or sentence. This is different than keyword searching, which requires an exact keyword hit. 

Container File: This is a single file that contains multiple other files or documents, such as a zip file. Container files are typically used due to their considerably smaller file size. Extracted contents are usually anywhere from 50% to 250% larger in size than the original container file.

Counsel (Inside/Outside): Counsel refers to legal representation. Inside counsel refers to attorneys who work inside a corporation and outside counsel refers to legal representation who work outside of the corporation, typically for a law firm.

Culling: The process of eliminating files from a collection of electronic files to reduce the number of documents to be reviewed. Culling techniques include de-duplication, near-de-duplication, email thread analysis, deNISTing, and filtering.

Custodian: Refers to the individual who has electronically stored information relevant to the pending litigation.

D

Data Extraction: The process of parsing data from electronic documents to identify their metadata and body contents.

Data Integrity: Maintaining accuracy of data through its life cycle.

Data Mapping: The process of identifying and recording the location and types of electronically stored information within an organization’s network, and policies and procedures related to that information.

Data Security: The protection of digital data from access or alteration by unauthorized parties.

Deduplication (Deduping): The process of comparing the characteristics of electronic documents to identify and/or remove duplicate records to reduce review time and increase coding consistency. Removes all files from the data set that contain the same hash value and are deemed to be exact duplicates.

DeNISTing: The process of separating documents generated by a computer system from those created by a user. This automated process utilizes a list of file extensions developed by the National Institute of Standards and Technology (NIST).

Discovery: The process of identifying, securing, and reviewing information that is potentially relevant to a legal matter and producing information that can be utilized as evidence in the legal process.

Document Family: All parts of a group of documents that are connected to each other for purposes of communication, e.g. an email and its attachments.

E

Early Case Assessment (ECA): The tools or methods used for investigating and quickly learning about document collection for the purposes of estimating the risks, costs, and time spent pursuing a particular legal course of action. 

Electronic Discovery (eDiscovery): The process of discovery in civil litigation, in which electronically stored information is identified, collected, prepared, reviewed, and produced in the context of a legal or investigative process.

Electronic Discovery Reference Model (EDRM): A model that outlines the different phases of discovery.

Electronic Evidence: Information that is stored in an electronic format. This is used to prove or disprove the facts of a legal matter.

Electronically Stored Information (ESI): Information stored in digital form, e.g. on computers and storage devices.

Email: An electronic communication sent or received via a data application designed for that purpose, e.g. MS Outlook, Google Gmail.

Email Threading: The process of compiling all the emails in a dataset and organizing them into conversations. Email threading can dramatically increase review speeds of email data by having the entire conversation reviewed by one reviewer, as well as the ability to read the final inclusive email, as opposed to separate conversation pieces.

Endpoint: Computer hardware device on an IP network (laptops, desktops, smart phones, tablets, etc.).

F

Filtering: The process of applying specific parameters to remove groups of documents that do not fit those parameters, in order to reduce the volume of the data set, e.g. date ranges and keywords.

Forensics: The handling of electronically stored information, including collection, examination, and analysis, in a manner that ensures its authenticity, so as to provide for its admission as evidence in a court of law.

Federal Rules of Civil Procedure (FRCP): The rules that govern eDiscovery and other aspects of the civil legal process in United States District Courts.

G

General Counsel (GC): Head corporate lawyer at a company. Sometimes called Chief Legal Officer. An executive level position on par with a vice president or a C-level officer.

General Data Protection Regulations (GDPR): The GDPR provides several broad protections for the personal data of European residents, including the right to access one’s data and the right to have one’s data erased. It requires that companies justify their possession of personal data and carefully control what they do with it.

H

Harvesting: Also referred to as the collection of electronically stored information. Harvesting is the method of gathering electronic data for future use in your investigation or lawsuit, preferably while maintaining file and system metadata. 

Hash: An algorithm that generates a unique value for each document. It is referred to as a digital fingerprint and is used to authenticate documents and to identify duplicate documents.

Hold: Keeping items possibly pertinent to a matter in a safe and secure condition to be collected and used in discovery.

Hosting: Defines a service provided by a third-party litigation support firm that provides access to documents relating to a particular matter within a review software platform. The platform can be accessed via the internet by logging in with a username and password.  

I

Identification: The process of learning the location of all data which a law firm or client may have a duty to preserve and potentially disclose in a pending or prospective legal proceeding. This is typically done during the interview phase of a legal hold.

Image (Drive): To make an identical copy of a drive including its empty space; “mirror image.”

Image (File): To make a picture copy of a document. The most common image formats in eDiscovery are TIFF and PDF.

In-house: Corporate legal teams (contrasted against external law firms).

Information Governance (IG): Policies that affect the creation, management, and disposition of electronic and paper records.

Internet of Things (IoT): The interconnection via the Internet of computing devices embedded in everyday objects, e.g., Amazon Alexa or Apple Watch, enabling them to send and receive data.

L

Legacy Data: Data whose format has become obsolete.

Legal Hold: A communication requesting the preservation of information that is potentially relevant to a current or reasonably anticipated legal matter and the resulting preservation.

Legal Tech: The use of technology and software to provide legal services.

Load File: A file used to import data into an eDiscovery system. It defines document parameters for imaged documents and often contains metadata for all electronically stored information it relates to.

M

Machine Learning: Artificial intelligence (AI) allowing systems to learn and improve from experience without being directly programmed to do so.

Media: Devices used to store electronic information, e.g. hard drives, back up tapes, and DVDs.

Metadata: Often referred to as data about data, it is the information that describes the characteristics of electronically stored information, e.g. sender, recipient, author, date.

N

Native Format: A file that is maintained in the format in which it was created. This format preserves metadata and details about the data that might be lost when the documents are converted to image format, e.g. pivot tables in spreadsheets.

Near-duplicate: Two or more files that contain a specified percentage of similarity. Also, the process used to identify those nearly identical files.

NIST List: The National Software Reference Library published by the National Institute of Standards and Technology (NIST) of the U.S. Department of Commerce. These are common software files, such as operating system commands, libraries, and application executables and data files, designated non-discoverable or irrelevant to discovery because they contain no data that can be deemed as evidence to an action.

Normalization: Reformatting data so that it is stored in a standardized format.

O

Optical Character Recognition (OCR): The process of converting images of printed pages into electronic text.

P

Parent Document: A document to which other documents/files are attached.

Personal Storage Table (PST): A file format used to store copies of messages, calendar events, and other items within Microsoft software (like Microsoft Outlook, Microsoft Exchange Client, and Windows Messaging).

Portable Document Format (PDF): A file format that displays documents, including text and images, in an electronic form, independent of the software, hardware, or operating system the viewer is using.

Predictive Coding: This document categorization process is the combination of machine-learning technology and workflow methods that use keyword search, filtering, and sampling to automate portions of an eDiscovery document review aiming to reduce the number of non-responsive and irrelevant documents. 

Precision: In search results analysis, precision is the measure of the level of relevance to the query in the results set of documents.

Processing: The eDiscovery workflow which ingests data, extracts text and metadata, and normalizes the data. Some systems include data indexing and de-duplication in their processing workflow.

Production: To make items which have been collected ready to deliver to a party, usually after they have been redacted as part of the discovery of the defendant or claimant. The production set consists of items that are responsive to the opposition’s request for documents, but not privileged.

Proportionality: Belief that the costs of a legal case should be related to its importance and value.

Q

Query: A formally phrased question.

R

Recall: In search results analysis, recall is the measure of the percentage of total number of relevant documents in the corpus returned in the results set.

Redact: To intentionally conceal, usually via an overlay, portions of a document considered privileged, proprietary, or confidential.

Request for Production of Documents: During discovery, a party may request that another party produce items pertaining to the matter. This is accomplished by providing the party with a formal request.

Review: During discovery, the producing parties are responsible for reviewing every item identified as potentially relevant to the matter and identifying those that are responsive to the request for production.

S

Search: The process of looking within a data set using specific criteria (a query). There are several types of search ranging from simple keyword to concept searches that identify documents related to the query, even when the query term is not present in the document.

Slack Space: The unused portion of a disk that exists when the data does not completely fill the space allotted for it. This space can be examined for otherwise unavailable data.

Social Discovery: Defined as the discovery of electronically stored information on the various social media sites used today, including but not limited to Facebook, Twitter, YouTube, LinkedIn, and Instagram. 

Spoliation: The destruction or alteration of evidence, or the failure to preserve the evidence properly.

Structured Data: Data stored in a structured format such as a database. Structured data can create challenges in eDiscovery.

System Files: An electronic file that is part of the operating system or other control program. These files are created by the computer, not the user of the computer. The most popular system files on a Windows computer include msdos.sys, io.sys, ntdetect.com, and ntldr. 

T

Tagging: The process of assigning classifications, such as by relevance or privilege, to one or more documents.

Technology Assisted Review (TAR): Also known as computer-assisted review or predictive coding, this process uses software to sort through data for discovery purposes.

Tagged Image File Format (TIFF): This file format allows storage of multiple bitmap images and introduces no compression artifacts, making it ideal for archiving intermediate files. TIFF images are the most common file formats for scanned hard copy documents.

U

Unallocated Space: Most often, this is space created on a hard drive when a file is marked for deletion. This space is no longer allocated to a specific file. Until it is overwritten, it still contains the previous data and can often be retrieved.

Unicode: The code standard that provides for uniform representation of character sets for all languages. It is also referred to as double-byte language.

Unitization: The process of splitting image files received in multiple page formats down into individual “documents.”

Unstructured Data: Information that does not exist in the usual row-column database. These text and multimedia data files, such as webpages, videos, audio files or videos, lack the ability to be organized effectively within a database, hence the name “unstructured.”

W

Work Product: All the writing that a lawyer creates on behalf of a party to a matter. Work product is privileged. 

    Share this Post