Collabware Blog

5-Step Discovery Strategy for Any Complex Search Scenario

Written by Jayson Kennedy | Oct 21, 2021 3:00:00 PM

Man pressing technology smart table interface with blue graphs. Image courtesy of Shutterstock.

Basic search is a common project that many information professionals are tasked with. These projects can range for extremely simple requests that can be completed in a few minutes, to massive endeavors requiring multiple days and people.

Regardless of the size of the search, one of the most important factors for the success of the project is the strategy used to execute the search. This article will suggest a simple, 5-step strategy for executing a search project to ensure it is completed efficiently.

The basic search strategy can be broken down into the following steps:
  1. Identify the search topic
  2. Identify the desired outcome format
  3. Outline a search plan
  4. Execute the search plan
  5. Review and deliver the results

The following example scenario will be performed on Collabspace, a high-performance data lake that allows users to securely store and search across their go-to content repositories for one set of unified results. Read more on how this enterprise-wide search application reveals the content you need (including dark data) from one simple search query.

Okay, we’ve got the strategy and the tool. Now let’s go through and describe each step, using a simple example to be clear.

Step 1: Identify Search Topic

First, the Search Topic. Manager of Product Operations, Sam, has requested all information about a clinical trial the Research Team is working on. He has not specified a source of the information, such as email or network folder, so it is implied that all potential sources are valid. Our Search Topic would then be “clinical trial”.

Step 2: Identify Desired Outcome Format

Next, the Outcome Format. Typically, the outcome format is not detailed by the user making the request. It is then left to the information professional.

Using the example of “clinical trial”, we are going to assume that the desired Outcome Format is the full text of any content related to the Search Topic. This means the documents themselves, not the metadata about the documents or a list of the documents. If required, this should be clarified with the requestor to avoid confusion.

Step 3: Outline a Search Plan

The Search Plan is usually the most complex step, because it involves many smaller steps and decisions based on the type of content being searched for. It involves coming up with the proper search queries to execute. Developing a Search Plan can involve trial and error, especially when you are unfamiliar with the dataset.

At first, the Search Plan should be a pen and paper exercise; no need to get into a Search if you don’t know what you are going to search for. There are many things that can be identified immediately before any Searches are done, and these can then be folded into the Search to make the results more precise.

Here are some common things that can be identified:

  1. Identify the time-period that the search should be done for
  2. Identify the relevant people by name and email
  3. Identify the subject of the search, whether it be one word or “a phrase”

Once these have been identified, the next step is to execute an initial “exploratory test”. Normally, the exploratory test only uses the “search subject” and is intended to be as broad as possible.

Rather than using the Basic Search, which can be overly broad, it can be helpful to use Advanced Search and limit your subject to only fields that would be relevant. File Name and Content Summary are going to be the best ones to use, because this selection filters out user fields, location fields, and additional metadata. File Name will capture any file name, including the subject line of an email, and Content Summary will contain all extracted text from a file, whether that be a word doc, email, or pdf. A screenshot of these two selected fields is below:


This won’t be your official search, as it’s only meant to gather additional information that can then be used in the main search. Here are some things to watch out for:

  • Are there users that appear repeatedly, either as a creator or contributor?
  • Is there a particular document location that all the content is in?
  • Are there additional phrases or keywords that appear consistently near the subject?
  • Are there different kinds of content that have different field names for the same piece of information?

All this information can be used to add to the search. For example, if several different users are appearing, then it might be best to do individual searches against each of these users. If email results are appearing, then a search dedicated to the communication between only these users can really narrow the scope. Likewise with additional phrases or keywords. A secondary exploratory search can be executed using these instead of the original subject.

Tip for too many search results: another important thing to consider is the amount of search results being returned. If the criteria is returning thousands of results, then it is time to consider breaking the search into smaller chunks that can be more easily understood. Shortening the time-period is a good technique, as well as limiting the search to only a specific user.

In Collabspace, a Search can be limited to a specific date range using Advanced Search. First, knowing what kind of content is being searched for is important; for a document, Last Modified Date is the best to use, for an email, Received Date is better. Next, you want both the Start and End date to be represented in the Search Conditions.

For example, the query shown in the screenshot above will look for anything that was modified in the year 2020, which we know is the time period for the clinical trial.

Likewise, a Search for emails between two users can be done. Through our searches, we have identified that the team members working on the clinical trial are jkennedy and gjames. We can easily narrow our focus to the team members, as shown in the screenshot below:

Note that there are two sub-groups both using the “Match Any” flag, which means that either one of these conditions can be true for the “group” to be true. The “Match All” at the very top of the query means that both groups must return a pass, essentially limiting this to emails between these two users.

When searching for content, it is important to understand what other metadata is available to search by. To view this information, execute an exploratory search, then right-click on a Search Result and open Lifecycle Details, show below.

Once in Lifecycle Details, the Versions tab will list all the metadata about an item, shown below.

Another way to view the metadata is under the Audit Delta, which will show both the new and the previous values of a metadata field. This can give you an idea of which metadata fields are actively populated by the users. To view the delta, select a specific Audit entry, shown below.

Using the techniques listed above, it’s time to craft what the final Search Query will look like. You will want to fold in all the information that has been identified by the exploratory searches. Also watch out for creating overly complex queries, as they might result in too few results. Don’t be afraid to split the Search out into smaller chunks.

One final note, if we are dealing with content that is very old, do not forget to use the Item Deleted and All Versions flags. This will expand your search and allow you to find information from the past as well. However, use this tool sparingly. It can add a huge bulk to the Search results, mostly from multiple versions of the same item, so be sure to add the Collabspace ID and Collabspace Version columns, which will tell you which item you are looking at, and what version it is.

Step 4: Execute the Search Plan

Executing the Search should be done based on the Search Plan. This can involve one single large search or several smaller searches that are then stitched together. The query or queries should fold in everything that has been learned from the exploratory search. This can include narrowing the scope to the identified time period; targeting the users who are directly involved rather than secondary participants; and directly searching against known content locations rather than casting an overly wide net.

Step 5: Review and Deliver Results

Finally, all results should be Reviewed prior to sharing them with the requestee. They should be reviewed for potential mistakes and to ensure that no private information is being unintentionally shared. Once reviewed, if using Collabspace, single/multiple/all search results can be selected to export and share with the requestee. (or: the content can be delivered as a Saved Search, list of matching results or the results themselves, depending on what the requestee needs.

Conclusion

In summary, requested searches should be treated like a project. The phases of the project include planning, executing, reviewing, and delivering. The planning phase should involve identifying what kind of content needs to be returned and what search techniques should be used.

Don’t be hesitant to perform some preliminary searches to gather more information. Execution of the Search should be broken into chunks to ensure precision and thoroughness. Once the searches have been completed, it is best to Save the Search results, or even Export the Metadata. In the Review phase, it can be important to get the data out into a spreadsheet to make it more malleable.

Files can be marked with different tags based on a coding scheme, which can make it much easier to distribute the content based on how valuable it is and who should see the content. Finally, with Collabspace, the content can be delivered as a Saved Search, a list of matching results, or the results themselves, depending on what the requestee needs.

Interested to learn more about content search? We’ve got articles breaking down e-Discovery and how to accelerate your content discovery with shareable search templates. You can also contact us with your questions, or read about how you can reveal your dark data with Collabspace DISCOVERY. As bonus, we’re including our free webinar recording about Supercharging your eDiscovery below: