Skip to content

Detect and/or Redact Personally Identifiable Information v1.0.0 Help

Inspects text for personally identifiable information (PII) entities and returns details about them; can redact identified PII entities with provided masks. Refers to Named entity recognition (NER).

How can I use the Step?

The Step lets you find and redact PII entities in the text. This way, you can automate PII data collection or implement specific policies to deal with sensitive personal data. The Step only supports English.

How does the Step work?

A PII entity is a text reference to information that identifies a person, such as an address, bank account number, driver's license, etc.

For example, in a text, "Dear John Doe! The credit balance on your card number 0000-1111-0000-1111 has been updated," the Step recognizes John Doe as a name and 0000-1111-0000-1111 as a creditDebitNumber.

In addition, the Step assigns a confidence score to each PII entity found in a text. This score indicates confidence that the Step correctly identified the PII entity type. To learn more, see the Output example.

You can also mask found PII entities using different Redaction options.

Input settings

To set up the section, do the following:

  1. For Operations, select at least one of the following options:

  2. For Input text, enter text to analyze.

  3. For PII entity types, select entity types you want to detect/redact in the text.

Input text

The input text must be a UTF-8 string. The string must contain at least 1 character. The maximum string size is 100 KB. English is the only valid language.

PII entity types

The Step uses a set of 22 PII entity types, which you can find in the following table:

PII entity typeDescription
addressA physical address, such as "100 Main Street, Anytown, USA" or "Suite #12, Building 123". An address can include a street, building, location, city, state, country, county, zip, precinct, neighborhood, and
ageAn individual's age, including the quantity and unit of time. For example, in the phrase "I am 40 years old," the Step recognizes "40 years" as an age.
awsAccessKeyA unique identifier that's associated with a secret access key; the access key ID and secret access key are used together to sign programmatic AWS requests cryptographically.
awsSecretKeyA unique identifier that's associated with an access key; the access key ID and secret access key are used together to sign programmatic AWS requests cryptographically.
bankAccountNumberA US bank account number. These are typically between 10 - 12 digits long, but the Step also recognizes bank account numbers when only the last 4 digits are present.
bankRoutingA US bank account routing number. These are typically 9 digits long, but the Step also recognizes routing numbers when only the last 4 digits are present.
creditDebitCvvA 3-digit card verification code (CVV) that is present on VISA, MasterCard, and Discover credit and debit cards. In American Express credit or debit cards, it is a 4-digit numeric code.
creditDebitExpiryThe expiration date for a credit or debit card. This number is usually 4 digits long and formatted as month/year or MM/YY. For example, the Step can recognize expiration dates such as 01/21, 01/2021, and Jan 2021.
creditDebitNumberThe number for a credit or debit card. These numbers can vary from 13 to 16 digits in length, but the Step also recognizes credit or debit card numbers when only the last 4 digits are present.
dateTimeA date can include a year, month, day, day of week, or time of day. For example, the Step recognizes "January 19, 2020" or "11 am" as dates. The Step will identify partial dates, date ranges, and date intervals. It will also recognize decades, such as "the 1990s".
driverIdThe number assigned to a driver's license is an official document permitting an individual to operate one or more motorized vehicles on a public road. A driver's license number consists of alphanumeric characters.
emailAn email address, such as marymajor@email.com.
ipAddressAn IPv4 address, such as 198.51.100.0.
macAddressA media access control (MAC) address is a unique identifier assigned to a network interface controller (NIC).
nameAn individual's name. This entity type does not include titles, such as Mr., Mrs., Miss, or Dr. the Step does not apply this entity type to names that are part of organizations or addresses. For example, the Step recognizes the "John Doe Organization" as an organization, and it recognizes "Jane Doe Street" as an address.
passportNumberA US passport number. Passport numbers range from 6 - 9 alphanumeric characters.
passwordAn alphanumeric string that is used as a password, such as "Very20special#pass".
phoneA phone number. This entity type also includes fax and pager numbers.
pinA 4-digit personal identification number (PIN) that allows someone to access their bank account information.
ssnA Social Security Number (SSN) is a 9-digit number that is issued to US citizens, permanent residents, and temporary working residents. the Step also recognizes Social Security Numbers when only the last 4 digits are present.
urlA web address, such as www.example.com.
usernameA user name that identifies an account, such as a login name, screen name, nickname, or handle.

Datetime settings

The Datetime feature converts dates found in the text from one timezone to another and returns the converted date in a selected date and time format.

To set up the section, follow these steps:

  1. For Timezone, select input and output timezones for date conversion.
  2. For Output format, select the date and time format and specify options that suit your application.

Redaction options

Redact operation lets you mask PII entities using two following options:

  • Entity type mask (default)
  • Custom mask

Entity type mask

Entity type mask redact PII entities with predefined PII types.

For example, using the text, "Dear John Doe! The credit balance on your card number 0000-1111-0000-1111 has been updated," with a Redact operation and Entity type mask, the Step returns the following text:

"Dear [NAME]! The credit balance on your card number [CREDIT DEBIT NUMBER] has been updated,"

Custom mask

Custom mask works similarly to Entity type mask but redact PII entities with characters you provide instead of predefined PII types.

Output and exit behavior

To set up this section, take the following steps:

  1. For Output data options, select the appropriate options to configure the output structure. The setting is available only for Detect operation.
  2. In Output data structure, ensure that the output structure suits your application.

Merge field settings

The Step returns the result as a JSON object and stores it in the Merge field variable. Thus you can access the output JSON object from any point of your Flow. To learn more about this Step's output, see the Output example.

Skip logic exit

Use this setting to handle cases where duplicate Merge field variable names exist in your Flow, whereas the previously defined variable holds value.

By default, in such cases, the Step overwrites the existing variable with the new value. Another option is to skip the Step execution and direct the Flow down the selected exit. To do so, follow these steps:

  1. Enable the Skip step execution if existing merge field has data toggle.
  2. In the Skip logic exit list, select exit to direct the Flow.

Output example

The Step's output contains information about each detected PII entity, including its type, confidence score, start and end points in the text, and redacted text (if applicable).

For example, using the Detect and Redact operations with default settings and the input text "Dear John Doe! The credit balance on your card number 0000-1111-0000-1111 has been updated," the Step returns the following JSON object:

json
{
  "count": 2,
  "byOrder": [
    {
      "score": 0.9999272227287292,
      "type": "name",
      "beginOffset": 5,
      "endOffset": 13,
      "text": "John Doe"
    },
    {
      "score": 0.9999970197677612,
      "type": "creditDebitNumber",
      "beginOffset": 54,
      "endOffset": 73,
      "text": "0000-1111-0000-1111"
    }
  ],
  "redacted": "Dear [NAME]! The credit balance on your card number [CREDIT DEBIT NUMBER] has been updated"
}
{
  "count": 2,
  "byOrder": [
    {
      "score": 0.9999272227287292,
      "type": "name",
      "beginOffset": 5,
      "endOffset": 13,
      "text": "John Doe"
    },
    {
      "score": 0.9999970197677612,
      "type": "creditDebitNumber",
      "beginOffset": 54,
      "endOffset": 73,
      "text": "0000-1111-0000-1111"
    }
  ],
  "redacted": "Dear [NAME]! The credit balance on your card number [CREDIT DEBIT NUMBER] has been updated"
}

Error Handling

By default, the Step handles errors using a separate exit. So if any error occurs during the Step execution, the Flow proceeds down the error exit.

Note: If you disable the Handle error toggle, the Step does not handle errors. With this setup, if any error occurs during the Step execution, the Flow fails immediately after exceeding the Flow's timeout. To prevent the Flow from being suspended while continuing to handle errors in the Flow, place the Flow Error Handling Step before the main Flow logic.

Reporting

The Step reports once after its execution. You can change the Step log level and add new tags in the section.

Log level

By default, the Step inherits its log level from Flow's log level. You can change the Step's log level by selecting an appropriate option from the Log level list.

Tags

Tags help organize and filter session information when generating reports. You can specify the tag category, label, and value when adding a new tag.

Service dependencies

  • flow builder - v2.28.3
  • event-manager - v2.3.0
  • deployer - v2.6.0
  • comprehend provider - v0.9.0

Release notes

v1.0.0

  • Initial release