Home Linux Commands How to Extract Email Addresses from Text File in Linux

How to Extract Email Addresses from Text File in Linux

Text files contain a continuous stream of characters in no predefined format whatsoever. While some file formats have developed on top of text files (Eg. JSON, YAML), which expect text data to be present in a particular format, normal '.txt' files have no such conventions. Hence, retrieving a specific line, or phrase, or string, from a text file, is to be done using generic Linux tools.

The grep command in Linux is used to find a substring or a text pattern, in a string or a file. It prints the line where the substring is found.

The syntax for using the grep command is as follows:

$ grep <substring> <filename/standard input>

For example, to search for substring “Name” in file ‘test.txt‘ (contents of which are shown in the screenshot), run the following.

$ grep "Name" test.txt
Find a String in File
Find a String in File

Today, we will see how to extract Email addresses out of text files using the grep command.

As we know, an Email address is present in the format:

<user_id>@<domain>.<subdomain>

Here, user_id is a unique identifier string chosen by the user, and domain and subdomain represent the Email service provider (Eg. gmail.com).

Domain and subdomain names can contain only alphabets, whereas user_id can contain alphabets, numeric characters as well as other common characters such as period (.) and underscore (_).

As this is a definite pattern that is to be searched, we can use the '-e' flag of grep, which allows us to specify regular expression patterns instead of substrings, for extraction from a file.

Thus, the syntax of grep with '-e' is:

$ grep -e <regular_expression> <filename/standard input>

Based on the pattern of an Email address discussed before, we can form the following regular expression:

[a-zA-Z0-9._]\+@[a-zA-Z]\+.[a-zA-Z]\+

Here, 'a-zA-Z' represents any alphabet, '0-9' represents numericals, '._' represent a period or an underscore. Note that the characters '\+' represent that the character set in the brackets should appear one or more times.

We will run this regular expression to extract Email addresses from the file ‘test2.txt‘.

First, view the contents of file test2.txt are:

$ cat test2.txt
View Contents of File
View Contents of File

Next, run the following command to extract Email addresses from the file.

$ grep -e "[a-zA-Z0-9._]\+@[a-zA-Z]\+\.[a-zA-Z]\+" test2.txt
Extract Email Addresses from File
Extract Email Addresses from File

As we can see, the Email addresses were identified successfully by Grep. However, they are being displayed along with the complete line in the file.

To display just the found Email IDs, use the '-o' flag along with '-e' as shown.

$ grep -oe "[a-zA-Z0-9._]\+@[a-zA-Z]\+\.[a-zA-Z]\+" test2.txt
Find Email Addresses in File
Find Email Addresses in File
Conclusion

In this article, we have seen how to extract Email addresses from a text file in Linux, using the handy command-line tool Grep. These Email addresses can then also be written to a file using redirection.

If you have any questions or feedback, let us know in the comments below.

Ravi Saive
I am an Experienced GNU/Linux expert and a full-stack software developer with over a decade in the field of Linux and Open Source technologies. Founder of TecMint.com, LinuxShellTips.com, and Fossmint.com. Over 150+ million people visited my websites.

Each tutorial at UbuntuMint is created by a team of experienced writers so that it meets our high-quality standards.

Was this article helpful? Please add a comment to show your appreciation and support.

4 thoughts on “How to Extract Email Addresses from Text File in Linux”

Got something to say? Join the discussion.

Thanks for choosing to leave a comment. Please keep in mind that all comments are moderated according to our comment policy, and your email address will NOT be published or shared. Please Do NOT use keywords in the name field. Let's have a personal and meaningful conversation.