grep is a utility program available on any Unix-like operating system.
The most common use of the grep utility, is to look for a matching string in a file or
with the use of the -r option, directories of files
grep -rn variable_name ./src
The -n option makes it prints out line numbers and the file path relative to the
current working directory
Another common use is finding the large files on a system,
very practical when one runs out of storage space.
du -h ~/Downloads | grep ^[0-9.]*[G]
1.1G /home/etag/Downloads/snap/firefox/common/.cache/mozilla/firefox/szwshp2r.default/cache2/entries
1.1G /home/etag/Downloads/snap/firefox/common/.cache/mozilla/firefox/szwshp2r.default/cache2
1.1G /home/etag/Downloads/snap/firefox/common/.cache/mozilla/firefox/szwshp2r.default
1.1G /home/etag/Downloads/snap/firefox/common/.cache/mozilla/firefox
1.1G /home/etag/Downloads/snap/firefox/common/.cache/mozilla
1.1G /home/etag/Downloads/snap/firefox/common/.cache
1.9G /home/etag/Downloads/snap/firefox/common
1.9G /home/etag/Downloads/snap/firefox
1.3G /home/etag/Downloads/snap/teams-for-linux
3.8G /home/etag/Downloads/snap
6.7G /home/etag/Downloads/archives
1.4G /home/etag/Downloads/cube_mx
31G /home/etag/Downloads/
It printed all directories that are larger than 1 gigabyte
du is a utility that estimates the storage space of files or directories, in this example only directories
and the -h option makes it print the size in a human friendly format.
The output of the du utility is then passed to the standard input of grep which
filters it by applying a simple regular expression ^[0-9.]*G
The text must begin with any number of characters matching the range 0-9 and the dot . followed
by a single capital letter G
The caret ^ means 'beginning of the line'
The brackets [0-9.] are used to specify a character class, ranges can be used like 'A-z' 'A-Z' '0-9' which is
of course more practical than typing out [0123456789] each time.
The asterisk * means zero or more occurrences of the character or character class, if the * is omitted in the above
example it would only match lines beginning with a single digit or dot and followed by a G
* always applies to the previous character or character class
1G
9G
.G
Since the du utility also prints out file names, I often use it to get a list of all files names
of a particular extension. While I agree that the find utility is probably faster and more optimized,
my problem with it is that I can never remember the syntax or options :) So I tend to use it only in scripts
and lookup the manual page each time.
du -a src | grep "\.h$"
This outputs a list of all C header files present in the src directory
The dollar sign $ means 'end of the line'
The dot . 'means any single character'
Consider the following command
du -a src | grep .h$
It would nearly achieve the desired result, it will also print out files ending in sh
so shell scripts and C header files.
To match lines ending in '.h' the dot has to be escaped with a backslash \
to represent the actual character.
Note that the quotes are necessary here, because the shell also makes use of the backslash to escape characters
that have a 'special meaning'.
One can also use the variant below to prevent the shell from interpreting the backslash
du -a src | grep \\.h$
The dot is a very powerful concept, let's print out the first 10 words in the English dictionary
that begin with 'th' followed by any 3 letters
grep ^th...$ /usr/dict/en/words.txt | head -n 10
thack
thais
thala
thana
thane
thank
tharf
tharm
thatd
thatn
One can find some unusual words :)
Another common everyday use of the caret is filtering the output of ls -l
to show only directories
ls -l ~/src/lvgl | grep ^d
drwxrwxr-x 15 etag etag 4096 Dec 28 13:56 demos
drwxrwxr-x 9 etag etag 4096 Dec 28 13:56 docs
drwxrwxr-x 8 etag etag 4096 Dec 16 13:17 env_support
drwxrwxr-x 14 etag etag 4096 Dec 16 13:17 examples
drwxrwxr-x 3 etag etag 4096 Dec 16 13:17 libs
drwxrwxr-x 5 etag etag 4096 Dec 28 13:56 scripts
drwxrwxr-x 17 etag etag 4096 Dec 28 13:56 src
drwxrwxr-x 11 etag etag 4096 Dec 28 13:56 tests
drwxrwxr-x 2 etag etag 4096 Dec 28 13:56 xmls
drwxrwxr-x 2 etag etag 4096 Dec 16 13:17 zephyr
To only show files, one can use the -v option to invert the meaning of the expression
ls -l ~/src/lvgl | grep -v ^d | head -n 6
total 232
-rw-rw-r-- 1 etag etag 1518 Dec 28 13:56 CMakeLists.txt
-rw-rw-r-- 1 etag etag 2818 Dec 16 13:17 CMakePresets.json
-rw-rw-r-- 1 etag etag 2508 Dec 16 13:17 component.mk
-rw-rw-r-- 1 etag etag 204 Dec 16 13:17 idf_component.yml
-rw-rw-r-- 1 etag etag 57414 Dec 28 13:56 Kconfig
But the if total amount is not needed:
ls -l ~/src/lvgl | grep ^- | head -n 6
When working with LVGL, to locate the definition of a function this
expression can be used
grep -rn "^[a-z_\ *]*\ lv_obj_set_size.*[^;/]$" src
src/core/lv_obj_pos.c:213:void lv_obj_set_size(lv_obj_t * obj, int32_t w, int32_t h)
Let's break it down, the line must begin with one or more characters defined by the character class [a-z_\ *]
which allows letters underscores spaces and the * character
So, it covers storage specifiers, return types and pointers.
There must be at least a space before the function name which can be followed by any number of characters
but it must not end with ; or /
With the caret ^ omitted from the [;/] character class, it will output
the declaration and calls to the function, this is because the caret inverts the meaning of the
character class
grep -rn "^[a-z_\ *]*\ lv_obj_create.*[;]$" src
src/core/lv_obj.h:229:lv_obj_t * lv_obj_create(lv_obj_t * parent);
src/widgets/tabview/lv_tabview.c:277: lv_obj_create(obj);
tests/src/test_cases/widgets/test_obj_tree.c:18: lv_obj_create(lv_screen_active());
tests/src/test_cases/widgets/test_obj_tree.c:20: lv_obj_create(lv_screen_active());
To only show declarations, filter it more
grep -rn "^[a-z_\ *]*\ lv_obj_create.*[;]$" src | grep \\.h
src/core/lv_obj.h:229:lv_obj_t * lv_obj_create(lv_obj_t * parent);
This expression works because the coding convention used by LVGL states that
the opening brace { of a function body always goes on the line below
The / is needed because it could match a calls instead
something(); /* some comment */
Usually expressions depend on the style of the code
some projects, have return types and storage specifiers on a separate line above the function name
void
myfunction(int x, int y)
{
/* body */
}
In such projects the following expression can be used to locate the definition
grep -rn ^myfunction src
There are many more use cases for grep, it is a very practical tool that I use on a daily basis.
To filter logs, find occurrences and extract matching strings with the -o option
the only catch is that it can only perform pattern matching on one line at a time.