top of page
  • Linkedin

General Discussions

Public·2 members

Tips on using grep


grep is a utility program available on any Unix-like operating system.


The most common use of the grep utility, is to look for a matching string in a file or

with the use of the -r option, directories of files

grep -rn variable_name ./src

The -n option makes it prints out line numbers and the file path relative to the

current working directory


Another common use is finding the large files on a system,

very practical when one runs out of storage space.

du -h ~/Downloads | grep ^[0-9.]*[G]
1.1G	/home/etag/Downloads/snap/firefox/common/.cache/mozilla/firefox/szwshp2r.default/cache2/entries
1.1G	/home/etag/Downloads/snap/firefox/common/.cache/mozilla/firefox/szwshp2r.default/cache2
1.1G	/home/etag/Downloads/snap/firefox/common/.cache/mozilla/firefox/szwshp2r.default
1.1G	/home/etag/Downloads/snap/firefox/common/.cache/mozilla/firefox
1.1G	/home/etag/Downloads/snap/firefox/common/.cache/mozilla
1.1G	/home/etag/Downloads/snap/firefox/common/.cache
1.9G	/home/etag/Downloads/snap/firefox/common
1.9G	/home/etag/Downloads/snap/firefox
1.3G	/home/etag/Downloads/snap/teams-for-linux
3.8G	/home/etag/Downloads/snap
6.7G	/home/etag/Downloads/archives
1.4G	/home/etag/Downloads/cube_mx
31G	/home/etag/Downloads/

It printed all directories that are larger than 1 gigabyte


du is a utility that estimates the storage space of files or directories, in this example only directories

and the -h option makes it print the size in a human friendly format.


The output of the du utility is then passed to the standard input of grep which

filters it by applying a simple regular expression ^[0-9.]*G


The text must begin with any number of characters matching the range 0-9 and the dot . followed

by a single capital letter G


The caret ^ means 'beginning of the line'


The brackets [0-9.] are used to specify a character class, ranges can be used like 'A-z' 'A-Z' '0-9' which is

of course more practical than typing out [0123456789] each time.


The asterisk * means zero or more occurrences of the character or character class, if the * is omitted in the above

example it would only match lines beginning with a single digit or dot and followed by a G

* always applies to the previous character or character class

1G
9G
.G

Since the du utility also prints out file names, I often use it to get a list of all files names

of a particular extension. While I agree that the find utility is probably faster and more optimized,

my problem with it is that I can never remember the syntax or options :) So I tend to use it only in scripts

and lookup the manual page each time.

du -a src | grep "\.h$"

This outputs a list of all C header files present in the src directory


The dollar sign $ means 'end of the line'

The dot . 'means any single character'


Consider the following command

du -a src | grep .h$

It would nearly achieve the desired result, it will also print out files ending in sh

so shell scripts and C header files.


To match lines ending in '.h' the dot has to be escaped with a backslash \

to represent the actual character.


Note that the quotes are necessary here, because the shell also makes use of the backslash to escape characters

that have a 'special meaning'.


One can also use the variant below to prevent the shell from interpreting the backslash

du -a src | grep \\.h$

The dot is a very powerful concept, let's print out the first 10 words in the English dictionary

that begin with 'th' followed by any 3 letters

grep ^th...$ /usr/dict/en/words.txt | head -n 10
thack
thais
thala
thana
thane
thank
tharf
tharm
thatd
thatn

One can find some unusual words :)


Another common everyday use of the caret is filtering the output of ls -l

to show only directories

ls -l ~/src/lvgl | grep ^d
drwxrwxr-x 15 etag etag  4096 Dec 28 13:56 demos
drwxrwxr-x  9 etag etag  4096 Dec 28 13:56 docs
drwxrwxr-x  8 etag etag  4096 Dec 16 13:17 env_support
drwxrwxr-x 14 etag etag  4096 Dec 16 13:17 examples
drwxrwxr-x  3 etag etag  4096 Dec 16 13:17 libs
drwxrwxr-x  5 etag etag  4096 Dec 28 13:56 scripts
drwxrwxr-x 17 etag etag  4096 Dec 28 13:56 src
drwxrwxr-x 11 etag etag  4096 Dec 28 13:56 tests
drwxrwxr-x  2 etag etag  4096 Dec 28 13:56 xmls
drwxrwxr-x  2 etag etag  4096 Dec 16 13:17 zephyr

To only show files, one can use the -v option to invert the meaning of the expression

ls -l ~/src/lvgl | grep -v ^d | head -n 6
total 232
-rw-rw-r--  1 etag etag  1518 Dec 28 13:56 CMakeLists.txt
-rw-rw-r--  1 etag etag  2818 Dec 16 13:17 CMakePresets.json
-rw-rw-r--  1 etag etag  2508 Dec 16 13:17 component.mk
-rw-rw-r--  1 etag etag   204 Dec 16 13:17 idf_component.yml
-rw-rw-r--  1 etag etag 57414 Dec 28 13:56 Kconfig

But the if total amount is not needed:

ls -l ~/src/lvgl | grep ^- | head -n 6

When working with LVGL, to locate the definition of a function this

expression can be used

grep -rn "^[a-z_\ *]*\ lv_obj_set_size.*[^;/]$" src
src/core/lv_obj_pos.c:213:void lv_obj_set_size(lv_obj_t * obj, int32_t w, int32_t h)

Let's break it down, the line must begin with one or more characters defined by the character class [a-z_\ *]

which allows letters underscores spaces and the * character


So, it covers storage specifiers, return types and pointers.


There must be at least a space before the function name which can be followed by any number of characters

but it must not end with ; or /


With the caret ^ omitted from the [;/] character class, it will output

the declaration and calls to the function, this is because the caret inverts the meaning of the

character class

grep -rn "^[a-z_\ *]*\ lv_obj_create.*[;]$" src
src/core/lv_obj.h:229:lv_obj_t * lv_obj_create(lv_obj_t * parent);
src/widgets/tabview/lv_tabview.c:277:    lv_obj_create(obj);
tests/src/test_cases/widgets/test_obj_tree.c:18:    lv_obj_create(lv_screen_active());
tests/src/test_cases/widgets/test_obj_tree.c:20:    lv_obj_create(lv_screen_active());

To only show declarations, filter it more

grep -rn "^[a-z_\ *]*\ lv_obj_create.*[;]$" src | grep \\.h
src/core/lv_obj.h:229:lv_obj_t * lv_obj_create(lv_obj_t * parent);

This expression works because the coding convention used by LVGL states that

the opening brace { of a function body always goes on the line below


The / is needed because it could match a calls instead

something(); /* some comment */

Usually expressions depend on the style of the code

some projects, have return types and storage specifiers on a separate line above the function name

void
myfunction(int x, int y)
{
	/* body */
}

In such projects the following expression can be used to locate the definition

grep -rn ^myfunction src

There are many more use cases for grep, it is a very practical tool that I use on a daily basis.


To filter logs, find occurrences and extract matching strings with the -o option

the only catch is that it can only perform pattern matching on one line at a time.

26 Views
bottom of page