Strings (text)

We already saw a little bit about strings in chapter 2, when we talked about Python data types. Going back to the definition we used there, a string is a type of data that represents text, normally printed or exported by a program.

In summary, it’s a piece of text, and it can contain letters, numbers and symbols.

Defining strings

There are three main ways to define a string. The two ways that are most common is defining text between single or double quotes (' or "). For most cases, use whatever you prefer. Let’s see some examples:

string1 = "Hey Python"
string2 = 'Bye Python'

> Hey Python
> Bye Python

The third way is the multi-line string. These are defined by using triple quotes, which are a sequence of three single quotes ('''). In a multi-line string, escaped characters assume their correct form and line breaks, either in the terminal or in your line of code, are treated as such in the output. In the example below, the multi-line string definition is only completed when you close the triple quotes, and you can user ENTER to break lines normally:

large_string = '''Hi. This is a large string in Python.
Here, you can use " or ' normally.
Characters are escaped as expected.
\t testing TAB so we can be done'''

> Hi. This is a large string in Python.
> Here, you can use " or ' normally.
> Characters are escaped as expected.
> 	 testing TAB so we can be done

Escaping strings

There are some times when you want to create a strings with single and double quotes in it. If you use single quotes to define a string and want to use a double quote, or if you want to do the opposite, just define the string normally with the desired character, as it will be valid and work normally. See the example:

string3 = "Sinnead O'Connor is a singer"
string4 = 'Alfred said: "hey, check this out!"'

> Sinnead O'Connor is a singer
> Alfred said: "hey, check this out!"

Now, what if you want to use them both in the same string, or if you always use one type to define your strings and want to use that same type in the middle of one string, you can escape the desired character. Escaping means using a character with a special meaning in a string, like a new line, a tab character or double or single quotes, for example.

To escape a character in Python, we use the backslash (\) before the character we want to escape. Let’s see some examples

string5 = "Alfred said: \"hey, check this out!\""
string6 = 'Sinnead O\'Connor said: "Nothing compares 2 u"'

> Alfredo disse "Corram aqui para ver isso!"
> Sinnead O'Connor disse "Nothing compares 2 u"

Finally, if you want a backslash in your string, you just use two backslashes (\\), which will turn into one backslash in the output:

string7 = "Escaping one \\"

> Escaping one \

From Python’s documentation, here follows a small table with escaped characters and their meanings:

Escape Sequence Meaning
\newline Backslash and newline ignored
\ Backslash ()
Single quote (‘)
Double quote (“)
\a ASCII Bell (BEL)
\b ASCII Backspace (BS)
\f ASCII Formfeed (FF)
\n ASCII Linefeed (LF)
\r ASCII Carriage Return (CR)
\t ASCII Horizontal Tab (TAB)
\v ASCII Vertical Tab (VT)
\ooo Character with octal value ooo
\xhh Character with hex value hh

Strings as they are

Another way to define strings is by including a r character before the string definition. This generates a raw string literal, which means that the string output is exactly how it’s defined. This means that it doesn’t replace escaped characters with their true meaning in the output. Let’s see an example below to understand:

string8 = r"We use \n to start a new line"

> We use \n to start a new line

With this, we can create strings without having to worry about escaped characters.

Including variables in a string

To use a value stored in a variable inside a string, you have multiple options. The first one, is by using a %, which is a string formatting character in Python.

Despite allowing more complex use cases, the most basic usage is through %s, to include other strings in a string, and %d, to include an integer. Let’s see examples for both.

Outra situação bastante recorrente: tenho um determinado valor armazenado em uma variável e quero exibi-lo juntamente com outros caracteres de texto. Posso fazer isso de diferentes formas. A primeira delas é utilizando o sinal de %, que é um caractere de formatação de strings no Python.

Apesar de permitir utilizações mais complexas para formatação de strings, sua utilização mais básica é através do %s para incluir outras strings em uma string, e %d para incluir um número inteiro (integer). Vejamos exemplos dos dois:

name = "Felipe"
age = 30
print("My name is %s " % name)
print("I am %d years old" % age)

> My name is Felipe
> I am 30 years old

We can also include decimal numbers (float or Decimal) in strings, using %f. Let’s see some possibilities in the following examples. First, one example without any formatting.

a = 30.46257
print("Formatting decimals: %f" % a)

> Formatting decimals: 30.462570

If we add a dot and a number, we can define the number of decimal places to be shown in the string:

print("Formatting decimals: %.2f" % a)
print("Formatting decimals: %.3f" % a)

> Formatting decimals: 30.46
> Formatting decimals: 30.463

You can insert more than one value in the string, putting them in order inside of the parenthesis, separated by comma. This data structure is the tuple, which we saw on an earlier chapter:

name = 'Felipe'
age = 30
print("My name is %s and I am %d years old." % (name, age))

> My name is Felipe and I am 30 years old.


Another way to insert values in strings is through concatenation. The concatenation sign in Python is +. Through it, we join two strings together. Let’s see the example below:

name = 'Felipe'
print("Hi, my name is " + name)

> Hi, my name is Felipe

The only issue with concatenation is that conversion to string is not done automatically. This leads to errors like the one below:

name = 'Felipe'
age = 30
print("Hi, my name is " + name + " and I am " + idade + " years old.")

> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> TypeError: can only concatenate str (not "int") to str

That means that if you want to join an integer together with a string, you have to convert it explicitly, using the str class:

print("Hi, my name is " + name + " and I am " + str(age) + "years old.")

> Hi, my name is Felipe and I am 30 years old.

What about decimal numbers? Well, if you don’t want to format it, you can use str again to create a string representation of the number and include then with concatenation:

value = 503.78987
print("The value is " + str(value))

> The value is 503.78987

If you want to format them before, we have to use the format function. Its parameters are the value to be formatted and the decimal cases format. Você já viu um exemplo deste tipo de formatação anteriormente. Let’s check the example:

print("The value is " + format(value, '.2f'))

> The value is é 503.79

f strings

Release with Python 3.6, f strings are a new feature, a really simple way to include values in a string. Basically, if you prefix your string with the letter f, you’re then able to write a Python expression inside of that string between curly braces {}. Let’s see an example:

> age = 34
> f"I am {age} years old"
'I am 34 years old'

Strings as lists

In Python, every string is also treated like a list. We will talk more about lists in a later chapter, but in summay, it’s similar to arrays in other programming languages. It’s used to store sequences of data. And in Python, a string is treated like a list.

This means that you can access any character through the index. First letter is index 0, and the last one is equal to the quantity of characters in the string minus one.

string9 = "Hi, my name is Felipe"

> O
> ,
> e

In the example above, the string we used has 22 characters. And what happens if we use an index that doesn’t exist. Well, Python raises an exception. Let’s see:


> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> IndexError: string index out of range

Useful functions to work with strings

Here, I will present some useful functions to work with strings, and a little bit of how it works. I will not get into details on every function that Python has, because there is too many of them, and the documentation for the language is a way better place to explain them. I’ll just present some of the functions that I consider important and pretty useful when working with strings, with some clear examples.

capitalize(), lower() and upper()

The capitalize function transforms the first letter in a string to upper case. lower and upper transforms the entire string to lower or upper case. Let’s see some examples:

string10 = "hello, my NAME is Felipe"

> Hello, my name is felipe

center(), ljust() and rjust()

The center function, as the name probably implies, centers a string to a certain number of characters, using a predefined character to fill both sides and effectively do the centering. ljust and rjust do something similar, but they only fill one of the sides. Let’s see one example with center

A função center(), como o nome provavelmente já dá a entender, centraliza uma string para um determinado número de caracteres, utilizando um caractere a ser definido para preencher a string dos dois lados. ljust() e rjust() fazem o mesmo, mas preenchem caracteres apenas à esquerda ou à direita. Veja um exemplo com o center. Os outros dois são análogos:

string11 = "hi, my name is Felipe"

> **************hi, my name is Felipe**************


This function checks if a substring is part of the string. If it is, it returns the position where this substring starts. If it doesn’t find anything it returns a -1. Let’s check the example:

string12 = "Hi, my name is Felipe"
substring1 = "my"
substring2 = "José"

> 4
> -1

You can also define the index where the search begins and ends, through the start and end parameters, like in the example:

print(string12.find(substring1, 7))
print(string12.find(substring1, 2))

> -1
> 5

isalnum(), isalpha() and isnumeric()

These functions indicate if a certain string is, respectively, entirely of alphanumeric characters, alphabetical or numbers. If the strings have at least on characters that makes the condition invalid, these functions will return False. It’s woth noting that white space in the string makes it invalid for any of these 3 functions. Let’s see examples, so we can understand well. First, a string with only alphabetic characters:

string13 = "Felipe"

> True
> True
> False

Now, a string with only numbers::

string14 = "1234"

> True
> False
> True

And now, one with both numbers and alphabetical characters:

string15 = "Felipe1234"

> True
> False
> False


The len() function returns the quantity of characters in a certain string. Note that len(), unlike most of the functions we saw until now, len() is not called from the string, but instead, we pass the string as a parameter. Let’s see:

string16 = "My name is Felipe"

> 17


The replace() function, as the name already implies, replaces one part of a string for another part, defined in the first two function arguments. Let’s check the example:

string17 = "Hi, my name is Felipe"

> Hi, my name is José

strip(), rstrip() e lstrip()

These 3 functions can be used to remove white space from certain parts of the strings. While rstrip() and lstrip() remove white space, respectively, from the right and from the left (hence, the r and l before strip), strip() will remove from both (left and right). Let’s see some examples

string18 = "   Hi, my name is Felipe      "

>    Hi, my name is Felipe       
> Hi, my name is Felipe
>   Hi, my name is Felipe
> Hi, my name is Felipe


There are certainly other useful functions related to string in Python, but we were able to check some of the most important and commonly used of them. We saw how to define strings, store them in a variable and format numbers.