Learn to Program using Python

Strings, Part II

by Richard G. Baldwin
baldwin.richard@iname.com

File Pyth0024.htm

June 30, 2000


Preface

This document is part of a series of online tutorial lessons designed to teach you how to program using the Python scripting language.

Something for everyone

Beginners start at the beginning, and experienced programmers jump in further along. Lesson 1 provides an overall description of this online programming course.

Introduction

What you have learned

You have learned how to write some simple programs and execute them interactively.

You have learned how to capture simple programs in script files and to execute those script files.

You have learned how to construct programs, including the indentation concepts involved in Python.

You have also learned some of the fundamental concepts involving strings.

What you will learn

This lesson will expand your knowledge of strings, and in addition will introduce you to some concepts that will be useful with other data types as well: indexing and slicing.

What is indexing?

According to a definition that I found on the web, "... an ordinal number is an adjective which describes the numerical position of an object, e.g., first, second, third, etc."

A practical example

Many years ago when I did a tour of duty as an enlisted man in the U.S. Air Force, they had a habit of lining us up and requiring us to "count off."

What this meant was that the first person in the line called out the number one, the person behind him called out the number two, the person behind him called out the number three, etc.  (Since learning about computer programming, I now wonder if the first person should have called out zero.)

Assigning an ordinal index

I'm sure they didn't realize that what they were doing was assigning an ordinal index value to each person in the line (and neither did I at the time).

Using an ordinal index

Even though they didn't know the technical details of ordinal indices, they didn't have any difficulty saying, "Number six, wash dishes, number fourteen, peel potatoes, number twenty-two, carry out the garbage, etc."

This is indexing

That is what indexing is all about.
 
In the context of this lesson, indexing is the process of assigning an ordinal index value to each data item contained in some sort of a container. 

In other words, we assign an ordinal number to each item, which describes the numerical position of the item in the container.
 
For example, if you were very careful, you could use a felt tip pen to assign an ordinal index to each of the twelve eggs contained in a carton containing a dozen eggs.  (Should you start with zero or one?) 

Then you could extract the egg whose index value is 9 from the container and eat it for breakfast.

Having assigned the index, we can use that index to access the data item corresponding to that index, as in "Number six, wash dishes."

(Note that this process is also referred to as a subscription in the Python Reference Manual.)

Index values automatically assigned

In this lesson, we will be using the index values that are automatically assigned to the characters in a string for the purpose of accessing those characters, both individually, and in groups.

What is slicing?

Here is what Magnus Lie Hetland has to say on the topic of slicing (and indexing as well.)  Although this quotation was taken from a discussion of lists, it applies equally well to strings.
 
"One of the nice things about lists is that you can access their elements separately or in groups, through indexing and slicing. 

Indexing is done (as in many other languages) by appending the index in brackets to the list. (Note that the first element has index 0). (This is the answer to the question about the first egg -- Baldwin)
...
Slicing is almost like indexing, except that you indicate both the start and stop index of the result, with a colon (":") separating them: 
..
Notice that the end is non-inclusive. If one of the indices is dropped, it is assumed that you want everything in that direction. i.e. list[:3] means "every element from the beginning of list up to element 3, non-inclusive." 
...
list[3:] would, on the other hand, mean "every element from list, starting at element 3 (inclusive) up to, and including, the last one." 

For really interesting results, you can use negative numbers too: list[-3] is the third element from the end of the list..." 

Some material deleted for brevity

I added the boldface and the red comment for emphasis.  I also deleted some of the material from this quotation for brevity, but I will cover that material later in conjunction with my discussion of indexing and slicing strings.

A Sample Program

I will illustrate indexing and slicing of strings using a sample program contained in a script file named String01.py.

The program listing

A complete listing of the program, and the output produced by the program, are provided at the end of the lesson.

Will discuss in fragments

I will discuss the program in fragments, illustrating particular aspects of indexing and slicing in each fragment.  This is a scheme that I will use frequently in this set of tutorial lessons.

Interesting Code Fragments

A single character can be extracted from a string by referring to the string and indicating the index of the character in square brackets, as shown in the code fragment in Figure 1.

(Note that this is a fragment from a script file, not from an interactive Python program.)

First create a string to work with

The fragment in Figure 1 creates a string (highlighted in boldface) and assigns it to a variable named aStr.  From this point on, the contents of the string can be accessed by referring to the variable.

Index values always begin with zero

Unlike eggs and Air Force enlisted men, the first character in a string is always located at index 0, as in aStr[0].

Thus, the second statement in the fragment extracts and prints the T, which is the first character in the word This.

Character at index 3, display yourself

Similarly, the last statement in the fragment extracts and prints the s from index position 3, as in aStr[3].

The character at this index position is the s that ends the word This.

The last statement is equivalent to the following request, "Will the character at index position 3 please display yourself on the screen."

Is this the fourth character?

(You would probably refer to this as the fourth character, and you would be correct if you did.  The character at index 3 is the fourth character in the string.  The character at index 0 is the first character in the string.  First does not equate to index 1.  You need to think about this, because this can be a confusing topic for new programmers.)

An important, and potentially confusing point

At the risk of becoming boring, there is an important point here that you might as well get used to right now.

The s is located at index value 3.  However, according to the way you are probably accustomed to counting, this is actually the fourth character in the string.  You might be inclined to refer to this character as character number 4.

This is because index values always begin with zero, while you are probably accustomed to counting things beginning with one, not zero.

Not like eggs

If you access the egg at index value 4 in the container and eat it for breakfast, it cannot be accessed again (because it will be gone).

However, if you access the character at index value 4 in the string and use it for some purpose, what you really use is a copy of the character.  It is still there and can be accessed again.

(Some data containers do allow for the removal of data elements in much the same sense that we can remove an egg from its container.  However, a string is not such a container.)

A simple slice

For convenience, here is another copy of the fragment that created the string.
 
aStr = "This is a string"

The fragment in Figure 2 cuts a couple of slices out of that string and displays them on the screen.

Slice Notation

The slice notation uses two index values separated by a colon, as shown in boldface in Figure 2.

The end is non-inclusive

As was indicated in the earlier quotation, "... the end is non-inclusive."  This means that the character whose index value is the number following the colon is not included in the slice.

Extract the first word in the string

Thus, the first statement containing the reference aStr[0:4] extracts and prints the character sequence beginning with index value 0 and ending with index value 3 (not 4).  This causes the word This to be extracted and printed.

Extract the last word in the string

Similarly, the second statement in the above fragment (aStr[10:16]) extracts and prints the characters having index values from 10 through 15, inclusive (not 16).  This causes the word string to be extracted and printed.

Omitting the first index

If you omit the first index value, as shown in Figure 3, it defaults to the value zero.

Therefore, the statement in Figure 3 extracts and prints the first word in the string, which is This.

Omitting the second index

If you omit the second index, as shown in Figure 4, it defaults to a value that includes the last character in the string.

Thus, the statement in Figure 4 extracts and prints the last word in the string, which is string.

Print the entire string

Figure 5 shows two different ways to extract and print the entire string.  I won't comment on this, but will leave the analysis as an exercise for the student.

(Hint:  Remember that the plus sign when used with strings is the string concatenation operator.)

Print an empty string

There are several ways that you can specify the index values that will produce an empty string.  One of those ways is shown following the plus sign in Figure 6.

In Figure 6, both index values are outside the bounds of the index values of the characters in the string, which range from 0 through 15 inclusive.

Negative indices

Although it can be a little confusing, negative index values can be used to count from the right, as shown in Figure 7.

This fragment extracts and prints the characters tri from the word string, which is the last word in the string.

Eliminating confusion

Once you allow negative indices for slicing, thing can become very confusing.  The following explanation of how indices work with slicing is attributed to Guido van Rossum.

In this example, Mr. van Rossum is referring to a five-character string with a value of "HelpA".
 
The best way to remember how slices work is to think of the indices as pointing between characters, with the left edge of the first character numbered 0. Then the right edge of the last character of a string of n characters has index n, for example: 

     +---+---+---+---+---+ 
     | H | e | l | p | A |
     +---+---+---+---+---+ 
     0   1   2   3   4   5 
    -5  -4  -3  -2  -1

The first row of numbers gives the position of the indices 0...5 in the string; the second row gives the corresponding negative indices. 

The slice from i to j consists of all characters between the edges labeled i and j, respectively. 

For nonnegative indices, the length of a slice is the difference of the indices, if both are within bounds, e.g., the length of word[1:3] is 2. 

Hopefully, this explanation will help you to understand and to remember how index values are used for the extraction of substrings from strings using slicing.

Getting the length of a string

And finally, a built-in function named len() can be used to determine the number of characters actually contained in a string as shown in Figure 8.

For the example string used in this lesson, Figure 8 gets and prints the length of the string as 16.

If you count the characters in the string (beginning with 1), you will conclude that there are 16 characters in the string.

Note the difference between the number of characters and the maximum index value

For a string containing 16 characters, the valid index values range from 0 through 15 inclusive.

The complete output

This Python script file produces the output shown in Figure 9 on my computer (boldface added for emphasis).

A String is Immutable

There is one more point that needs to be made here.  Although you can use indexing and slicing to access the characters in a string, you cannot use indexing and slicing to assign new character values to those characters.

This is because a Python string is immutable.  In other words, after it is created, it cannot be modified.

Complete Program Listing

A complete listing of the program follows is shown in Figure 10.
 

Review

1.  What is an ordinal number?

Ans:  An ordinal number is an adjective, which describes the numerical position of an object, e.g., first, second, third, etc.

2.  What is indexing?

Ans:  Indexing is the process of assigning an ordinal index value to each data item contained in some sort of a container.

3.  We assign index values to the characters in a string, True or False?

Ans:  False.  Index values are automatically assigned to the characters in a string.

4.  What is the syntax for accessing a character at a particular index in a string?

Ans:  Refer to the string, and include the index value in square brackets, as in aStr[3].

5.  What is the syntax for accessing a substring from a string using slicing?

Ans:  Include both the start and stop index in square brackets, separated by a colon, as in aStr[10:16].

6.  The second index of a slice is inclusive, True or False?

Ans:  False.  The character whose index value matches the second index of a slice is not included in the slice.

7.  What is the default value for the first index if you omit the first index value in a slice?

Ans:  The default value is zero (0).

8.  What is the default value for the second index if you omit the second index value in a slice?

Ans:  The default value is the length of the string (the actual number of characters in the string).

9.  What is the purpose of negative slice indices?

Ans:  Negative indices can be used to count from the right end of the string.

10.  What is the name and syntax of the function that can be used to find the length of a string?

Ans:  len(theString)
 
 

Copyright 2000, Richard G. Baldwin.  Reproduction in whole or in part in any form or medium without  express written permission from Richard Baldwin is prohibited.

About the author

Richard Baldwin is a college professor and private consultant whose primary focus is a combination of Java and XML. In addition to the many platform-independent benefits of Java applications, he believes that a combination of Java and XML will become the primary driving force in the delivery of structured information on the Web.

Richard has participated in numerous consulting projects involving Java, XML, or a combination of the two.  He frequently provides onsite Java and/or XML training at the high-tech companies located in and around Austin, Texas.  He is the author of Baldwin's Java Programming Tutorials, which has gained a worldwide following among experienced and aspiring Java programmers. He has also published articles on Java Programming in Java Pro magazine.

Richard holds an MSEE degree from Southern Methodist University and has many years of experience in the application of computer technology to real-world problems.

baldwin.richard@iname.com

-end-