Strings & Text Processing
Strings are immutable
What does this mean? Let's say we have the string:
s = "brahm"
Now, let's say that Brahm wants to change the string, so that the 'b' in his name is capitalized, like this: "Brahm"
To do that, Brahm might want to write a line of code like this, the same way you would change an element of a list:
s[0] = "B"
#this won't work!
But when he does this, Brahm runs into an error message:
TypeError: 'str' object does not support item assignment
This is a key difference between strings and lists: strings can't be mutated. The only way to change a string variable is to reassign it and make a new string.
So, Brahm can create a new string to capitalize the 'b' in his name:
str = "Brahm"
Immutability guarantees that string parameters won’t change, like in this example below. Click through the slides to see how the strings are stored in the computer's memory:
String Functions
.upper()
and .lower()
.upper()
returns a string with all the letters in upper case, and .lower()
returns a string with all the letters in lower case.
s = "So long and thanks for all the fish" >>> s.upper() " SO LONG AND THANKS FOR ALL THE FISH " >>> s.lower() " so long and thanks for all the fish "
.replace(char1, char2)
.replace(char1, char2)
replaces all instances of the first parameter with the second parameter
s = "So long and thanks for all the fish" >>> s.replace(“a”, “e”) “ So long end thenks for ell the fish ” >>> s.replace(“s”, “”) “ o long and thank for all the fih ”
.find(char)
.find(char)
returns the first index of where the parameter (a character) occurs in the string. If the character is not in the string, the function returns -1.
s = "So long and thanks for all the fish" >>> s.find(“n”) 5 >>> s.find(“x”) -1
.strip()
.strip()
removes all the white space at the beginning and end of the string.
s = " So long and thanks for all the fish " >>> s.strip() “So long and thanks for all the fish”
.split()
.split
splits the string into a list. If you don't pass any parameters, the string is split by spaces:
s = "So long and thanks for all the fish" >>> s.split() [“So”, “long”, “and”, “thanks”, “for”, “all”, “the”, “fish”]If you do pass a parameter, the string will be split by the parameter that you pass. So, if you pass in a comma as a parameter, the string will be separated by commas.
names = “Bruce,Diana,Victor,Barry,Clark,Arthur,Hal” >>> names.split(“,”) [“Bruce”, “Diana”, “Victor”, “Barry”, “Clark”, “Arthur”, “Hal”]
Some more string functions:
>>> chant = “wakanda forever” >>> stop_balrog = “YOU SHALL NOT PASS” >>> spaces = “ ” >>> number = “42” >>> chant.startswith(“wak”) True >>> stop_balrog.startswith(“you”) False >>> chant.endswith(“ver”) True >>> chant.title() “Wakanda Forever” >>> chant.islower() True >>> spaces.isspace() True >>> number.isdigit() True
Because strings are immutable, these functions don’t change the string and return a new string instead.
How to Process A String
Processing a string involves transforming or inspecting the contents of the string
If you need both the index (i) and the character (char), use a for i in range loop | If you need just the character (char), use a for-each loop |
---|---|
|
|
Reversing a string
We can use the code we wrote to reverse a string to check if a string is a palindrome! A palindrome is a word that reads the same way from the front and from the back. Here are some examples: kayak, A man, a plan, a canal - Panama! Here's a palindrome in Hindi: कडक
def reverse_string(s):
reverse = “”
for ch in s:
reverse = ch + reverse
return reverse
def is_palindrome(str):
normalized = normalize(str)
rev = reverse_string(normalized)
return normalized == rev
def normalize(str):
'''
This function removes all spaces and punctuation from a string
'''
normalized = ''
for ch in str:
if ch.isalpha():
normalized += ch.lower()
return normalized
Solving a Puzzle
Let's return to the story of Bright Simons, who came up with the mPedigree problem. A significant problem people with malaria face is with counterfeit medicine. In 2013, the UN estimated that 700,000 people die each year because they are given counterfeit drugs. Bright Simons wanted to find a way to prevent malaria patients from receiving counterfeit medicine.
Here's how Bright Simon's solution works: Each box of medicine from the manufacturer has a label with a number on it. Someone can then take that number, and text it to mPedigree. If that number comes from a manufacturer, mPedigree will let them know. If that number comes from a counterfeiter who just put a random number on the box, mPedigree can tell the patient to not take the medicine.
You now have all the knowledge needed to implement Bright Simon's solution in Python. How can we write a function that generates these numbers for his solution?