Before learning what a substring is in Python, let’s first understand the concept of a string in Python so that it would be easier for you to understand Python substring in a better way.
- String
- What is a Substring?
- How a Substring can be generated from a given String
- Slicing in Python
- What is String Slicing In Python?
- Syntax of Slicing Operator
- Different Methods of Slicing Strings In Python
String
A string in Python can be defined as a multiple code character/s series that includes a number or collection of characters that may include alphanumeric and special characters, respectively. Strings are one of the most common styles used in the Python language. Strings can be generated by literally insulating characters in quotations. Python handles single quotes similar to double quotes. Building strings in Python is just as easy as a value is allocated to a variable.
For Example:
Variable1 = "Hello Python"
Variable2 = "Welcome to the world of Python"
What is a Substring?
Just imagine what a car company does to find out the last five digits of a Chassis Number in a fast and efficient manner. The solution to this image is hidden behind the concept of Substring. Let’s read along to know more about substring. Before moving ahead, you can also take up a free online Python fundamentals for beginners course and enhance your skills.
In proper language analysis and computer science, a substring is a sequential character segment within a string.
In other words, a substring can be explained as a part of a string that is constructed by several techniques specified by the Python string that checks if it includes a substring, substring index, etc.
In another way, a substring can be defined as a part or subset of a string. Any modification in text data of a string is a part of the substring process.
For example: “This is great work. We must pursue it.” is a type of string, and part of the string “We must pursue it” is a type of substring.
In Python, a substring can be extracted by using slicing.
Many times, programmers want to split data they have into different parts for some specific purpose. For example, if a developer has data as the full name of a user and he requires the only first name to use, then, in this case, the developer will be required to split data into two parts, like forename and surname.
Now the question is how this job will be done by a developer in the Python programming language?
The answer is, to accomplish this kind of job, a developer needs to perform “string slicing.” In Python, string slicing is a type of technique that is used to get a specific part of a string, and this specific part later becomes a “substring.”
check out the free course on python for data analysis.
How a Substring can be generated from a given String?
There are several techniques available to generate a substring from a string in Python. But, the slicing operation is one of the most widely used techniques for generating a substring from a string in Python.
Slicing in Python
Strings are a collection of characters, and these characters can be accessed anytime by a program developer based on their position. This is known as indexing. Indexing is a technique in Python that is used to get back a one-character string at the specified position or offset.
Now, in case a section of string is required rather than a single character, then slicing is the technique that is used to perform this activity.
What is String Slicing In Python?
Slicing can be explained as a generalized form of indexing that returns an entire required section in a single step instead of a single item. With the help of slicing, many activities can be performed, like extracting columns of data, stripping off leading and trailing characters, and much more.
A very simple concept is used in slicing. When a string is indexed using a pair of offsets separated by a colon (:), Python returns a new string object that contains the section identified by the offset pair.
In the offset pair, the left offset, lower bound, is inclusive, and the right offset, upper bound, is non-inclusive. In case both the offsets are not specified, then the left and right bounds will default to value 0 and the length of the string that you are slicing, respectively.
Let’s go into the details to understand the syntax of the Slicing operator.
Also Read: How to convert List to String | String to List – Python Program
Syntax of Slicing Operator
As we have already read earlier, the slicing operator is considered one of the best methods that can be used for the creation of a substring.
Let’s understand the syntax of the slicing operator:
string[startIndex: endIndex: steps]
where,
startIndex: It is the starting index of the substring. At this index, the character is included in the substring. If the startIndex value is not set then, it is assumed to equal to 0.
endIndex: It is the last index of the substring. At this index, the character is not included in the substring. If the endIndex value is not set then, it is assumed to be equal to the entire length of the string by default.
step: It is referred to as how many characters to move forward after the first character is retrieved from the string. Its default value is 1.
Different Methods of Slicing Strings In Python
There are several ways for the creation of substring but most of them are slicing operators and can be used in different forms to get different kinds of output. So, let’s understand one by one in detail with the help of examples.
Using start index and end index ([start])
When the start index and the end index are specified in the slicing operator, then a substring generates, which includes the starting index but excludes the ending index. Let’s understand this with an example.
Example:
Let’s see this example where bypassing both start and end value slicing of the original string is done.
originalString = ' vectorAcademy'
subString = originalString[1:7]
print('originalString: ', originalString)
print('subString: ', subString)
Output:
originalString: vector Academy
subString: ectorA
Explanation:
Firstly, an original string is created.
Secondly, a slicing operator is used in which startIndex and the endIndex syntax are passed.
Finally, in the resulting output, the character at startIndex is included while the character at endIndex is excluded.
Using start index without end index ([start:])
When in the slicing operator, only the start index is specified and the end index is not specified, then, the generated substring includes the starting index and creates a substring till the end of the string.
Let’s check the example of this type of case.
Example:
In this example, the slicing of the original string is done by only passing the start value.
originalString = 'pythonknowledge'
subString = originalString[5:]
print('originalString: ', originalString)
print('subString: ', subString)
Output:
originalString:
pythonknowledge
subString: nknowledge
Explanation:
Firstly, an original string is created.
Then, a slicing operator is used in which a startIndex is passed.
Finally, in the received output, we see that the character at startIndex is included and the substring is generated till the end of the string.
Using end index without start index ([])
When in the process of generating a substring from a string, we specify only the endIndex in the slicing operator, not the startIndex, then, a substring starts generating from the start of the string and it ends where the endIndex is specified
Let’s check the example of this type of case.
Example:
In this example, slicing of the original string is being done by just passing only endIndex.
originalString = 'vectorAcademy'
subString = originalString[:10]
print('originalString: ', originalString)
print('subString: ', subString)
Output:
originalString: vectorAcademy
subString: vectorAcad
Explanation:
Firstly, an original string is created.
Then, a slicing operator is used in which the endIndex syntax is passed.
In the final output, we find that a substring is generated which starts from the beginning of the string and ends at the position where endIndex is specified.
Using complete string ([:])
When in the process of generating a substring from the string, the start index and the end index are not specified in the slicing operator, then, in that case, the substring generated is from the beginning to the end of the string. In other words, we can say that it would be a replica of the string.
Let’s check this case by example.
Example:
In this example, the original string is being sliced bypassing no value in the slicing operator.
originalString = 'pythonKnowledge'
subString = originalString[:]
print('originalString: ', originalString)
print('subString: ', subString)
Output:
originalString:
pythonKnowledge
subString:
python Knowledge
Explanation:
Firstly, an original string is created.
Then, a slicing operator is used to generate a substring in which no parameters are specified.
In the final result, we see that the output is just the same as the input.
Using a single character from a string ([index])
When the single index is specified in the slicing operator then we get a single character as an output which is present at that particular index.
Let’s understand this by example.
Example:
In this example slicing of the original string will be done by passing a single index position only.
originalString = 'vectorAcademy'
subString = originalString[5]
print('originalString: ', originalString)
print('subString: ', subString)
Output:
originalString:
vectorAcademy
subString: r
Explanation:
Firstly, an original string is created.
After that, a slicing operator is used in which a single index is passed.
Finally, as an output, we get a character printed which was at the position where the index was specified.
See Using Of Start Index, End Index And Step (Start : End : Step)
When the start index, end index and the steps syntax are specified in a slicing operator to generate a substring from a string then a substring generates from the start index to the end index where every character is at an interval of steps which are passed in the parameter. The default value of steps is set to 1.
Example:
Let’s see this example where slicing of the original string is being done to generate a substring by passing start, end and the steps value.
originalString = 'pythonknowledge'
subString = originalString[2:12:2]
print('originalString: ', originalString)
print('subString: ', subString)
Output:
originalString: pythonknowledge
subString: tokol
Explanation:
Firstly, an original string is created.
Then, the slicing operator is used in which the startIndex and the endIndex and the step syntax are passed.
In the final result, we get the output where the character at startIndex is included while the character at endIndex is excluded and every character is at an interval of steps which are passed in the parameter.
Using Negative Index ([-index])
As we are aware python also supports -ve indexing. In this process, the letters of the string when traversed from right to left are indexed with negative numbers.
Example:
In this example, the original string is sliced by passing negative(-) values.
originalString = 'vector Academy'
subString = originalString[-5]
print('originalString: ', originalString)
print('subString: ', subString)
Output:
originalString: vector Academy
subString: a
Using Positive Index ([index])
In this case, we will use the positive index to generate a substring from the string.
Example:
In this example, we will slice the original string by only passing positive(+) values.
originalString = 'vectorAcademy'
subString = originalString[2:5]
print('originalString: ', originalString)
print('subString: ', subString)
Output:
originalString: vectorAcademy
subString: cto
Explanation:
First of all, we have created the string from which we will generate a substring.
Then using the slicing operator we have passed +ve index to it.
As a result, we get the output as a substring that will be printed.
Using List Comprehension
List comprehension is a technique that offers a shorter syntax when there is a need to create a new list based on the values of an existing list. Example: Based on a list of vegetables, you want a new list, containing only the vegetables with the letter “c” in the name.
In other words, list comprehensions are used for creating new lists from other available iterables like strings, tuples, arrays, lists, etc.
A list comprehension is made of brackets that contain the expression, which is executed for each element along with the for loop to iterate over each element.
List comprehension is a technique which helps to create a new list based on the values of an existing list in a shorter way.
Syntax:
This returns the new list, keeping the old list unchanged.
newList = [expression for item in iterables]
We can use the combination of list comprehension and string slicing to get all the substrings that can be generated by a string.
Example:
We will create all the possible substrings that can be generated by the word VECTOR.
originalString = 'VECTOR'
allSubstrings=[originalString[i:j] for i in range(len(originalString)) for j in range(i+1,len(originalString)+1)]
print(allSubstrings)
Output:
[‘V’, ‘VE’, ‘VEC’, ‘VECT’, ‘VECTO’, ‘VECTOR’, ‘E’, ‘EC’, ‘ECT’, ‘ECTO’, ‘ECTOR’, ‘C’, ‘CT’, ‘CTO’, ‘CTOR’, ‘T’, ‘TO’, ‘TOR’, ‘O’, ‘OR’, ‘R’]
Explanation:
In the entire process, first, a string was created that stores the value of strings whose substrings have to be generated.
Then after, the List comprehension technique was used in which a sliced operator was used. The starting and ending position is judged by the outer loops (loop for iteration of i) and inner loops(loop for iteration of j) respectively.
Then at the last, the array of all substrings is printed.
Using itertools.combination()
The process of generating all substrings of the string can also be accomplished by using the inbuilt function of combinations of itertools library which will help to get all the possible combinations of the substrings that can be generated from a string.
Example:
Let’s have a look at how we are going to generate all the substrings of string using the inbuilt library function combination.
from itertools import combinations
originalString = 'VECTOR'
res = [originalString[x:y] for x, y in combinations(range(len(originalString) + 1), r = 2)]
print("All substrings of string are : " + str(res))
Output:
All substrings of string are :
[‘V’, ‘VE’, ‘VEC’, ‘VECT’, ‘VECTO’, ‘VECTOR’, ‘E’, ‘EC’, ‘ECT’, ‘ECTO’, ‘ECTOR’, ‘C’, ‘CT’, ‘CTO’, ‘CTOR’, ‘T’, ‘TO’, ‘TOR’, ‘O’, ‘OR’, ‘R’]
Explanation:
It starts with importing the inbuilt function combinations from the itertools library.
Then a string is created whose substrings are to be generated. The created string is stored in a variable.
Then itertools combination function is used for the creation of the start index and end index for the generation of substring
At last, the array of all the substrings is printed and we get the desired output.
Check if Python String Contains Substring Using in operator
The ‘in’ operator function in Python can check if a Python string contains a substring. This is the easiest way. It returns a boolean value, like true or false.
Example:
originalString = "pythonknowledge"
subString = "wledge"
if subString in originalString:
print('found substring')
else:
print('no substring found')
Output:
found substring
Explanation:
In this process, an original string and a sliced string(substring) are created and these are stored in 2 different variables.
Then, if-else conditioning statements are used in which the ‘in statement’ is used to check whether the substring is present in the string or not.
Finally, we get the output which states whether the substring is present in the string or not.
Using String.index() Method
The Python string index() method can be used to find the starting index of the first occurrence of a substring in a string.
In the case the substring is not found in the string then it will raise the error which needs to be handled with the help of the try-exception statement.
Syntax:
In Python, the Index function, used on a string, is used to find the index of the character present in the string. It takes three parameters:
Value: Value, whose index position is to be found in the string.
Start: It is the starting index. Its default value is 0.
End: It is the ending index. End of the string is its default value.
string.index(value, start, end)
Example:
originalString = "vectorAcademy"
subString = "damy"
try:
originalString.index(subString)
except ValueError:
print("substring not found")
else:
print("substring found")
Output:
substring not found
Explanation:
An original string and a sliced string(substring) are created and they are stored in 2 different variables.
Then, try-exception-else conditioning statements are used in which the index() function is used to check the first occurrence index of the substring.
Finally, we get the desired output stating whether the substring is present in the string or not. In this case, If the substring is not present then the error is handled with the help of try-exception block.
Using String.find() Method
There is another method in the string type called find which is more convenient to use than the index(), because there is no need to worry about handling any exceptions. Its function is to return the index of the first occurrence of substring which is found in the string.
In case, the find() function doesn’t find a match then it will return the -1, otherwise, it will return the leftmost index of the substring in the larger string.
Syntax:
The find() function, used on the string, is used to find the index of the character present in the string. It requires the following parameters:
Value: Value whose index position is to be found in the string.
Start: It is a Starting index and its default value is 0.
End: It is an ending index and its default value is the end of the string.
string.find(value, start, end)
Example:
originalString = "pythonknowledge"
subString = "thonkn"
if originalString.find(subString)==-1:
print('substring is not present in the original string')
else:
print('substring is present in the original string')
Output:
substring is present in the original
Explanation:
At the start, an original string and a sliced string(substring) are created and then they are stored in 2 different variables.
Then if-else conditioning statements are used in which the find() function statement is used to check whether the substring is present in the string or not.
Finally, we get the desired output stating whether the substring is present in the string or not. In case, the string doesn’t contain the searched substring then the find function will return the -1.
Using Regular Expression
Using regular expressions, strings can be checked for pattern matching, in a more flexible manner. For using regular expressions in python, the re module is used. The re module has a function called search(), which is used to match a substring pattern.
Example:
from re import search
originalString = "vectorAcademy"
subString = "orAca"
if search(subString, originalString):
print('substring is present in the original string')
else:
print('substring is not present in the original string')
Output:
substring is present in the original
Explanation:
First of all, an original string and a sliced string are created and then they are stored into two different variables.
Then, if-else conditioning statements are used into which a search statement is used to check whether the substring is present in the string or not.
Finally, we get the desired output stating whether the substring is present in the string or not.
Count of Substring Occurrence
In Python, the count() function is used to find the number of occurrences of a word or a substring in the string.
The count function is known to us, in Python. Now, we will see in the example how the find function is used to find the occurrence of a substring in a string.
Example:
originalString = 'this article is published on scaler topics.'
countOfSubStringS = originalString.count('s')
countOfSubStringIs = originalString.count('is')
print('count of substring s in original string: ', countOfSubStringS)
print('count of substring is in original string: ', countOfSubStringIs)
Output:
count of substring s in original string: 5
count of substring is in original string: 3
Explanation:
In the first action, an original string is created and then it is stored in a variable.
In the second action, two different substrings are created and then they are stored in two different variables.
In the third action, the count() function is used to find the frequency of each substring into the string one by one.
Finally, the result is printed on the output screen.
Find all Index of Substring
In Python, there is no built-in function that can be used to get the list of all the indexes for the substring. For this, a user defined function is required to be created which can further be used to find all the index of substring using find() function.
Example:
def findAllIndexOfSubString(originalString, subString):
index=[]
originalStringLength = len(originalString)
currentIndex=0
while currentIndex<originalStringLength:
indexOfOccurrence = originalString.find(subString,currentIndex)
if indexOfOccurrence==-1:
return index
index.append(indexOfOccurrence)
currentIndex = indexOfOccurrence+1
return index
originalString = 'the scaler topics is the best platform for python articles.'
subString = 'th'
print('all index of substring in the original string are: ',findAllIndexOfSubString(originalString, subString))
Output:
all index of substring in the original string are: [0, 21, 45]
Explanation:
Initially, a user defined function is created which accepts two parameters, the original string and the substring.
Then we will start the loop till we iterate the complete string.
A find() function is used inside it which returns the first occurrence index of the substring from the main string.
In case, the substring is not present then the -1 value will be returned.
Once the user defined function is created, we call that function to get the desired output.
Conclusion
I am sure that you have gone through the whole article carefully and wisely. The following points, I would like to summarise for your reference:
We started with what is a substring in Python?
Then we learned how to create a substring in Python.
Then we studied several methods for the creation of substring in Python.
Then we have studied how various methods can help us to check whether a substring is present in the string or not.
Then we have learned how the last 4 digits of a mobile number or the last 5 digits of a chassis number are found.
Finally, we can say that we have learned around 40 different methods which can be applied on a string to get different kinds of results.