chapter11-regular-expressions

import re
using re.search() like find() and startswith()
if re.search('From:',line): print line
if line.find('From:')>=0：print line 
if line.search('^From:',line): 
^X.*:    # began with'X'followed by any charactor(including space) many times (.*), then followed by':'
result X-Sieve-DSPAM: (X-Plane is big:)
fine-tuning match - no spaces between X and :, ^X-\S+:
'\S' means match any non-whitespace character, '+' means one or more times

using re.findall() like atpos=find ('@'),sppos=find(' ',atpos),var[atpos+1:sppos] 
to extract the matched string
x ='my 2 best numbers are 9 and 12'
y = re.findall('[0-9]+',x) # [] is one charactor, [0-9]+ means at least one digit
print y -> ['2','9','12']

x='From: Using the : character'
y=re.findall('^F.+:',x) --> print y --> ['From: Using the :']  # gready matching takes the largest string
y=re.findall('^F.+?:',x)--> print y --> ['From:']  # not gready matching takes the shortest string
y=re.findall('\S+@\S+',x)--> print y --> stephen.marquard@uct.ac.za
y=re.findall('^From (\S+@\S+)',x) --> print y --> stephen.marquard@uct.ac.za 
                      #()are not part of match, but where to start and stop in the return
y=re.findall('@()',x) # '@ meanslook through the string until you find an at sign       
            ('@([^ ]*)',x) # inside [], ^ means NOT, so [^ ]means a signle non-blank character, * means report [^ ] many times
            print y -> uct.ac.za
y=re.findall('^From .*@([^ ]*)',x)      print y -> uct.ac.za

stuff=re.findall('^X-DSPAM-Confidence: ([0-9.]+)',line)
if len(stuff) !=1 : continue
num = float(stuff[0])
numlist.append(num)
print max(numlist)

if you want a special regular expression character to just behave normally, prefix it with'\'
y=re.findall('\$[0-9.]+',x)   print y -> ['$10.00'] # \$ is a real dollar sigm, [0-9.] means a digit OR period, note that . is not a wild card here
[a-z0-9] match a lowercase OR a digit
The "+" matches at least one character and the "*" matches zero or more characters

^	Matches the beginning of a line
$	Matches the end of the line
.	Matches any character
\s	Matches whitespace
\S	Matches any non-whitespace character
*	Repeats a character zero or more times
*?	Repeats a character zero or more times (non-greedy)
+	Repeats a character one or more times
+?	Repeats a character one or more times (non-greedy)
[aeiou]	Matches a single character in the listed set
[^XYZ]	Matches a single character not in the listed set
[a-z0-9]	The set of characters can include a range
(	Indicates where string extraction is to start
)	Indicates where string extraction is to end

import re
handle = open (r'C:\Users\Sharon\Desktop\Python\regex_sum_335980.txt') #put r before the string will slove the '\r' problem
total = 0
for line in handle:
    numlist = re.findall('[0-9]+',line)  # the type of the retrun is list
    if len(numlist) == 0: continue
    for i in numlist:
        i = int(i)
        total = total + i
print total