forked from luyishisi/Anti-Anti-Spider
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
89 changed files
with
353 additions
and
25 deletions.
There are no files selected for viewing
File renamed without changes
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
本部分测试了简单的图像验证码的处理 | ||
|
||
主要代码为:recognise.py | ||
|
||
在main中可以替换为你需要解析的图片路径与名称, | ||
|
||
产生L.png是进行了灰度以及二值转换之后的图片在本目录下 | ||
|
||
还会产生字符切割的效果 | ||
|
||
用户可以将切割出来的字符人工标记后加入到icon中成为新字符集合. | ||
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
#coding:utf-8 | ||
|
||
import requests | ||
from recognise import * | ||
from PIL import Image | ||
import base64 | ||
import getpass | ||
|
||
|
||
def login(username,passwd): | ||
session=requests.session() | ||
session.get('http://wsxk.hust.edu.cn/login.jsp').text | ||
img=session.get('http://wsxk.hust.edu.cn/randomImage.action').content | ||
with open('captcha.jpeg','wb') as imgfile: | ||
imgfile.write(img) | ||
imageRecognize=CaptchaRecognize() | ||
image=Image.open('captcha.jpeg') | ||
result=imageRecognize.recognise(image) | ||
string='' | ||
for item in result: | ||
string+=item[1] | ||
print(string) | ||
data={ | ||
'usertype':"xs", | ||
'username':username, | ||
'password':passwd, | ||
'rand':string, | ||
'sm1':"", | ||
'ln':"app610.dc.hust.edu.cn" | ||
} | ||
headers = { | ||
'Host':"wsxk.hust.edu.cn", | ||
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", | ||
"Accept-Encoding": "gzip, deflate", | ||
"Accept-Language": "en-US,en;q=0.5", | ||
"Connection": "keep-alive", | ||
'Referer':"http://wsxk.hust.edu.cn/login.jsp", | ||
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:39.0) Gecko/20100101 Firefox/39.0"} | ||
session.post('http://wsxk.hust.edu.cn/hublogin.action',data=data,headers=headers) | ||
html=session.get('http://wsxk.hust.edu.cn/select.jsp',headers=headers).text | ||
print(html) | ||
return session | ||
|
||
def main(): | ||
username=input('username:') | ||
passwd=base64.b64encode(getpass.getpass('Passwd:').encode()).decode() | ||
login(username,passwd) | ||
|
||
main() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,144 @@ | ||
#coding:utf-8 | ||
|
||
import os | ||
import requests | ||
from PIL import Image | ||
import math,time | ||
|
||
def convert_image(image): | ||
image=image.convert('L')#灰度 | ||
image2=Image.new('L',image.size,255) | ||
for x in range(image.size[0]): | ||
for y in range(image.size[1]): | ||
pix=image.getpixel((x,y)) | ||
if pix<120:#灰度低于120 设置为 0 | ||
image2.putpixel((x,y),0) | ||
image2.save('L.png')#将灰度图存储下来看效果 | ||
return image2 | ||
|
||
def cut_image(image): | ||
''' 字符切割,根据黑色的连续性,当某一列出现黑色为标志,当黑色消失为结束点''' | ||
inletter=False | ||
foundletter=False | ||
letters=[] | ||
start=0 | ||
end=0 | ||
for x in range(image.size[0]): | ||
for y in range(image.size[1]): | ||
pix=image.getpixel((x,y)) | ||
if(pix==0): | ||
inletter=True | ||
if foundletter==False and inletter ==True: | ||
foundletter=True | ||
start=x | ||
if foundletter==True and inletter==False: | ||
end=x | ||
letters.append((start,end)) | ||
foundletter=False | ||
inletter=False | ||
images=[] | ||
for letter in letters: | ||
img=image.crop((letter[0],0,letter[1],image.size[1])) | ||
#img.save(str(letter[0])+'.png')#展示切割效果 | ||
img.save("./cat/"+str(int(time.time()))+'.png')#展示切割效果 | ||
images.append(img) | ||
return images | ||
|
||
def buildvector(image): | ||
''' 图片转换成矢量,将二维的图片转为一维''' | ||
result={} | ||
count=0 | ||
for i in image.getdata(): | ||
result[count]=i | ||
count+=1 | ||
#print result | ||
return result | ||
|
||
|
||
class CaptchaRecognize: | ||
def __init__(self): | ||
self.letters=['0','1','2','3','4','5','6','7','8','9'] | ||
self.loadSet() | ||
|
||
def loadSet(self): | ||
self.imgset=[] | ||
for letter in self.letters: | ||
temp=[] | ||
for img in os.listdir('./icon/%s'%(letter)): | ||
temp.append(buildvector(Image.open('./icon/%s/%s'%(letter,img)))) | ||
self.imgset.append({letter:temp}) | ||
|
||
#计算矢量大小 | ||
def magnitude(self,concordance): | ||
total = 0 | ||
for word,count in concordance.items(): | ||
try: | ||
if(type(count) == type(())): | ||
total += count[0] ** 2 | ||
#print type(total),total,type(count),count ** 2 | ||
else: | ||
total += count ** 2 | ||
except Exception,e: | ||
print type(total),total,type(count),count | ||
print e | ||
return math.sqrt(total) | ||
|
||
#计算矢量之间的 cos 值 | ||
def relation(self,concordance1, concordance2): | ||
relevance = 0 | ||
topvalue = 0 | ||
for word, count in concordance1.items(): | ||
if word in concordance2: | ||
print type(topvalue),topvalue,count,concordance2[word] | ||
time.sleep(1) | ||
topvalue += count * concordance2[word][0] | ||
#time.sleep(10) | ||
return topvalue / (self.magnitude(concordance1) * self.magnitude(concordance2)) | ||
|
||
def recognise(self,image): | ||
image=convert_image(image)#二值化 | ||
images=cut_image(image)#字符单独切割出来 | ||
vectors=[] | ||
for img in images: | ||
vectors.append(buildvector(img)) | ||
result=[] | ||
for vector in vectors: | ||
guess=[] | ||
for image in self.imgset: | ||
for letter,temp in image.items(): | ||
relevance=0 | ||
num=0 | ||
for img in temp: | ||
relevance+=self.relation(vector,img) | ||
num+=1 | ||
relevance=relevance/num | ||
guess.append((relevance,letter)) | ||
guess.sort(reverse=True) | ||
result.append(guess[0]) | ||
return result | ||
|
||
if __name__=='__main__': | ||
import os | ||
dir="./temp" | ||
name_list = [] | ||
for root,dirs,files in os.walk(dir): | ||
for file in files: | ||
#name_list.append(file) | ||
name = os.path.join(root,file) | ||
name_list.append(name) | ||
|
||
print name_list | ||
for i in name_list: | ||
#name = '11' | ||
name = i | ||
print name | ||
|
||
imageRecognize=CaptchaRecognize() | ||
# 设置图片路径 | ||
image=Image.open(name) | ||
#image=Image.open('./temp/2.png') | ||
print image.mode | ||
result=imageRecognize.recognise(image) | ||
string=[''.join(item[1]) for item in result] | ||
print(string) | ||
break |
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.