Dec 072012
I guess if you import enough libraries just about anything can be made into a one liner… if you have imported BeautifulSoup, re, requests, and sys, in python3 you can simply do:
print(re.sub(r'^.*imgurl=([^&]+)&.*$', r'\1', str(BeautifulSoup(requests.get("http://images.google.com/search?num=50&hl=en&safe=off&site=&tbm=isch&source=hp&biw=1744&bih=1279&q=%s&oq=" % sys.argv[1]).text).find(href=re.compile("imgurl")))))
To find the first hit on a google image search with argv[1]. Google will probably change their URL images later today and it’ll stop working, but I wanted this for a random task….
E.g. “foo.py one+ring+to+bind+them” currently yields the URL for this beauty:
Or you can wimp out and do it the ez way.
#!/usr/local/bin/python3
#
# search google images for the first match to a word (optionally more than
# one, put together by quotes; returns the URL of the first match.
#
# Usage: $0 name-to-search-for
#
#
# Google image URLs currently look like (for a search for "monkey+breath"):
#
# <a href="/imgres?imgurl=http://amirobyn.com/blog/wp-content/uploads/2009/07/monkeybreath03.jpg&imgrefurl=http://amirobyn.com/blog/%3Fp%3D16&usg=__y39gYotHzJkeYQ2RhxJAkQIbLf4=&h=318&w=620&sz=59&hl=en&start=1&zoom=1&tbnid=bUjKYBdrdgvHSM:&tbnh=70&tbnw=136&ei=kTrCUMzzCs_siQLd0oDwBQ&prev=/search%3Fq%3Dmonkey%2Bbreath%26hl%3Den%26safe%3Doff%26biw%3D1744%26bih%3D1279%26ie%3DUTF-8%26tbm%3Disch&itbs=1"><img src="http://t2.gstatic.com/images?q=tbn:ANd9GcS19-iKGCVUWOdUwzdighrxyDpU3HWLpDPiAcmdPHVDIgDG7U2Y5GAVX70L" alt="" width="136" height="70" /></a>
#
# a quick dispatch to beautiful soup and a substitution and the real URL is yours -
# in this case, at this time:
#
# http://amirobyn.com/blog/wp-content/uploads/2009/07/monkeybreath03.jpg
#
from bs4 import BeautifulSoup
import re
import requests
import sys
if len(sys.argv) == 1 or len(sys.argv) > 2:
print("Usage: %s image-name-to-search-(can-use-pluses-between-multi-words)" % sys.argv[0])
exit(1)
# You can do it in one monster line... monster line... monster line....
# print(re.sub(r'^.*imgurl=([^&]+)&.*$', r'\1', str(BeautifulSoup(requests.get("http://images.google.com/search?num=50&hl=en&safe=off&site=&tbm=isch&source=hp&biw=1744&bih=1279&q=%s&oq=" % sys.argv[1]).text).find(href=re.compile("imgurl")))))
#
# or have a prayer of understanding it the usual way
#
url = "http://images.google.com/search?num=50&hl=en&safe=off&site=&tbm=isch&source=hp&biw=1744&bih=1279&q=%s&oq=" % sys.argv[1]
print("Searching for %s" % sys.argv[1])
r = requests.get(url)
soup = BeautifulSoup(r.text)
# find urls that have the imgurl
big_link = soup.find(href=re.compile("imgurl"))
real_link = re.sub(r'^.*imgurl=([^&]+)&.*$', r'\1', str(big_link))
print(real_link)
#
# search google images for the first match to a word (optionally more than
# one, put together by quotes; returns the URL of the first match.
#
# Usage: $0 name-to-search-for
#
#
# Google image URLs currently look like (for a search for "monkey+breath"):
#
# <a href="/imgres?imgurl=http://amirobyn.com/blog/wp-content/uploads/2009/07/monkeybreath03.jpg&imgrefurl=http://amirobyn.com/blog/%3Fp%3D16&usg=__y39gYotHzJkeYQ2RhxJAkQIbLf4=&h=318&w=620&sz=59&hl=en&start=1&zoom=1&tbnid=bUjKYBdrdgvHSM:&tbnh=70&tbnw=136&ei=kTrCUMzzCs_siQLd0oDwBQ&prev=/search%3Fq%3Dmonkey%2Bbreath%26hl%3Den%26safe%3Doff%26biw%3D1744%26bih%3D1279%26ie%3DUTF-8%26tbm%3Disch&itbs=1"><img src="http://t2.gstatic.com/images?q=tbn:ANd9GcS19-iKGCVUWOdUwzdighrxyDpU3HWLpDPiAcmdPHVDIgDG7U2Y5GAVX70L" alt="" width="136" height="70" /></a>
#
# a quick dispatch to beautiful soup and a substitution and the real URL is yours -
# in this case, at this time:
#
# http://amirobyn.com/blog/wp-content/uploads/2009/07/monkeybreath03.jpg
#
from bs4 import BeautifulSoup
import re
import requests
import sys
if len(sys.argv) == 1 or len(sys.argv) > 2:
print("Usage: %s image-name-to-search-(can-use-pluses-between-multi-words)" % sys.argv[0])
exit(1)
# You can do it in one monster line... monster line... monster line....
# print(re.sub(r'^.*imgurl=([^&]+)&.*$', r'\1', str(BeautifulSoup(requests.get("http://images.google.com/search?num=50&hl=en&safe=off&site=&tbm=isch&source=hp&biw=1744&bih=1279&q=%s&oq=" % sys.argv[1]).text).find(href=re.compile("imgurl")))))
#
# or have a prayer of understanding it the usual way
#
url = "http://images.google.com/search?num=50&hl=en&safe=off&site=&tbm=isch&source=hp&biw=1744&bih=1279&q=%s&oq=" % sys.argv[1]
print("Searching for %s" % sys.argv[1])
r = requests.get(url)
soup = BeautifulSoup(r.text)
# find urls that have the imgurl
big_link = soup.find(href=re.compile("imgurl"))
real_link = re.sub(r'^.*imgurl=([^&]+)&.*$', r'\1', str(big_link))
print(real_link)
Sorry, the comment form is closed at this time.