Tool implementations¶
This package contains various implementations of the different search engine tools.
See also
interfacesThe tools interfaces definitions
configThe default configuration instantiates tools from this package
Savers¶
-
class
swisstext.cmd.searching.tools.console_saver.ConsoleSaver(**kwargs)[source]¶ Bases:
swisstext.cmd.searching.interfaces.ISaverImplementation of an
ISaveruseful for testing and debugging. It does not persist any results, but prints everything to the console instead.
-
class
swisstext.cmd.searching.tools.mongo_saver.MongoSaver(host='localhost', port=27017, db='st1', **kwargs)[source]¶ Bases:
swisstext.cmd.searching.interfaces.ISaverThis
ISaverimplementation persists everything to a MongoDB database.See also
swisstext.mongoPackage defining the Mongo collections.
-
__init__(host='localhost', port=27017, db='st1', **kwargs)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
link_exists(url: str) → swisstext.cmd.searching.interfaces.ISaver.LinkStatus[source]¶ Test if the url already exists in the persistence layer. Returns false by default.
Searchers¶
This module let’s you use the Google Custom Search API to retrieve URLs.
Example usages¶
Create a factory:
factory = GoogleGeneratorFactory(apikey="your apikey")
To retrieve a list of results, you can use the interface’s top_results method:
# retrieve at most 13 results from a search using the interface
results = factory.top_results(query="es isch sone seich", max_results=13)
If you want to use the iterator, it is advised to use itertools, since it deals with
StopIteration exceptions silently:
# retrieve at most 13 results from a search using the builtin python iterator
results_iterator = factory.search(query="es isch sone seich")
import itertools
results = itertools.islice(results_iterator, 13) # won't throw StopIteration
Iterators are useful when you need to process URLs in a loop, or have a more complex stop criteria than just the number of results. For that, you can use the builtin iterator interface:
# process results one by one using the builtin python iterator
from sys import stderr
results_iterator = factory.search(query="es isch sone seich")
for i in range(13): # usually, here we have a while some_dynamic_condition
try:
url = next(results_iterator)
print(f"Processing url: {url}")
# ... do something more with the result ...
except StopIteration:
# even though Google is pretty good at retrieving billions of results,
# you might hit the limit...
print("Oops, no result left!", file=sys.stderr)
break
You can also use the GoogleGenerator.next() and GoogleGenerator.has_next() methods
in place of the try-except, like so:
# process results one by one using the Google Iterator methods
from sys import stderr
results_iterator = factory.search(query="es isch sone seich")
for i in range(13):
if not results_iterator.has_next():
print("Oops, no result left!", file=sys.stderr)
break
url = results_iterator.next()
print(f"Processing url: {url}")
# ... do something more with the result ...
-
swisstext.cmd.searching.tools.google_search.BASE_URL= 'https://www.googleapis.com/customsearch/v1'¶ google api URL see https://developers.google.com/custom-search/json-api/v1/reference/cse/list#request for the json API reference
-
class
swisstext.cmd.searching.tools.google_search.GoogleGenerator(query, apikey: str, context='015058622601103575455:cpfpm27mio8', qps=20, qpm=200)[source]¶ Bases:
collections.abc.Iterable,typing.GenericA new Google Generator should be created for each query. It allows for lazy loading of results, thus sparing API quotas.
Warning
This generator might raise an
Exception, for example if you reached your daily quota limit. In this case, the exception message should contain the error code and error message as delivered by the Google API.-
__init__(query, apikey: str, context='015058622601103575455:cpfpm27mio8', qps=20, qpm=200)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
ctx= None¶ The context to use (see the official API reference for more info). The default context is usually fine: it is parameterized to search all the web.
-
key= None¶ The Google Custom Search API key
-
next() → str[source]¶ Note
If you use this method directly (instead of the classic python iterator interface), you need to check that results are available by yourself using the
has_next()method.
-
-
class
swisstext.cmd.searching.tools.google_search.GoogleGeneratorFactory(**kwargs)[source]¶ Bases:
swisstext.cmd.searching.interfaces.ISearcherThis factory creates a new
GoogleGeneratorfor each query.
This module implements an ISearcher using
http://startpage.com/.
Warning
This module highjacks startpage.com !! There are no API available, so this is really a dirty hack. But we couldn’t find free APIs to run tests… So use this module sparsely and ONLY IN DEVELOPMENT. We decline all responsibility in case startpage detects the robot…
-
class
swisstext.cmd.searching.tools.start_page.StartPageGenerator(query)[source]¶ Bases:
collections.abc.Iterable,typing.GenericA new generator is created for each query. Its usage is similar to the generator described in the
google_searchmodule.
-
class
swisstext.cmd.searching.tools.start_page.StartPageGeneratorFactory(**kwargs)[source]¶ Bases:
swisstext.cmd.searching.interfaces.ISearcherImplementation of a searcher using startpage. Its usage is similar to the factory described in the
google_searchmodule.