swisstext.mongo¶
This package contains common definitions of the MongoDB collections used as part of the SwissText project.
Installation¶
Simply run:
python setup.py install
Warning
The library cityhash
is compiled during install and you need to have GCC and G++ installed,
as well as the python header files. On Ubuntu, run:
apt install gcc g++ libdpkg-perl python3-dev
Collections¶
The available collections are:
seeds: the keywords used to search for new URLs using a search engine,
sentences: the Swiss-German sentences found,
urls: the URLs clawled so far that contained at least one Swiss German sentence,
blacklist: all other URLs crawled,
users: the users and their roles (for the frontend)
About the code¶
This package is required by
swisstext.cmd
, which uses MongoEngine,
and swisstext.frontend
, which uses Flask-Mongoengine
The only way I found to make classes reusable by both is described in this issue.
In short, the package swisstext.mongo.abstract
defines “regular” MongoEngine documents, but also set
the meta flag abstract to True. Thus, those classes cannot be used as-is.
When using MongoEngine, just subclass each abstract class, as done in swisstext.mongo.models
:
from swisstext.mongo.abstract import AbstractMongoURL
class MongoURL(AbstractMongoURL):
pass
When using Flask-Mongoengine do the same, but use a mixin so that documents also inherit from the Flask Mongoengine Document class:
from flask_mongoengine import MongoEngine
db = MongoEngine()
from swisstext.mongo.abstract import AbstractMongoURL
class MongoURL(db.Document, AbstractMongoURL):
pass