SwissText

Getting Started

  • Installation
    • Installation on a Linux server
    • Installation on MacOS
  • Usage
    • Running MongoDB
    • Running the frontend
    • Running the backend
  • Configuration options
    • Current ?best? pipeline configuration
      • Using default tools
      • Using extra tools
    • Tool dependencies

API

  • API Overview
    • swisstext.cmd
      • Configuration files
      • Link utilities
        • Dealing with links in a page
        • Dealing with search results
    • swisstext.cmd.scraping
      • st_scrape
        • dump_config
        • from_file
        • from_mongo
        • gen_seeds
      • Tool interfaces
      • Tool implementations
        • Deciders
        • Seed creators
        • Crawlers
        • Normalizers
        • Splitters
        • Sentence Filters
        • Link Filters
        • Language Detectors
        • Savers
      • Pipeline implementation
        • Configuration
        • Data structures
        • Queue
        • Pipeline
    • swisstext.cmd.searching
      • st_search
        • dump_config
        • from_file
        • from_mongo
      • Tool interfaces
      • Tool implementations
        • Savers
        • Searchers
      • Pipeline implementation
        • Configuration
        • Data structures
        • Search engine
    • swisstext.mongo
      • Installation
      • Collections
      • About the code
      • Abstract Definitions
        • Common structures and embedded documents
        • Seed collection
        • Sentences collection
        • URLs and Blacklist collections
        • Users collection
      • MongoEngine-ready classes
    • swisstext.alswiki
      • st_alswiki
        • download
        • parse
        • process
        • txt
      • Processing Alswiki dumps
      • Processing text files
  • TODOs

Other

  • Ideas
    • Sentence Filtering
    • URL Filtering
  • About this documentation
    • Generation
    • Deploying on github-pages
SwissText
  • Docs »
  • Overview: module code

All modules for which code is available

  • mongoengine.fields
  • swisstext.cmd.base_config
  • swisstext.cmd.link_utils
  • swisstext.cmd.scraping.config
  • swisstext.cmd.scraping.data
  • swisstext.cmd.scraping.interfaces
  • swisstext.cmd.scraping.page_queue
  • swisstext.cmd.scraping.pipeline
  • swisstext.cmd.scraping.tools.basic_decider
  • swisstext.cmd.scraping.tools.basic_seed_creator
  • swisstext.cmd.scraping.tools.bs_crawler
  • swisstext.cmd.scraping.tools.console_saver
  • swisstext.cmd.scraping.tools.justext_crawler
  • swisstext.cmd.scraping.tools.mocy_splitter
  • swisstext.cmd.scraping.tools.mongo_saver
  • swisstext.cmd.scraping.tools.moses_splitter
  • swisstext.cmd.scraping.tools.norm_punc
  • swisstext.cmd.scraping.tools.pattern_sentence_filter
  • swisstext.cmd.scraping.tools.punkt_splitter
  • swisstext.cmd.scraping.tools.swigspot_langid
  • swisstext.cmd.searching.config
  • swisstext.cmd.searching.data
  • swisstext.cmd.searching.interfaces
  • swisstext.cmd.searching.pipeline
  • swisstext.cmd.searching.tools.console_saver
  • swisstext.cmd.searching.tools.google_search
  • swisstext.cmd.searching.tools.mongo_saver
  • swisstext.cmd.searching.tools.start_page
  • swisstext.mongo.abstract.generic
  • swisstext.mongo.abstract.seeds
  • swisstext.mongo.abstract.sentences
  • swisstext.mongo.abstract.urls
  • swisstext.mongo.abstract.users
  • swisstext.mongo.models

© Copyright 2018, Lucy Linder

Built with Sphinx using a theme provided by Read the Docs.