Veritable | Towards Human-Centered A.I.

This A.I.-powered article reader can extract the text contents from web pages, partition the article into sentences, identify named entities in each sentence, and highlight the sentences that convey the article's core ideas (in yellow).

The A.I. algorithm behind the reader is based on TextRank, designed initially to do extractive summarization (that is, extract sentences from the article as the summary). The summarization process unavoidably loses information in the process, so instead of throwing out unimportant sentences, we highlight the important sentences while keeping the rest intact. The readers can quickly browse the article for the key ideas and entities mentioned and read the paragraphs that interest them for more contexts and details.

The reader interface also provides an option to manually annotates (highlight) the important sentences (in green), which can be used to create datasets for further fine-tuning the highlighting model. The reader provides full support for English and basic support for Japanese and Chinese.

We open-sourced a demo version of the improved TextRank algorithm at github.com/ceshine/textrank_demo (the actual algorithm used in the reader is a bit more sophisticated).

The technology used in this project:

A.I. (NLP) Algorithms
- (PyTorch) summanlp/textrank (Base TextRank Implementation)
- (PyTorch) Sentence Transformers
- (PyTorch) LASER
- (Tensorflow) Multilingual Universal Sentence Encoder
- Spacy (English sentence segmentation and named entity recognition)
- Baidu NLP API (Chinese word segmentation)
- nagisa (Japanese word segmentation)
Interface
- Web Interface: Pure React
- Backend API Server: FastAPI
- Backend Server Infrastructure: Cloud Run (Google Cloud)
Web Content Extraction: python-readability

Augmented Reader

(Only available to internal users.)