The first thing you need to do before you start using Codeq's NLP API is to sign up to generate a User ID and User Key. These two pieces of information are required to make requests. Go ahead and sign up if you have not done it yet.
Once you have obtained a User ID and User Key, you can make requests to our API in the two following ways:
1. By using our Python SDK.
2. By sending a POST request to the API endpoints
There is a monthly rate limit of 1,000 request per user. If you are interested on a custom plan, please take a look at our pricing plans or contact us.
The easiest way to call the API is by using our Python SDK (requires Python 3). To install it:
pip install codeq-nlp-api -U
Once installed, you can import the SDK and use it to initialize a client object. This client can be used to send a text request to the NLP Pipeline endpoint and retrieve a Document object, which encapsulates a list of Sentence objects with the analyzed information of the text:
from codeq_nlp_api import CodeqClient
client = CodeqClient(user_id="YOUR_USER_ID", user_key="YOUR_USER_KEY")
text = "This model is an expensive alternative with useless battery life."
document = client.analyze(text)
for sentence in document.sentences:
print(sentence.sentiments)
"""
Output:
>> ['Negative']
"""
It is also possible to pass as input a list of sentences. In this case, the API will not apply it's own sentence segmentation:
sentences = ["This model is an expensive alternative.", "It has a useless battery life."]
document = client.analyze_sentences(sentences)
print(len(document.sentences))
"""
Output:
>> 2
"""
The client can also be used to call the Text Similarity endpoint and retrieve a textual similarity score between two texts.
from codeq_nlp_api import CodeqClient
client = CodeqClient(user_id="YOUR_USER_ID", user_key="YOUR_USER_KEY")
text1 = "Some people are singing"
text2 = "A group of people is singing"
result = client.analyze_text_similarity(text1, text2)
print(result)
"""
Output:
>> {"text_similarity_score": 4.6}
"""
Alternatively, you can call the API by sending a POST request to one of the following endpoints:
URL | DESCRIPTION |
---|---|
https://api.codeq.com/v1 |
NLP Pipeline This enpdoint receives one text and returns a JSON object containing a list of analyzed sentences. |
https://api.codeq.com/v1_text_similarity |
Text Similarity This endpoint receives two texts and returns a JSON object containing the text similarity score. |
The POST request to this endpoint must be composed with the following parameters:
user_id: the id provided on the registration process.
user_key: the key also generated on the registration process.
pipeline (optional): a string indicating the specific NLP annotators to apply.
And one of both:
text: a string to be analyzed.
sentences: or a list of strings to be analyzed (no sentence segmentation will be applied).
curl -X POST https://api.codeq.com/v1 \
-d '{ \
"user_id": "YOUR_USER_ID", \
"user_key": "YOUR_USER_KEY", \
"text": "This model is an expensive alternative with useless battery life." \
}' \
# Output:
{
"sentences": [
{
"position": 0,
"raw_sentence": "This model is an expensive alternative with useless battery life.",
"tokens": ["This", "model", "is", "an", "expensive", "alternative", "with", ... ],
"pos_tags": ["DT", "NN", "VBZ", "DT", "JJ", "NN", "IN", "JJ", "NN", "NN", "."],
"speech_acts": ["Statement"],
"sentiments": ["Negative"],
...
}
]
}
The POST request to this endpoint must be composed with the following parameters:
user_id: the id provided on the registration process.
user_key: the key also generated on the registration process.
text1: the first document to be compared.
text2: the second document to be compared.
curl -X POST https://api.codeq.com/v1 \
-d '{ \
"user_id": "YOUR_USER_ID", \
"user_key": "YOUR_USER_KEY", \
"text1": "Some people are singing", \
"text2": "A group of people is singing" \
}' \
# Output:
{
"text_similarity_score": 4.6
}
Independently of the method you use to call our API, we will return a status code helpful to debug any error you may encounter. The following table summarizes the list of Status responses:
CODE | TEXT | DESCRIPTION |
---|---|---|
200 | Ok | The request was successfully processed. |
400 | Bad Request | We are not able to process your request, usually because a mal formed JSON. |
401 | Unauthorized | The user key or user id you submitted is unknown. |
404 | Not Found | No idea what are you looking for. |
413 | Request Entity Too Large | The text is longer that 30,000 characters (~30Kb). |
429 | Too Many Requests | Your quota limit is done. Wait or talk to us to increase your quota. |
500 | Internal Server Error | There is something wrong in our spaghetti code that we will fix soon. |
By default, when you call the NLP Pipeline API you will retrieve a text fully analyzed by our complete set of NLP Annotators (see following sections).
Or you can specify a custom pipeline depending on your needs. For example, if you are only interested on getting the emotion and speech acts labels of a text, you can declare a "pipeline" key as parameter in the Python SDK client object or in the content of the POST request, and send as value a comma separated string indicating the Annotators you need:
client = CodeqClient(user_id="YOUR_USER_ID", user_key="YOUR_USER_KEY")
text = "This is getting very interesting!"
pipeline = "speechact, sentiment"
document = client.analyze(text, pipeline)
for sentence in document.sentences:
print(sentence.speech_acts)
print(sentence.sentiments)
"""
Output:
>> ['Statement']
>> ['Positive']
"""
The following sections show the complete list of Annotators of our NLP API, including the KEY you can use as value of the pipeline parameter, as well as the description of the Annotators' output and its respective Python attribute.
KEY | NAME | DESCRIPTION | OUTPUT LABELS |
---|---|---|---|
language | Language Identifier | Generates a label indicating the language of the text and its probability. ATTR:document.language ATTR:document.language_probability |
Supported languages: Afrikaans Albanian Arabic Basque Bulgarian Catalan Chinese Croatian Czech Danish Dutch English Esperanto Estonian Finnish French Galician German Greek Hebrew Hindi Hungarian Icelandic Italian Japanese Korean Latvian Lithuanian Norwegian Pashto Polish Portuguese Romanian Russian Serbian Slovak Slovenian Spanish Swahili Swedish Tamil Telugu Thai Turkish Ukrainian Urdu Vietnamese Welsh Wolof Yiddish |
tokenize | Tokenization | Generates a list of words from raw text. ATTR:document.tokens ATTR:sentence.tokens |
N/A |
ssplit | Sentence Segmentation | Generates a list of sentences from a raw text. ATTR:document.sentences |
N/A |
stopword | Stopword Removal | Produces a list of tokens after removing common stopwords from the text. ATTR:sentence.tokens_filtered |
N/A |
stem | Stemming | Generates a list of the inflected forms of the tokens. ATTR:sentence.stems |
N/A |
truecase | True casing | Produces a string with the true case of sentence tokens. ATTR:sentence.truecase_sentence |
N/A |
detruecase | Detrue casing | Produces a string with the predicted original case of the tokens. ATTR:sentence.detruecase_sentence |
N/A |
lemma | Lemmatization | Generates a list containing the lemma for each sentence token. ATTR:sentence.lemmas |
N/A |
pos | Part of Speech Tagging | Generates a list containing the PoS-tag for each sentence token. ATTR:sentence.pos_tags |
Penn Treebank |
parse | Dependency parser | Generates a list of dependencies in 3-tuples consisting of:
head, dependent and relation. Head and dependent are in the format "token@@@position".
Positions are 1-indexed, with 0 being the index for the root. ATTR:sentence.dependencies |
Our dependency labels mostly follow the basic dependencies given in section 2 of the
Stanford Parser Dependencies (3.5.2)
, except for the following distinctions: * we do not use dependency labels which apply only to collapsed dependencies or which are labeled as "additional dependencies", * we do not use the labels "goeswith" or "cop", * we introduce the novel label "ncomp". |
chunk | Chunker | Groups the tokens of the sentence into small, non-overlapping groups based on prominent
parts of speech, such as NP chunks ("the tall person") or VP chunks ("will leave"). ATTR:sentence.chunks |
CONLL 2000 |
semantic_roles | Semantic Role Labelling | Generates a list of dictionaries containing the retrieved predicates of each sentence,
their lemmas, the constituents of the sentence found to be arguments of each predicate,
and the classified argument type.
ATTR:sentence.semantic_roles |
Agent/Experiencer Patient/Theme/Affected Beneficiary/Goal/Predicate/Comitative Destination/EndingPoint/Source Location Speaker/Addressee/Conjunction/Interjection Manner/Means/Extent Modal Cause Temporal EventModifier/Purpose Negative |
twitter_preprocess | Twitter Preprocessing | Removes artifacts like user mentions and URLs, segments hashtags and generates a list of
words from raw text.
ATTR:sentence.tokens_clean |
N/A |
KEY | NAME | DESCRIPTION | OUTPUT LABELS |
---|---|---|---|
ner | Named Entity Recognition | Produces a list of named entities found in a sentence, containing the tokens of the entity,
its
type and its span positions. ATTR:sentence.named_entities |
PER_(person) LOC_(location) ORG_(organization) MISC_(miscellaneous) DATE MONEY URL PHONE TWITTERNAME TRACKINGNUMBER AIRLINECODE AIRLINENAME AIRPORTCODE AIRPORTNAME EMOJI SMILIE |
salience | Named Entity Salience | Produces a list of tuples indicating the salience of named entities,
that is how central they are to the content input document.
Each tuple contains a boolean indicating if the entity is salient or not
and its salience score. ATTR:sentence.named_entities_salience |
boolean |
date | Date resolution | Generates a list of tuples for each sentence with all resolved date entities given
a relative date (by default: today). The output includes the date entity, its tokens
span and the resolved timestamp.
This annotator accepts an input variable date_referent to indicate the desired date to be used as referent for the resolution. The format of this variable needs to be: "year-month-day hour:minutes:seconds" "%Y-%m-%d %H:%M:%S": document = client.analyze(text, date_referent="2022-03-01 09:00:00") ATTR:sentence.dates |
N/A |
coreference | Coreference resolution | Generates a list of resolved pronominal coreferences. Each coreference is a dictionary
that includes: mention, referent, first_referent, where each of those elements is a tuple
containing a coreference id, the tokens and the span of the item. Additionally, each
coreference dict contains a coreference chain (all the ids of the linked mentions) and the
first referent of a chain. ATTR:sentence.coreferences |
N/A |
KEY | NAME | DESCRIPTION |
---|---|---|
speechact | Speech Act Classifier | Generates a list of tags indicating the predicted speech acts
of a sentence. ATTR:sentence.speech_acts OUTPUT LABELS: statement command/request question desire/need/hope commitment/promise evaluation speculation suggestion/recommendation regret greeting permission offer gratitude congratulation disagreement apology/condolence agreement well-wishing warning introduction unknown speech act |
question | Question Classifier | Generates a list of tags indicating the predicted type of question,
if a sentence is classified as such. ATTR:sentence.question_types OUTPUT LABELS: qy: Yes/No question qo: Open-ended question qr: Or question qt: Task question UNKNOWN: Unknown question type |
sentiment | Sentiment Classifier | Generates a list of values for each sentence indicating the predicted sentiment label. ATTR:sentence.sentiments OUTPUT LABELS: Positive Neutral Negative |
emotion | Emotion Classifier | Generates a list of values for each sentence indicating the predicted emotion label. ATTR:sentence.emotions OUTPUT LABELS: Anger Disgust/Dislike Fear Joy/Like Sadness Surprise Excitement Angst No emotion |
sarcasm | Sarcasm Classifier | Generates a label predicting if a sentence is sarcastic or not. ATTR:sentence.sarcasm OUTPUT LABELS: Sarcastic Non-sarcastic |
abuse | Abuse Classifier | Generates a list of values for each sentence indicating the types of abuse detected.
ATTR:sentence.abuse OUTPUT LABELS: Offensive Obscene/scatologic Threatening/hostile Insult Hate speech/racist Unknown abuse Non-abusive |
task | Task Classifier | Generates different values including whether a sentence
is predicted to be a task, and if so, it returns a dictionary that includes
a list of tags indicating its predicted task type and priority. ATTR:sentence.is_task ATTR:sentence.task_actions OUTPUT LABELS: is_task: boolean task_actions['actions']: call open_attachment open_browser text deadline open_calendar unknown_task_action |
KEY | NAME | DESCRIPTION |
---|---|---|
summarize | Summarization | Generates an extractive summary with the most relevant sentences of the input text.
This annotator accepts an input variable summary_length to indicate the desired size of the output summary: document = client.analyze(text, pipeline="summarize", summary_length=2) ATTR:document.summary |
summarize_news | News Summarization | Generates an extractive summary with the most relevant sentences of a news article.
The input text must be a valid URL. The content of the article is stored as the raw_text of
the document. The metadata of the news article is stored on a single dict and include
the url, image, date and author.
This annotator accepts an input variable summary_length to indicate the desired size of the output summary: document = client.analyze(text, pipeline="summarize_news", summary_length=2) ATTR:document.raw_text ATTR:document.summary ATTR:document.news_article |
compress | Sentence compression | Provides, where applicable, a shortened version of a sentence that gives its main point
without extraneous clauses. It uses the output of the dependency parser Annotator to
determine
parts of the sentence that serve to modify, explain, or embellish the main points
and strips them off, leaving only the core information provided by the sentence.
ATTR:sentence.compressed_sentence |
summarize_compress | Summarization with compression |
Generates an extractive summary with the most relevant sentences of the input text
in its compressed form, independently if the compress Annotator is specified
in the pipeline or not.
ATTR:document.compressed_summary |
keyphrases | Keyphrase Extraction | Generates a list of keyphrases to capture the topics covered by the document,
in order from most to least relevant. Keyphrases can be retrieved with or without
relevance scores. ATTR:document.keyphrases ATTR:document.keyphrases_scored |
The probability scores of some text classifiers are stored at the Sentence level on the variable sentence.scores. This variable is a dictionary that can be accessed using the same name of the annotator and contain the assigned label as key and the probability as value.
ANNOTATOR | SCORE KEY |
---|---|
speechact | sentence.scores['speechact'] |
question | sentence.scores['question'] |
task |
sentence.scores['task'] sentence.scores['task_actions'] |
sentiment | sentence.scores['sentiment'] |
emotion | sentence.scores['emotion'] |
sarcasm | sentence.scores['sarcasm'] |
abuse | sentence.scores['abuse'] |
The following table lists the range of text similarity scores, which is based on the SemEval (Semantic Textual Similarity) tasks.
SCORE | DESCRIPTION |
---|---|
5 | The two sentences are completely equivalent, as they mean the same thing. |
4 | The two sentences are mostly equivalent, but some unimportant details differ. |
3 | The two sentences are roughly equivalent, but some important information differs/missing. |
2 | The two sentences are not equivalent, but share some details or are on the same topic. |
1 | The two sentences are completely dissimilar. |