Codeq NLP API Documentation

API Authentication

The first thing you need to do before you start using Codeq's NLP API is to sign up to generate a User ID and User Key. These two pieces of information are required to make requests. Go ahead and sign up if you have not done it yet.

Get started

API Calls

Once you have obtained a User ID and User Key, you can make requests to our API in the two following ways:

1. By using our Python SDK.

2. By sending a POST request to the API endpoints

Rate Limits

There is a monthly rate limit of 1,000 request per user. If you are interested on a custom plan, please take a look at our pricing plans or contact us.

Python SDK

The easiest way to call the API is by using our Python SDK (requires Python 3). To install it:


pip install codeq-nlp-api -U
    

NLP Pipeline

Once installed, you can import the SDK and use it to initialize a client object. This client can be used to send a text request to the NLP Pipeline endpoint and retrieve a Document object, which encapsulates a list of Sentence objects with the analyzed information of the text:


from codeq_nlp_api import CodeqClient

client = CodeqClient(user_id="YOUR_USER_ID", user_key="YOUR_USER_KEY")

text = "This model is an expensive alternative with useless battery life."
document = client.analyze(text)

for sentence in document.sentences:
    print(sentence.sentiments)

"""
Output:
>> ['Negative']
"""

It is also possible to pass as input a list of sentences. In this case, the API will not apply it's own sentence segmentation:


sentences = ["This model is an expensive alternative.", "It has a useless battery life."]
document = client.analyze_sentences(sentences)
print(len(document.sentences))

"""
Output:
>> 2
"""
    

Text Similarity

The client can also be used to call the Text Similarity endpoint and retrieve a textual similarity score between two texts.


from codeq_nlp_api import CodeqClient

client = CodeqClient(user_id="YOUR_USER_ID", user_key="YOUR_USER_KEY")

text1 = "Some people are singing"
text2 = "A group of people is singing"

result = client.analyze_text_similarity(text1, text2)
print(result)

"""
Output:
>> {"text_similarity_score": 4.6}
"""
                

HTTP Request

Alternatively, you can call the API by sending a POST request to one of the following endpoints:

URL DESCRIPTION
https://api.codeq.com/v1 NLP Pipeline
This enpdoint receives one text and returns a JSON object containing a list of analyzed sentences.
https://api.codeq.com/v1_text_similarity Text Similarity
This endpoint receives two texts and returns a JSON object containing the text similarity score.

NLP Pipeline

The POST request to this endpoint must be composed with the following parameters:

user_id: the id provided on the registration process.

user_key: the key also generated on the registration process.

pipeline (optional): a string indicating the specific NLP annotators to apply.

And one of both:

text: a string to be analyzed.

sentences: or a list of strings to be analyzed (no sentence segmentation will be applied).


curl -X POST https://api.codeq.com/v1 \
    -d '{ \
        "user_id": "YOUR_USER_ID", \
        "user_key": "YOUR_USER_KEY", \
        "text": "This model is an expensive alternative with useless battery life." \
    }' \


# Output:
{
  "sentences": [
    {
      "position": 0,
      "raw_sentence": "This model is an expensive alternative with useless battery life.",
      "tokens": ["This", "model", "is", "an", "expensive", "alternative", "with", ... ],
      "pos_tags": ["DT", "NN", "VBZ", "DT", "JJ", "NN", "IN", "JJ", "NN", "NN", "."],
      "speech_acts": ["Statement"],
      "sentiments": ["Negative"],
      ...
    }
  ]
}
            

Text Similarity

The POST request to this endpoint must be composed with the following parameters:

user_id: the id provided on the registration process.

user_key: the key also generated on the registration process.

text1: the first document to be compared.

text2: the second document to be compared.


curl -X POST https://api.codeq.com/v1 \
    -d '{ \
        "user_id": "YOUR_USER_ID", \
        "user_key": "YOUR_USER_KEY", \
        "text1": "Some people are singing", \
        "text2": "A group of people is singing" \
    }' \

# Output:
{
  "text_similarity_score": 4.6
}
            

Response Status

Independently of the method you use to call our API, we will return a status code helpful to debug any error you may encounter. The following table summarizes the list of Status responses:


CODE TEXT DESCRIPTION
200 Ok The request was successfully processed.
400 Bad Request We are not able to process your request, usually because a mal formed JSON.
401 Unauthorized The user key or user id you submitted is unknown.
404 Not Found No idea what are you looking for.
413 Request Entity Too Large The text is longer that 30,000 characters (~30Kb).
429 Too Many Requests Your quota limit is done. Wait or talk to us to increase your quota.
500 Internal Server Error There is something wrong in our spaghetti code that we will fix soon.

Calling a specific NLP Pipeline

By default, when you call the NLP Pipeline API you will retrieve a text fully analyzed by our complete set of NLP Annotators (see following sections).

Or you can specify a custom pipeline depending on your needs. For example, if you are only interested on getting the emotion and speech acts labels of a text, you can declare a "pipeline" key as parameter in the Python SDK client object or in the content of the POST request, and send as value a comma separated string indicating the Annotators you need:


client = CodeqClient(user_id="YOUR_USER_ID", user_key="YOUR_USER_KEY")

text = "This is getting very interesting!"
pipeline = "speechact, sentiment"
document = client.analyze(text, pipeline)

for sentence in document.sentences:
    print(sentence.speech_acts)
    print(sentence.sentiments)

"""
Output:
>> ['Statement']
>> ['Positive']
"""
            

The following sections show the complete list of Annotators of our NLP API, including the KEY you can use as value of the pipeline parameter, as well as the description of the Annotators' output and its respective Python attribute.


Linguistic Features

KEY NAME DESCRIPTION OUTPUT LABELS
language Language Identifier Generates a label indicating the language of the text and its probability.
ATTR:document.language
ATTR:document.language_probability
Supported languages:
Afrikaans Albanian Arabic Basque Bulgarian Catalan Chinese Croatian Czech Danish Dutch English Esperanto Estonian Finnish French Galician German Greek Hebrew Hindi Hungarian Icelandic Italian Japanese Korean Latvian Lithuanian Norwegian Pashto Polish Portuguese Romanian Russian Serbian Slovak Slovenian Spanish Swahili Swedish Tamil Telugu Thai Turkish Ukrainian Urdu Vietnamese Welsh Wolof Yiddish
tokenize Tokenization Generates a list of words from raw text.
ATTR:document.tokens
ATTR:sentence.tokens
N/A
ssplit Sentence Segmentation Generates a list of sentences from a raw text.
ATTR:document.sentences
N/A
stopword Stopword Removal Produces a list of tokens after removing common stopwords from the text.
ATTR:sentence.tokens_filtered
N/A
stem Stemming Generates a list of the inflected forms of the tokens.
ATTR:sentence.stems
N/A
truecase True casing Produces a string with the true case of sentence tokens.
ATTR:sentence.truecase_sentence
N/A
detruecase Detrue casing Produces a string with the predicted original case of the tokens.
ATTR:sentence.detruecase_sentence
N/A
lemma Lemmatization Generates a list containing the lemma for each sentence token.
ATTR:sentence.lemmas
N/A
pos Part of Speech Tagging Generates a list containing the PoS-tag for each sentence token.
ATTR:sentence.pos_tags
Penn Treebank
parse Dependency parser Generates a list of dependencies in 3-tuples consisting of: head, dependent and relation. Head and dependent are in the format "token@@@position". Positions are 1-indexed, with 0 being the index for the root.
ATTR:sentence.dependencies
Our dependency labels mostly follow the basic dependencies given in section 2 of the Stanford Parser Dependencies (3.5.2) , except for the following distinctions:
* we do not use dependency labels which apply only to collapsed dependencies or which are labeled as "additional dependencies",
* we do not use the labels "goeswith" or "cop",
* we introduce the novel label "ncomp".
chunk Chunker Groups the tokens of the sentence into small, non-overlapping groups based on prominent parts of speech, such as NP chunks ("the tall person") or VP chunks ("will leave").
ATTR:sentence.chunks
CONLL 2000
semantic_roles Semantic Role Labelling Generates a list of dictionaries containing the retrieved predicates of each sentence, their lemmas, the constituents of the sentence found to be arguments of each predicate, and the classified argument type.
ATTR:sentence.semantic_roles
Agent/Experiencer
Patient/Theme/Affected
Beneficiary/Goal/Predicate/Comitative
Destination/EndingPoint/Source
Location
Speaker/Addressee/Conjunction/Interjection
Manner/Means/Extent
Modal
Cause
Temporal
EventModifier/Purpose
Negative
twitter_preprocess Twitter Preprocessing Removes artifacts like user mentions and URLs, segments hashtags and generates a list of words from raw text.
ATTR:sentence.tokens_clean
N/A

Named Entities

KEY NAME DESCRIPTION OUTPUT LABELS
ner Named Entity Recognition Produces a list of named entities found in a sentence, containing the tokens of the entity, its type and its span positions.
ATTR:sentence.named_entities
PER_(person)
LOC_(location)
ORG_(organization)
MISC_(miscellaneous)
DATE
MONEY
URL
PHONE
EMAIL
TWITTERNAME
TRACKINGNUMBER
AIRLINECODE
AIRLINENAME
AIRPORTCODE
AIRPORTNAME
EMOJI
SMILIE
salience Named Entity Salience Produces a list of tuples indicating the salience of named entities, that is how central they are to the content input document. Each tuple contains a boolean indicating if the entity is salient or not and its salience score.
ATTR:sentence.named_entities_salience
boolean
date Date resolution Generates a list of tuples for each sentence with all resolved date entities given a relative date (by default: today). The output includes the date entity, its tokens span and the resolved timestamp.

This annotator accepts an input variable date_referent to indicate the desired date to be used as referent for the resolution.
The format of this variable needs to be:
"year-month-day hour:minutes:seconds"
"%Y-%m-%d %H:%M:%S":

document = client.analyze(text, date_referent="2022-03-01 09:00:00")

ATTR:sentence.dates
N/A
coreference Coreference resolution Generates a list of resolved pronominal coreferences. Each coreference is a dictionary that includes: mention, referent, first_referent, where each of those elements is a tuple containing a coreference id, the tokens and the span of the item. Additionally, each coreference dict contains a coreference chain (all the ids of the linked mentions) and the first referent of a chain.
ATTR:sentence.coreferences
N/A

Text Classification

KEY NAME DESCRIPTION
speechact Speech Act Classifier Generates a list of tags indicating the predicted speech acts of a sentence.
ATTR:sentence.speech_acts
OUTPUT LABELS:

statement
A sentence which conveys information from the speaker to the hearer.

command/request
A sentence which attempts to either impart an obligation on the hearer to do a certain task for the speaker or which asks the hearer to do such a task.

question
A sentence which imparts a request the hearer to give some type of information to the speaker.

desire/need/hope
A sentence which expresses or reports something that the speaker wants, needs, or hopes for, whether or not the hearer is involved or has the power to help realize this desire, need, or hope.

commitment/promise
A sentence which expresses that the speaker plans to do some stated action in the future.

evaluation
A sentence which expresses the speaker's subjective opinion about one or more people, things, events or other entities.

speculation
A sentence which expresses the speaker's uncertain belief about some event or state of affairs of the world, whether possible event/state of affairs would be in the past, present, or future.

suggestion/recommendation
A sentence which expresses the speaker's belief about the optimal course of action for some party, whether the group includes the hearer or not.

regret
A sentence which expresses that the speaker would prefer that some past event or state of affairs had not occurred the way it did, whether or not the situation was caused by the speaker.

greeting
A sentence which expresses acknowledgement to the hearer, often in a conventionalized way, often in the event of the speaker and the hearer meeting, parting company, or acknowledging some recognized holiday.

permission
A sentence which expresses to the hearer that the speaker allows the hearer to perform some action, and presupposes that the speaker believes that they have the authority to grant or withhold such permission.

offer
A sentence which expresses that the speaker is willing to give some object to, or do some task for, the hearer, if the hearer so desires and will accept the offer.

gratitude
A sentence which expresses that the speaker is thankful for something that some party, often but not always the hearer, has done or was involved with, often in a conventionalized way.

congratulation
A sentence which expresses that the speaker is proud of the hearer for some accomplishment.

disagreement
A sentence which expresses that the speaker disagrees with something that the hearer has recently said.

apology/condolence
A sentence which either expresses remorse for something that the speaker has done or failed to do, or which commiserates with some misfortune that the hearer has experienced.

agreement
A sentence which expresses that the speaker agrees with something that the hearer has recently said.

well-wishing
A sentence which expresses that the speaker hopes that the hearer will have good fortune in the future, often in a conventionalized way.

warning
A sentence which expresses that some misfortune may, or definitely will, befall some party, often the hearer but some times a third party or the speaker; sometimes the misfortune is unconditional, but sometimes the warning expresses that misfortune will occur unless someone performs or refrains from performing a certain action.

introduction
A sentence which acts to bring the hearer in acquaintance with some party, either the speaker or some third party.

unknown speech act
A sentence which cannot be classified in any of the above ways, sometimes because it cannot be interpreted; sometimes because it is in a different language than the one (English) that this classifier was built for; sometimes because it is a short interjection; etc.

question Question Classifier Generates a list of tags indicating the predicted type of question, if a sentence is classified as such.
ATTR:sentence.question_types
OUTPUT LABELS:
qy: Yes/No question
qo: Open-ended question
qr: Or question
qt: Task question
UNKNOWN: Unknown question type
sentiment Sentiment Classifier Generates a list of values for each sentence indicating the predicted sentiment label.
ATTR:sentence.sentiments
OUTPUT LABELS:
Positive
Neutral
Negative
emotion Emotion Classifier Generates a list of values for each sentence indicating the predicted emotion label.
ATTR:sentence.emotions
OUTPUT LABELS:
Anger
Disgust/Dislike
Fear
Joy/Like
Sadness
Surprise
Excitement
Angst
No emotion
sarcasm Sarcasm Classifier Generates a label predicting if a sentence is sarcastic or not.
ATTR:sentence.sarcasm
OUTPUT LABELS:
Sarcastic
Non-sarcastic
abuse Abuse Classifier Generates a list of values for each sentence indicating the types of abuse detected.
ATTR:sentence.abuse
OUTPUT LABELS:
Offensive
Obscene/scatologic
Threatening/hostile
Insult
Hate speech/racist
Unknown abuse
Non-abusive
task Task Classifier Generates different values including whether a sentence is predicted to be a task, and if so, it returns a dictionary that includes a list of tags indicating its predicted task type and priority.
ATTR:sentence.is_task
ATTR:sentence.task_actions
OUTPUT LABELS:
is_task:
boolean
task_actions['actions']:
email
call
open_attachment
open_browser
text
deadline
open_calendar
unknown_task_action

Summarization

KEY NAME DESCRIPTION
summarize Summarization Generates an extractive summary with the most relevant sentences of the input text.
This annotator accepts an input variable summary_length to indicate the desired size of the output summary:
document = client.analyze(text, pipeline="summarize", summary_length=2)
ATTR:document.summary
summarize_news News Summarization Generates an extractive summary with the most relevant sentences of a news article. The input text must be a valid URL. The content of the article is stored as the raw_text of the document. The metadata of the news article is stored on a single dict and include the url, image, date and author.
This annotator accepts an input variable summary_length to indicate the desired size of the output summary:
document = client.analyze(text, pipeline="summarize_news", summary_length=2) ATTR:document.raw_text
ATTR:document.summary
ATTR:document.news_article
compress Sentence compression Provides, where applicable, a shortened version of a sentence that gives its main point without extraneous clauses. It uses the output of the dependency parser Annotator to determine parts of the sentence that serve to modify, explain, or embellish the main points and strips them off, leaving only the core information provided by the sentence.
ATTR:sentence.compressed_sentence
summarize_compress Summarization
with compression
Generates an extractive summary with the most relevant sentences of the input text in its compressed form, independently if the compress Annotator is specified in the pipeline or not.
ATTR:document.compressed_summary
keyphrases Keyphrase Extraction Generates a list of keyphrases to capture the topics covered by the document, in order from most to least relevant. Keyphrases can be retrieved with or without relevance scores.
ATTR:document.keyphrases
ATTR:document.keyphrases_scored

Scores

The probability scores of some text classifiers are stored at the Sentence level on the variable sentence.scores. This variable is a dictionary that can be accessed using the same name of the annotator and contain the assigned label as key and the probability as value.

ANNOTATOR SCORE KEY
speechact sentence.scores['speechact']
question sentence.scores['question']
task sentence.scores['task']
sentence.scores['task_actions']
sentiment sentence.scores['sentiment']
emotion sentence.scores['emotion']
sarcasm sentence.scores['sarcasm']
abuse sentence.scores['abuse']

Text Similarity Output Score

The following table lists the range of text similarity scores, which is based on the SemEval (Semantic Textual Similarity) tasks.

SCORE DESCRIPTION
5 The two sentences are completely equivalent, as they mean the same thing.
4 The two sentences are mostly equivalent, but some unimportant details differ.
3 The two sentences are roughly equivalent, but some important information differs/missing.
2 The two sentences are not equivalent, but share some details or are on the same topic.
1 The two sentences are completely dissimilar.

Take a look at our demos to find out more about the output of specific NLP Annotators.