Codeq NLP API Documentation

API Authentication

The first thing you need to do before you start using Codeq's NLP API is to sign up to generate a User ID and User Key. These two pieces of information are required to make requests. Go ahead and sign up if you have not done it yet.

Get started

API Calls

Once you have obtained a User ID and User Key, you can make requests to our API in the two following ways:

1. By using our Python SDK.

2. By sending a POST request to the API endpoints

Rate Limits

There is a monthly rate limit of 1,000 request per user. If you are interested on a custom plan, please take a look at our pricing plans or contact us.

Python SDK

The easiest way to call the API is by using our Python SDK (requires Python 3). To install it:


pip install codeq-nlp-api -U

NLP Pipeline

Once installed, you can import the SDK and use it to initialize a client object. This client can be used to send a text request to the NLP Pipeline endpoint and retrieve a Document object, which encapsulates a list of Sentence objects with the analyzed information of the text:


from codeq_nlp_api import CodeqClient

client = CodeqClient(user_id="YOUR_USER_ID", user_key="YOUR_USER_KEY")

text = "This model is an expensive alternative with useless battery life."
document = client.analyze(text)

for sentence in document.sentences:
    print(sentence.sentiments)

"""
Output:
>> ['Negative']
"""

It is also possible to pass as input a list of sentences. In this case, the API will not apply it's own sentence segmentation:


sentences = ["This model is an expensive alternative.", "It has a useless battery life."]
document = client.analyze_sentences(sentences)
print(len(document.sentences))

"""
Output:
>> 2
"""

Text Similarity

The client can also be used to call the Text Similarity endpoint and retrieve a textual similarity score between two texts.


from codeq_nlp_api import CodeqClient

client = CodeqClient(user_id="YOUR_USER_ID", user_key="YOUR_USER_KEY")

text1 = "Some people are singing"
text2 = "A group of people is singing"

result = client.analyze_text_similarity(text1, text2)
print(result)

"""
Output:
>> {"text_similarity_score": 4.6}
"""

HTTP Request

Alternatively, you can call the API by sending a POST request to one of the following endpoints:

URL	DESCRIPTION
https://api.codeq.com/v1	NLP Pipeline This enpdoint receives one text and returns a JSON object containing a list of analyzed sentences.
https://api.codeq.com/v1_text_similarity	Text Similarity This endpoint receives two texts and returns a JSON object containing the text similarity score.

NLP Pipeline

The POST request to this endpoint must be composed with the following parameters:

user_id: the id provided on the registration process.

user_key: the key also generated on the registration process.

pipeline (optional): a string indicating the specific NLP annotators to apply.

And one of both:

text: a string to be analyzed.

sentences: or a list of strings to be analyzed (no sentence segmentation will be applied).


curl -X POST https://api.codeq.com/v1 \
    -d '{ \
        "user_id": "YOUR_USER_ID", \
        "user_key": "YOUR_USER_KEY", \
        "text": "This model is an expensive alternative with useless battery life." \
    }' \


# Output:
{
  "sentences": [
    {
      "position": 0,
      "raw_sentence": "This model is an expensive alternative with useless battery life.",
      "tokens": ["This", "model", "is", "an", "expensive", "alternative", "with", ... ],
      "pos_tags": ["DT", "NN", "VBZ", "DT", "JJ", "NN", "IN", "JJ", "NN", "NN", "."],
      "speech_acts": ["Statement"],
      "sentiments": ["Negative"],
      ...
    }
  ]
}

Text Similarity

The POST request to this endpoint must be composed with the following parameters:

user_id: the id provided on the registration process.

user_key: the key also generated on the registration process.

text1: the first document to be compared.

text2: the second document to be compared.


curl -X POST https://api.codeq.com/v1 \
    -d '{ \
        "user_id": "YOUR_USER_ID", \
        "user_key": "YOUR_USER_KEY", \
        "text1": "Some people are singing", \
        "text2": "A group of people is singing" \
    }' \

# Output:
{
  "text_similarity_score": 4.6
}

Response Status

Independently of the method you use to call our API, we will return a status code helpful to debug any error you may encounter. The following table summarizes the list of Status responses:

CODE	TEXT	DESCRIPTION
200	Ok	The request was successfully processed.
400	Bad Request	We are not able to process your request, usually because a mal formed JSON.
401	Unauthorized	The user key or user id you submitted is unknown.
404	Not Found	No idea what are you looking for.
413	Request Entity Too Large	The text is longer that 30,000 characters (~30Kb).
429	Too Many Requests	Your quota limit is done. Wait or talk to us to increase your quota.
500	Internal Server Error	There is something wrong in our spaghetti code that we will fix soon.

Calling a specific NLP Pipeline

By default, when you call the NLP Pipeline API you will retrieve a text fully analyzed by our complete set of NLP Annotators (see following sections).

Or you can specify a custom pipeline depending on your needs. For example, if you are only interested on getting the emotion and speech acts labels of a text, you can declare a "pipeline" key as parameter in the Python SDK client object or in the content of the POST request, and send as value a comma separated string indicating the Annotators you need:


client = CodeqClient(user_id="YOUR_USER_ID", user_key="YOUR_USER_KEY")

text = "This is getting very interesting!"
pipeline = "speechact, sentiment"
document = client.analyze(text, pipeline)

for sentence in document.sentences:
    print(sentence.speech_acts)
    print(sentence.sentiments)

"""
Output:
>> ['Statement']
>> ['Positive']
"""

The following sections show the complete list of Annotators of our NLP API, including the KEY you can use as value of the pipeline parameter, as well as the description of the Annotators' output and its respective Python attribute.

Linguistic Features

KEY	NAME	DESCRIPTION	OUTPUT LABELS
language	Language Identifier	Generates a label indicating the language of the text and its probability. ATTR:document.language ATTR:document.language_probability	Supported languages: Afrikaans Albanian Arabic Basque Bulgarian Catalan Chinese Croatian Czech Danish Dutch English Esperanto Estonian Finnish French Galician German Greek Hebrew Hindi Hungarian Icelandic Italian Japanese Korean Latvian Lithuanian Norwegian Pashto Polish Portuguese Romanian Russian Serbian Slovak Slovenian Spanish Swahili Swedish Tamil Telugu Thai Turkish Ukrainian Urdu Vietnamese Welsh Wolof Yiddish
tokenize	Tokenization	Generates a list of words from raw text. ATTR:document.tokens ATTR:sentence.tokens	N/A
ssplit	Sentence Segmentation	Generates a list of sentences from a raw text. ATTR:document.sentences	N/A
stopword	Stopword Removal	Produces a list of tokens after removing common stopwords from the text. ATTR:sentence.tokens_filtered	N/A
stem	Stemming	Generates a list of the inflected forms of the tokens. ATTR:sentence.stems	N/A
truecase	True casing	Produces a string with the true case of sentence tokens. ATTR:sentence.truecase_sentence	N/A
detruecase	Detrue casing	Produces a string with the predicted original case of the tokens. ATTR:sentence.detruecase_sentence	N/A
lemma	Lemmatization	Generates a list containing the lemma for each sentence token. ATTR:sentence.lemmas	N/A
pos	Part of Speech Tagging	Generates a list containing the PoS-tag for each sentence token. ATTR:sentence.pos_tags	Penn Treebank
parse	Dependency parser	Generates a list of dependencies in 3-tuples consisting of: head, dependent and relation. Head and dependent are in the format "token@@@position". Positions are 1-indexed, with 0 being the index for the root. ATTR:sentence.dependencies	Our dependency labels mostly follow the basic dependencies given in section 2 of the Stanford Parser Dependencies (3.5.2) , except for the following distinctions: * we do not use dependency labels which apply only to collapsed dependencies or which are labeled as "additional dependencies", * we do not use the labels "goeswith" or "cop", * we introduce the novel label "ncomp".
chunk	Chunker	Groups the tokens of the sentence into small, non-overlapping groups based on prominent parts of speech, such as NP chunks ("the tall person") or VP chunks ("will leave"). ATTR:sentence.chunks	CONLL 2000
semantic_roles	Semantic Role Labelling	Generates a list of dictionaries containing the retrieved predicates of each sentence, their lemmas, the constituents of the sentence found to be arguments of each predicate, and the classified argument type. ATTR:sentence.semantic_roles	Agent/Experiencer Patient/Theme/Affected Beneficiary/Goal/Predicate/Comitative Destination/EndingPoint/Source Location Speaker/Addressee/Conjunction/Interjection Manner/Means/Extent Modal Cause Temporal EventModifier/Purpose Negative
twitter_preprocess	Twitter Preprocessing	Removes artifacts like user mentions and URLs, segments hashtags and generates a list of words from raw text. ATTR:sentence.tokens_clean	N/A

Named Entities

KEY	NAME	DESCRIPTION	OUTPUT LABELS
ner	Named Entity Recognition	Produces a list of named entities found in a sentence, containing the tokens of the entity, its type and its span positions. ATTR:sentence.named_entities	PER_(person) LOC_(location) ORG_(organization) MISC_(miscellaneous) DATE MONEY URL PHONE EMAIL TWITTERNAME TRACKINGNUMBER AIRLINECODE AIRLINENAME AIRPORTCODE AIRPORTNAME EMOJI SMILIE
salience	Named Entity Salience	Produces a list of tuples indicating the salience of named entities, that is how central they are to the content input document. Each tuple contains a boolean indicating if the entity is salient or not and its salience score. ATTR:sentence.named_entities_salience	boolean
date	Date resolution	Generates a list of tuples for each sentence with all resolved date entities given a relative date (by default: today). The output includes the date entity, its tokens span and the resolved timestamp. This annotator accepts an input variable date_referent to indicate the desired date to be used as referent for the resolution. The format of this variable needs to be: "year-month-day hour:minutes:seconds" "%Y-%m-%d %H:%M:%S": document = client.analyze(text, date_referent="2022-03-01 09:00:00") ATTR:sentence.dates	N/A
coreference	Coreference resolution	Generates a list of resolved pronominal coreferences. Each coreference is a dictionary that includes: mention, referent, first_referent, where each of those elements is a tuple containing a coreference id, the tokens and the span of the item. Additionally, each coreference dict contains a coreference chain (all the ids of the linked mentions) and the first referent of a chain. ATTR:sentence.coreferences	N/A

Text Classification

KEY	NAME	DESCRIPTION
speechact	Speech Act Classifier	Generates a list of tags indicating the predicted speech acts of a sentence. ATTR:sentence.speech_acts OUTPUT LABELS: statement A sentence which conveys information from the speaker to the hearer. command/request A sentence which attempts to either impart an obligation on the hearer to do a certain task for the speaker or which asks the hearer to do such a task. question A sentence which imparts a request the hearer to give some type of information to the speaker. desire/need/hope A sentence which expresses or reports something that the speaker wants, needs, or hopes for, whether or not the hearer is involved or has the power to help realize this desire, need, or hope. commitment/promise A sentence which expresses that the speaker plans to do some stated action in the future. evaluation A sentence which expresses the speaker's subjective opinion about one or more people, things, events or other entities. speculation A sentence which expresses the speaker's uncertain belief about some event or state of affairs of the world, whether possible event/state of affairs would be in the past, present, or future. suggestion/recommendation A sentence which expresses the speaker's belief about the optimal course of action for some party, whether the group includes the hearer or not. regret A sentence which expresses that the speaker would prefer that some past event or state of affairs had not occurred the way it did, whether or not the situation was caused by the speaker. greeting A sentence which expresses acknowledgement to the hearer, often in a conventionalized way, often in the event of the speaker and the hearer meeting, parting company, or acknowledging some recognized holiday. permission A sentence which expresses to the hearer that the speaker allows the hearer to perform some action, and presupposes that the speaker believes that they have the authority to grant or withhold such permission. offer A sentence which expresses that the speaker is willing to give some object to, or do some task for, the hearer, if the hearer so desires and will accept the offer. gratitude A sentence which expresses that the speaker is thankful for something that some party, often but not always the hearer, has done or was involved with, often in a conventionalized way. congratulation A sentence which expresses that the speaker is proud of the hearer for some accomplishment. disagreement A sentence which expresses that the speaker disagrees with something that the hearer has recently said. apology/condolence A sentence which either expresses remorse for something that the speaker has done or failed to do, or which commiserates with some misfortune that the hearer has experienced. agreement A sentence which expresses that the speaker agrees with something that the hearer has recently said. well-wishing A sentence which expresses that the speaker hopes that the hearer will have good fortune in the future, often in a conventionalized way. warning A sentence which expresses that some misfortune may, or definitely will, befall some party, often the hearer but some times a third party or the speaker; sometimes the misfortune is unconditional, but sometimes the warning expresses that misfortune will occur unless someone performs or refrains from performing a certain action. introduction A sentence which acts to bring the hearer in acquaintance with some party, either the speaker or some third party. unknown speech act A sentence which cannot be classified in any of the above ways, sometimes because it cannot be interpreted; sometimes because it is in a different language than the one (English) that this classifier was built for; sometimes because it is a short interjection; etc.
question	Question Classifier	Generates a list of tags indicating the predicted type of question, if a sentence is classified as such. ATTR:sentence.question_types OUTPUT LABELS: qy: Yes/No question qo: Open-ended question qr: Or question qt: Task question UNKNOWN: Unknown question type
sentiment	Sentiment Classifier	Generates a list of values for each sentence indicating the predicted sentiment label. ATTR:sentence.sentiments OUTPUT LABELS: Positive Neutral Negative
emotion	Emotion Classifier	Generates a list of values for each sentence indicating the predicted emotion label. ATTR:sentence.emotions OUTPUT LABELS: Anger Disgust/Dislike Fear Joy/Like Sadness Surprise Excitement Angst No emotion
sarcasm	Sarcasm Classifier	Generates a label predicting if a sentence is sarcastic or not. ATTR:sentence.sarcasm OUTPUT LABELS: Sarcastic Non-sarcastic
abuse	Abuse Classifier	Generates a list of values for each sentence indicating the types of abuse detected. ATTR:sentence.abuse OUTPUT LABELS: Offensive Obscene/scatologic Threatening/hostile Insult Hate speech/racist Unknown abuse Non-abusive
task	Task Classifier	Generates different values including whether a sentence is predicted to be a task, and if so, it returns a dictionary that includes a list of tags indicating its predicted task type and priority. ATTR:sentence.is_task ATTR:sentence.task_actions OUTPUT LABELS: is_task: boolean task_actions['actions']: email call open_attachment open_browser text deadline open_calendar unknown_task_action

Summarization

KEY	NAME	DESCRIPTION
summarize	Summarization	Generates an extractive summary with the most relevant sentences of the input text. This annotator accepts an input variable summary_length to indicate the desired size of the output summary: document = client.analyze(text, pipeline="summarize", summary_length=2) ATTR:document.summary
summarize_news	News Summarization	Generates an extractive summary with the most relevant sentences of a news article. The input text must be a valid URL. The content of the article is stored as the raw_text of the document. The metadata of the news article is stored on a single dict and include the url, image, date and author. This annotator accepts an input variable summary_length to indicate the desired size of the output summary: document = client.analyze(text, pipeline="summarize_news", summary_length=2) ATTR:document.raw_text ATTR:document.summary ATTR:document.news_article
compress	Sentence compression	Provides, where applicable, a shortened version of a sentence that gives its main point without extraneous clauses. It uses the output of the dependency parser Annotator to determine parts of the sentence that serve to modify, explain, or embellish the main points and strips them off, leaving only the core information provided by the sentence. ATTR:sentence.compressed_sentence
summarize_compress	Summarization with compression	Generates an extractive summary with the most relevant sentences of the input text in its compressed form, independently if the compress Annotator is specified in the pipeline or not. ATTR:document.compressed_summary
keyphrases	Keyphrase Extraction	Generates a list of keyphrases to capture the topics covered by the document, in order from most to least relevant. Keyphrases can be retrieved with or without relevance scores. ATTR:document.keyphrases ATTR:document.keyphrases_scored

Scores

The probability scores of some text classifiers are stored at the Sentence level on the variable sentence.scores. This variable is a dictionary that can be accessed using the same name of the annotator and contain the assigned label as key and the probability as value.

ANNOTATOR	SCORE KEY
speechact	sentence.scores['speechact']
question	sentence.scores['question']
task	sentence.scores['task'] sentence.scores['task_actions']
sentiment	sentence.scores['sentiment']
emotion	sentence.scores['emotion']
sarcasm	sentence.scores['sarcasm']
abuse	sentence.scores['abuse']

Text Similarity Output Score

The following table lists the range of text similarity scores, which is based on the SemEval (Semantic Textual Similarity) tasks.

SCORE	DESCRIPTION
5	The two sentences are completely equivalent, as they mean the same thing.
4	The two sentences are mostly equivalent, but some unimportant details differ.
3	The two sentences are roughly equivalent, but some important information differs/missing.
2	The two sentences are not equivalent, but share some details or are on the same topic.
1	The two sentences are completely dissimilar.

Codeq NLP API Documentation

API Authentication

API Calls

Rate Limits

Python SDK

NLP Pipeline

Text Similarity

HTTP Request

NLP Pipeline

Text Similarity

Response Status

Calling a specific NLP Pipeline

Linguistic Features

Named Entities

Text Classification

Summarization

Scores

Text Similarity Output Score

Take a look at our demos to find out more about the output of specific NLP Annotators.