Email address

Password

Email address

Enter a valid email address

NLP - Linguistic Resources

< Introduction to NLP

NLP - Word Level Analysis >

Linguistic resources are required for the creation of grammars, in the framework of symbolic approaches, or for the training of machine learning modules. The word corpus means body in Latin, but when used as a data source in linguistics, it can be interpreted as a collection of texts. A collection of linguistic data, either written texts or transcriptions of recorded speech can be used to begin linguistic description. The linguistic data consortium (LDC) owns a large catalog of written and spoken corpora covering a wide range of languages. ELRA2 is a European language resource agency that collects, distributes, and validates spoken, written, and terminological linguistic resources, as well as software tools.

A corpus is a sizable, organized collection of texts that are machine-readable and were created in a context where communication was natural. Corpora are plural. They can be derived in a variety of ways, including electronic text, transcripts of spoken language, optical character recognition, and so on.
Sampling is yet another crucial component of corpus design. Sampling has a strong relationship with corpus representativeness and balance. As a result, sampling is unavoidable in corpus building.

The following practical factors and the intended use of the corpus will all affect how big the corpus is:

The type of question expected from the user.
The method by which the users studied the data.
The availability of the data source.

< Introduction to NLP

NLP - Word Level Analysis >

Featured Degree & Certificate Programs

Top course recommendations for you

How to Build your own Chatbot using Python?

2 hrs

Beginner

36.3K+ Learners

4.51 (2502)

Face Detection with OpenCV in Python

2 hrs

Intermediate

17.2K+ Learners

4.47 (801)

Introduction to Artificial Intelligence

1 hrs

Beginner

163.9K+ Learners

4.47 (21781)

AI for Leaders

2 hrs

Advanced

5.7K+ Learners

4.49 (289)

AI Foundation

4 hrs

Beginner

9.4K+ Learners

4.51 (232)

Multilayer Perceptron

2 hrs

Intermediate

3.2K+ Learners

4.65 (220)

Deepfakes Basics

2 hrs

Intermediate

5.4K+ Learners

4.5 (238)

Convolutional Neural Networks

3 hrs

Intermediate

16K+ Learners

4.58 (736)

Neural Network in R

2 hrs

Intermediate

6.5K+ Learners

4.6 (303)

Email address

Password

Email address

Enter a valid email address

Natural Language Processing

NLP - Linguistic Resources

Featured Degree & Certificate Programs

PG Program in Artificial Intelligence and Machine Learning

MS in Information Science: Machine Learning

Top course recommendations for you