Table of Contents
Natural Language Processing with Spark Introduction
Return to NLP Glossary, Natural Language Processing with Spark NLP, NLP bibliography - Python NLP - NLP, Python AI - AI bibliography, Python ML - Machine Learning (ML) bibliography, Python DL - Deep Learning (DL) bibliography, Python Data science - Data Science bibliography
“This book is about using Spark NLP to build natural language processing (NLP) applications. Spark NLP is an NLP library built on top of Apache Spark. In this book I’ll cover how to use Spark NLP, as well as fundamental natural language processing topics. Hopefully, at the end of this book you’ll have a new software tool for working with natural language and Spark NLP, as well as a suite of techniques and some understanding of why these techniques work.” (NLPwSprk 2020)
“Let’s begin by talking about the structure of this book. In the first part, we’ll go over the technologies and techniques we’ll be using with Spark NLP throughout this book. After that we’ll talk about the building blocks of NLP. Finally, we’ll talk about NLP applications and systems.” (NLPwSprk 2020)
“When working on an application that requires NLP, there are three perspectives you should keep in mind: the software developer’s perspective, the linguist’s perspective, and the data scientist’s perspective. The software developer’s perspective focuses on what your application needs to do; this grounds the work in terms of the product you want to create. The linguist’s perspective focuses on what it is in the data that you want to extract. The data scientist’s perspective focuses on how you can extract the information you need from your data.” (NLPwSprk 2020)
“Following is a more detailed overview of the book.” (NLPwSprk 2020)
- Chapter 1 covers setting up your environment so you can follow along with the examples and exercises in the book.
- Chapter 2, Natural Language Basics is a survey of some of the linguistic concepts that help in understanding why NLP techniques work, and how to use NLP techniques to get the information you need from language.
- Chapter 3, NLP on Apache Spark is an introduction to Apache Spark and, most germane, the Spark NLP library.
- Chapter 4, Deep Learning Basics is a survey of some of the deep learning concepts that we’ll be using in this book. This book is not a tutorial on deep learning, but we’ll try and explain these techniques when necessary.
Part II, Building Blocks:
- Chapter 5, Processing Words covers the classic text-processing techniques. Since NLP applications generally require a pipeline of transformations, understanding the early steps well is a necessity.
- Chapter 6, Information Retrieval covers the basic concepts of search engines. Not only is this a classic example of an application that uses text, but many NLP techniques used in other kinds of applications ultimately come from information retrieval.
- Chapter 7, Classification and Regression covers some well-established techniques of using text features for classification and regression tasks.
- Chapter 8, Sequence Modeling with Keras introduces techniques used in modeling natural language text data as sequences. Since natural language is a sequence, these techniques are fundamental.
- Chapter 10, Topic Modeling demonstrates techniques for finding topics in documents. Topic modeling is a great way to explore text.
- Chapter 11, Word Embeddings discusses one of the most popular modern techniques for creating features from text.
Part III, Applications:
- Chapter 12, Sentiment Analysis and Emotion Detection covers some basic applications that require identifying the sentiment of a text’s author — for example, whether a movie review is positive or negative.
- Chapter 13, Building Knowledge Bases explores creating an ontology, a collection of facts and relationships organized in a graph-like manner, from a corpus.
- Chapter 14, Search Engine goes deeper into what can be done to improve a search engine. Improving is not just about improving the ranker; it’s also about facilitating the user with features like facets.
- Chapter 15, Chatbot demonstrates how to create a chatbot — this is a fun and interesting application. This kind of application is becoming more and more popular.
- Chapter 16, Object Character Recognition introduces converting text stored as images to text data. Not all texts are stored as text data. Handwriting and old texts are examples of texts we may receive as images. Sometimes, we also have to deal with nonhandwritten text stored in images like PDF images and scans of printed documents.
Part IV, Building NLP Systems:
- Chapter 17, Supporting Multiple Languages explores topics that an application creator should consider when preparing to work with multiple languages.
- Chapter 18, Human Labeling covers ways to use humans to gathered data about texts. Being able to efficiently use humans to augment data can make an otherwise impossible project feasible.
- Chapter 19, Productionizing NLP Applications covers creating models, Spark NLP pipelines, and TensorFlow graphs, and publishing them for use in production; some of the performance concerns that developers should keep in mind when designing a system that uses text; and the quality and monitoring concerns that are unique to NLP applications.
- Other Tools
Fair Use Sources
Natural Language Processing (NLP): What Is Language, Text classification, Language modeling, Google Gemini, ChatGPT
Machine Learning for NLP NLP ML, NLP DL - NLP Deep learning - Python NLP, NLP MLOps, Python NLP (sci-kit NLP, OpenCV NLP, TensorFlow NLP, PyTorch NLP, Keras NLP, NumPy NLP, NLTK NLP, SciPy NLP, sci-kit learn NLP, Seaborn NLP, Matplotlib NLP), C Plus Plus Natural Language Processing | C++ NLP, C Sharp Natural Language Processing | NLP, Golang Natural Language Processing | Golang NLP, Java Natural Language Processing | Java NLP, JavaScript Natural Language Processing | JavaScript NLP, Julia Natural Language Processing | Julia NLP, Kotlin Natural Language Processing | Kotlin NLP, R Natural Language Processing | R NLP, Ruby Natural Language Processing | Ruby NLP, Rust Natural Language Processing | Rust NLP, Scala Natural Language Processing | Scala NLP, Swift Natural Language Processing | Swift NLP, NLP history, NLP bibliography, NLP glossary, NLP topics, NLP courses, NLP libraries, NLP frameworks, NLP GitHub, NLP Awesome list. (navbar_nlp - See also navbar_llm, navbar_chatbot, navbar_dl, navbar_ml, navbar_chatgpt, navbar_ai, borg_usage_disclaimer)
Cloud Monk is Retired ( for now). Buddha with you. © 2025 and Beginningless Time - Present Moment - Three Times: The Buddhas or Fair Use. Disclaimers
SYI LU SENG E MU CHYWE YE. NAN. WEI LA YE. WEI LA YE. SA WA HE.