Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
@@ -10,9 +10,25 @@ pinned: false
|
|
10 |
---
|
11 |
# Intro
|
12 |
In this repository, Natural Language Processing (NLP) techniques are used to explore the speech style of Rachel from the TV series Friends, perform a multi-lingual analysis for English, and train a neural network to communicate in Rachel's style.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
|
14 |
|
|
|
|
|
|
|
15 |
|
|
|
|
|
|
|
|
|
|
|
16 |
|
17 |
|
18 |
|
|
|
10 |
---
|
11 |
# Intro
|
12 |
In this repository, Natural Language Processing (NLP) techniques are used to explore the speech style of Rachel from the TV series Friends, perform a multi-lingual analysis for English, and train a neural network to communicate in Rachel's style.
|
13 |
+
Style transfer is very popular in NLP, and is now being used in a wide range of applications, from education to personalising electronic assistants. And with the development of large transformer models that show outstanding abilities in natural language comprehension and imitation of a wide variety of styles, style transfer has reached a new level. Today, large language models such as GPT3, due to their volumes and billions of parameters, are able to perfectly learn all the features of the training sample (i.e., train distribution) and generate realistic text in a particular style.
|
14 |
+
In this post, I explore the possibilities of language models to generate text in the style of Rachel from the famous TV series "Friends". For this purpose, a corpus of English transcripts of the TV series, which was collected for retrieval chatbot is used and trained bilingual models to communicate in the style of Rachel Green.
|
15 |
+
In addition, I conducted a style analysis, studied the speech features of Rachel.
|
16 |
+
Thus, the project can be roughly divided into 3 parts:
|
17 |
+
* data collection
|
18 |
+
* Stylistic analysis of the characters' speech
|
19 |
+
* Framework for training models that write text in the Rachel style
|
20 |
+
You can find all the code in this HF repository.
|
21 |
|
22 |
|
23 |
+
# Data
|
24 |
+
## Selecting a character
|
25 |
+
I decided to continue to use TV series, which I settled in the previuos project, sitcom "Friends", which ran from 1994 to 2004. Despite its age, it is still popular today. This comedy series tells about the life of six friends (Ross, Phoebe, Monica, Rachel, Joey, and Chandler), who live in New York and constantly get into some trouble and funny situations. I did we choose this particular series? For three reasons:
|
26 |
|
27 |
+
1. I found transcripts of 236 episodes in the public domain. It's a lot of data, which i can use to train a language model.
|
28 |
+
|
29 |
+
2. The TV series contains dialogues of as many as six characters (rather than one), which opens me up to comparative analysis
|
30 |
+
|
31 |
+
3. This is a popular TV series that many of us have watched and know well. This means I can make assumptions about the data (e.g. Phoebe speaks in simpler words, etc.) and assess the realism of the style of the generated text based on my viewing experience.
|
32 |
|
33 |
|
34 |
|