Generate Focus Key Phrase From A Page
Extracts key phrases from given text
Category: Text Analytics
Mar 15, 2018 A keyword, or a focus keyword as some call it, is a word that describes the content on your page or post best. It’s the search term that you want to rank for with a certain page. So when people search for that keyword or phrase in Google or other search engines, they should find that page on.
- Generates the most important key-phrase/key-words from a document based on a corpus - BelalC/keywordi2x. Generate key word candidates (ii) Computes 'scores' for each candidate. Words are scored according to their frequency and the typical length of a candidate phrase in which they appear.
- Focus Keyword in WordPress SEO By Yoast. First of all, let’s clear things up on what is a focus keyword in Yoast’s plugin. Have you ever noticed that when you start typing anything in Google search box, you immediately get complimentary words and suggestions based on your search?
Note
Applies to: Machine Learning Studio (classic)
This content pertains only to Studio (classic). Similar drag and drop modules have been added to Azure Machine Learningdesigner (preview). Learn more in this article comparing the two versions.
Module overview
This article explains how to use the Extract Key Phrases from Text module in Azure Machine Learning Studio (classic), to pre-process a text column. Given a column of natural language text, the module extracts one or more meaningful phrases. A phrase might be a single word, a compound noun, or a modifier plus a noun.
This module is a wrapper for natural language processing APIs for key-phrase extraction. The phrases are analyzed as potentially meaningful in the context of the sentence for various reasons:
- The phrase captures the topic of the sentence.
- The phrase contains a combination of modifier and noun that indicates sentiment.
For example, assume the sentence analyzed is: 'It was a wonderful hotel to stay at, with unique decor and friendly staff.'
The Extract Key Phrases from Text module might return these key phrases:
- wonderful hotel
- friendly staff
- unique decor
How to configure Extract Key Phrases from Text
To extract key phrases, you must connect a dataset that has a column of text.
Add the Extract Key Phrases from Text module to your experiment in Azure Machine Learning Studio (classic). Then, connect a dataset that has at least one full-text column.
Use the Column Selector to select a column of type string, from which to extract key phrases.
For Language, select a language to use when analyzing phrases. If you specify a language, only phrases in the target language will be output.
If the text column contains phrases in multiple languages, choose the option, Language identified in columns. A new column selector is displayed that lets you select a column in your data set that contains a language identifier. The language identifier can either be the language name or the Iso6391 culture identifier. For example, either 'English' or 'en' are acceptable.
Tip
Before running Extract Key Phrases from Text, use the Detect Languages module to identify the language in each row and generate the identifier for you.An error is raised if the language identifier column contains any languages not supported by Extract Key Phrases from Text.
Results
The output of the module is a dataset containing a column of comma-separated key phrases.
For example, the following example results are for an input dataset containing reviews in multiple languages:
Key Phrases |
---|
novel,nuclear submarine,good book,adventure story,avalanche of events,good characters |
primer misterio,personajes,fan,aventura,isla |
All output phrases are contained in a single column; no other columns are passed through, and an identifier is not added. However, if you want to align the output phrases with the source text, you can recombine the output phrases with the input by using the Add Columns module.
The output of key-phrase extraction does not flag the language of individual phrases.
If a language is included that is not supported by the Extract Key Phrases module, an error is raised (0039). To avoid errors, be sure to filter out input text that has an incompatible language identifier.
If there are very few rows of other languages, you can also avoid the error by omitting the language identifier, and analyzing all text using a single language selection. However, when you do so, results are very poor, because entire sentences in the other languages might be output as a single key phrase.
Examples
The following example demonstrates how to use this module to extract key phrases and then build a word cloud from the phrases: Extract Key Phrases and Show Word Cloud
See the Azure AI Gallery for more examples of text processing using Azure Machine Learning.
Technical notes
This module currently supports the following languages:
- Dutch
- English
- French
- German
- Italian
- Spanish
For additional languages, consider using the Text Analytics API in Azure Cognitive Services. For more information, see How to extract key phrases in Text Analytics
Expected inputs
Name | Type | Description |
---|---|---|
Dataset | Data Table | The table containing the text to be processed. |
Module parameters
Name | Type | Range | Optional | Default | Description |
---|---|---|---|---|---|
Culture-language column | ColumnSelection | language:Column contains language | Name or one-based index of the column containing the culture-language information | ||
Text column | ColumnSelection | Required | Name or one-based index of the text column. | ||
Language | T_Language | English, Spanish, French, Dutch, German, Italian, Column contains language | Required | English | Select the language of the text to be processed. |
Outputs
Name | Type | Description |
---|---|---|
Results dataset | Data Table | The extracted key phrases |
Exceptions
Exception | Description |
---|---|
Error 0003 | Exception occurs if one or more of inputs are null or empty. |
Error 0010 | Exception occurs if input datasets have column names that should match but do not. |
Error 0016 | Exception occurs if input datasets passed to the module should have compatible column types but do not. |
Error 0008 | Exception occurs if parameter is not in range. |
For a list of errors specific to Studio (classic) modules, see Machine Learning Error codes.
For a list of API exceptions, see Machine Learning REST API Error Codes.
See also
Text Analytics
A-Z Module List
Built this package as a toy challenge to do the following: Thunderbird generate a public key.
1 - Compute the most important key-words (a key-word can be between 1-3 words)
2 - Choose the top n words from the previously generated list. Compare these key- words with all the words occurring in all of the transcripts.
3 - Generate a score (rank) for these top n words based on analysed transcripts.
What this package does:
1 - Generates the keywords (from 1-3 words in length) from a document based, based on the RAKE algorithm
2 - Generate vector representations of all key words and words in a test corpus, using Word2Vec.
3 - Ranks key words by comparing key word vectors with paragraph/document vectors from test corpus
4 - Saves ranked keywords to text file (and/or displays on the console)
Installing dependencies
Generate pem file from rsa private key. The code was developed with python 3.5 and requires the following libraries/versions:
gensim2.0.0numpy1.12.1scikit-learn0.18.1wget3.2
These dependencies are specified in requirements.txt, and can be downloaded via the following command:
Usage
Running the keyword_xtract file, will carry out the steps described above (keyword extraction -> compute vector representations -> rank key words)
Models available:
A truncated version of Google's pre-trained Word2Vec model is available as default. GloVe Word2Vec models (https://nlp.stanford.edu/projects/glove/) can also be downloaded by specifying the model required at run time:
glove_6B - Wikipedia 2014 + Gigaword 5 (6B tokens, 400K vocab, uncased, 50d, 100d, 200d, & 300d vectors, 822 MB download)glove_42B - Common Crawl (42B tokens, 1.9M vocab, uncased, 300d vectors, 1.75 GB download): glove.42B.300d.zipglove_840B - Common Crawl (840B tokens, 2.2M vocab, cased, 300d vectors, 2.03 GB download): glove.840B.300d.zipglove_twitter - Twitter (2B tweets, 27B tokens, 1.2M vocab, uncased, 25d, 50d, 100d, & 200d vectors, 1.42 GB download)
Use the labels above as inputs for the '-m/--model' command line arguments. If the selected model is not present, the model will be downloaded; this may take some time. It is also possible to use custom user-defined Word2Vec models by supplying a path to the model.
NOTE - the default evaluation docs provided for ranking keywords are 3 document pages related to food, which were extracted from Wikipedia. Please provide your own relevant evaluation documents for accurate keyword ranking. Otherwise, keywords can simply be extracted and the ranking scores ignored.
RAKE algorithm + implementation
I modified an existing RAKE implementation to work with Python 3 and different parameters. In this implementation, RAKE does the following:
(i) Generate key word candidates(ii) Computes 'scores' for each candidate. Words are scored according to their frequency and the typical length of a candidate phrase in which they appear.
Originally implemented by: https://github.com/aneesha/RAKEForked from: https://github.com/BelalC/RAKE-tutorial/tree/master
A Python implementation of the Rapid Automatic Keyword Extraction (RAKE) algorithm as described in:Rose, S., Engel, D., Cramer, N., & Cowley, W. (2010). Automatic Keyword Extraction from Individual Documents. In M. W. Berry & J. Kogan (Eds.), Text Mining: Theory and Applications: John Wiley & Sons.
The source code is released under the MIT License.
Generate Focus Key Phrase From A Page Pdf
Word2Vec + Ranking
Utilising gensim and pre-trained Word2Vec models, keyword vector representations are computed. Vector representations of evaluation documents are computed by taking the average of the word vectors present in a specified document. The pairwise cosine similarity between each keyword vector and evaluation document vector are computed and averaged, giving a single score which can be utilised as a 'rank' for the keyword.
Generate Focus Key Phrase From A Page Video
Gensim - https://radimrehurek.com/gensim/index.htmlVector represenations of words and phrases - Distributed Representations of Words and Phrases and their Compositionality; Mikolov, Tomas; Sutskever, Ilya; Chen, Kai; Corrado, Greg; Dean, Jeffrey, arXiv:1310.4546