TextBlob的使用方法介绍

新知榜官方账号

2023-12-04 08:21:27

导读

本文介绍了TextBlob的使用方法,这是一个用Python编写的开源的文本处理库。它可以用来执行很多自然语言处理的任务,比如,词性标注,名词性成分提取,情感分析,文本翻译,等等。

简介

TextBlob是一个用Python编写的开源的文本处理库。它可以用来执行很多自然语言处理的任务,比如,词性标注,名词性成分提取,情感分析,文本翻译,等等。

Github地址:https://github.com/sloria/TextBlob

官方文档:https://textblob.readthedocs.io/en/dev/

实战

  1. 安装

    安装:pip install textblob

    配置国内源安装:pip install textblob -i https://pypi.tuna.tsinghua.edu.cn/simple

    参考:https://textblob.readthedocs.io/en/dev/quickstart.html

  2. 词性标注

    blob.tags[('I','PRP'),('love','VBP'),('natural','JJ'),('language','NN'),('processing','NN'),('I','PRP'),('am','VBP'),('not','RB'),('like','IN'),('fish','NN')]
  3. 短语抽取

    np=blob.noun_phrasesforwinnp:print(w)naturallanguageprocessing
  4. 情感分析

    forsentenceinblob.sentences:print(sentence+'------>'+str(sentence.sentiment.polarity))Ilovenaturallanguageprocessing!------>0.3125iamnotlikeyou!------>0.05.
  5. Tokenization(把文本切割成句子或者单词)

    token=blob.wordsforwintoken:print(w)IlovenaturallanguageprocessingIamnotlikefishsentence=blob.sentencesforsinsentence:print(s)Ilovenaturallanguageprocessing!Iamnotlikefish!
  6. 词语变形(Words Inflection)

    token=blob.wordsforwintoken:#变复数print(w.pluralize())#变单数print(w.singularize())weIlovelovenaturalsnaturallanguageslanguageprocessingsprocessingweIamsamnotsnotlikeslikefishfish
  7. 词干化(Words Lemmatization)

    fromtextblobimportWordw=Word('went')print(w.lemmatize('v'))w=Word('octopi')print(w.lemmatize())gooctopus
  8. 集成WordNet

    fromtextblob.wordnetimportVERBword=Word('octopus')syn_word=word.synsetsforsyninsyn_word:print(syn)Synset('octopus.n.01')Synset('octopus.n.02')#指定返回的同义词集为动词syn_word1=Word("hack").get_synsets(pos=VERB)forsyninsyn_word1:print(syn)Synset('chop.v.05')Synset('hack.v.02')Synset('hack.v.03')Synset('hack.v.04')Synset('hack.v.05')Synset('hack.v.06')Synset('hack.v.07')Synset('hack.v.08')#查看synset(同义词集)的具体定义Word("beautiful").definitions['delightingthesensesorexcitingintellectualoremotionaladmiration','(ofweather)highlyenjoyable']
  9. 拼写纠正(Spelling Correction)

    sen='Ilvoenaturllanguageprocessing!'sen=TextBlob(sen)print(sen.correct())Ilovenaturelanguageprocessing!#Word.spellcheck()返回拼写建议以及置信度w1=Word('good')w2=Word('god')w3=Word('gd')print(w1.spellcheck())print(w2.spellcheck())print(w3.spellcheck())[('good',1.0)][('god',1.0)][('go',0.586139896373057),('god',0.23510362694300518),('d',0.11658031088082901),('g',0.03626943005181347),('ed',0.009067357512953367),('rd',0.006476683937823834),('nd',0.0038860103626943004),('gr',0.0025906735751295338),('sd',0.0006476683937823834),('md',0.0006476683937823834),('id',0.0006476683937823834),('gdp',0.0006476683937823834),('ga',0.0006476683937823834),('ad',0.0006476683937823834)]
  10. 句法分析(Parsing)

    text=TextBlob('Ilvoenaturllanguageprocessing!')print(text.parse())I/PRP/B-NP/Olvoe/NN/I-NP/Onaturl/NN/I-NP/Olanguage/NN/I-NP/Oprocessing/NN/I-NP/O!/./O/O
  11. N-Grams

    text=TextBlob('Ilvoenaturllanguageprocessing!')print(text.ngrams(n=2))[WordList(['I','lvoe']),WordList(['lvoe','naturl']),WordList(['naturl','language']),WordList(['language','processing'])]
  12. TextBlob实战之朴素贝叶斯文本分类

    参考:https://textblob.readthedocs.io/en/dev/classifiers.html#classifiers

代码已上传:

  1. https://github.com/yuquanle/StudyForNLP/blob/master/NLPtools/TextBlobDemo.ipynb
  2. https://github.com/yuquanle/StudyForNLP/blob/master/NLPtools/TextBlob2TextClassifier.ipynb

欢迎关注同名微信公众号:AI小白入门。跟着博主的脚步,每天进步一点点哟!

本页网址:https://www.xinzhibang.net/article_detail-21965.html

寻求报道,请 点击这里 微信扫码咨询

相关工具

相关文章