Wednesday 25 September 2013

Using Stanford named entity recognizer

Stanford named entity recognizer allows you to extract entities from text, here entities implies 'person','place' and 'thing'.

Start by downloading the NER from here , stanford NER is developed in java. But nevertheless we can conveniently use it in other programming languages, here we will be using it in python.
The reason python is predominant in NLP is because of its rich support in NLTK and scikit.

Extract the zip file downloaded from Stanford , from terminal move to the directory


$ java -mx1000m -cp stanford-ner.jar edu.stanford.nlp.ie.NERServer -loadClassifier classifiers/english.muc.7class.distsim.crf.ser.gz -port 8082 -outputformat inlineXML

will start the server, now you can use its service at the specified port.

Now in python

ner.SocketNER(host='localhost', port=8082).get_entities("text string")

will extract named entity from the text


Equivalently you can use its service from any programming language!