Social Media Sentiment Analysis - Connecting 3 open-source projects
We are a small research team from Germany and are currently working on a research project, for which we need someone who will assist us in implementing, connecting and modifying 3 open-source projects/libraries.
Our project is focused on social sentiment analysis. Most of the work has already been done.
This are the 3 open-source projects/libraries:
1. Kral: A tool which retrieves information from Twitter, Facebook, Reddit, and Youtube by using their respective API’s
2. Synt: A sentiment classification tool (using NLTK) which has been developed to work with Kral
3. OSAE (Open Sentiment Analysis Engine): It allows humans to flag sentiment for random Twitter samples, which are then used to train Synt.
(Information about the respective projects can be found at the bottom)
All these 3 Projects are from the same developer and should therefore be easy to connect.
For a full understanding please have a look at this info graphic attached!
Kral (1.) and Synt (2.) don’t really need to be modified much and are well documented. You will need to do some modification on OSEA, so that the same set of information (sample tweets) can be classified by several people and that only the tweets which were for example classified as positive 8/10 people are used to train Synt (2.) + the tweet will need to be shown in a random pattern.
We are currently running a Linux VPS and will need you to install all the required repositories, which are listed in the respective open source projects (should be one command in ssh each).
The finished product should allow us to do the following:
1. Define multiple keywords which are going to be analyzed (by Synt) at the same time
2. Have people classify sample tweets by using OSEA
3. Train Synt with the info from 2
4. Have Kral search through Twitter, Facebook, Reddit, and Youtube based on the keywords in 1
5. Analyzing Sentiment with Synt and store this info into a database.
• We want to be able to export this db to excel and distinguish between the different keywords (possibly one db per keyword) – I guess mySQL
• It’s important that we can graph the sentiment vs. time in excel
This should not be very complicated and it shouldn’t take a lot of time.
Most of the work has already been done - have a look at the code and you can judge for yourself (links can be found below). Code is well documented!!!
Also: To make sure that you actually read the whole thing and don’t give us one of those terrible automatic bids, please write “blue elephant” in your bid.
What’s in it for you?
1. You will get paid ☺
2. You will work on an interesting project
3. This is only the test project so if you do a good job and the project works as desired you will be able to work with us on further developing this project.
What qualities/skills are we looking for?
1. Good knowledge of Python
2. Server Side Scripting knowledge
3. Basic HTML
5. Preferably worked with API’s before
6. Knowledge of NLTK (Natural Language Tool Kit) would be great but is not a requirement
7. Know your way around Linux servers (cpanel / WHM)
8. fun to work with
9. reliable and meets deadlines
10. Will provide future support
Detailed project description can be discussed personally / via skype.
kral (pronounced: "crawl") is a python library intended to be a flexible solution for retreiving live streaming data from a variety of social network apis on given keyword(s), and yeilding the retreived data in a unified format.
• Ability to harvest user information, and posts from Twitter, Facebook, Reddit, and Youtube (more to be added)
• Ability to expand all short-urls into full real URLs.
• Ability to track number of mentions of a given URL across multiple networks.
• Modular design. Easily add or disable plugins for different social networks.
Synt (pronounced: "cent") is a python library for sentiment classification on social text.
The end-goal is to have a simple library that "just works". It should have an easy barrier to entry and be thoroughly documented.
• Can collect negative/positive tweets from twitter and store it to a local database (can also fetch a pre-existing samples database)
• Can train a classifier based on a samples database
• Can classifiy text and output a score between -1 and 1. (where -1 is negative, +1 is positive and anything close to 0 can be considered neutral)
• abilitiy to collect, train, guess, and test (accuracy) from cli
Open Sentiment Analysis Engine
Open Sentiment Analysis Engine - Allows humans to flag sentiment for random Twitter samples. The data is saved for use with machine learning sentiment classification projects like Synt
Skills: twitter, facebook, youtube, linux, json