Technology Learning Snippets

1) machine learning technology is hear to stay.

2) Many times some of us miss on key metrics and statistics to help on decision making.

3) computer technologies are growing a lot… sometimes technology passes us when we are busy with our daily lives. Recommendations for night reads and material to listen while driving

Reverse XML – my lessons learned for millennial sake

In April of 2001, I was so much into XML and had some innovative ideas of my own. I was working as a software engineer at Knowledgeview. The company was heavily focused on developing content syndication software for news agencies and newspapers. At that time the language Perl was still dominant for parsing text and Java was the popular language for websites. Knowledgeview was adamant about using standard specifications including NewsML, NITF, RSS, and more. XML, initially defined in 1998 wiki was starting to become a topic discussion within the office during the time that I was working with them between 1999 and 2001. As a young programmer, working mainly from the Lebanon office, I had lesser exposure to the technologies that the London office had at the time. But that did not stop me from trying to innovate. On April 23, 2001, I send an email to the xml mail distro at Knowledgeview that said:

Date: Mon, 23 Apr 2001 11:39:03 +0100

Attached is a 4-page white paper about a concept that crossed my mind like week. The whole idea initially started after a brief conversation with Dr Ali about XML in which he mentioned that not all companies may integrate XML in their applications. Such a remark, made me think of solutions that would keep one form of framework for such companies to exchange their data (that are based on customized and different structuring) without resorting to applications’ modifications (expensive) but where standards (like any form of XML) still apply.

What I basically said in the document is that if the industry is heading towards XML as a standard form of communication across systems or applications, and if some companies many not be quick to jump onto XML, why not generate an orchestration mechanism to allow a company A and a company B share data by bridging each of their own custom formats with a help of a middle-player – a two translator. The steps would be as follows

  1. let each company declare its set of set delimiters D for its content and publish the format on a common repository
  2. define a set of XSL rules that convert each set of delimiters D from (1) into a universal XML X format.
  3. anytime a company Y would like to leverage date from company X, company Y would query the common repository for company X data specification and execute the set of rules in 2 to convert the text from Company X into the format needed for Company Y.

I named the technique Reverse XML

I did not hear from anyone in the company about my idea. I was 27 at the time and was still young in the industry. I did not push myself nor I knew any better way to articulate my idea other than just emailing. Few month later I left the company not because of this but because I decided to moved the United States and a build a new future with my wife.

Why am I saying all this? I thought I had a great idea at the same time. Given that I had limited resources I really did not know that there might have been a similar product out there. Maybe if I know at that time what I know now I could have been more aggressive in marketing my idea. I would also ask the question why would I have sent a subsequent email and say that some competitor product exist when I really did not try that product. Whatever my idea was truly unique. Moreover, even if I did not hear from my management I should have tried another way. Nevertheless, I was proud of my idea and the name that I gave it – Reverse XML. I tried to be imaginative and was thinking big. In my conclusion I wrote:

The application may be established as a free service where the following could be our revenue:

  1. Our database would include all companies’ source formats, where our content-representation language that should handle all of these formats may create a data bridge between all those companies whose applications are not XML-friendly yet.
  1. Having said point “1”, our database would be also valuable because we will be able to market-focus our products that maybe of interest to these companies.

Note: “Reverse XML” may be free to use and our company may provide as a paid service the option to write the client’s “content key” files.

I was thinking of open sourcing the solution but provide a paid service for assisting companies.

Who knows maybe it would have been a great business opportunity or a great success story. Maybe this idea might have turned big just like JSON format nowadays. What if I patented the idea or made something more of it? There is no shortage of one’s tendency to dream and think of great accomplishments. Why not? Unfortunately I did not push for it and, at the same time, I could not convey its value in a better presentable fashion.

I later received a call from management but eventually the idea was not really understood nor was accepted. I still believe that, at the time, this idea could have had a great potential. But that does not matter. What matters is not to give up on your ideas. Push for them. That said, if you are in the late twenties early thirties or any age actually, if you have a great idea, push for it with your heart and soul. It might win big, and if it does not, learn from your mistakes to do something even better the next time. Don’t just wait for someone to call you… be proactive and make the call – not once but more.

note: I have no intention to say anything negative about my past employer. This post is only meant to illustrate the point that if you feel strong about your idea, push harder and don’t just accept the status quo.

You can read my original document here here.

Tarek and Computers

I believe that everyone should have a chance to build and grow in the computer world. Youth or old, computers are beautiful and can always open windows for future opportunities, innovation, and personal accomplishments. Some of us might say “I don’t know where to start” or “no one asked me to do something” blah blah blah… We all can find millions of excuses not to start or restart in the magical computer world.

I tried to convey to my fellow Thomson Reuters colleagues in Hyderabad India what my passion with computers has been and what are some of the areas in the computer industry that we need to stop fearing and start experimenting with. Everyone has a chance to do something innovative. We just need to start sometime, and that is NOW!

If you would like to discuss the topic with me, please contact me at t@tarek.computer or tarek.hoteit@tr.com

You can download my presentation here.

Thanks. Now let’s Code!

Python Getting Started

Getting started with Python

Best place to start with Python is on Python main site itself https://www.python.org/about/gettingstarted

Sometimes playing around with Python libraries for different projects can mess things up, so it is best to create virtual environment for separate Python projects. Quick and easy tool to use https://virtualenv.pypa.io/en/latest

Perfect tool for coding in Python – Jetbrains Pycharm http://www.jetbrains.com/pycharm

Restful API

What is Restful API? Check http://www.restapitutorial.com/lessons/whatisrest.html

Learning how to leverage RestAPI – Soap UI https://www.soapui.org/rest-testing/getting-started.html

Leveraging Restful API With Python:

Build Websites using Python?

Django is the best place to be. Following the Getting Started and you should be good to go.

Don’t forgot to leverage some coder repository and do proper testing!

Python for Data Science?

Invest time learning Jupyter which you can use with other programming languages. Nice free course to take https://www.edx.org/course/introduction-python-data-science-microsoft-dat208x-6 Or check https://elitedatascience.com/learn-python-for-data-science

Amazon Alexa Getting Started

Alexa development

####Getting started with Alexa service https://developer.amazon.com/public/solutions/alexa/alexa-voice-service/getting-started-with-the-alexa-voice-service

Code to install the sample Alexa app https://github.com/alexa/alexa-avs-sample-app/wiki/Mac

Code to train the Alexa with a new skill https://github.com/alexa/alexa-skills-kit-sdk-for-nodejs

Skills development

⁃ Read https://developer.amazon.com/public/solutions/alexa/alexa-skills-kit/docs/understanding-the-different-types-of-skills ⁃ Develop a custom skill using aws lambda or a webservice with https ⁃ Need a full device for full testing but you can use Service Simulator for testing

####Custom Skill

Consist of : ⁃ Set of intents represent actions that users can do with your skill ⁃ Set of sample utterances – map these utterances to the intents and create the interaction model ⁃ Invocation name that identifies the skill and initiates the conversation ⁃ Cloud-based service that accepts the intents and is accessible via the internet. Endpoint need to provided for skill ⁃ Configuration that puts all the info above for Alexa to route the requests ⁃ Example: User: Alexa, get high tide for Seattle from Tide Pooler “Get high tide” form the sample utterance, innovation name is “Tide Pooler” Sample utterances include: OneshotTideIntent get high tide OneshotTideIntent get high tide for {City} OneshotTideIntent tide information for {City} OneshotTideIntent when is high tide in {City} … (many more sample utterances)

To Deploying the skills:

  • Create a Lambda Function for a Skill
  • Deploying a Sample Custom Skill to AWS Lambda
  • Hosting a Custom Skill as a Web Service
  • Deploying a Sample Custom Skill as a Web Service

####Steps to Build a Custom Skill https://developer.amazon.com/public/solutions/alexa/alexa-skills-kit/overviews/steps-to-build-a-custom-skill

Step 1: design the voice user interface Step 2: set up the skill Step 3: write and test the code Step 4: submit the skill

Defining the Voice Interface https://developer.amazon.com/public/solutions/alexa/alexa-skills-kit/docs/defining-the-voice-interface

two main inputs:

  • Intent schema: JSON structure for the set of intents
  • Spoken input data includes sample utterances and custom values needed for custom slots)

Custom intents developments: https://developer.amazon.com/public/solutions/alexa/alexa-skills-kit/docs/alexa-skills-kit-interaction-model-reference

Integrating with AWS Lambda https://developer.amazon.com/public/solutions/alexa/alexa-skills-kit/docs/developing-an-alexa-skill-as-a-lambda-function

Developing using NodeJS https://github.com/alexa/alexa-skills-kit-sdk-for-nodejs

Retro: Atari & Commodore days 1981-1990

This is how my life with computers first started.

It all began in 1981 when I first saw my cousin in Beirut, Lebanon playing PacMan on the Atari 2600. The colors, the sounds, and the animation fascinated me.
I was six years then and that’s when my passion first started to take shape. We travel to Nicosia, Cyprus that year to avoid the start of the 1982 Lebanese civil war and Israeli invasion. My dad buys me the Atari 2600 which came with the game Combat. What a classic game in which I picked on and mastered level 10! Level 10 basically allows two tanks in a grid to shoot and allow bullets to bounce off walls. Hiding behind one “trench” and locking in the enemy was an art that made win my battles every time. Ironically, the war game timed perfectly with what was happening in Lebanon and, as a young boy, I felt emotionally stressed with what the news, newspapers and magazines that my late journalist-dad brought home every day. Maybe the game made feel like a soldier wanting to defend my country against the enemies.

Pacman and Combat later followed by Defender. Oh Defender! What a marvelous game that span multiple screens left or right.

By then my parents have noticed my passion for games. It only took one accidental visit to a computer club in summer of 84 that took place at the Cleopatra Hotel in Nicosia Cyprus. The computer in the room rubber stamped my passion for computers ever since. A white device with keys connected to a color-screen tv with a cyan screen display and a flashing icon. The all-mighty Commodore Vic20

The most beautiful moment is when my dad bought me one. I vaguely remember if the story was that the club was closing and they were selling the computers. Can’t remember but at least I got one. A beautiful white computer that became my best partner for a long time. My parents hired a tutor for my sister and I. Our tutor, who’s first name was Chris and I can’t remember his last name, also took me and my dad to a computer store where we bought all sorts of education games… Chemistry, Physics, Math, etc. I recall that my dad paid a lot for those but, for some reason, I was not interested in them. Not that I didn’t dislike anything that is related to computers but education games at a 9 years of age simply has no meaning. I don’t know what my father was thinking (maybe a future strategy ?) but I never opened them .. barely once and that’s it. I had no games for my Vic20 nor was able to easily get hold of BASIC programs from magazines that I can TYPE and RUN. Chris has taught me the BASIC commands PRINT, INPUT, VAR, GOTO, and then I learned myself POKE and PEEK. But that was about it. I recall Chris showing me a Commodore 128 and how when he typed “GO 64” it took him to the Commodore 64 screen.

The Commodore 64…let’s talk about it…. By 1984 I have gone to know about other home computers, Sinclair ZX81, ZX Spectrum, BBC Micro, and various other 8-bit computers. But the Commodore 64 was more appealing to me from the computer magazines that i kept buying especially from reading the Computer!’s Gazzette magazine, and Commodore User.

I just could not get a Commodore 64 while, at the same time, ran out of anything to do with the Commodore Vic 20 and the Atari 2600. It’s not that I perfected the machines – no.. far less from that… I was having mixed feelings between the limitations that I quickly came to realize with the Vic20 compared to the newer computer with better graphics, the better video, and the better sounds. Moreover, there were couple of incidents where I felt even more helpless when my best friend and neighbor at the time in 1984 showed me the text adventure game Hobbit running on his Spectrum Sinclair.

After returning to Lebanon in 1985, I tried convincing my parents to get me a Commodore 64. It never worked. My frustration grew so much that I became obsessed with the Commodore 64 that I built dreams around ultimately having one. The cost of the computer was $240 – a number that I can never forget. I think my parents could not differentiate why a Commodore 64 is any different than a Vic-20 other than playing games. They didn’t get it. I did. It is not about games nor about how the computer looks. It is everything about the computer! The Computer! As an 11 year boy the Commodore 64 was the only thing I ever wanted at that time. So what I did is that I started buying computer magazines from the pocket money my parents gave and also began buying games for the Commodore 64 even though I didn’t have the computer. The graphics on the tape covers and the vibrant colors in the magazines became my salvation rather than the computer itself. It then took another 2 years when I finally got the Commodore 64 with the help of my uncle who offered to give me money to buy one.

After getting the Commodore 64, I found my passion in adventure games: Zorkand Hitchhiker Guide to the Galaxy But the pleasure of finally getting the Commodore lived short and was replaced with a new found passion of using computers with either an a CP/M operating syst
em, an MS DOS 2-11 operating system, a Microsoft Windows 3.0, or a a Mac around the period between 1990 and 1992. That would be the topic of the next story.

Recalling the PhD experience

Woke up at 5am this Saturday morning for some research and work before the family wakes up. While doing some digital clean up on with my old Evernote notes, I came across my phd work at Walden University which I completed in 2015. At the time I created a Google Sites workplace phd.hoteit.net that I used to put together all my work, shared progress with my supervisors, store drafts and todo lists, and put some notes about the code that I working on. When I open up the notes, everything about the phd experience flashed back – the stress of writing with the thoughts of rejection by advisors, the obstacles faced when you thought that the data you need for your dissertation is within reach but when in reality it is not so you have to work harder to get to your data, or the thought that your original intuition about the research prior to writing the dissertation did not align with the rational outcome from the research. But then you come to learn more about your self, your thoughts, your discoveries, and even your own beliefs regardless of what what the research outcome ended up to be. Developing and completing a dissertation is a personal feat that makes you a better person as an intellectual human being with little more wisdom than what you had before the journey. I strongly recommend the journey – not just for the sake of contributing to the scholarly world but for the sake of personal human development when we are now racing not only with ourselves as a human race but also racing against our own artificial intelligence creations.

My dissertation: Effects of Investor Sentiment Using Social Media on Corporate Financial Distress

GitHub: hoteit/finSentiment

Financial Sentiment Analysis

As part of my Phd dissertation at Walden University, I developed an application that would analyze the sentiments of tweets that include the stock symbols of the publicly held firms in the United States and correlate the results with the financial data of such firms during the period of the research. At the time of the coding, between 4th quarter of 2014 and the first quarter of 2015, I could not find ready made tools that I could use for conducting the data analysis. Some companies offered solutions at very high costs while other tools had limited capabilities. So I went ahead and stitched various tools and coding techniques for my own research. The key steps and scripting tools used were as follows:

  • Used Twitter APIs and Tweepy libraries in my Python code that would extract the relevant tweets on a streaming-basis from Twitter
  • Leveraged Yahoo Developer Network using my Python code to extract the financial data of each of the publicly held firms in the United States.
  • Extracted the stock symbols of all publicly held companies in the United States by extracting the data from nasdaq.com using IPython Pandas libraries
  • Developed a portal using Python Django that would help me train the machine learning system to recognize negative and positive sentiments.
  • Used Stanford Core NLP java modules with the help of the trained data to analyze the sentiments of all the tweets
  • Used IPython Pandas in a Notebook format to conduct the data analysis.

The source code is available on finSentiment on Github. In subsequent posts I will explain what the scripts do, which is relevant and which is not, and how the scripts can be used in other projects.

Stanford Core NLP Sentiment Analysis

There is no shortage of natural language processing (NLP) algorithms on the Internet. Open source software is available for accessing NLP libraries are also available in nearly every major programming language. Few of the major key NLP frameworks are:

Stanford Core NLP

Stanford Core NLP includes a set of libraries for natural language analysis. This includes part of speech tagger (POS), named entity recognizer (NER), a statistical parser, a coreference detection system, sentiment analyzer, pattern-based extractor system, and other tools. Stanford Core NLP is licensed under GNU General Public License. Stanford NLP Group offers a number of different software that you can check at Stanford Core NLP Software

I personally used Stanford Core NLP sentiment analysis in my dissertation. The original code runs in Java and requires a training dataset. You could use Stanford Sentiment Treebank or you could allow the framework to run on a treebank that you can develop as the training dataset model that you require for a specific research domain. To illustrate Stanford sentiment analyzer, checkStanford Core NLP Live Demo. Source code instructions instructions on how to retrain the model using your own database is available on Stanford Core NLP Sentiment Analysis code. To run the Stanford already trained model on movie reviews,

java -cp "*" -mx5g edu.stanford.nlp.sentiment.SentimentPipeline -file foo.txt  //foo.txt is some text file)
 java -cp "*" -mx5g edu.stanford.nlp.sentiment.SentimentPipeline -stdin  //as command line input)

Training Stanford CoreNLP Sentiment Analysis model requires a Penn Tree Bank (PTB) dataset

java -mx8g edu.stanford.nlp.sentiment.SentimentTraining -numHid 25 -trainPath train.txt -devPath dev.txt -train -model model.ser.gz

`‘dev.text’, ‘train.txt’, and preferable ‘dev.txt’ would be your standard subset of data from your dataset for better supervised training techniques. However, such text should be in PTB format, such as

(4 (4 (2 A) (4 (3 (3 warm) (2 ,)) (3 funny))) (3 (2 ,) (3 (4 (4 engaging) (2 film)) (2 .))))

where the numbers represent the annotations for each word in the document. Stanford Core NLP Java class PTBTokenizer can help you with tokenizing the text.

Training the model with your dataset takes good amount of time. I highly recommended not to run the model training on a cloud vm instance. Instead, run it on a local machine. Training the model would result in model.ser.gz that will used to perform sentiment analysis on new untrained text.

java -mx8g edu.stanford.nlp.sentiment.SentimentTraining -numHid 25 -trainPath train.txt -devPath dev.txt -train -model model.ser.gz

In my application, I ran the model training on a local machine and then used Python to execute in Java edu.stanford.nlp.sentiment.SentimentPipeline via a command pipeline for each input that i had previously captured using a different Python algorithm and stored into my MySql/Django database. I will not go in details about the results of my implementation. I will leave it to another blog post.

Stanford CoreNLP is really cool. You will really gain a lot of insights on natural language processing by leveraging such model.