Sifan Guo


Data Science/Analysis - Cryptography - Blockchain

Portfolio


About


Sifan has been a student for almost 18 years and doesn't have work experience. When people asked him why not find a job, he often says he was not fully prepared. However he is in fact not bad. It was his misunderstanding about job. He realizes he could still learn a lot in the working.

Sifan got a Bachelor's degree in Wuhan University and he is doing his Master's degree in UT Austin at iSchool. He loves using python to talk with data, he can apply various kinds of statistical learning methods to build models as well as machine learning algorithms. Besides, he enjoys exploring cyber security and blockchain technologies.

It's not an easy job to summarize Sifan's 23 years. However, he would be super happy as long as you know that he's a good student and he can also be a good employee. So, feel free to contact him if you happen to need a data scientist or cyber security engineer, he will stay humble to learn more as always.

Contact Me



Google Analytics Customer Revenue Prediction


The motivation of the Google Merchandise Store (GStore) sales prediction project is to provide insights on the small fraction of customers who produce most of the revenue for the online store.
Our team aimed to predict the revenue generated by each customer in the first quarter of 2019. Our training dataset includes hundreds of features which were obtained after parsing the raw dataset.
Dimensionality reduction was applied to reduce the level of categorical features and the best number of components was determined. We implemented and compared light gradient boosting model (LGBM) and long short term memory (LSTM), and achieved a best validation root mean squared error of 0.389.

Close Project

Transcripts Searchable Webpage Based on Flask


This westwing transcript website provides an easy way for westwing fans to search for the transcript. Users could select the desired filter options on the left of this website and then search for the transcript.
main steps:
1. Scraper: Save webpages from http://www.westwingtranscripts.com/
2. Parser: Output a list of scene objects from webpages.
3. Save to Database: Save the scene object to the transcript database.
4. Create User Interface: Design the Interaction on Flask.

Close Project

Prediction System Based on MIMIC-III Clinical Database


It was my first time to deal with 25+Gb dataset in csv files. I was responsible for the database retrieval and data cleaning as well as connect the csv files by certain IDs. As for the prediciton part, we relied on SPSS and used multi-linear regression because of its interpretation.
The prediction result was not bad, and the test RME is acceptable. After that, I designed an interaction using C++ in Qt 5.7 which was very user-friendly.

Close Project

Random Number Generator


In this project, I created a random number generator based on the running status. But it's worth noting that I didn't use the python package for md5 hash function. Instead, I wrote my own md5 hash function.

Close Project

VizWiz Label "answerable" Prediction


This is a binary classification prediction problem. Initially, I tried to extract image features on my own by downloading all the images and convert them into matrix format. But I do lack computational power, so I turn to Microsoft Computer Vision API which helps to digest the image data and return labels. And I use these labels as my features to predict if the question inside a image is "answerable". I get a 0.73 accuracy score.

Close Project

muttlovedogs dog database


Click me to see the muttlovedogs

This project I'm responsible for the MySQL and PHP connection part. I didn't work on the UI but it looks pretty good.

Close Project

Lending Club Loan Status Prediction


Ongoing project

Close Project