Hi, this is Chris! from Dr. Boncella’s class! Very nice article, your nifty trick made a great movement towards your main goal! Nice usage of logistic regression, thank you for sharing this!
Hello, my name is Diana Martinez-Ponce. I am another student from Dr. Boncella's Data Mining class. When you talked about the confusion matrix, it was a great way to explain it. I was always a bit confused with them but I get it better now. I have heard about QGIS and so I liked reading on how you used it. I was wondering if you had good resources in order to learn QGIS? Thanks ,
Hi! My name is Nilam Dangi and I am a student in Dr. Boncella's Data Mining and Modelling class at Washburn University. I thoroughly enjoyed reading your article and found this article to be really informative. After reading this article, it made more sense about Logistic Regression and Naive Bayes analysis that we are doing in the class. Thank you for the information!
Hello, I am Sangya Yogi, one of the students from Dr. Boncella's Data Mining and Modeling class at Washburn University. I really enjoyed your article. It was thorough and informative. The use QGIS in the post was quite interesting to me as it definitely makes the visualization more effective. I was also curious what other violations do you recommend to use into the model to improve the model’s accuracy? Thank you!
Hi Sangya - if you look at the Excel file linked on this page, you can get more info about the types of Building Violations that NYC tracks: https://data.cityofnewyork.us/Housing-Development/DOB-Violations/3h2n-5cm9 C-Construction, UB-Unsafe Buildings, and Local Law 11/Facade violations could all be good options!
Hi ! My name is Simon and I am a student in Dr. Boncella's Data Mining and Modeling class at Washburn University. I enjoyed reading your post and loved your simple, yet detailed, information of the process you followed. The initial steps taken to make the dataset ready for analysis using pandas reminded me of the Data Discovery and Management class I took last semester. Your work made me think about a possible research topic- the severity of the emergencies, where factors like the average age of buildings in the boroughs could be used to make the prediction. Also, what do think about the use of Power BI in visualizing your findings in a bubble map? I found your post really interesting and informative. Thank you !
Hi Simon, so glad you liked the piece. That sounds like an awesome research idea and next step. I've never actually used Power BI before, but it sounds like it could be a cool way to pull in and visualize data from a bunch of different sources: maybe NYC municipal data, 311 complaints, tax records, etc. If you end up exploring the severity of emergency/building age idea, I'd love to hear about it!
Hello, my name is Kristen H. I am enrolled in Dr. Boncella's Data Mining Course at Washburn University. I enjoyed your post! The layout of your investigation was very thorough, and sequential, which lends to reproducibility of results. The post was also humanizing in you focused and re-valuating the logistic and naive bayes models. Testing models and variable combinations is one of the.... time consuming joys of modeling. I was curious if you considered topographic data such as elevation, flood zone, or other NOAA data for weather in the analysis? That may require significant effort, but it may add a dimension to the analysis that other data does not.
Hi Kristen, thanks for reaching out! It's funny - I considered seeing if I could pull in topographic data in some way because my *hunch* is that there were a lot of imminent emergencies reported in seaside/riverside neighborhoods during and immediately after Hurricane Sandy in 2012. Factoring in NOAA or particularly flood zone data could be a great way of testing that hypothesis!
Hello, I'm Peyton Wilson and I'm enrolled in Dr. Boncella's Data Mining and Modeling class at Washburn University. I really enjoyed reading through your process in this article. I am curious about how the addition of more data (census data, income data, etc.) would help or hurt the model? Do you believe there is a point where the model could have too much data? If so, I would like to hear your opinion on what you believe to be the optimal amount of data for this model? (if you were to build on it)
Hi Peyton - ooh that's a good question and a tough one! I think at some point, you could add data that is not directly related to the core of the model - i.e. when and where buildings in NYC start to break down. For instance, you could add in data, like income or commercial status, which may be correlated with lots of emergencies but not causative. But I'm not sure! I think you'd really have to start digging in the data and see what you find :)
I'm Brandon. M, a student in Dr. Boncella's Data Mining and Modeling class at Washburn University. I found this article incredibly insightful to how to use pandas to create machine learning algorithms! I want to learn how to use python to create useful insights from data, would you mind posting to me a copy of your code you wrote to my email brandon.michael@washburn.edu ? I would love to check it out and see if there are any insights to gain that I could use for my future in data science. Thank you!
Hi Brandon, yeah of course! Give me a day or two to comment out the important parts of the code, tidy things up, and generally make it look presentable lol.
Hi, this is Chris! from Dr. Boncella’s class! Very nice article, your nifty trick made a great movement towards your main goal! Nice usage of logistic regression, thank you for sharing this!
Hi Chris, glad you liked it!
Hello, my name is Diana Martinez-Ponce. I am another student from Dr. Boncella's Data Mining class. When you talked about the confusion matrix, it was a great way to explain it. I was always a bit confused with them but I get it better now. I have heard about QGIS and so I liked reading on how you used it. I was wondering if you had good resources in order to learn QGIS? Thanks ,
Diana
Hi Diana, thanks for reaching out! I've used these tutorials and lessons for QGIS: https://www.qgistutorials.com/en/index.html There's also a great YouTube channel I like for some more advanced features: https://www.youtube.com/channel/UCABPfMswe_-Ywrj5pHiRUoA And if you can't figure something out, asking on GIS Stack Exchange is always a safe bet: https://gis.stackexchange.com/
Hi! My name is Nilam Dangi and I am a student in Dr. Boncella's Data Mining and Modelling class at Washburn University. I thoroughly enjoyed reading your article and found this article to be really informative. After reading this article, it made more sense about Logistic Regression and Naive Bayes analysis that we are doing in the class. Thank you for the information!
Love to hear it, thanks Nilam!
Hello, I am Sangya Yogi, one of the students from Dr. Boncella's Data Mining and Modeling class at Washburn University. I really enjoyed your article. It was thorough and informative. The use QGIS in the post was quite interesting to me as it definitely makes the visualization more effective. I was also curious what other violations do you recommend to use into the model to improve the model’s accuracy? Thank you!
Hi Sangya - if you look at the Excel file linked on this page, you can get more info about the types of Building Violations that NYC tracks: https://data.cityofnewyork.us/Housing-Development/DOB-Violations/3h2n-5cm9 C-Construction, UB-Unsafe Buildings, and Local Law 11/Facade violations could all be good options!
Hi ! My name is Simon and I am a student in Dr. Boncella's Data Mining and Modeling class at Washburn University. I enjoyed reading your post and loved your simple, yet detailed, information of the process you followed. The initial steps taken to make the dataset ready for analysis using pandas reminded me of the Data Discovery and Management class I took last semester. Your work made me think about a possible research topic- the severity of the emergencies, where factors like the average age of buildings in the boroughs could be used to make the prediction. Also, what do think about the use of Power BI in visualizing your findings in a bubble map? I found your post really interesting and informative. Thank you !
Hi Simon, so glad you liked the piece. That sounds like an awesome research idea and next step. I've never actually used Power BI before, but it sounds like it could be a cool way to pull in and visualize data from a bunch of different sources: maybe NYC municipal data, 311 complaints, tax records, etc. If you end up exploring the severity of emergency/building age idea, I'd love to hear about it!
Hello, my name is Kristen H. I am enrolled in Dr. Boncella's Data Mining Course at Washburn University. I enjoyed your post! The layout of your investigation was very thorough, and sequential, which lends to reproducibility of results. The post was also humanizing in you focused and re-valuating the logistic and naive bayes models. Testing models and variable combinations is one of the.... time consuming joys of modeling. I was curious if you considered topographic data such as elevation, flood zone, or other NOAA data for weather in the analysis? That may require significant effort, but it may add a dimension to the analysis that other data does not.
Hi Kristen, thanks for reaching out! It's funny - I considered seeing if I could pull in topographic data in some way because my *hunch* is that there were a lot of imminent emergencies reported in seaside/riverside neighborhoods during and immediately after Hurricane Sandy in 2012. Factoring in NOAA or particularly flood zone data could be a great way of testing that hypothesis!
Hello, I'm Peyton Wilson and I'm enrolled in Dr. Boncella's Data Mining and Modeling class at Washburn University. I really enjoyed reading through your process in this article. I am curious about how the addition of more data (census data, income data, etc.) would help or hurt the model? Do you believe there is a point where the model could have too much data? If so, I would like to hear your opinion on what you believe to be the optimal amount of data for this model? (if you were to build on it)
Hi Peyton - ooh that's a good question and a tough one! I think at some point, you could add data that is not directly related to the core of the model - i.e. when and where buildings in NYC start to break down. For instance, you could add in data, like income or commercial status, which may be correlated with lots of emergencies but not causative. But I'm not sure! I think you'd really have to start digging in the data and see what you find :)
I'm Brandon. M, a student in Dr. Boncella's Data Mining and Modeling class at Washburn University. I found this article incredibly insightful to how to use pandas to create machine learning algorithms! I want to learn how to use python to create useful insights from data, would you mind posting to me a copy of your code you wrote to my email brandon.michael@washburn.edu ? I would love to check it out and see if there are any insights to gain that I could use for my future in data science. Thank you!
Hi Brandon, yeah of course! Give me a day or two to comment out the important parts of the code, tidy things up, and generally make it look presentable lol.