联系我们

小丸工具箱官方网站

小丸工具箱processed(小丸工具箱怎么转mp3)

发布者:小丸工具箱发布时间:2022-08-07访问量:166

转载声明

本文为灯塔大数据原创内容,欢迎个人转载至朋友圈,其小丸工具箱processed他机构转载请在文章开头标注:

“转自:灯塔大数据;微信:DTbigdata”

近几年来,Python在数据科学界受到大量关注,我们在这里为数据科学界的科学家和工程师列举出小丸工具箱processed了最顶尖的Python库。(文末更多往期译文推荐)

因为这里提到的所有的库都是开源的,所以我们还备注小丸工具箱processed了每个库的贡献资料数量、贡献者人数以及其小丸工具箱processed他指数,可对每个Python库的受欢迎程度加以辅助说明。

1. NumPy

(资料数量:15980; 贡献者:522)

在最开始接触Python的时候,我们不可避免的都需要寻求Python的SciPy Stack的帮助,SciPy Stack是一款专为Python中科学计算而设计的软件集。所以我们在讲Python库的时候就不得不提到它了。但是SciPy Stack所含内容非常广泛,其中包括了十几个库,而我们需要做的是找到其中最重要的软件包。

NumPy(代表Numerical Python)是构建科学计算栈(scientific computation stack)的最基础的软件包。它的功能丰富,可以满足Python中n数组和矩阵的操作需求。 该库提供了NumPy数组类型的数学运算向量化,可以改善性能,从而加快执行速度。

2. SciPy

(资料数量:17213; 贡献者:489)

SciPy是一个工程和科学软件库。 您还需要了解SciPy Stack和SciPy Library之间的区别。SciPy包含线性代数,优化,集成和统计多个模块。SciPy Library的主要功能是建立在NumPy的基础上,因此它的数组大量使用NumPy。它通过其特定的子模块提供有效的数值例程(numerical routines),如数字积分,优化等等。SciPy的所有子模块中功能都有详细的记录 – 这是它的另一大优势。

展开全文

3. Pandas

(资料数量:15089; 贡献者:762)

Pandas是一个Python软件包,可以处理“标记”(labeled)和“关联”(relational)数据,简单直观。Pandas是数据整理的完美工具。 使用者可以通过它快速简便地完成数据操作,聚合和可视化。

Pandas库有两种主要数据结构:

“系列”(Series)——单维结构

“数据帧”(Data Frames)——二维结构

例如,如果你通过Series在Data Frame中附加一行数据,你就能从这两种数据结构中获得一个的新的“数据帧”

使用Pandas你可以完成以下操作:

轻松删除或添加“数据帧”

bjects将数据结构转化成“数据帧对象”

处理缺失数据,用NaNs表示

强大的分组功能

4.Matplotlib

(资料数量:21754; 贡献者:588)

MatPlotlib是SciPy Stack另一个核心软件包和Python库,可以轻松生成简单而强大的可视化功能。 这个顶尖软件包使得Python(有一些NumPy,SciPy和Pandas的帮助)可以与MatLab或Mathematica等科学工具的一较高下。

然而,这个库还是相对比较低级的,这意味着你需要编写更多的代码才能达到高级的可视化效果,而且通常会比使用那些高级工具要付出更多的努力,但总体来说还是值得一试的。

你可以使用它实现各种可视化:

线路图

散点图;

条形图和直方图;

饼状图;

茎叶图

等值线图

向量场图

频谱图

还可以使用Matplotlib创建标签,网格,图例和许多其他格式化字符。基本来说,一切都是可进行自定义的。

这个库由很多平台支持,并使用不同的图形用户界面(GUI)套件来描绘所得的可视化。 很多IDE(如IPython)都支持Matplotlib的功能。

5. Seaborn

(资料数量:1699; 贡献者:71)

Seaborn主要关注统计模型的可视化,如热图,这些可视化图形在总结数据的同时描绘数据的总体分布。 Seaborn是基于Matplotlib的,并高度依赖于它。

6. Bokeh

(资料数量:15724; 贡献者:223)

Bokeh是另一个强大的可视化库,可以实现交互式可视化。与其他的库相比,它的特别之处在于它是独立于Matplotlib的。Bokeh的主要关注点是交互性,所以它可以通过现代浏览器以数据驱动文档(d3.js)的方式进行演示。

7. Plotly

(资料数量:2486; 贡献者:33)

它是一个基于网络的工具箱,可用于构建可视化,用编程语言(其中包括Python)处理应用程序界面(API)。 在“plotly”网站上有一些强大的“开箱即用”的图形。在使用Plotly之前,您需要设置您的API密钥。 这些图形将在服务器端上进行处理,然后发布到互联网上,当然也可以选择不发布。

英文原文

Top 15 Python Libraries for Data Science in 2017

As Python has gained a lot of traction in the recent years in Data Science industry, I wanted to outline some of its most useful libraries for data scientists and engineers, based on recent experience.

And, since all of the libraries are open sourced, we have added commits, contributors count and other metrics from Github, which could be served as a proxy metrics for library popularity.

Core Libraries.

1. NumPy (Commits: 15980, Contributors: 522)

When starting to deal with the scientific task in Python, one inevitably comes for help to Python’s SciPy Stack, which is a collection of software specifically designed for scientific computing in Python (do not confuse with SciPy library, which is part of this stack, and the community around this stack). This way we want to start with a look at it. However, the stack is pretty vast, there is more than a dozen of libraries in it, and we want to put a focal point on the core packages (particularly the most essential ones).

The most fundamental package, around which the scientific computation stack is built, is NumPy (stands for Numerical Python). It provides an abundance of useful features for operations on n-arrays and matrices in Python. The library provides vectorization of mathematical operations on the NumPy array type, which ameliorates performance and accordingly speeds up the execution.

2. SciPy (Commits: 17213, Contributors: 489)

SciPy is a library of software for engineering and science. Again you need to understand the difference between SciPy Stack and SciPy Library. SciPy contains modules for linear algebra, optimization, integration, and statistics. The main functionality of SciPy library is built upon NumPy, and its arrays thus make substantial use of NumPy. It provides efficient numerical routines as numerical integration, optimization, and many others via its specific submodules. The functions in all submodules of SciPy are well documented?—?another coin in its pot.

3. Pandas (Commits: 15089, Contributors: 762)

Pandas is a Python package designed to do work with “labeled” and “relational” data simple and intuitive. Pandas is a perfect tool for data wrangling. It designed for quick and easy data manipulation, aggregation, and visualization.

There are two main data structures in the library:

“Series”?—?one-dimensional

“Data Frames”, two-dimensional

For example, when you want to receive a new Dataframe from these two types of structures, as a result you will receive such DF by appending a single row to a DataFrame by passing a Series:

Here is just a small list of things that you can do with Pandas:

Easily delete and add columns from DataFrame

Convert data structures to DataFrame objects

Handle missing data, represents as NaNs

Powerful grouping by functionality

4.Matplotlib (Commits: 21754, Contributors: 588)

Another SciPy Stack core package and another Python Library that is tailored for the generation of simple and powerful visualizations with ease is Matplotlib. It is a top-notch piece of software which is making Python (with some help of NumPy, SciPy, and Pandas) a cognizant competitor to such scientific tools as MatLab or Mathematica.

However, the library is pretty low-level, meaning that you will need to write more code to reach the advanced levels of visualizations and you will generally put more effort, than if using more high-level tools, but the overall effort is worth a shot.

With a bit of effort you can make just about any visualizations:

Line plots;

Scatter plots;

Bar charts and Histograms;

Pie charts;

Stem plots;

Contour plots;

Quiver plots;

Spectrograms

There are also facilities for creating labels, grids, legends, and many other formatting entities with Matplotlib. Basically, everything is customizable.

The library is supported by different platforms and makes use of different GUI kits for the depiction of resulting visualizations. Varying IDEs (like IPython) support functionality of Matplotlib.

There are also some additional libraries that can make visualization even easier.

5. Seaborn (Commits: 1699, Contributors: 71)

Seaborn is mostly focused on the visualization of statistical models; such visualizations include heat maps, those that summarize the data but still depict the overall distributions. Seaborn is based on Matplotlib and highly dependent on that.

6. Bokeh (Commits: 15724, Contributors: 223)

Another great visualization library is Bokeh, which is aimed at interactive visualizations. In contrast to the previous library, this one is independent of Matplotlib. The main focus of Bokeh, as we already mentioned, is interactivity and it makes its presentation via modern browsers in the style of Data-Driven Documents (d3.js).

7. Plotly (Commits: 2486, Contributors: 33)

Finally, a word about Plotly. It is rather a web-based toolbox for building visualizations, exposing APIs to some programming languages (Python among them). There is a number of robust, out-of-box graphics on the plot.ly website. In order to use Plotly, you will need to set up your API key. The graphics will be processed server side and will be posted on the internet, but there is a way to avoid it.

翻译:灯塔大数据

【灯塔大数据】微信公众号介绍:中国电信北京研究院通过整合电信自有数据、互联网数据和线下数据,创建了业内领先的“灯塔”大数据行业应用平台,致力于与行业合作伙伴共同打造大数据行业应用生态圈。目前我们面向市场研究、广告、汽车、金融、人力资源等诸多行业领域,提供零售研究、消费者研究、店铺选址、精准营销、泛义征信等服务,助力企业在大数据时代杨帆远航。

微信公众号【灯塔大数据】关键字信息:

【人工智能】获取人工智能时代的发展思考 ppt

【半月刊】下载大数据瞭望半月刊

【网络安全】获取国民网络安全报告全文

【23个理由】下载《大数据让你兴奋的23个理由》电子书

【思维导图】下载12种工具的获取方式

【 灯塔 】 查看更多关键字回复