Column Analysis

Get a compact, numerical summary of all analyses carried out. Visualizations and key figures provide a quick overview of key results and developments-ideal for evaluation at a glance.

Constant Object
Column: epitope_type
Number of Values: 946669
Number of Unique: 1
Number of Missing: 0 (0.0%)
Value Count Frequency
Linear peptide 946669 1.0
Statistic Value
Unique Categories1
ModeLinear peptide
Entropy-0.0
Gini0.0
Simpson Diversity1.0
Min Category Length14
Max Category Length14
Memory67213499
Cardinality Ratio0.0
Sequence
Column: peptide
Number of Values: 946669
Number of Unique: 357108
Number of Missing: 0 (0.0%)
Sequence type: protein
Top 20 Protein Sequences Analysis
Rank Sequence Count Length (AA) Frequency Molecular Weight (Da) Isoelectric Point (pI)
1 GILGFVFTL 114 9
Gly 22.22 %
Ile 11.11 %
Leu 22.22 %
Phe 22.22 %
Thr 11.11 %
Val 11.11 %
966.2 6.10
2 HLPETKFSEL 105 10
Glu 20.00 %
His 10.00 %
Leu 20.00 %
Lys 10.00 %
Phe 10.00 %
Pro 10.00 %
Ser 10.00 %
Thr 10.00 %
1200.4 5.49
3 NAPWAVTSL 102 9
Ala 22.22 %
Asn 11.11 %
Leu 11.11 %
Pro 11.11 %
Ser 11.11 %
Thr 11.11 %
Trp 11.11 %
Val 11.11 %
958.1 6.10
4 IAPTGHSL 89 8
Ala 12.50 %
Gly 12.50 %
His 12.50 %
Ile 12.50 %
Leu 12.50 %
Pro 12.50 %
Ser 12.50 %
Thr 12.50 %
794.9 7.55
5 FLPSDFFPSV 86 10
Asp 10.00 %
Leu 10.00 %
Phe 30.00 %
Pro 20.00 %
Ser 20.00 %
Val 10.00 %
1155.3 3.75
6 IIHEIAVLEL 81 10
Ala 10.00 %
Glu 20.00 %
His 10.00 %
Ile 30.00 %
Leu 20.00 %
Val 10.00 %
1149.4 4.25
7 SRYWAIRTR 72 9
Ala 11.11 %
Arg 33.33 %
Ile 11.11 %
Ser 11.11 %
Thr 11.11 %
Trp 11.11 %
Tyr 11.11 %
1208.4 12.20
8 SIPGGYNAL 71 9
Ala 11.11 %
Asn 11.11 %
Gly 22.22 %
Ile 11.11 %
Leu 11.11 %
Pro 11.11 %
Ser 11.11 %
Tyr 11.11 %
891.0 6.09
9 SIPEKNRPL 68 9
Arg 11.11 %
Asn 11.11 %
Glu 11.11 %
Ile 11.11 %
Leu 11.11 %
Lys 11.11 %
Pro 22.22 %
Ser 11.11 %
1053.2 9.70
10 VNLPINGNGKQ 67 11
Asn 27.27 %
Gln 9.09 %
Gly 18.18 %
Ile 9.09 %
Leu 9.09 %
Lys 9.09 %
Pro 9.09 %
Val 9.09 %
1153.3 9.70
11 YLPDGTASL 65 9
Ala 11.11 %
Asp 11.11 %
Gly 11.11 %
Leu 22.22 %
Pro 11.11 %
Ser 11.11 %
Thr 11.11 %
Tyr 11.11 %
936.0 3.75
12 SVPEAQSAL 65 9
Ala 22.22 %
Gln 11.11 %
Glu 11.11 %
Leu 11.11 %
Pro 11.11 %
Ser 22.22 %
Val 11.11 %
901.0 3.85
13 TAMDVVYAL 64 9
Ala 22.22 %
Asp 11.11 %
Leu 11.11 %
Met 11.11 %
Thr 11.11 %
Tyr 11.11 %
Val 22.22 %
982.2 3.75
14 NVKVDPEIQ 64 9
Asn 11.11 %
Asp 11.11 %
Gln 11.11 %
Glu 11.11 %
Ile 11.11 %
Lys 11.11 %
Pro 11.11 %
Val 22.22 %
1041.2 4.18
15 TVTEKPEVID 64 10
Asp 10.00 %
Glu 20.00 %
Ile 10.00 %
Lys 10.00 %
Pro 10.00 %
Thr 20.00 %
Val 20.00 %
1130.3 3.93
16 FAYDGKDYI 63 9
Ala 11.11 %
Asp 22.22 %
Gly 11.11 %
Ile 11.11 %
Lys 11.11 %
Phe 11.11 %
Tyr 22.22 %
1091.2 4.11
17 VVPEPGQPL 63 9
Gln 11.11 %
Glu 11.11 %
Gly 11.11 %
Leu 11.11 %
Pro 33.33 %
Val 22.22 %
935.1 3.85
18 SAPVGVTAL 63 9
Ala 22.22 %
Gly 11.11 %
Leu 11.11 %
Pro 11.11 %
Ser 11.11 %
Thr 11.11 %
Val 22.22 %
813.9 6.10
19 AGFAGDDAPR 63 10
Ala 30.00 %
Arg 10.00 %
Asp 20.00 %
Gly 20.00 %
Phe 10.00 %
Pro 10.00 %
976.0 4.11
20 YFAERVTSL 62 9
Ala 11.11 %
Arg 11.11 %
Glu 11.11 %
Leu 11.11 %
Phe 11.11 %
Ser 11.11 %
Thr 11.11 %
Tyr 11.11 %
Val 11.11 %
1085.2 6.40
Length Statistics
Min Length:8 AA
Mean Length:9.3 AA
Max Length:11 AA
Molecular Weight Statistics
Min MW:794.9 Da
Mean MW:1021.1 Da
Max MW:1208.4 Da
Isoelectric Point Statistics
Min pI:3.75
Mean pI:5.75
Max pI:12.20
Hydrophobicity Statistics
Min Hydrophobicity:-1.233
Mean Hydrophobicity:0.148
Max Hydrophobicity:2.267
Charge Statistics
Min Charge:-2.00
Mean Charge:-0.39
Max Charge:3.00
Amino Acid Composition (Aggregated from Top 20)
Amino Acid Three-Letter Count Percentage Distribution
L Leu 19 10.22%
P Pro 18 9.68%
A Ala 18 9.68%
V Val 17 9.14%
G Gly 13 6.99%
S Ser 13 6.99%
I Ile 12 6.45%
T Thr 11 5.91%
E Glu 11 5.91%
F Phe 9 4.84%
D Asp 9 4.84%
N Asn 7 3.76%
Y Tyr 7 3.76%
K Lys 6 3.23%
R Arg 6 3.23%
Q Gln 4 2.15%
H His 3 1.61%
W Trp 2 1.08%
M Met 1 0.54%
Rank Sequence Length Hydrophobicity Charge Aliphatic Index Boman Index Aromaticity Instability
1 GILGFVFTL 9 2.267 -0.00 162.22 -2.67 0.222 20.86
2 HLPETKFSEL 10 -0.680 -0.91 78.00 1.70 0.100 64.90
3 NAPWAVTSL 9 0.456 -0.00 97.78 -0.26 0.111 0.96
4 IAPTGHSL 8 0.425 0.09 110.00 -0.24 0.000 22.21
5 FLPSDFFPSV 10 0.810 -1.00 68.00 -0.24 0.300 78.50
6 IIHEIAVLEL 10 1.690 -1.91 234.00 -1.22 0.000 40.60
7 SRYWAIRTR 9 -1.211 3.00 54.44 4.65 0.222 -6.31
8 SIPGGYNAL 9 0.233 -0.00 97.78 -0.37 0.111 9.97
9 SIPEKNRPL 9 -1.233 1.00 86.67 3.05 0.000 46.40
10 VNLPINGNGKQ 11 -0.709 1.00 97.27 1.39 0.000 5.36
11 YLPDGTASL 9 0.122 -1.00 97.78 0.25 0.111 12.48
12 SVPEAQSAL 9 0.156 -1.00 97.78 0.73 0.000 98.42
13 TAMDVVYAL 9 1.356 -1.00 130.00 -0.84 0.111 26.82
14 NVKVDPEIQ 9 -0.733 -1.00 107.78 2.25 0.000 20.27
15 TVTEKPEVID 10 -0.450 -2.00 97.00 2.00 0.000 29.61
16 FAYDGKDYI 9 -0.533 -1.00 54.44 1.40 0.333 25.77
17 VVPEPGQPL 9 0.000 -1.00 107.78 -0.18 0.000 92.40
18 SAPVGVTAL 9 1.367 -0.00 130.00 -1.29 0.000 32.82
19 AGFAGDDAPR 10 -0.570 -1.00 30.00 2.21 0.100 20.72
20 YFAERVTSL 9 0.200 -0.00 86.67 1.57 0.222 -0.54
Additional Property Statistics
Aliphatic Index from 30.00 to 234.00
Boman Index from -2.67 to 4.65
Aromaticity from 0.000 to 0.333
Instability Index from -6.31 to 98.42
Property Interpretation

Aliphatic Index: Relative volume of aliphatic side chains (higher = more thermostable)

Boman Index: Protein-protein interaction potential

Aromaticity: Fraction of aromatic amino acids (Phe, Trp, Tyr)

Instability Index: <40 = stable, >40 = unstable protein

Most Common Peptide K-mers per Sequence

Peptide K-merCount
FTL 1
FVF 1
GFV 1
GIL 1
ILG 1

Peptide K-merCount
ETK 1
FSE 1
HLP 1
KFS 1
LPE 1

Peptide K-merCount
APW 1
AVT 1
NAP 1
PWA 1
TSL 1

Peptide K-merCount
APT 1
GHS 1
HSL 1
IAP 1
PTG 1

Peptide K-merCount
DFF 1
FFP 1
FLP 1
FPS 1
LPS 1

Peptide K-merCount
AVL 1
EIA 1
HEI 1
IAV 1
IHE 1

Peptide K-merCount
AIR 1
IRT 1
RTR 1
RYW 1
SRY 1

Peptide K-merCount
GGY 1
GYN 1
IPG 1
NAL 1
PGG 1

Peptide K-merCount
EKN 1
IPE 1
KNR 1
NRP 1
PEK 1

Peptide K-merCount
GKQ 1
GNG 1
ING 1
LPI 1
NGK 1

Peptide K-merCount
ASL 1
DGT 1
GTA 1
LPD 1
PDG 1

Peptide K-merCount
AQS 1
EAQ 1
PEA 1
QSA 1
SAL 1

Peptide K-merCount
AMD 1
DVV 1
MDV 1
TAM 1
VVY 1

Peptide K-merCount
DPE 1
EIQ 1
KVD 1
NVK 1
PEI 1

Peptide K-merCount
EKP 1
EVI 1
KPE 1
PEV 1
TEK 1

Peptide K-merCount
AYD 1
DGK 1
DYI 1
FAY 1
GKD 1

Peptide K-merCount
EPG 1
GQP 1
PEP 1
PGQ 1
QPL 1

Peptide K-merCount
APV 1
GVT 1
PVG 1
SAP 1
TAL 1

Peptide K-merCount
AGD 1
AGF 1
APR 1
DAP 1
DDA 1

Peptide K-merCount
AER 1
ERV 1
FAE 1
RVT 1
TSL 1
Missing Object
Column: Reference_Name
Number of Values: 740466
Number of Unique: 355539
Number of Missing: 206203 (21.78%)
Value Count Frequency
Eluted Peptide 850 100 0.0001350500900784101
Eluted Peptide 413 99 0.000133699589177626
Eluted Peptide 5918 83 0.00011209157476508037
Eluted Peptide 2636 78 0.00010533907026115986
NUP210 69 9.318456215410295e-05
Eluted Peptide 9062 68 9.183406125331885e-05
Eluted Peptide 7529 68 9.183406125331885e-05
Eluted Peptide 7040 68 9.183406125331885e-05
Eluted Peptide 1245 65 8.778255855096655e-05
HPVG analog 65 8.778255855096655e-05
Eluted Peptide 6060 65 8.778255855096655e-05
HSPA6 64 8.643205765018245e-05
Eluted Peptide 7012 63 8.508155674939835e-05
Eluted Peptide 4848 63 8.508155674939835e-05
Eluted Peptide 7363 62 8.373105584861426e-05
Eluted Peptide 3515 62 8.373105584861426e-05
Eluted Peptide 7282 58 7.832905224547784e-05
Eluted Peptide 1053 57 7.697855134469374e-05
Eluted Peptide 1517 56 7.562805044390964e-05
Eluted Peptide 14345 53 7.157654774155735e-05
Statistic Value
Unique Categories355539
ModeEluted Peptide 850
Entropy14.3
Gini1.0
Simpson Diversity268390.5
Min Category Length1
Max Category Length82
Memory58720857
Cardinality Ratio0.376
Missing Number High Correlation
Column: start
Number of Values: 911982
Number of Unique: 5589
Number of Missing: 34687 (3.66%)
Number type: float64
Correlates with:
End : 0.9999995213213142
Value Count Frequency
2.0 2937 0.0032204582985190496
11.0 2491 0.0027314135586009375
58.0 2463 0.0027007111982473337
53.0 2428 0.0026623332478053294
76.0 2409 0.00264149950327967
41.0 2392 0.0026228587844935535
67.0 2392 0.0026228587844935535
34.0 2392 0.0026228587844935535
44.0 2360 0.002587770372660864
47.0 2355 0.0025822878083120063
74.0 2352 0.0025789982697026915
35.0 2350 0.0025768052439631484
7.0 2345 0.0025713226796142906
5.0 2344 0.002570226166744519
6.0 2336 0.0025614540637863468
69.0 2325 0.0025493924222188594
28.0 2325 0.0025493924222188594
56.0 2323 0.0025471993964793164
59.0 2322 0.002546102883609545
3.0 2308 0.0025307517034327434
Statistic Value
Minimum1.0
Maximum33966.0
Mean461.01
Median250.0
Mode2.0
Standard Deviation734.25
Sum420433542.0
Kurtosis180.38
Skewness8.03
Median Abs Deviation172.0
Coefficient of Variation1.59
Quantiles 25 %: nan
50 %: nan
75 %: nan
Memory15146704
Missing Number High Correlation
Column: end
Number of Values: 911982
Number of Unique: 5562
Number of Missing: 34687 (3.66%)
Number type: float64
Correlates with:
Start : 0.9999995213213143
Value Count Frequency
11.0 2574 0.0028224241267919763
61.0 2511 0.0027533438159963682
86.0 2486 0.0027259309942520796
66.0 2449 0.002685360018070532
12.0 2433 0.0026678158121541872
49.0 2423 0.0026568506834564716
43.0 2401 0.0026327274003214974
15.0 2389 0.0026195692458842387
48.0 2372 0.0026009285270981227
19.0 2359 0.0025866738597910923
42.0 2357 0.0025844808340515493
68.0 2342 0.002568033141004976
13.0 2336 0.0025614540637863468
55.0 2303 0.0025252691390838856
71.0 2298 0.002519786574735028
47.0 2292 0.0025132074975163982
14.0 2287 0.0025077249331675404
29.0 2286 0.002506628420297769
85.0 2285 0.0025055319074279974
75.0 2284 0.002504435394558226
Statistic Value
Minimum8.0
Maximum33975.0
Mean469.31
Median258.0
Mode11.0
Standard Deviation734.25
Sum428004954.0
Kurtosis180.39
Skewness8.03
Median Abs Deviation172.0
Coefficient of Variation1.56
Quantiles 25 %: nan
50 %: nan
75 %: nan
Memory15146704
Missing Object
Column: protein
Number of Values: 940112
Number of Unique: 60999
Number of Missing: 6557 (0.69%)
Value Count Frequency
unknown protein eluted from human MHC allele 26501 0.0281891944789557
Genome polyprotein 10214 0.010864662933778102
Replicase polyprotein 1ab 7380 0.007850128495328216
hypothetical protein 5752 0.00611841993294416
L protein 2973 0.003162389162142383
unnamed protein product 2824 0.0030038974079684123
nucleoprotein NP 2608 0.002774137549568562
involved in plaque and EEV formation 2435 0.0025901169222390524
polyprotein 2260 0.0024039688888132476
glycoprotein 2224 0.0023656755790799396
Hypothetical protein 2106 0.0022401586193985397
Nucleoprotein 1996 0.0021231512841023196
nucleocapsid protein 1789 0.0019029647531357966
Myosin-9 1450 0.0015423694198138094
unknown 1365 0.0014519546607212757
structural glycoprotein 1340 0.0014253620845175894
conserved hypothetical protein 1278 0.0013594124955324473
Protein A16 1244 0.0013232465918954336
Spike glycoprotein precursor 1217 0.0012945266095954525
IMV heparin binding surface protein 1143 0.0012158125840325408
Statistic Value
Unique Categories60999
Modeunknown protein eluted from human MHC allele
Entropy13.54
Gini1.0
Simpson Diversity843.19
Min Category Length1
Max Category Length433
Memory85114587
Cardinality Ratio0.064
Taxonomy invalid Missing Object
Column: source_organism
Number of Values: 940113
Number of Unique: 1290
Number of Missing: 6556 (0.69%)
Taxonomy: Invalid
Value Count Frequency
Homo sapiens 750132 0.7979168461663652
Vaccinia virus WR 32737 0.03482240964649994
unidentified 26501 0.0281891644940555
SARS coronavirus Tor2 8208 0.008730865332146242
Zaire ebolavirus 7278 0.007741622549629673
Mycobacterium tuberculosis 5908 0.00628435092377193
Giardia lamblia ATCC 50803 4382 0.004661141798911407
SARS-CoV1 4059 0.004317566079822319
Dengue virus 2 Jamaica/1409/1983 2063 0.0021944170541200896
Hepatitis B virus 2060 0.0021912259483700364
Vaccinia virus Copenhagen 2032 0.002161442294702871
Dengue virus 4 Philippines/H241/1956 1688 0.0017955288353634084
Escherichia coli O157:H7 1630 0.0017338341241957085
Dengue virus 1 Brazil/97-11/1997 1585 0.0016859675379449065
Influenza A virus 1543 0.0016412920574441583
Severe acute respiratory syndrome coronavirus 2 1475 0.0015689603271096134
Variola virus 1467 0.0015604507117761375
Sudan ebolavirus 1420 0.001510456721691967
Vibrio cholerae 1361 0.0014476983086075823
Dengue virus 3 Martinique/1243/1999 1360 0.0014466346066908979
Taxonomy
Adeno-associated virus - 8
Chlamydia trachomatis Serovar L2
Cryptococcus neoformans var. neoformans B-3501A
Dengue virus 2 S221
Dengue virus type 1 Hawaii
Frankia sp. EAN1pec
Giardia intestinalis ATCC 50581
Guanarito virus strain INH-95551
H1N1 subtype Influenza A/Oklahoma/7485/01
H1N1 swine influenza virus (A/swine/Korea/S10/2004(H1N1))
H9N2 subtype Influenza A virus (A/swine/Korea/S81/2004(H9N2))
H9N2 subtype Influenza A virus (A/swine/Korea/S83/2004(H9N2))
Hepatitis B virus subtype AYR
Hepatitis delta virus TW2667
Human gammaherpesvirus 4
Human herpesvirus 4 GD1
Human herpesvirus 4 WW1
Human herpesvirus 5 TB40
Human metapneumovirus
Human metapneumovirus A2 NL/00/17
Influenza A virus (A/X-31(H3N2)) A/X-31 X HK
Influenza A virus H3N2 (A/Resvir-9 (H3N2))
Junin virus strain MC2
Lymphocytic choriomeningitis virus (strain Armstrong) (clone 53b)
Machupo virus strain Carvallo
Monkeypox virus USA_2003_039
Pichinde virus strain Munchique
SARS-CoV1
SARS-CoV2 Alpha
SARS-CoV2 Beta
SARS-CoV2 Delta
SARS-CoV2 Epsilon
SARS-CoV2 Gamma
SARS-CoV2 Iota
SARS-CoV2 Kappa
SARS-CoV2 Omicron
Severe acute respiratory syndrome coronavirus 2 Canada/QC-L00478241001/2022
Severe acute respiratory syndrome coronavirus 2 HKG/HKU-001a/2020
Severe acute respiratory syndrome coronavirus 2 USA/NJ-CDC-LC0471426/2022
Severe acute respiratory syndrome coronavirus 2 Wuhan/Hu-1/2019
Ustilago maydis 521
Variola major virus India-1967
West Nile virus NY-99
Whitewater Arroyo virus strain AV9310135
Statistic Value
Unique Categories1290
ModeHomo sapiens
Entropy2.0
Gini0.37
Simpson Diversity1.59
Min Category Length4
Max Category Length75
Memory67038049
Cardinality Ratio0.001
Object
Column: host
Number of Values: 946669
Number of Unique: 2
Number of Missing: 0 (0.0%)
Taxonomy: Valid
Value Count Frequency
Homo sapiens (human) 946628 0.9999566902475945
Homo sapiens 41 4.3309752405539845e-05
Statistic Value
Unique Categories2
ModeHomo sapiens (human)
Entropy0.0
Gini0.0
Simpson Diversity1.0
Min Category Length12
Max Category Length20
Memory72893185
Cardinality Ratio0.0
Object
Column: method
Number of Values: 946669
Number of Unique: 24
Number of Missing: 0 (0.0%)
Value Count Frequency
cellular MHC/mass spectrometry 545348 0.5760704110940572
mass spectrometry 198332 0.20950511741696412
purified MHC/competitive/radioactivity 67023 0.07079876915796335
purified MHC/direct/fluorescence 63421 0.06699384895882299
secreted MHC/mass spectrometry 24875 0.02627634368506838
purified MHC/competitive/fluorescence 23853 0.025196768881203462
cellular MHC/direct/fluorescence 7650 0.00808096599761902
purified MHC/direct/radioactivity 7126 0.00752744623516773
cellular MHC/competitive/fluorescence 5531 0.005842591233049778
purified MHC 1256 0.0013267572932038549
x-ray crystallography 599 0.0006327449192906919
High throughput multiplexed assay 345 0.00036443572146124993
Edman degradation 308 0.0003253513107538115
cellular MHC 291 0.0003073936085368804
binding assay 230 0.0002429571476408333
lysate MHC/direct/radioactivity 207 0.00021866143287674995
cellular MHC/competitive/radioactivity 77 8.133782768845288e-05
cellular MHC/T cell inhibition 75 7.922515683940216e-05
cellular MHC/direct/radioactivity 43 4.542242325459057e-05
coelution 39 4.119708155648912e-05
Statistic Value
Unique Categories24
Modecellular MHC/mass spectrometry
Entropy1.93
Gini0.61
Simpson Diversity2.59
Min Category Length9
Max Category Length38
Memory80646976
Cardinality Ratio0.0
Object
Column: response_measured
Number of Values: 946669
Number of Unique: 13
Number of Missing: 0 (0.0%)
Value Count Frequency
ligand presentation 768928 0.8122458853094376
dissociation constant KD (~IC50) 52307 0.055253737050648115
dissociation constant KD (~EC50) 38325 0.0404840551449345
qualitative binding 31033 0.03278125722929556
half maximal inhibitory concentration (IC50) 24325 0.02569535920157943
dissociation constant KD 18799 0.01985804964565228
half life 10458 0.011047155869686237
half maximal effective concentration (EC50) 1003 0.0010595044307989381
3D structure 600 0.0006338012547152173
50% dissociation temperature 533 0.000563026781272018
MHC binding 345 0.00036443572146124993
off rate 12 1.2676025094304346e-05
association constant KA 1 1.0563354245253621e-06
Statistic Value
Unique Categories13
Modeligand presentation
Entropy1.17
Gini0.33
Simpson Diversity1.5
Min Category Length8
Max Category Length44
Memory73744381
Cardinality Ratio0.0
Missing Object
Column: units
Number of Values: 146362
Number of Unique: 6
Number of Missing: 800307 (84.54%)
Value Count Frequency
nM 134759 0.9207239584045039
min 10458 0.07145297276615514
angstroms 599 0.004092592339541685
°C 533 0.0036416556209945203
1/s 12 8.198849428130252e-05
1/M 1 6.832374523441877e-06
Unit Count
None 800307
nM 134759
min 10458
angstroms 599
°C 533
1/s 12
1/M 1
Statistic Value
Unique Categories6
ModenM
Entropy0.49
Gini0.98
Simpson Diversity49.05
Min Category Length2
Max Category Length9
Memory34270506
Cardinality Ratio0.0
Object
Column: qualitative
Number of Values: 946669
Number of Unique: 5
Number of Missing: 0 (0.0%)
Value Count Frequency
Positive 783658 0.8278057061126962
Negative 88365 0.09334307978818362
Positive-High 27282 0.028818943051900928
Positive-Low 25854 0.027310496065678713
Positive-Intermediate 21510 0.02272177498154054
Statistic Value
Unique Categories5
ModePositive
Entropy0.96
Gini0.3
Simpson Diversity1.44
Min Category Length8
Max Category Length21
Memory62052941
Cardinality Ratio0.0
Missing Object
Column: inequality
Number of Values: 85405
Number of Unique: 3
Number of Missing: 861264 (90.98%)
Value Count Frequency
> 45297 0.5303787834435923
= 39929 0.4675253205315848
>= 179 0.0020958960248229027
Statistic Value
Unique Categories3
Mode>
Entropy0.4
Gini1.0
Simpson Diversity245.79
Min Category Length1
Max Category Length4
Memory32514117
Cardinality Ratio0.0
Missing Number
Column: quantitative
Number of Values: 149882
Number of Unique: 20212
Number of Missing: 796787 (84.17%)
Number type: float64
Value Count Frequency
20000.0 39086 0.2607784790702019
5000.0 2202 0.014691557358455318
70000.0 1774 0.011835977635740116
1.0 943 0.0062916160713094305
0.0 543 0.0036228499753139134
10000.0 533 0.0035561308229140258
12.0 458 0.003055737179914866
18.0 421 0.002808876316035281
77700.0 414 0.0027621729093553595
77900.0 410 0.0027354852483954042
78100.0 409 0.0027288133331554157
0.4 388 0.002588703113115651
24.0 343 0.0022884669273161556
100000.0 338 0.0022551073511162113
30.0 328 0.0021883881987163237
0.6 313 0.0020883094701164917
2.0 307 0.002048277978676559
0.1 306 0.0020416060634365703
0.5 300 0.0020015745719966374
0.3 299 0.0019949026567566484
Statistic Value
Minimum0.0
Maximum37000000.0
Mean15292.08
Median2700.0
Mode20000.0
Standard Deviation128360.12
Sum2292007181.06
Kurtosis49147.39
Skewness190.91
Median Abs Deviation2698.4
Coefficient of Variation8.39
Quantiles 25 %: nan
50 %: nan
75 %: nan
Memory15146704
Object
Column: mhc_allele
Number of Values: 946669
Number of Unique: 255
Number of Missing: 0 (0.0%)
Value Count Frequency
HLA-A*02:01 129775 0.13708592971777886
HLA-B*07:02 40317 0.04258827531058902
HLA-B*27:05 33788 0.03569146132386294
HLA-B*57:01 32903 0.03475660447315799
HLA-A*24:02 29063 0.0307002764429806
HLA-B*40:02 27150 0.028679506775863582
HLA-A*01:01 26082 0.027551340542470493
HLA-B*51:01 23999 0.025350993853184164
HLA-B*40:01 23595 0.024924234341675917
HLA-B*15:01 22870 0.02415839115889503
HLA-A*11:01 22415 0.02367775854073599
HLA-A*03:01 21814 0.02304290095059625
HLA-B*15:02 20486 0.02164008750682657
HLA-B*44:02 20476 0.021629524152581313
HLA-B*44:03 19126 0.020203471329472075
HLA-C*05:01 15299 0.016160875659813514
HLA-A*29:02 14768 0.015599961549390548
HLA-B*08:01 13859 0.014639752648496993
HLA-A*68:02 13271 0.01401862741887608
HLA-B*58:01 11712 0.01237180049204104
Statistic Value
Unique Categories255
ModeHLA-A*02:01
Entropy5.98
Gini0.97
Simpson Diversity30.06
Min Category Length6
Max Category Length12
Memory64271975
Cardinality Ratio0.0
Constant Object
Column: mhc_class
Number of Values: 946669
Number of Unique: 1
Number of Missing: 0 (0.0%)
Value Count Frequency
I 946669 1.0
Statistic Value
Unique Categories1
ModeI
Entropy-0.0
Gini0.0
Simpson Diversity1.0
Min Category Length1
Max Category Length1
Memory54906802
Cardinality Ratio0.0
Object
Column: bind_class
Number of Values: 946669
Number of Unique: 2
Number of Missing: 0 (0.0%)
Value Count Frequency
Positive 832450 0.8793464241461376
Negative 114219 0.12065357585386233
Statistic Value
Unique Categories2
ModePositive
Entropy0.53
Gini0.21
Simpson Diversity1.27
Min Category Length8
Max Category Length8
Memory61533485
Cardinality Ratio0.0
Number
Column: length
Number of Values: 946669
Number of Unique: 4
Number of Missing: 0 (0.0%)
Number type: int64
Value Count Frequency
9 624589 0.6597754864688714
10 176860 0.18682348318155553
11 86272 0.09113216974465203
8 58948 0.062268860604921046
Statistic Value
Minimum8
Maximum11
Mean9.31
Median9.0
Mode9
Standard Deviation0.72
Sum8810477
Kurtosis0.63
Skewness0.93
Median Abs Deviation0.0
Coefficient of Variation0.08
Quantiles 25 %: 9.0
50 %: 9.0
75 %: 10.0
Memory15146704